LinuxLists.cc - [PATCH 2/2] f2fs: support data compression

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:
> From: Chao Yu <[email protected]>
>
> This patch tries to support compression in f2fs.
>
> - New term named cluster is defined as basic unit of compression, file can
> be divided into multiple clusters logically. One cluster includes 4 << n
> (n >= 0) logical pages, compression size is also cluster size, each of
> cluster can be compressed or not.
>
> - In cluster metadata layout, one special flag is used to indicate cluster
> is compressed one or normal one, for compressed cluster, following metadata
> maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
> data including compress header and compressed data.
>
> - In order to eliminate write amplification during overwrite, F2FS only
> support compression on write-once file, data can be compressed only when
> all logical blocks in file are valid and cluster compress ratio is lower
> than specified threshold.
>
> - To enable compression on regular inode, there are three ways:
> * chattr +c file
> * chattr +c dir; touch dir/file
> * mount w/ -o compress_extension=ext; touch file.ext
>
> Compress metadata layout:
> [Dnode Structure]
> +-----------------------------------------------+
> | cluster 1 | cluster 2 | ......... | cluster N |
> +-----------------------------------------------+
> . . . .
> . . . .
> . Compressed Cluster . . Normal Cluster .
> +----------+---------+---------+---------+ +---------+---------+---------+---------+
> |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
> +----------+---------+---------+---------+ +---------+---------+---------+---------+
> . .
> . .
> . .
> +-------------+-------------+----------+----------------------------+
> | data length | data chksum | reserved | compressed data |
> +-------------+-------------+----------+----------------------------+
>
> Changelog:
>
> 20190326:
> - fix error handling of read_end_io().
> - remove unneeded comments in f2fs_encrypt_one_page().
>
> 20190327:
> - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
> - don't jump into loop directly to avoid uninitialized variables.
> - add TODO tag in error path of f2fs_write_cache_pages().
>
> 20190328:
> - fix wrong merge condition in f2fs_read_multi_pages().
> - check compressed file in f2fs_post_read_required().
>
> 20190401
> - allow overwrite on non-compressed cluster.
> - check cluster meta before writing compressed data.
>
> 20190402
> - don't preallocate blocks for compressed file.
>
> - add lz4 compress algorithm
> - process multiple post read works in one workqueue
> Now f2fs supports processing post read work in multiple workqueue,
> it shows low performance due to schedule overhead of multiple
> workqueue executing orderly.
>
> - compress: support buffered overwrite
> C: compress cluster flag
> V: valid block address
> N: NEW_ADDR
>
> One cluster contain 4 blocks
>
> before overwrite after overwrite
>
> - VVVV -> CVNN
> - CVNN -> VVVV
>
> - CVNN -> CVNN
> - CVNN -> CVVV
>
> - CVVV -> CVNN
> - CVVV -> CVVV
>
> [Jaegeuk Kim]
> - add tracepoint for f2fs_{,de}compress_pages()
> - fix many bugs and add some compression stats
>
> Signed-off-by: Chao Yu <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>

How was this tested? Shouldn't there a mount option analogous to
test_dummy_encryption that causes all files to be auto-compressed, so that a
full run of xfstests can be done with compression? I see "compress_extension",
but apparently it's only for a file extension? Also, since reads can involve
any combination of decryption, compression, and verity, it's important to test
as many combinations as possible, including all at once. Has that been done?

I also tried running the fs-verity xfstests on this with
'kvm-xfstests -c f2fs -g verity', but the kernel immediately crashes:

BUG: kernel NULL pointer dereference, address: 0000000000000182
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1-00119-g60f351f4c50f #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
Call Trace:
<IRQ>
queue_work_on+0x67/0x70 kernel/workqueue.c:1518
queue_work include/linux/workqueue.h:494 [inline]
f2fs_enqueue_post_read_work fs/f2fs/data.c:166 [inline]
bio_post_read_processing fs/f2fs/data.c:173 [inline]
f2fs_read_end_io+0xcb/0xe0 fs/f2fs/data.c:195
bio_endio+0xa4/0x1a0 block/bio.c:1818
req_bio_endio block/blk-core.c:242 [inline]
blk_update_request+0xf6/0x310 block/blk-core.c:1462
blk_mq_end_request+0x1c/0x130 block/blk-mq.c:568
virtblk_request_done+0x32/0x80 drivers/block/virtio_blk.c:226
blk_done_softirq+0x98/0xc0 block/blk-softirq.c:37
__do_softirq+0xc1/0x40d kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xb3/0xc0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
do_IRQ+0x5b/0x110 arch/x86/kernel/irq.c:263
common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
</IRQ>
RIP: 0010:native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
RIP: 0010:arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
RIP: 0010:default_idle+0x29/0x160 arch/x86/kernel/process.c:580
Code: 90 55 48 89 e5 41 55 41 54 65 44 8b 25 70 64 76 7e 53 0f 1f 44 00 00 e8 95 13 88 ff e9 07 00 00 00 0f 00 2d 8b c0 b
RSP: 0018:ffffc90000073e78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdc
RAX: ffff88807d94e340 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffff88807d94e340
RBP: ffffc90000073e90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff88807d94e340 R14: 0000000000000000 R15: 0000000000000000
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1e4/0x210 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
start_secondary+0x151/0x1a0 arch/x86/kernel/smpboot.c:264
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
CR2: 0000000000000182
---[ end trace 86328090a3179142 ]---
RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 5 seconds..

2019-10-24 11:47:33

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 10/22, Eric Biggers wrote:
> On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:
> > From: Chao Yu <[email protected]>
> >
> > This patch tries to support compression in f2fs.
> >
> > - New term named cluster is defined as basic unit of compression, file can
> > be divided into multiple clusters logically. One cluster includes 4 << n
> > (n >= 0) logical pages, compression size is also cluster size, each of
> > cluster can be compressed or not.
> >
> > - In cluster metadata layout, one special flag is used to indicate cluster
> > is compressed one or normal one, for compressed cluster, following metadata
> > maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
> > data including compress header and compressed data.
> >
> > - In order to eliminate write amplification during overwrite, F2FS only
> > support compression on write-once file, data can be compressed only when
> > all logical blocks in file are valid and cluster compress ratio is lower
> > than specified threshold.
> >
> > - To enable compression on regular inode, there are three ways:
> > * chattr +c file
> > * chattr +c dir; touch dir/file
> > * mount w/ -o compress_extension=ext; touch file.ext
> >
> > Compress metadata layout:
> > [Dnode Structure]
> > +-----------------------------------------------+
> > | cluster 1 | cluster 2 | ......... | cluster N |
> > +-----------------------------------------------+
> > . . . .
> > . . . .
> > . Compressed Cluster . . Normal Cluster .
> > +----------+---------+---------+---------+ +---------+---------+---------+---------+
> > |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
> > +----------+---------+---------+---------+ +---------+---------+---------+---------+
> > . .
> > . .
> > . .
> > +-------------+-------------+----------+----------------------------+
> > | data length | data chksum | reserved | compressed data |
> > +-------------+-------------+----------+----------------------------+
> >
> > Changelog:
> >
> > 20190326:
> > - fix error handling of read_end_io().
> > - remove unneeded comments in f2fs_encrypt_one_page().
> >
> > 20190327:
> > - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
> > - don't jump into loop directly to avoid uninitialized variables.
> > - add TODO tag in error path of f2fs_write_cache_pages().
> >
> > 20190328:
> > - fix wrong merge condition in f2fs_read_multi_pages().
> > - check compressed file in f2fs_post_read_required().
> >
> > 20190401
> > - allow overwrite on non-compressed cluster.
> > - check cluster meta before writing compressed data.
> >
> > 20190402
> > - don't preallocate blocks for compressed file.
> >
> > - add lz4 compress algorithm
> > - process multiple post read works in one workqueue
> > Now f2fs supports processing post read work in multiple workqueue,
> > it shows low performance due to schedule overhead of multiple
> > workqueue executing orderly.
> >
> > - compress: support buffered overwrite
> > C: compress cluster flag
> > V: valid block address
> > N: NEW_ADDR
> >
> > One cluster contain 4 blocks
> >
> > before overwrite after overwrite
> >
> > - VVVV -> CVNN
> > - CVNN -> VVVV
> >
> > - CVNN -> CVNN
> > - CVNN -> CVVV
> >
> > - CVVV -> CVNN
> > - CVVV -> CVVV
> >
> > [Jaegeuk Kim]
> > - add tracepoint for f2fs_{,de}compress_pages()
> > - fix many bugs and add some compression stats
> >
> > Signed-off-by: Chao Yu <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
>
> How was this tested? Shouldn't there a mount option analogous to
> test_dummy_encryption that causes all files to be auto-compressed, so that a
> full run of xfstests can be done with compression? I see "compress_extension",
> but apparently it's only for a file extension? Also, since reads can involve
> any combination of decryption, compression, and verity, it's important to test
> as many combinations as possible, including all at once. Has that been done?

This patch should be RFC which requires as many tests as possible. I posted it
quite early in order to get some reviews and feedback as well.

What I've done so far would look like:
- mkfs.f2fs -f -O encrypt -O quota -O compression -O extra_attr /dev/sdb1
- mount -t f2fs /dev/sdb1 /mnt/test
- mkdir /mnt/test/comp_dir
- f2fs_io setflags compression /mnt/test/comp_dir
- cd /mnt/test/comp_dir
- git clone kernel.git
- compile kernel
- or, fsstress on top of it

>
> I also tried running the fs-verity xfstests on this with
> 'kvm-xfstests -c f2fs -g verity', but the kernel immediately crashes:

I didn't check verity yet. I'll take a look at this soon.

>
> BUG: kernel NULL pointer dereference, address: 0000000000000182
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1-00119-g60f351f4c50f #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
> RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
> Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
> RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
> RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
> RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
> R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
> FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
> Call Trace:
> <IRQ>
> queue_work_on+0x67/0x70 kernel/workqueue.c:1518
> queue_work include/linux/workqueue.h:494 [inline]
> f2fs_enqueue_post_read_work fs/f2fs/data.c:166 [inline]
> bio_post_read_processing fs/f2fs/data.c:173 [inline]
> f2fs_read_end_io+0xcb/0xe0 fs/f2fs/data.c:195
> bio_endio+0xa4/0x1a0 block/bio.c:1818
> req_bio_endio block/blk-core.c:242 [inline]
> blk_update_request+0xf6/0x310 block/blk-core.c:1462
> blk_mq_end_request+0x1c/0x130 block/blk-mq.c:568
> virtblk_request_done+0x32/0x80 drivers/block/virtio_blk.c:226
> blk_done_softirq+0x98/0xc0 block/blk-softirq.c:37
> __do_softirq+0xc1/0x40d kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:373 [inline]
> irq_exit+0xb3/0xc0 kernel/softirq.c:413
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> do_IRQ+0x5b/0x110 arch/x86/kernel/irq.c:263
> common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
> </IRQ>
> RIP: 0010:native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
> RIP: 0010:arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
> RIP: 0010:default_idle+0x29/0x160 arch/x86/kernel/process.c:580
> Code: 90 55 48 89 e5 41 55 41 54 65 44 8b 25 70 64 76 7e 53 0f 1f 44 00 00 e8 95 13 88 ff e9 07 00 00 00 0f 00 2d 8b c0 b
> RSP: 0018:ffffc90000073e78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdc
> RAX: ffff88807d94e340 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffff88807d94e340
> RBP: ffffc90000073e90 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> R13: ffff88807d94e340 R14: 0000000000000000 R15: 0000000000000000
> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
> default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
> cpuidle_idle_call kernel/sched/idle.c:154 [inline]
> do_idle+0x1e4/0x210 kernel/sched/idle.c:263
> cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
> start_secondary+0x151/0x1a0 arch/x86/kernel/smpboot.c:264
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
> CR2: 0000000000000182
> ---[ end trace 86328090a3179142 ]---
> RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
> Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
> RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
> RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
> RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
> R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
> FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> Rebooting in 5 seconds..

2019-10-25 19:34:33

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

On 2019/10/23 13:24, Eric Biggers wrote:
> How was this tested? Shouldn't there a mount option analogous to

This should be a pre-RFC version..., I only didn't simple test on it, will do
more later with combination of other features.

Thanks,

2019-10-28 11:46:02

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:
> diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
> index 29020af0cff9..d1accf665c86 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -235,6 +235,13 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "en
> hide up to all remaining free space. The actual space that
> would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
> This space is reclaimed once checkpoint=enable.
> +compress_algorithm=%s Control compress algorithm, currently f2fs supports "lzo"
> + and "lz4" algorithm.
> +compress_log_size=%u Support configuring compress cluster size, the size will
> + be 4kb * (1 << %u), 16kb is minimum size, also it's
> + default size.

kb means kilobits, not kilobytes.

> +compress_extension=%s Support adding specified extension, so that f2fs can
> + enable compression on those corresponding file.

What does "Support adding specified extension" mean? And does "so that f2fs can
enable compression on those corresponding file" mean that f2fs can't enable
compression on other files? Please be clear about what this option does.

>
> ================================================================================
> DEBUGFS ENTRIES
> @@ -837,3 +844,44 @@ zero or random data, which is useful to the below scenario where:
> 4. address = fibmap(fd, offset)
> 5. open(blkdev)
> 6. write(blkdev, address)
> +
> +Compression implementation
> +--------------------------
> +
> +- New term named cluster is defined as basic unit of compression, file can
> +be divided into multiple clusters logically. One cluster includes 4 << n
> +(n >= 0) logical pages, compression size is also cluster size, each of
> +cluster can be compressed or not.
> +
> +- In cluster metadata layout, one special flag is used to indicate cluster
> +is compressed one or normal one, for compressed cluster, following metadata
> +maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
> +data including compress header and compressed data.

In the code it's actually a special block address, not a "flag".

> +
> +- In order to eliminate write amplification during overwrite, F2FS only
> +support compression on write-once file, data can be compressed only when
> +all logical blocks in file are valid and cluster compress ratio is lower
> +than specified threshold.
> +
> +- To enable compression on regular inode, there are three ways:
> +* chattr +c file
> +* chattr +c dir; touch dir/file
> +* mount w/ -o compress_extension=ext; touch file.ext
> +
> +Compress metadata layout:
> + [Dnode Structure]
> + +-----------------------------------------------+
> + | cluster 1 | cluster 2 | ......... | cluster N |
> + +-----------------------------------------------+
> + . . . .
> + . . . .
> + . Compressed Cluster . . Normal Cluster .
> ++----------+---------+---------+---------+ +---------+---------+---------+---------+
> +|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
> ++----------+---------+---------+---------+ +---------+---------+---------+---------+
> + . .
> + . .
> + . .
> + +-------------+-------------+----------+----------------------------+
> + | data length | data chksum | reserved | compressed data |
> + +-------------+-------------+----------+----------------------------+
> diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
> index 652fd2e2b23d..c12854c3b1a1 100644
> --- a/fs/f2fs/Kconfig
> +++ b/fs/f2fs/Kconfig
> @@ -6,6 +6,10 @@ config F2FS_FS
> select CRYPTO
> select CRYPTO_CRC32
> select F2FS_FS_XATTR if FS_ENCRYPTION
> + select LZO_COMPRESS
> + select LZO_DECOMPRESS
> + select LZ4_COMPRESS
> + select LZ4_DECOMPRESS

As someone else suggested, there's not much need to support LZO, since LZ4 is
usually better. Also, compression support should be behind a kconfig option, so
it doesn't cause bloat or extra attack surface for people who don't want it.

> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> new file mode 100644
> index 000000000000..f276d82a67aa
> --- /dev/null
> +++ b/fs/f2fs/compress.c
> @@ -0,0 +1,1066 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * f2fs compress support
> + *
> + * Copyright (c) 2019 Chao Yu <[email protected]>
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/f2fs_fs.h>
> +#include <linux/writeback.h>
> +#include <linux/lzo.h>
> +#include <linux/lz4.h>
> +
> +#include "f2fs.h"
> +#include "node.h"
> +#include <trace/events/f2fs.h>
> +
> +struct f2fs_compress_ops {
> + int (*init_compress_ctx)(struct compress_ctx *cc);
> + void (*destroy_compress_ctx)(struct compress_ctx *cc);
> + int (*compress_pages)(struct compress_ctx *cc);
> + int (*decompress_pages)(struct decompress_io_ctx *dic);
> +};
> +
> +static unsigned int offset_in_cluster(struct compress_ctx *cc, pgoff_t index)
> +{
> + return index % cc->cluster_size;
> +}
> +
> +static unsigned int cluster_idx(struct compress_ctx *cc, pgoff_t index)
> +{
> + return index / cc->cluster_size;
> +}

% and / are slow on values that aren't power-of-2 constants. Since cluster_size
is always a power of 2, how about also keeping cluster_size_bits and doing:

index & (cc->cluster_size - 1)

and
index >> cc->cluster_size_bits

> +
> +static unsigned int start_idx_of_cluster(struct compress_ctx *cc)
> +{
> + return cc->cluster_idx * cc->cluster_size;
> +}

and here:

cc->cluster_idx << cc->cluster_size_bits

> +bool f2fs_is_compressed_page(struct page *page)
> +{
> + if (!page_private(page))
> + return false;
> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
> + return false;
> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
> +}

This code implies that there can be multiple page private structures each of
which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
Where in the code is the other one(s)?

> +
> +static void f2fs_set_compressed_page(struct page *page,
> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
> +{
> + SetPagePrivate(page);
> + set_page_private(page, (unsigned long)data);
> +
> + /* i_crypto_info and iv index */
> + page->index = index;
> + page->mapping = inode->i_mapping;
> + if (r)
> + refcount_inc(r);
> +}

It isn't really appropriate to create fake pagecache pages like this. Did you
consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?

> +
> +static void f2fs_put_compressed_page(struct page *page)
> +{
> + set_page_private(page, (unsigned long)NULL);
> + ClearPagePrivate(page);
> + page->mapping = NULL;
> + unlock_page(page);
> + put_page(page);
> +}
> +
> +struct page *f2fs_compress_control_page(struct page *page)
> +{
> + return ((struct compress_io_ctx *)page_private(page))->rpages[0];
> +}
> +
> +int f2fs_init_compress_ctx(struct compress_ctx *cc)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
> +
> + if (cc->rpages)
> + return 0;
> + cc->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) * cc->cluster_size,
> + GFP_KERNEL);
> + if (!cc->rpages)
> + return -ENOMEM;
> + return 0;
> +}

Is it really okay to be using GFP_KERNEL from ->writepages()?

> +
> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> +{
> + kvfree(cc->rpages);
> +}

The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
just kfree()?

> +
> +int f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page)
> +{
> + unsigned int cluster_ofs;
> +
> + if (!f2fs_cluster_can_merge_page(cc, page->index))
> + return -EAGAIN;

All callers do f2fs_bug_on() if this error is hit, so why not do the
f2fs_bug_on() here instead?

> +
> + cluster_ofs = offset_in_cluster(cc, page->index);
> + cc->rpages[cluster_ofs] = page;
> + cc->nr_rpages++;
> + cc->cluster_idx = cluster_idx(cc, page->index);
> + return 0;
> +}
> +
> +static int lzo_init_compress_ctx(struct compress_ctx *cc)
> +{
> + cc->private = f2fs_kvmalloc(F2FS_I_SB(cc->inode),
> + LZO1X_MEM_COMPRESS, GFP_KERNEL);
> + if (!cc->private)
> + return -ENOMEM;
> +
> + cc->clen = lzo1x_worst_compress(PAGE_SIZE * cc->cluster_size);
> + return 0;
> +}
> +
> +static void lzo_destroy_compress_ctx(struct compress_ctx *cc)
> +{
> + kvfree(cc->private);
> + cc->private = NULL;
> +}
> +
> +static int lzo_compress_pages(struct compress_ctx *cc)
> +{
> + int ret;
> +
> + ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata,
> + &cc->clen, cc->private);
> + if (ret != LZO_E_OK) {
> + printk_ratelimited("%sF2FS-fs: lzo compress failed, ret:%d\n",
> + KERN_ERR, ret);
> + return -EIO;
> + }
> + return 0;
> +}

Why not using f2fs_err()? Same in lots of other places.

> +
> +static int lzo_decompress_pages(struct decompress_io_ctx *dic)
> +{
> + int ret;
> +
> + ret = lzo1x_decompress_safe(dic->cbuf->cdata, dic->clen,
> + dic->rbuf, &dic->rlen);
> + if (ret != LZO_E_OK) {
> + printk_ratelimited("%sF2FS-fs: lzo decompress failed, ret:%d\n",
> + KERN_ERR, ret);
> + return -EIO;
> + }
> +
> + if (dic->rlen != PAGE_SIZE * dic->cluster_size) {
> + printk_ratelimited("%sF2FS-fs: lzo invalid rlen:%zu, "
> + "expected:%lu\n", KERN_ERR, dic->rlen,
> + PAGE_SIZE * dic->cluster_size);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +static const struct f2fs_compress_ops f2fs_lzo_ops = {
> + .init_compress_ctx = lzo_init_compress_ctx,
> + .destroy_compress_ctx = lzo_destroy_compress_ctx,
> + .compress_pages = lzo_compress_pages,
> + .decompress_pages = lzo_decompress_pages,
> +};
> +
> +static int lz4_init_compress_ctx(struct compress_ctx *cc)
> +{
> + cc->private = f2fs_kvmalloc(F2FS_I_SB(cc->inode),
> + LZO1X_MEM_COMPRESS, GFP_KERNEL);

Why is it using LZO1X_MEM_COMPRESS for LZ4?

> + if (!cc->private)
> + return -ENOMEM;
> +
> + cc->clen = LZ4_compressBound(PAGE_SIZE * cc->cluster_size);
> + return 0;
> +}
> +
> +static void lz4_destroy_compress_ctx(struct compress_ctx *cc)
> +{
> + kvfree(cc->private);
> + cc->private = NULL;
> +}
> +
> +static int lz4_compress_pages(struct compress_ctx *cc)
> +{
> + int len;
> +
> + len = LZ4_compress_default(cc->rbuf, cc->cbuf->cdata, cc->rlen,
> + cc->clen, cc->private);
> + if (!len) {
> + printk_ratelimited("%sF2FS-fs: lz4 compress failed\n",
> + KERN_ERR);
> + return -EIO;
> + }
> + cc->clen = len;
> + return 0;
> +}
> +
> +static int lz4_decompress_pages(struct decompress_io_ctx *dic)
> +{
> + int ret;
> +
> + ret = LZ4_decompress_safe(dic->cbuf->cdata, dic->rbuf,
> + dic->clen, dic->rlen);
> + if (ret < 0) {
> + printk_ratelimited("%sF2FS-fs: lz4 decompress failed, ret:%d\n",
> + KERN_ERR, ret);
> + return -EIO;
> + }
> +
> + if (ret != PAGE_SIZE * dic->cluster_size) {
> + printk_ratelimited("%sF2FS-fs: lz4 invalid rlen:%zu, "
> + "expected:%lu\n", KERN_ERR, dic->rlen,
> + PAGE_SIZE * dic->cluster_size);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +static const struct f2fs_compress_ops f2fs_lz4_ops = {
> + .init_compress_ctx = lz4_init_compress_ctx,
> + .destroy_compress_ctx = lz4_destroy_compress_ctx,
> + .compress_pages = lz4_compress_pages,
> + .decompress_pages = lz4_decompress_pages,
> +};
> +
> +static void f2fs_release_cluster_pages(struct compress_ctx *cc)
> +{
> + int i;
> +
> + for (i = 0; i < cc->nr_rpages; i++) {
> + inode_dec_dirty_pages(cc->inode);
> + unlock_page(cc->rpages[i]);
> + }
> +}
> +
> +static struct page *f2fs_grab_page(void)
> +{
> + struct page *page;
> +
> + page = alloc_pages(GFP_KERNEL, 0);

This should use alloc_page(), not alloc_pages().

> + if (!page)
> + return NULL;
> + lock_page(page);
> + return page;
> +}
> +
> +static int f2fs_compress_pages(struct compress_ctx *cc)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
> + struct f2fs_inode_info *fi = F2FS_I(cc->inode);
> + const struct f2fs_compress_ops *cops =
> + sbi->cops[fi->i_compress_algorithm];
> + unsigned int max_len, nr_cpages;
> + int i, ret;
> +
> + trace_f2fs_compress_pages_start(cc->inode, cc->cluster_idx,
> + cc->cluster_size, fi->i_compress_algorithm);
> +
> + ret = cops->init_compress_ctx(cc);
> + if (ret)
> + goto out;
> +
> + max_len = COMPRESS_HEADER_SIZE + cc->clen;
> + cc->nr_cpages = roundup(max_len, PAGE_SIZE) / PAGE_SIZE;
> +
> + cc->cpages = f2fs_kzalloc(sbi, sizeof(struct page *) *
> + cc->nr_cpages, GFP_KERNEL);
> + if (!cc->cpages) {
> + ret = -ENOMEM;
> + goto destroy_compress_ctx;
> + }
> +
> + for (i = 0; i < cc->nr_cpages; i++) {
> + cc->cpages[i] = f2fs_grab_page();
> + if (!cc->cpages[i]) {
> + ret = -ENOMEM;
> + goto out_free_cpages;
> + }

If this fails, then at out_free_cpages it will dereference a NULL pointer in
cc->cpages[i].

> + }
> +
> + cc->rbuf = vmap(cc->rpages, cc->cluster_size, VM_MAP, PAGE_KERNEL);
> + if (!cc->rbuf) {
> + ret = -ENOMEM;
> + goto destroy_compress_ctx;
> + }

Wrong error label. Should be out_free_cpages.

> +
> + cc->cbuf = vmap(cc->cpages, cc->nr_cpages, VM_MAP, PAGE_KERNEL);
> + if (!cc->cbuf) {
> + ret = -ENOMEM;
> + goto out_vunmap_rbuf;
> + }

It would be sufficient to map these pages read-only, i.e. use PAGE_KERNEL_RO.

> +
> + ret = cops->compress_pages(cc);
> + if (ret)
> + goto out_vunmap_cbuf;
> +
> + max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
> +
> + if (cc->clen > max_len) {
> + ret = -EAGAIN;
> + goto out_vunmap_cbuf;
> + }

Since we already know the max length we're willing to compress to (the max
length for any space to be saved), why is more space than that being allocated?
LZ4_compress_default() will return an error if there isn't enough space, so that
error could just be used as the indication to store the data uncompressed.

> +
> + cc->cbuf->clen = cpu_to_le32(cc->clen);
> + cc->cbuf->chksum = 0;

What is the point of the chksum field? It's always set to 0 and never checked.

> +
> + vunmap(cc->cbuf);
> + vunmap(cc->rbuf);
> +
> + nr_cpages = roundup(cc->clen + COMPRESS_HEADER_SIZE, PAGE_SIZE) /
> + PAGE_SIZE;
> +
> + for (i = nr_cpages; i < cc->nr_cpages; i++) {
> + f2fs_put_compressed_page(cc->cpages[i]);
> + cc->cpages[i] = NULL;
> + }
> +
> + cc->nr_cpages = nr_cpages;
> +
> + trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx,
> + cc->clen, ret);
> + return 0;
> +out_vunmap_cbuf:
> + vunmap(cc->cbuf);
> +out_vunmap_rbuf:
> + vunmap(cc->rbuf);
> +out_free_cpages:
> + for (i = 0; i < cc->nr_cpages; i++)
> + f2fs_put_compressed_page(cc->cpages[i]);
> + kvfree(cc->cpages);
> + cc->cpages = NULL;
> +destroy_compress_ctx:
> + cops->destroy_compress_ctx(cc);
> +out:
> + trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx,
> + cc->clen, ret);
> + return ret;
> +}
> +
> +void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
> +{
> + struct decompress_io_ctx *dic =
> + (struct decompress_io_ctx *)page_private(page);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(dic->inode);
> + struct f2fs_inode_info *fi= F2FS_I(dic->inode);
> + const struct f2fs_compress_ops *cops =
> + sbi->cops[fi->i_compress_algorithm];

Where is it checked that i_compress_algorithm is a valid compression algorithm?

> + int ret;
> +
> + dec_page_count(sbi, F2FS_RD_DATA);
> +
> + if (bio->bi_status)
> + dic->err = true;
> +
> + if (refcount_dec_not_one(&dic->ref))
> + return;
> +
> + trace_f2fs_decompress_pages_start(dic->inode, dic->cluster_idx,
> + dic->cluster_size, fi->i_compress_algorithm);
> +
> + /* submit partial compressed pages */
> + if (dic->err) {
> + ret = dic->err;

This sets 'ret' to a bool, whereas elsewhere it's set to a negative error value.
Which one is it?

> + goto out_free_dic;
> + }
> +
> + dic->rbuf = vmap(dic->tpages, dic->cluster_size, VM_MAP, PAGE_KERNEL);
> + if (!dic->rbuf) {
> + ret = -ENOMEM;
> + goto out_free_dic;
> + }
> +
> + dic->cbuf = vmap(dic->cpages, dic->nr_cpages, VM_MAP, PAGE_KERNEL);
> + if (!dic->cbuf) {
> + ret = -ENOMEM;
> + goto out_vunmap_rbuf;
> + }

It would be sufficient to map the source pages read-only.

> +
> + dic->clen = le32_to_cpu(dic->cbuf->clen);
> + dic->rlen = PAGE_SIZE * dic->cluster_size;

Shouldn't it also be verified that the reserved header fields are 0?
Otherwise, it may be difficult to use them for anything in the future.

> +
> + if (dic->clen > PAGE_SIZE * dic->nr_cpages - COMPRESS_HEADER_SIZE) {
> + ret = -EFAULT;
> + goto out_vunmap_cbuf;
> + }

EFAULT isn't an appropriate error code for corrupt on-disk data. It should be
EFSCORRUPTED.

> +
> + ret = cops->decompress_pages(dic);
> +
> +out_vunmap_cbuf:
> + vunmap(dic->cbuf);
> +out_vunmap_rbuf:
> + vunmap(dic->rbuf);
> +out_free_dic:
> + f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);

This is passing a -errno value to a function that takes a bool.

> + f2fs_free_dic(dic);
> +
> + trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
> + dic->clen, ret);

This is freeing 'dic' and then immediately using it again...

> +
> +static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
> +{
> + if (cc->cluster_idx == NULL_CLUSTER)
> + return true;
> + return cc->cluster_idx == cluster_idx(cc, index);
> +}
> +
> +bool f2fs_cluster_is_empty(struct compress_ctx *cc)
> +{
> + return cc->nr_rpages == 0;
> +}
> +
> +static bool f2fs_cluster_is_full(struct compress_ctx *cc)
> +{
> + return cc->cluster_size == cc->nr_rpages;
> +}
> +
> +bool f2fs_cluster_can_merge_page(struct compress_ctx *cc, pgoff_t index)
> +{
> + if (f2fs_cluster_is_empty(cc))
> + return true;
> + if (f2fs_cluster_is_full(cc))
> + return false;
> + return is_page_in_cluster(cc, index);
> +}

Why is the f2fs_cluster_is_full() check needed in f2fs_cluster_can_merge_page()?
If all pages of the cluster have already been added, then the next one can't be
in the same cluster.

> +
> +static bool __cluster_may_compress(struct compress_ctx *cc)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
> + loff_t i_size = i_size_read(cc->inode);
> + const pgoff_t end_index = ((unsigned long long)i_size)
> + >> PAGE_SHIFT;
> + unsigned offset;
> + int i;
> +
> + for (i = 0; i < cc->cluster_size; i++) {
> + struct page *page = cc->rpages[i];
> +
> + f2fs_bug_on(sbi, !page);
> +
> + if (unlikely(f2fs_cp_error(sbi)))
> + return false;
> + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> + return false;
> + if (f2fs_is_drop_cache(cc->inode))
> + return false;
> + if (f2fs_is_volatile_file(cc->inode))
> + return false;
> +
> + offset = i_size & (PAGE_SIZE - 1);
> + if ((page->index > end_index) ||
> + (page->index == end_index && !offset))
> + return false;

No need to have a special case for when i_size is a multiple of the page size.
Just replace end_index with 'nr_pages = DIV_ROUND_UP(i_size, PAGE_SIZE)' and
check for page->index >= nr_pages.

> + }
> + return true;
> +}
> +
> +int f2fs_is_cluster_existed(struct compress_ctx *cc)
> +{

This function name doesn't make sense. "is" is present tense whereas "existed"
is past tense. Also, the name implies it returns a bool, whereas actually it
returns a negative errno value, 1, or 2.

> +out_fail:
> + /* TODO: revoke partially updated block addresses */
> + for (i += 1; i < cc->cluster_size; i++) {
> + if (!cc->rpages[i])
> + continue;
> + redirty_page_for_writepage(wbc, cc->rpages[i]);
> + unlock_page(cc->rpages[i]);
> + }
> + return err;

Un-addressed TODO.

> +static void f2fs_init_compress_ops(struct f2fs_sb_info *sbi)
> +{
> + sbi->cops[COMPRESS_LZO] = &f2fs_lzo_ops;
> + sbi->cops[COMPRESS_LZ4] = &f2fs_lz4_ops;
> +}

Why are the compression operations a per-superblock thing? Seems this should be
a global table.

> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index ba3bcf4c7889..bac96c3a8bc9 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -41,6 +41,9 @@ static bool __is_cp_guaranteed(struct page *page)
> if (!mapping)
> return false;
>
> + if (f2fs_is_compressed_page(page))
> + return false;
> +
> inode = mapping->host;
> sbi = F2FS_I_SB(inode);
>
> @@ -73,19 +76,19 @@ static enum count_type __read_io_type(struct page *page)
>
> /* postprocessing steps for read bios */
> enum bio_post_read_step {
> - STEP_INITIAL = 0,
> STEP_DECRYPT,
> + STEP_DECOMPRESS,
> STEP_VERITY,
> };
>
> struct bio_post_read_ctx {
> struct bio *bio;
> + struct f2fs_sb_info *sbi;
> struct work_struct work;
> - unsigned int cur_step;
> unsigned int enabled_steps;
> };
>
> -static void __read_end_io(struct bio *bio)
> +static void __read_end_io(struct bio *bio, bool compr, bool verity)
> {
> struct page *page;
> struct bio_vec *bv;
> @@ -94,6 +97,11 @@ static void __read_end_io(struct bio *bio)
> bio_for_each_segment_all(bv, bio, iter_all) {
> page = bv->bv_page;
>
> + if (compr && PagePrivate(page)) {
> + f2fs_decompress_pages(bio, page, verity);
> + continue;
> + }
> +
> /* PG_error was set if any post_read step failed */
> if (bio->bi_status || PageError(page)) {
> ClearPageUptodate(page);
> @@ -110,60 +118,67 @@ static void __read_end_io(struct bio *bio)
> bio_put(bio);
> }
>
> +static void f2fs_decompress_bio(struct bio *bio, bool verity)
> +{
> + __read_end_io(bio, true, verity);
> +}
> +
> static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
>
> -static void decrypt_work(struct work_struct *work)
> +static void decrypt_work(struct bio_post_read_ctx *ctx)
> {
> - struct bio_post_read_ctx *ctx =
> - container_of(work, struct bio_post_read_ctx, work);
> -
> fscrypt_decrypt_bio(ctx->bio);
> +}
> +
> +static void decompress_work(struct bio_post_read_ctx *ctx, bool verity)
> +{
> + f2fs_decompress_bio(ctx->bio, verity);
> +}
>
> - bio_post_read_processing(ctx);
> +static void verity_work(struct bio_post_read_ctx *ctx)
> +{
> + fsverity_verify_bio(ctx->bio);
> }
>
> -static void verity_work(struct work_struct *work)
> +static void f2fs_post_read_work(struct work_struct *work)
> {
> struct bio_post_read_ctx *ctx =
> container_of(work, struct bio_post_read_ctx, work);
>
> - fsverity_verify_bio(ctx->bio);
> + if (ctx->enabled_steps & (1 << STEP_DECRYPT))
> + decrypt_work(ctx);
>
> - bio_post_read_processing(ctx);
> + if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> + decompress_work(ctx,
> + ctx->enabled_steps & (1 << STEP_VERITY));
> + return;
> + }
> +
> + if (ctx->enabled_steps & (1 << STEP_VERITY))
> + verity_work(ctx);
> +
> + __read_end_io(ctx->bio, false, false);
> +}
> +
> +static void f2fs_enqueue_post_read_work(struct f2fs_sb_info *sbi,
> + struct work_struct *work)
> +{
> + queue_work(sbi->post_read_wq, work);
> }
>
> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> {
> - /*
> - * We use different work queues for decryption and for verity because
> - * verity may require reading metadata pages that need decryption, and
> - * we shouldn't recurse to the same workqueue.
> - */

Why is it okay (i.e., no deadlocks) to no longer use different work queues for
decryption and for verity? See the comment above which is being deleted.

> + bio = f2fs_grab_read_bio(inode, blkaddr, nr_pages,
> + is_readahead ? REQ_RAHEAD : 0,
> + page->index);
> + if (IS_ERR(bio)) {
> + ret = PTR_ERR(bio);
> + bio = NULL;
> + dic->err = true;

'err' conventionally means a -errno value. Please call this 'failed' instead.

> + /* TODO: cluster can be compressed due to race with .writepage */
> +

Another un-addressed TODO.

> +int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
> +{
> + if (!f2fs_sb_has_encrypt(sbi) &&
> + !f2fs_sb_has_compression(sbi))
> + return 0;
> +
> + sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
> + WQ_UNBOUND | WQ_HIGHPRI,
> + num_online_cpus());

post_read_wq is also needed if verity is enabled.

> +/* For compression */
> +enum compress_algrithm_type {
> + COMPRESS_LZO,
> + COMPRESS_LZ4,
> + COMPRESS_MAX,
> +};

"algorithm" is misspelled.

> +
> +struct compress_data {
> + __le32 clen;
> + __le32 chksum;
> + __le32 reserved[4];
> + char cdata[];
> +};

cdata is binary, not a string. So it should be 'u8', not 'char'.

> +
> +struct compress_ctx {
> + struct inode *inode;
> + unsigned int cluster_size;
> + unsigned int cluster_idx;
> + struct page **rpages;
> + unsigned int nr_rpages;
> + struct page **cpages;
> + unsigned int nr_cpages;
> + void *rbuf;
> + struct compress_data *cbuf;
> + size_t rlen;
> + size_t clen;
> + void *private;
> +};
> +
> +#define F2FS_COMPRESSED_PAGE_MAGIC 0xF5F2C000
> +struct compress_io_ctx {
> + u32 magic;
> + struct inode *inode;
> + refcount_t ref;
> + struct page **rpages;
> + unsigned int nr_rpages;
> +};
> +
> +struct decompress_io_ctx {
> + struct inode *inode;
> + refcount_t ref;
> + struct page **rpages; /* raw pages from page cache */
> + unsigned int nr_rpages;
> + struct page **cpages; /* pages contain compressed data */
> + unsigned int nr_cpages;
> + struct page **tpages; /* temp pages to pad hole in cluster */
> + void *rbuf;
> + struct compress_data *cbuf;
> + size_t rlen;
> + size_t clen;
> + unsigned int cluster_idx;
> + unsigned int cluster_size;
> + bool err;
> +};

Please add comments properly documenting these structures.

> struct f2fs_private_dio {
> @@ -2375,6 +2473,8 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> /*
> * On-disk inode flags (f2fs_inode::i_flags)
> */
> +#define F2FS_COMPR_FL 0x00000004 /* Compress file */
> +#define F2FS_NOCOMP_FL 0x00000400 /* Don't compress */
> #define F2FS_SYNC_FL 0x00000008 /* Synchronous updates */
> #define F2FS_IMMUTABLE_FL 0x00000010 /* Immutable file */
> #define F2FS_APPEND_FL 0x00000020 /* writes to file may only append */

Please keep these in numerical order.

> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 386ad54c13c3..e84ef90ffdee 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -407,6 +407,20 @@ static int do_read_inode(struct inode *inode)
> fi->i_crtime.tv_nsec = le32_to_cpu(ri->i_crtime_nsec);
> }
>
> + if (f2fs_has_extra_attr(inode) && f2fs_sb_has_compression(sbi)) {
> + if (F2FS_FITS_IN_INODE(ri, fi->i_extra_isize,
> + i_log_cluster_size)) {
> + fi->i_compressed_blocks =
> + le64_to_cpu(ri->i_compressed_blocks);
> + fi->i_compress_algorithm = ri->i_compress_algorithm;
> + fi->i_log_cluster_size = ri->i_log_cluster_size;
> + fi->i_cluster_size = 1 << fi->i_log_cluster_size;
> + }
> +
> + if ((fi->i_flags & FS_COMPR_FL) && f2fs_may_compress(inode))
> + set_inode_flag(inode, FI_COMPRESSED_FILE);
> + }

Need to validate that these fields are valid.

> @@ -119,6 +119,20 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
> if (F2FS_I(inode)->i_flags & F2FS_PROJINHERIT_FL)
> set_inode_flag(inode, FI_PROJ_INHERIT);
>
> + if (f2fs_sb_has_compression(sbi)) {
> + F2FS_I(inode)->i_compress_algorithm =
> + F2FS_OPTION(sbi).compress_algorithm;
> + F2FS_I(inode)->i_log_cluster_size =
> + F2FS_OPTION(sbi).compress_log_size;
> + F2FS_I(inode)->i_cluster_size =
> + 1 << F2FS_I(inode)->i_log_cluster_size;
> +

Why are these compression fields being set on uncompressed files?

> @@ -810,6 +817,66 @@ static int parse_options(struct super_block *sb, char *options)
> case Opt_checkpoint_enable:
> clear_opt(sbi, DISABLE_CHECKPOINT);
> break;
> + case Opt_compress_algorithm:
> + if (!f2fs_sb_has_compression(sbi)) {
> + f2fs_err(sbi, "Compression feature if off");
> + return -EINVAL;

"if off" => "is off"

> + }
> + name = match_strdup(&args[0]);
> + if (!name)
> + return -ENOMEM;
> + if (strlen(name) == 3 && !strncmp(name, "lzo", 3)) {

!strcmp(name, "lzo")

> + F2FS_OPTION(sbi).compress_algorithm =
> + COMPRESS_LZO;
> + } else if (strlen(name) == 3 &&
> + !strncmp(name, "lz4", 3)) {

!strcmp(name, "lz4")

> + F2FS_OPTION(sbi).compress_algorithm =
> + COMPRESS_LZ4;
> + } else {
> + kvfree(name);

Why not kfree()?

> + return -EINVAL;
> + }
> + kvfree(name);
> + case Opt_compress_log_size:
> + if (!f2fs_sb_has_compression(sbi)) {
> + f2fs_err(sbi, "Compression feature if off");
> + return -EINVAL;
> + }

"if off" => "is off"

> + if (args->from && match_int(args, &arg))
> + return -EINVAL;
> + if (arg < MIN_COMPRESS_LOG_SIZE ||
> + arg > MAX_COMPRESS_LOG_SIZE) {
> + f2fs_err(sbi,
> + "Compress cluster log size if out of range");

"if out of range" => "is out of range"

> + return -EINVAL;
> + }
> + F2FS_OPTION(sbi).compress_log_size = arg;
> + break;
> + case Opt_compress_extension:
> + if (!f2fs_sb_has_compression(sbi)) {
> + f2fs_err(sbi, "Compression feature if off");

"if off" => "is off"

> + return -EINVAL;
> + }
> + name = match_strdup(&args[0]);
> + if (!name)
> + return -ENOMEM;
> +
> + ext = F2FS_OPTION(sbi).extensions;
> + ext_cnt = F2FS_OPTION(sbi).compress_ext_cnt;
> +
> + if (strlen(name) >= F2FS_EXTENSION_LEN ||
> + ext_cnt >= COMPRESS_EXT_NUM) {
> + f2fs_err(sbi,
> + "invalid extension length/number");
> + kvfree(name);
> + return -EINVAL;
> + }
> +
> + strcpy(ext[ext_cnt], name);
> + F2FS_OPTION(sbi).compress_ext_cnt++;
> + kvfree(name);

- Eric

2019-10-28 12:49:12

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

Eric, Jaegeuk,

On 2019/10/28 6:50, Eric Biggers wrote:
> On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:

Let me update the patch according to comments.

Thanks,

2019-10-29 08:37:24

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

On 2019/10/28 6:50, Eric Biggers wrote:
>> +bool f2fs_is_compressed_page(struct page *page)
>> +{
>> + if (!page_private(page))
>> + return false;
>> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
>> + return false;
>> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
>> +}
>
> This code implies that there can be multiple page private structures each of
> which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
> Where in the code is the other one(s)?

I'm not sure I understood you correctly, did you mean it needs to introduce
f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
f2fs_is_compressed_page()?

>
>> +
>> +static void f2fs_set_compressed_page(struct page *page,
>> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
>> +{
>> + SetPagePrivate(page);
>> + set_page_private(page, (unsigned long)data);
>> +
>> + /* i_crypto_info and iv index */
>> + page->index = index;
>> + page->mapping = inode->i_mapping;
>> + if (r)
>> + refcount_inc(r);
>> +}
>
> It isn't really appropriate to create fake pagecache pages like this. Did you
> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?

We need to store i_crypto_info and iv index somewhere, in order to pass them to
fscrypt_decrypt_block_inplace(), where did you suggest to store them?

>> +
>> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
>> +{
>> + kvfree(cc->rpages);
>> +}
>
> The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
> just kfree()?

It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
kmalloc() failed.

>> +static int lzo_compress_pages(struct compress_ctx *cc)
>> +{
>> + int ret;
>> +
>> + ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata,
>> + &cc->clen, cc->private);
>> + if (ret != LZO_E_OK) {
>> + printk_ratelimited("%sF2FS-fs: lzo compress failed, ret:%d\n",
>> + KERN_ERR, ret);
>> + return -EIO;
>> + }
>> + return 0;
>> +}
>
> Why not using f2fs_err()? Same in lots of other places.

We use printk_ratelimited at some points where we can afford to lose logs,
otherwise we use f2fs_{err,warn...} to record info as much as possible for
troubleshoot.

>> +
>> + ret = cops->compress_pages(cc);
>> + if (ret)
>> + goto out_vunmap_cbuf;
>> +
>> + max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
>> +
>> + if (cc->clen > max_len) {
>> + ret = -EAGAIN;
>> + goto out_vunmap_cbuf;
>> + }
>
> Since we already know the max length we're willing to compress to (the max
> length for any space to be saved), why is more space than that being allocated?
> LZ4_compress_default() will return an error if there isn't enough space, so that
> error could just be used as the indication to store the data uncompressed.

AFAIK, there is no such common error code returned from all compression
algorithms indicating there is no room for limited target size, however we need
that information to fallback to write raw pages. Any better idea?

>
>> +
>> + cc->cbuf->clen = cpu_to_le32(cc->clen);
>> + cc->cbuf->chksum = 0;
>
> What is the point of the chksum field? It's always set to 0 and never checked.

When I written initial codes, I doubt that I may lose to check some SPO corner
cases, in where we missed to write whole cluster, so I added that to help to
recall that case, however I didn't have time to cover those cases, resulting
leaving unfinished code there... :(, I'm okay to delete it in a formal version.

BTW, for data checksum feature, I guess we need to reconstruct dnode layout to
cover both compressed/non-compressed data.

>
>> +
>> +static bool __cluster_may_compress(struct compress_ctx *cc)
>> +{
>> + struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
>> + loff_t i_size = i_size_read(cc->inode);
>> + const pgoff_t end_index = ((unsigned long long)i_size)
>> + >> PAGE_SHIFT;
>> + unsigned offset;
>> + int i;
>> +
>> + for (i = 0; i < cc->cluster_size; i++) {
>> + struct page *page = cc->rpages[i];
>> +
>> + f2fs_bug_on(sbi, !page);
>> +
>> + if (unlikely(f2fs_cp_error(sbi)))
>> + return false;
>> + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>> + return false;
>> + if (f2fs_is_drop_cache(cc->inode))
>> + return false;
>> + if (f2fs_is_volatile_file(cc->inode))
>> + return false;
>> +
>> + offset = i_size & (PAGE_SIZE - 1);
>> + if ((page->index > end_index) ||
>> + (page->index == end_index && !offset))
>> + return false;
>
> No need to have a special case for when i_size is a multiple of the page size.
> Just replace end_index with 'nr_pages = DIV_ROUND_UP(i_size, PAGE_SIZE)' and
> check for page->index >= nr_pages.

That is copied from f2fs_write_data_page(), let's clean up in a separated patch.

>
>> +out_fail:
>> + /* TODO: revoke partially updated block addresses */
>> + for (i += 1; i < cc->cluster_size; i++) {
>> + if (!cc->rpages[i])
>> + continue;
>> + redirty_page_for_writepage(wbc, cc->rpages[i]);
>> + unlock_page(cc->rpages[i]);
>> + }
>> + return err;
>
> Un-addressed TODO.

Will fix a little later.

>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>> {
>> - /*
>> - * We use different work queues for decryption and for verity because
>> - * verity may require reading metadata pages that need decryption, and
>> - * we shouldn't recurse to the same workqueue.
>> - */
>
> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> decryption and for verity? See the comment above which is being deleted.

Could you explain more about how deadlock happen? or share me a link address if
you have described that case somewhere?

>
>> + /* TODO: cluster can be compressed due to race with .writepage */
>> +
>
> Another un-addressed TODO.

Will fix a little later.

>
>> +int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
>> +{
>> + if (!f2fs_sb_has_encrypt(sbi) &&
>> + !f2fs_sb_has_compression(sbi))
>> + return 0;
>> +
>> + sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
>> + WQ_UNBOUND | WQ_HIGHPRI,
>> + num_online_cpus());
>
> post_read_wq is also needed if verity is enabled.

Yes, we missed this as verity was not merged when implementing this....

Thanks,

2019-10-30 02:59:15

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
> On 2019/10/28 6:50, Eric Biggers wrote:
> >> +bool f2fs_is_compressed_page(struct page *page)
> >> +{
> >> + if (!page_private(page))
> >> + return false;
> >> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
> >> + return false;
> >> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
> >> +}
> >
> > This code implies that there can be multiple page private structures each of
> > which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
> > Where in the code is the other one(s)?
>
> I'm not sure I understood you correctly, did you mean it needs to introduce
> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
> f2fs_is_compressed_page()?
>

No, I'm asking what is the case where the line

*((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC

returns false?

> >
> >> +
> >> +static void f2fs_set_compressed_page(struct page *page,
> >> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
> >> +{
> >> + SetPagePrivate(page);
> >> + set_page_private(page, (unsigned long)data);
> >> +
> >> + /* i_crypto_info and iv index */
> >> + page->index = index;
> >> + page->mapping = inode->i_mapping;
> >> + if (r)
> >> + refcount_inc(r);
> >> +}
> >
> > It isn't really appropriate to create fake pagecache pages like this. Did you
> > consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
>
> We need to store i_crypto_info and iv index somewhere, in order to pass them to
> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
>

The same place where the pages are stored.

> >> +
> >> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> >> +{
> >> + kvfree(cc->rpages);
> >> +}
> >
> > The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
> > just kfree()?
>
> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
> kmalloc() failed.

This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
kvmalloc(). As per its name, it should only use kmalloc(). f2fs_kvmalloc()
already exists, so it can be used when the fallback is wanted.

>
> >> +static int lzo_compress_pages(struct compress_ctx *cc)
> >> +{
> >> + int ret;
> >> +
> >> + ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata,
> >> + &cc->clen, cc->private);
> >> + if (ret != LZO_E_OK) {
> >> + printk_ratelimited("%sF2FS-fs: lzo compress failed, ret:%d\n",
> >> + KERN_ERR, ret);
> >> + return -EIO;
> >> + }
> >> + return 0;
> >> +}
> >
> > Why not using f2fs_err()? Same in lots of other places.
>
> We use printk_ratelimited at some points where we can afford to lose logs,
> otherwise we use f2fs_{err,warn...} to record info as much as possible for
> troubleshoot.
>

It used to be the case that f2fs_msg() was ratelimited. What stops it from
spamming the logs now?

The problem with a bare printk is that it doesn't show which filesystem instance
the message is coming from.

> >> +
> >> + ret = cops->compress_pages(cc);
> >> + if (ret)
> >> + goto out_vunmap_cbuf;
> >> +
> >> + max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
> >> +
> >> + if (cc->clen > max_len) {
> >> + ret = -EAGAIN;
> >> + goto out_vunmap_cbuf;
> >> + }
> >
> > Since we already know the max length we're willing to compress to (the max
> > length for any space to be saved), why is more space than that being allocated?
> > LZ4_compress_default() will return an error if there isn't enough space, so that
> > error could just be used as the indication to store the data uncompressed.
>
> AFAIK, there is no such common error code returned from all compression
> algorithms indicating there is no room for limited target size, however we need
> that information to fallback to write raw pages. Any better idea?
>

"Not enough room" is the only reasonable way for compression to fail, so all
that's needed is the ability for compression to report errors at all. What
specifically prevents this approach from working?

> >> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> >> {
> >> - /*
> >> - * We use different work queues for decryption and for verity because
> >> - * verity may require reading metadata pages that need decryption, and
> >> - * we shouldn't recurse to the same workqueue.
> >> - */
> >
> > Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> > decryption and for verity? See the comment above which is being deleted.
>
> Could you explain more about how deadlock happen? or share me a link address if
> you have described that case somewhere?
>

The verity work can read pages from the file which require decryption. I'm
concerned that it could deadlock if the work is scheduled on the same workqueue.
Granted, I'm not an expert in Linux workqueues, so if you've investigated this
and determined that it's safe, can you explain why?

- Eric

2019-10-30 08:47:07

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 2019/10/30 10:55, Eric Biggers wrote:
> On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
>> On 2019/10/28 6:50, Eric Biggers wrote:
>>>> +bool f2fs_is_compressed_page(struct page *page)
>>>> +{
>>>> + if (!page_private(page))
>>>> + return false;
>>>> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
>>>> + return false;
>>>> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
>>>> +}
>>>
>>> This code implies that there can be multiple page private structures each of
>>> which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
>>> Where in the code is the other one(s)?
>>
>> I'm not sure I understood you correctly, did you mean it needs to introduce
>> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
>> f2fs_is_compressed_page()?
>>
>
> No, I'm asking what is the case where the line
>
> *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
>
> returns false?

Should be this?

if (!page_private(page))
return false;
f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
return true;

>
>>>
>>>> +
>>>> +static void f2fs_set_compressed_page(struct page *page,
>>>> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
>>>> +{
>>>> + SetPagePrivate(page);
>>>> + set_page_private(page, (unsigned long)data);
>>>> +
>>>> + /* i_crypto_info and iv index */
>>>> + page->index = index;
>>>> + page->mapping = inode->i_mapping;
>>>> + if (r)
>>>> + refcount_inc(r);
>>>> +}
>>>
>>> It isn't really appropriate to create fake pagecache pages like this. Did you
>>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
>>
>> We need to store i_crypto_info and iv index somewhere, in order to pass them to
>> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
>>
>
> The same place where the pages are stored.

Still we need allocate space for those fields, any strong reason to do so?

>
>>>> +
>>>> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
>>>> +{
>>>> + kvfree(cc->rpages);
>>>> +}
>>>
>>> The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
>>> just kfree()?
>>
>> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
>> kmalloc() failed.
>
> This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
> kvmalloc(). As per its name, it should only use kmalloc(). f2fs_kvmalloc()
> already exists, so it can be used when the fallback is wanted.

We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() as
below:

f2fs_memalloc()
{
mem = f2fs_kmalloc();
if (mem)
return mem;
return f2fs_kvmalloc();
}

It can be used in specified place where we really need it, like the place
descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in where
we introduced original logic.

>
>>
>>>> +static int lzo_compress_pages(struct compress_ctx *cc)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata,
>>>> + &cc->clen, cc->private);
>>>> + if (ret != LZO_E_OK) {
>>>> + printk_ratelimited("%sF2FS-fs: lzo compress failed, ret:%d\n",
>>>> + KERN_ERR, ret);
>>>> + return -EIO;
>>>> + }
>>>> + return 0;
>>>> +}
>>>
>>> Why not using f2fs_err()? Same in lots of other places.
>>
>> We use printk_ratelimited at some points where we can afford to lose logs,
>> otherwise we use f2fs_{err,warn...} to record info as much as possible for
>> troubleshoot.
>>
>
> It used to be the case that f2fs_msg() was ratelimited. What stops it from
> spamming the logs now?

https://lore.kernel.org/patchwork/patch/973837/

>
> The problem with a bare printk is that it doesn't show which filesystem instance
> the message is coming from.

We can add to print sbi->sb->s_id like f2fs_printk().

>
>>>> +
>>>> + ret = cops->compress_pages(cc);
>>>> + if (ret)
>>>> + goto out_vunmap_cbuf;
>>>> +
>>>> + max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
>>>> +
>>>> + if (cc->clen > max_len) {
>>>> + ret = -EAGAIN;
>>>> + goto out_vunmap_cbuf;
>>>> + }
>>>
>>> Since we already know the max length we're willing to compress to (the max
>>> length for any space to be saved), why is more space than that being allocated?
>>> LZ4_compress_default() will return an error if there isn't enough space, so that
>>> error could just be used as the indication to store the data uncompressed.
>>
>> AFAIK, there is no such common error code returned from all compression
>> algorithms indicating there is no room for limited target size, however we need
>> that information to fallback to write raw pages. Any better idea?
>>
>
> "Not enough room" is the only reasonable way for compression to fail, so all

At a glance, compression comments did say only fail due to out-of-space of
dst_buf, and it will fail due to other reasons as I checked few codes.
a) dst_buf is too small
b) src_buf is too large/small
c) wrong step
maybe missed other cases...

Yeah, we can get rid of condition b)/c) during implementation, however, what I'm
concern is the implementation is too tight to all error handling of all
compression algorithms, as we're not always aware of compression error handling
changes.

> that's needed is the ability for compression to report errors at all. What
> specifically prevents this approach from working?
>
>>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>>>> {
>>>> - /*
>>>> - * We use different work queues for decryption and for verity because
>>>> - * verity may require reading metadata pages that need decryption, and
>>>> - * we shouldn't recurse to the same workqueue.
>>>> - */
>>>
>>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
>>> decryption and for verity? See the comment above which is being deleted.
>>
>> Could you explain more about how deadlock happen? or share me a link address if
>> you have described that case somewhere?
>>
>
> The verity work can read pages from the file which require decryption. I'm
> concerned that it could deadlock if the work is scheduled on the same workqueue.

I assume you've tried one workqueue, and suffered deadlock..

> Granted, I'm not an expert in Linux workqueues, so if you've investigated this
> and determined that it's safe, can you explain why?

I'm not familiar with workqueue... I guess it may not safe that if the work is
scheduled to the same cpu in where verity was waiting for data? if the work is
scheduled to other cpu, it may be safe.

I can check that before splitting the workqueue for verity and decrypt/decompress.

Thanks,

>
> - Eric
> .
>

2019-10-30 16:52:12

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> On 2019/10/30 10:55, Eric Biggers wrote:
> > On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
> >> On 2019/10/28 6:50, Eric Biggers wrote:
> >>>> +bool f2fs_is_compressed_page(struct page *page)
> >>>> +{
> >>>> + if (!page_private(page))
> >>>> + return false;
> >>>> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
> >>>> + return false;
> >>>> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
> >>>> +}
> >>>
> >>> This code implies that there can be multiple page private structures each of
> >>> which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
> >>> Where in the code is the other one(s)?
> >>
> >> I'm not sure I understood you correctly, did you mean it needs to introduce
> >> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
> >> f2fs_is_compressed_page()?
> >>
> >
> > No, I'm asking what is the case where the line
> >
> > *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
> >
> > returns false?
>
> Should be this?
>
> if (!page_private(page))
> return false;
> f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
> return true;

Yes, that makes more sense, unless there are other cases.

>
> >
> >>>
> >>>> +
> >>>> +static void f2fs_set_compressed_page(struct page *page,
> >>>> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
> >>>> +{
> >>>> + SetPagePrivate(page);
> >>>> + set_page_private(page, (unsigned long)data);
> >>>> +
> >>>> + /* i_crypto_info and iv index */
> >>>> + page->index = index;
> >>>> + page->mapping = inode->i_mapping;
> >>>> + if (r)
> >>>> + refcount_inc(r);
> >>>> +}
> >>>
> >>> It isn't really appropriate to create fake pagecache pages like this. Did you
> >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> >>
> >> We need to store i_crypto_info and iv index somewhere, in order to pass them to
> >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> >>
> >
> > The same place where the pages are stored.
>
> Still we need allocate space for those fields, any strong reason to do so?
>

page->mapping set implies that the page is a pagecache page. Faking it could
cause problems with code elsewhere.

> >
> >>>> +
> >>>> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> >>>> +{
> >>>> + kvfree(cc->rpages);
> >>>> +}
> >>>
> >>> The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
> >>> just kfree()?
> >>
> >> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
> >> kmalloc() failed.
> >
> > This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
> > kvmalloc(). As per its name, it should only use kmalloc(). f2fs_kvmalloc()
> > already exists, so it can be used when the fallback is wanted.
>
> We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() as
> below:
>
> f2fs_memalloc()
> {
> mem = f2fs_kmalloc();
> if (mem)
> return mem;
> return f2fs_kvmalloc();
> }
>
> It can be used in specified place where we really need it, like the place
> descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in where
> we introduced original logic.

No, just use kvmalloc(). The whole point of kvmalloc() is that it tries
kmalloc() and then falls back to vmalloc() if it fails.

- Eric

2019-10-30 17:07:57

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> >>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> >>>> {
> >>>> - /*
> >>>> - * We use different work queues for decryption and for verity because
> >>>> - * verity may require reading metadata pages that need decryption, and
> >>>> - * we shouldn't recurse to the same workqueue.
> >>>> - */
> >>>
> >>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> >>> decryption and for verity? See the comment above which is being deleted.
> >>
> >> Could you explain more about how deadlock happen? or share me a link address if
> >> you have described that case somewhere?
> >>
> >
> > The verity work can read pages from the file which require decryption. I'm
> > concerned that it could deadlock if the work is scheduled on the same workqueue.
>
> I assume you've tried one workqueue, and suffered deadlock..
>
> > Granted, I'm not an expert in Linux workqueues, so if you've investigated this
> > and determined that it's safe, can you explain why?
>
> I'm not familiar with workqueue... I guess it may not safe that if the work is
> scheduled to the same cpu in where verity was waiting for data? if the work is
> scheduled to other cpu, it may be safe.
>
> I can check that before splitting the workqueue for verity and decrypt/decompress.
>

Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
the Merkle tree page to be decrypted. This is with the v2 compression patch;
it works fine on current mainline.

INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:0 D 0 61 2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
context_switch kernel/sched/core.c:3384 [inline]
__schedule+0x299/0x6c0 kernel/sched/core.c:4069
schedule+0x44/0xd0 kernel/sched/core.c:4136
io_schedule+0x11/0x40 kernel/sched/core.c:5780
wait_on_page_bit_common mm/filemap.c:1174 [inline]
wait_on_page_bit mm/filemap.c:1223 [inline]
wait_on_page_locked include/linux/pagemap.h:527 [inline]
wait_on_page_locked include/linux/pagemap.h:524 [inline]
wait_on_page_read mm/filemap.c:2767 [inline]
do_read_cache_page+0x407/0x660 mm/filemap.c:2810
read_cache_page+0xd/0x10 mm/filemap.c:2894
f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
verify_page+0x110/0x560 fs/verity/verify.c:120
fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
verity_work fs/f2fs/data.c:142 [inline]
f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
process_one_work+0x225/0x550 kernel/workqueue.c:2269
worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
kthread+0x125/0x140 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:1 D 0 1140 2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
context_switch kernel/sched/core.c:3384 [inline]
__schedule+0x299/0x6c0 kernel/sched/core.c:4069
schedule+0x44/0xd0 kernel/sched/core.c:4136
io_schedule+0x11/0x40 kernel/sched/core.c:5780
wait_on_page_bit_common mm/filemap.c:1174 [inline]
wait_on_page_bit mm/filemap.c:1223 [inline]
wait_on_page_locked include/linux/pagemap.h:527 [inline]
wait_on_page_locked include/linux/pagemap.h:524 [inline]
wait_on_page_read mm/filemap.c:2767 [inline]
do_read_cache_page+0x407/0x660 mm/filemap.c:2810
read_cache_page+0xd/0x10 mm/filemap.c:2894
f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
verify_page+0x110/0x560 fs/verity/verify.c:120
fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
verity_work fs/f2fs/data.c:142 [inline]
f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
process_one_work+0x225/0x550 kernel/workqueue.c:2269
worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
kthread+0x125/0x140 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Showing all locks held in the system:
1 lock held by khungtaskd/21:
#0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
2 locks held by kworker/u5:0/61:
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
#1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
#1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
#1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
2 locks held by kworker/u5:1/1140:
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
#0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
#1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
#1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
#1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240

2019-10-30 17:27:01

by Gao Xiang

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

Hi Eric,

(add some mm folks...)

On Wed, Oct 30, 2019 at 09:50:56AM -0700, Eric Biggers wrote:

<snip>

> > >>>
> > >>> It isn't really appropriate to create fake pagecache pages like this. Did you
> > >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> > >>
> > >> We need to store i_crypto_info and iv index somewhere, in order to pass them to
> > >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> > >>
> > >
> > > The same place where the pages are stored.
> >
> > Still we need allocate space for those fields, any strong reason to do so?
> >
>
> page->mapping set implies that the page is a pagecache page. Faking it could
> cause problems with code elsewhere.

Not very related with this patch. Faking page->mapping was used in zsmalloc before
nonLRU migration (see material [1]) and use in erofs now (page->mapping to indicate
nonLRU short lifetime temporary page type, page->private is used for per-page information),
as far as I know, NonLRU page without PAGE_MAPPING_MOVABLE set is safe for most mm code.

On the other hands, I think NULL page->mapping will waste such field in precious
page structure... And we can not get such page type directly only by a NULL --
a truncated file page or just allocated page or some type internal temporary pages...

So I have some proposal is to use page->mapping to indicate specific page type for
such nonLRU pages (by some common convention, e.g. some real structure, rather than
just zero out to waste 8 bytes, it's also natural to indicate some page type by
its `mapping' naming )... Since my English is not very well, I delay it util now...

[1] https://elixir.bootlin.com/linux/v3.18.140/source/mm/zsmalloc.c#L379
https://lore.kernel.org/linux-mm/[email protected]
and some not very related topic: https://lwn.net/Articles/752564/

Thanks,
Gao Xiang

2019-10-30 19:08:15

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 10/30, Eric Biggers wrote:
> On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> > On 2019/10/30 10:55, Eric Biggers wrote:
> > > On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
> > >> On 2019/10/28 6:50, Eric Biggers wrote:
> > >>>> +bool f2fs_is_compressed_page(struct page *page)
> > >>>> +{
> > >>>> + if (!page_private(page))
> > >>>> + return false;
> > >>>> + if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
> > >>>> + return false;
> > >>>> + return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
> > >>>> +}
> > >>>
> > >>> This code implies that there can be multiple page private structures each of
> > >>> which has a different magic number. But I only see F2FS_COMPRESSED_PAGE_MAGIC.
> > >>> Where in the code is the other one(s)?
> > >>
> > >> I'm not sure I understood you correctly, did you mean it needs to introduce
> > >> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
> > >> f2fs_is_compressed_page()?
> > >>
> > >
> > > No, I'm asking what is the case where the line
> > >
> > > *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
> > >
> > > returns false?
> >
> > Should be this?
> >
> > if (!page_private(page))
> > return false;
> > f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
> > return true;
>
> Yes, that makes more sense, unless there are other cases.
>
> >
> > >
> > >>>
> > >>>> +
> > >>>> +static void f2fs_set_compressed_page(struct page *page,
> > >>>> + struct inode *inode, pgoff_t index, void *data, refcount_t *r)
> > >>>> +{
> > >>>> + SetPagePrivate(page);
> > >>>> + set_page_private(page, (unsigned long)data);
> > >>>> +
> > >>>> + /* i_crypto_info and iv index */
> > >>>> + page->index = index;
> > >>>> + page->mapping = inode->i_mapping;
> > >>>> + if (r)
> > >>>> + refcount_inc(r);
> > >>>> +}
> > >>>
> > >>> It isn't really appropriate to create fake pagecache pages like this. Did you
> > >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> > >>
> > >> We need to store i_crypto_info and iv index somewhere, in order to pass them to
> > >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> > >>
> > >
> > > The same place where the pages are stored.
> >
> > Still we need allocate space for those fields, any strong reason to do so?
> >
>
> page->mapping set implies that the page is a pagecache page. Faking it could
> cause problems with code elsewhere.

I've checked it with minchan, and it seems to be fine that filesystem uses
this page internally only, not in pagecache.

>
> > >
> > >>>> +
> > >>>> +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> > >>>> +{
> > >>>> + kvfree(cc->rpages);
> > >>>> +}
> > >>>
> > >>> The memory is allocated with kzalloc(), so why is it freed with kvfree() and not
> > >>> just kfree()?
> > >>
> > >> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
> > >> kmalloc() failed.
> > >
> > > This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
> > > kvmalloc(). As per its name, it should only use kmalloc(). f2fs_kvmalloc()
> > > already exists, so it can be used when the fallback is wanted.
> >
> > We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() as
> > below:
> >
> > f2fs_memalloc()
> > {
> > mem = f2fs_kmalloc();
> > if (mem)
> > return mem;
> > return f2fs_kvmalloc();
> > }
> >
> > It can be used in specified place where we really need it, like the place
> > descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in where
> > we introduced original logic.
>
> No, just use kvmalloc(). The whole point of kvmalloc() is that it tries
> kmalloc() and then falls back to vmalloc() if it fails.
>
> - Eric

2019-10-31 02:17:42

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 2019/10/31 0:50, Eric Biggers wrote:
> No, just use kvmalloc(). The whole point of kvmalloc() is that it tries
> kmalloc() and then falls back to vmalloc() if it fails.

Okay, it's fine to me, let me fix this in another patch.

Thanks,

>
> - Eric
> .
>

2019-10-31 02:24:45

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 2019/10/31 1:02, Eric Biggers wrote:
> On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
>>>>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>>>>>> {
>>>>>> - /*
>>>>>> - * We use different work queues for decryption and for verity because
>>>>>> - * verity may require reading metadata pages that need decryption, and
>>>>>> - * we shouldn't recurse to the same workqueue.
>>>>>> - */
>>>>>
>>>>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
>>>>> decryption and for verity? See the comment above which is being deleted.
>>>>
>>>> Could you explain more about how deadlock happen? or share me a link address if
>>>> you have described that case somewhere?
>>>>
>>>
>>> The verity work can read pages from the file which require decryption. I'm
>>> concerned that it could deadlock if the work is scheduled on the same workqueue.
>>
>> I assume you've tried one workqueue, and suffered deadlock..
>>
>>> Granted, I'm not an expert in Linux workqueues, so if you've investigated this
>>> and determined that it's safe, can you explain why?
>>
>> I'm not familiar with workqueue... I guess it may not safe that if the work is
>> scheduled to the same cpu in where verity was waiting for data? if the work is
>> scheduled to other cpu, it may be safe.
>>
>> I can check that before splitting the workqueue for verity and decrypt/decompress.
>>
>
> Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
> The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
> the Merkle tree page to be decrypted. This is with the v2 compression patch;
> it works fine on current mainline.

Oh, alright...

Let me split them, thanks very much for all the comments and test anyway.

Thanks,

>
> INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
> Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:0 D 0 61 2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
> context_switch kernel/sched/core.c:3384 [inline]
> __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> schedule+0x44/0xd0 kernel/sched/core.c:4136
> io_schedule+0x11/0x40 kernel/sched/core.c:5780
> wait_on_page_bit_common mm/filemap.c:1174 [inline]
> wait_on_page_bit mm/filemap.c:1223 [inline]
> wait_on_page_locked include/linux/pagemap.h:527 [inline]
> wait_on_page_locked include/linux/pagemap.h:524 [inline]
> wait_on_page_read mm/filemap.c:2767 [inline]
> do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> read_cache_page+0xd/0x10 mm/filemap.c:2894
> f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> verify_page+0x110/0x560 fs/verity/verify.c:120
> fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> verity_work fs/f2fs/data.c:142 [inline]
> f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> process_one_work+0x225/0x550 kernel/workqueue.c:2269
> worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> kthread+0x125/0x140 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
> Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:1 D 0 1140 2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
> context_switch kernel/sched/core.c:3384 [inline]
> __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> schedule+0x44/0xd0 kernel/sched/core.c:4136
> io_schedule+0x11/0x40 kernel/sched/core.c:5780
> wait_on_page_bit_common mm/filemap.c:1174 [inline]
> wait_on_page_bit mm/filemap.c:1223 [inline]
> wait_on_page_locked include/linux/pagemap.h:527 [inline]
> wait_on_page_locked include/linux/pagemap.h:524 [inline]
> wait_on_page_read mm/filemap.c:2767 [inline]
> do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> read_cache_page+0xd/0x10 mm/filemap.c:2894
> f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> verify_page+0x110/0x560 fs/verity/verify.c:120
> fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> verity_work fs/f2fs/data.c:142 [inline]
> f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> process_one_work+0x225/0x550 kernel/workqueue.c:2269
> worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> kthread+0x125/0x140 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/21:
> #0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
> 2 locks held by kworker/u5:0/61:
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> 2 locks held by kworker/u5:1/1140:
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> .
>

2019-10-31 15:36:58

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

Hi Chao,

On 10/31, Chao Yu wrote:
> On 2019/10/31 0:50, Eric Biggers wrote:
> > No, just use kvmalloc(). The whole point of kvmalloc() is that it tries
> > kmalloc() and then falls back to vmalloc() if it fails.
>
> Okay, it's fine to me, let me fix this in another patch.

I've fixed some bugs. (e.g., mmap) Please apply this in your next patch, so that
I can continue to test new version as early as possible.

With this patch, I could boot up a device and install some apps successfully
with "compress_extension=*".

---
fs/f2fs/compress.c | 229 +++++++++++++++++++++++----------------------
fs/f2fs/data.c | 109 +++++++++++++--------
fs/f2fs/f2fs.h | 22 +++--
fs/f2fs/file.c | 71 +++++++++-----
fs/f2fs/namei.c | 20 +++-
5 files changed, 264 insertions(+), 187 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index f276d82a67aa..e03d57396ea2 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -77,8 +77,9 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);

- if (cc->rpages)
+ if (cc->nr_rpages)
return 0;
+
cc->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) * cc->cluster_size,
GFP_KERNEL);
if (!cc->rpages)
@@ -88,7 +89,9 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)

void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
{
- kvfree(cc->rpages);
+ f2fs_reset_compress_ctx(cc);
+ WARN_ON(cc->nr_rpages);
+ kfree(cc->rpages);
}

int f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page)
@@ -224,16 +227,6 @@ static const struct f2fs_compress_ops f2fs_lz4_ops = {
.decompress_pages = lz4_decompress_pages,
};

-static void f2fs_release_cluster_pages(struct compress_ctx *cc)
-{
- int i;
-
- for (i = 0; i < cc->nr_rpages; i++) {
- inode_dec_dirty_pages(cc->inode);
- unlock_page(cc->rpages[i]);
- }
-}
-
static struct page *f2fs_grab_page(void)
{
struct page *page;
@@ -321,6 +314,7 @@ static int f2fs_compress_pages(struct compress_ctx *cc)
trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx,
cc->clen, ret);
return 0;
+
out_vunmap_cbuf:
vunmap(cc->cbuf);
out_vunmap_rbuf:
@@ -393,10 +387,9 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
vunmap(dic->rbuf);
out_free_dic:
f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
- f2fs_free_dic(dic);
-
trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
dic->clen, ret);
+ f2fs_free_dic(dic);
}

static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
@@ -443,51 +436,25 @@ static bool __cluster_may_compress(struct compress_ctx *cc)
return false;
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
return false;
- if (f2fs_is_drop_cache(cc->inode))
- return false;
- if (f2fs_is_volatile_file(cc->inode))
- return false;

offset = i_size & (PAGE_SIZE - 1);
if ((page->index > end_index) ||
(page->index == end_index && !offset))
return false;
+ if (page->index != start_idx_of_cluster(cc) + i)
+ return false;
}
return true;
}

-int f2fs_is_cluster_existed(struct compress_ctx *cc)
-{
- struct dnode_of_data dn;
- unsigned int start_idx = start_idx_of_cluster(cc);
- int ret;
- int i;
-
- set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
- ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
- if (ret)
- return ret;
-
- for (i = 0; i < cc->cluster_size; i++, dn.ofs_in_node++) {
- block_t blkaddr = datablock_addr(dn.inode, dn.node_page,
- dn.ofs_in_node);
- if (blkaddr == COMPRESS_ADDR) {
- ret = 1;
- break;
- }
- if (__is_valid_data_blkaddr(blkaddr)) {
- ret = 2;
- break;
- }
- }
- f2fs_put_dnode(&dn);
- return ret;
-}
-
static bool cluster_may_compress(struct compress_ctx *cc)
{
if (!f2fs_compressed_file(cc->inode))
return false;
+ if (f2fs_is_atomic_file(cc->inode))
+ return false;
+ if (f2fs_is_mmap_file(cc->inode))
+ return false;
if (!f2fs_cluster_is_full(cc))
return false;
return __cluster_may_compress(cc);
@@ -495,19 +462,59 @@ static bool cluster_may_compress(struct compress_ctx *cc)

void f2fs_reset_compress_ctx(struct compress_ctx *cc)
{
- if (cc->rpages)
- memset(cc->rpages, 0, sizeof(struct page *) * cc->cluster_size);
cc->nr_rpages = 0;
cc->nr_cpages = 0;
cc->cluster_idx = NULL_CLUSTER;
}

+int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
+{
+ struct dnode_of_data dn;
+ unsigned int start_idx = cluster_idx(cc, index) * cc->cluster_size;
+ int ret, i;
+
+ set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
+ ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
+ if (ret) {
+ if (ret == -ENOENT)
+ ret = 0;
+ goto fail;
+ }
+ if (dn.data_blkaddr == COMPRESS_ADDR) {
+ ret = CLUSTER_IS_FULL;
+ for (i = 1; i < cc->cluster_size; i++) {
+ block_t blkaddr;
+
+ blkaddr = datablock_addr(dn.inode,
+ dn.node_page, dn.ofs_in_node + i);
+ if (blkaddr == NULL_ADDR) {
+ ret = CLUSTER_HAS_SPACE;
+ break;
+ }
+ }
+ }
+fail:
+ f2fs_put_dnode(&dn);
+ return ret;
+}
+
+int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index)
+{
+ struct compress_ctx cc = {
+ .inode = inode,
+ .cluster_size = F2FS_I(inode)->i_cluster_size,
+ };
+
+ return is_compressed_cluster(&cc, index);
+}
+
static void set_cluster_writeback(struct compress_ctx *cc)
{
int i;

for (i = 0; i < cc->cluster_size; i++)
- set_page_writeback(cc->rpages[i]);
+ if (cc->rpages[i])
+ set_page_writeback(cc->rpages[i]);
}

static void set_cluster_dirty(struct compress_ctx *cc)
@@ -515,17 +522,17 @@ static void set_cluster_dirty(struct compress_ctx *cc)
int i;

for (i = 0; i < cc->cluster_size; i++)
- set_page_dirty(cc->rpages[i]);
+ if (cc->rpages[i])
+ set_page_dirty(cc->rpages[i]);
}

-int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
- struct page **pagep, pgoff_t index,
- void **fsdata, bool prealloc)
+static int prepare_compress_overwrite(struct compress_ctx *cc,
+ struct page **pagep, pgoff_t index, void **fsdata,
+ bool prealloc)
{
- struct inode *inode = cc->inode;
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
struct bio *bio = NULL;
- struct address_space *mapping = inode->i_mapping;
+ struct address_space *mapping = cc->inode->i_mapping;
struct page *page;
struct dnode_of_data dn;
sector_t last_block_in_bio;
@@ -586,13 +593,12 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
}
goto retry;
}
-
}

if (prealloc) {
__do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);

- set_new_dnode(&dn, inode, NULL, NULL, 0);
+ set_new_dnode(&dn, cc->inode, NULL, NULL, 0);

for (i = cc->cluster_size - 1; i > 0; i--) {
ret = f2fs_get_block(&dn, start_idx + i);
@@ -609,7 +615,8 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,

*fsdata = cc->rpages;
*pagep = cc->rpages[offset_in_cluster(cc, index)];
- return 0;
+ return CLUSTER_IS_FULL;
+
unlock_pages:
for (idx = 0; idx < i; idx++) {
if (cc->rpages[idx])
@@ -626,13 +633,34 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
return ret;
}

-void f2fs_compress_write_end(struct inode *inode, void *fsdata,
- bool written)
+int f2fs_prepare_compress_overwrite(struct inode *inode,
+ struct page **pagep, pgoff_t index, void **fsdata)
+{
+ struct compress_ctx cc = {
+ .inode = inode,
+ .cluster_size = F2FS_I(inode)->i_cluster_size,
+ .cluster_idx = NULL_CLUSTER,
+ .rpages = NULL,
+ .nr_rpages = 0,
+ };
+ int ret = is_compressed_cluster(&cc, index);
+
+ if (ret <= 0)
+ return ret;
+
+ /* compressed case */
+ return prepare_compress_overwrite(&cc, pagep, index,
+ fsdata, ret == CLUSTER_HAS_SPACE);
+}
+
+bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
+ pgoff_t index, bool written)
{
struct compress_ctx cc = {
.cluster_size = F2FS_I(inode)->i_cluster_size,
.rpages = fsdata,
};
+ bool first_index = (index == cc.rpages[0]->index);
int i;

if (written)
@@ -640,6 +668,11 @@ void f2fs_compress_write_end(struct inode *inode, void *fsdata,

for (i = 0; i < cc.cluster_size; i++)
f2fs_put_page(cc.rpages[i], 1);
+
+ f2fs_destroy_compress_ctx(&cc);
+
+ return first_index;
+
}

static int f2fs_write_compressed_pages(struct compress_ctx *cc,
@@ -723,6 +756,8 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,

blkaddr = datablock_addr(dn.inode, dn.node_page,
dn.ofs_in_node);
+ fio.page = cc->rpages[i];
+ fio.old_blkaddr = blkaddr;

/* cluster header */
if (i == 0) {
@@ -731,7 +766,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
if (__is_valid_data_blkaddr(blkaddr))
f2fs_invalidate_blocks(sbi, blkaddr);
f2fs_update_data_blkaddr(&dn, COMPRESS_ADDR);
- continue;
+ goto unlock_continue;
}

if (pre_compressed_blocks && __is_valid_data_blkaddr(blkaddr))
@@ -742,13 +777,11 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
f2fs_invalidate_blocks(sbi, blkaddr);
f2fs_update_data_blkaddr(&dn, NEW_ADDR);
}
- continue;
+ goto unlock_continue;
}

f2fs_bug_on(fio.sbi, blkaddr == NULL_ADDR);

- fio.page = cc->rpages[i];
- fio.old_blkaddr = blkaddr;

if (fio.encrypted)
fio.encrypted_page = cc->cpages[i - 1];
@@ -759,6 +792,9 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
cc->cpages[i - 1] = NULL;
f2fs_outplace_write_data(&dn, &fio);
(*submitted)++;
+unlock_continue:
+ inode_dec_dirty_pages(cc->inode);
+ unlock_page(fio.page);
}

if (pre_compressed_blocks) {
@@ -778,10 +814,6 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
f2fs_put_dnode(&dn);
f2fs_unlock_op(sbi);

- f2fs_release_cluster_pages(cc);
-
- cc->rpages = NULL;
-
if (err) {
file_set_keep_isize(inode);
} else {
@@ -791,6 +823,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
up_write(&fi->i_sem);
}
return 0;
+
out_destroy_crypt:
for (i -= 1; i >= 0; i--)
fscrypt_finalize_bounce_page(&cc->cpages[i]);
@@ -824,12 +857,13 @@ void f2fs_compress_write_end_io(struct bio *bio, struct page *page)
return;

for (i = 0; i < cic->nr_rpages; i++) {
+ WARN_ON(!cic->rpages[i]);
clear_cold_data(cic->rpages[i]);
end_page_writeback(cic->rpages[i]);
}

- kvfree(cic->rpages);
- kvfree(cic);
+ kfree(cic->rpages);
+ kfree(cic);
}

static int f2fs_write_raw_pages(struct compress_ctx *cc,
@@ -843,6 +877,7 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
for (i = 0; i < cc->cluster_size; i++) {
if (!cc->rpages[i])
continue;
+ BUG_ON(!PageLocked(cc->rpages[i]));
ret = f2fs_write_single_data_page(cc->rpages[i], &_submitted,
NULL, NULL, wbc, io_type);
if (ret) {
@@ -855,9 +890,10 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
*submitted += _submitted;
}
return 0;
+
out_fail:
/* TODO: revoke partially updated block addresses */
- for (i += 1; i < cc->cluster_size; i++) {
+ for (++i; i < cc->cluster_size; i++) {
if (!cc->rpages[i])
continue;
redirty_page_for_writepage(wbc, cc->rpages[i]);
@@ -890,9 +926,14 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
}
write:
if (err == -EAGAIN) {
+ bool compressed = false;
+
f2fs_bug_on(F2FS_I_SB(cc->inode), *submitted);
+ if (is_compressed_cluster(cc, start_idx_of_cluster(cc)))
+ compressed = true;
+
err = f2fs_write_raw_pages(cc, submitted, wbc, io_type);
- if (f2fs_is_cluster_existed(cc) == 1) {
+ if (compressed) {
stat_sub_compr_blocks(cc->inode, *submitted);
F2FS_I(cc->inode)->i_compressed_blocks -= *submitted;
f2fs_mark_inode_dirty_sync(cc->inode, true);
@@ -902,37 +943,6 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
return err;
}

-int f2fs_is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
-{
- struct dnode_of_data dn;
- unsigned int start_idx = cluster_idx(cc, index) * cc->cluster_size;
- int ret, i;
-
- set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
- ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
- if (ret) {
- if (ret == -ENOENT)
- ret = 0;
- goto fail;
- }
- if (dn.data_blkaddr == COMPRESS_ADDR) {
- ret = CLUSTER_IS_FULL;
- for (i = 1; i < cc->cluster_size; i++) {
- block_t blkaddr;
-
- blkaddr = datablock_addr(dn.inode,
- dn.node_page, dn.ofs_in_node + i);
- if (blkaddr == NULL_ADDR) {
- ret = CLUSTER_HAS_SPACE;
- break;
- }
- }
- }
-fail:
- f2fs_put_dnode(&dn);
- return ret;
-}
-
struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
@@ -991,9 +1001,8 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)

dic->rpages = cc->rpages;
dic->nr_rpages = cc->cluster_size;
-
- cc->rpages = NULL;
return dic;
+
out_free:
f2fs_free_dic(dic);
out:
@@ -1011,7 +1020,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
unlock_page(dic->tpages[i]);
put_page(dic->tpages[i]);
}
- kvfree(dic->tpages);
+ kfree(dic->tpages);
}

if (dic->cpages) {
@@ -1020,11 +1029,11 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
continue;
f2fs_put_compressed_page(dic->cpages[i]);
}
- kvfree(dic->cpages);
+ kfree(dic->cpages);
}

- kvfree(dic->rpages);
- kvfree(dic);
+ kfree(dic->rpages);
+ kfree(dic);
}

void f2fs_set_cluster_uptodate(struct page **rpages,
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index bac96c3a8bc9..b8e0431747b1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1925,18 +1925,18 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;

/* get rid of pages beyond EOF */
- for (i = cc->nr_rpages - 1; i >= 0; i--) {
+ for (i = 0; i < cc->cluster_size; i++) {
struct page *page = cc->rpages[i];

if (!page)
continue;
- if ((sector_t)page->index < last_block_in_file)
- break;
-
- zero_user_segment(page, 0, PAGE_SIZE);
- if (!PageUptodate(page))
- SetPageUptodate(page);
-
+ if ((sector_t)page->index >= last_block_in_file) {
+ zero_user_segment(page, 0, PAGE_SIZE);
+ if (!PageUptodate(page))
+ SetPageUptodate(page);
+ } else if (!PageUptodate(page)) {
+ continue;
+ }
unlock_page(page);
cc->rpages[i] = NULL;
cc->nr_rpages--;
@@ -2031,6 +2031,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
f2fs_reset_compress_ctx(cc);
*bio_ret = bio;
return 0;
+
out_put_dnode:
f2fs_put_dnode(&dn);
out:
@@ -2100,7 +2101,7 @@ int f2fs_mpage_readpages(struct address_space *mapping,
if (ret)
goto set_error_page;
}
- ret = f2fs_is_compressed_cluster(&cc, page->index);
+ ret = f2fs_is_compressed_cluster(inode, page->index);
if (ret < 0)
goto set_error_page;
else if (!ret)
@@ -2457,7 +2458,8 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;

- if (page->index < end_index || f2fs_verity_in_progress(inode))
+ if (f2fs_compressed_file(inode) ||
+ page->index < end_index || f2fs_verity_in_progress(inode))
goto write;

/*
@@ -2533,7 +2535,6 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
f2fs_remove_dirty_inode(inode);
submitted = NULL;
}
-
unlock_page(page);
if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode) &&
!F2FS_I(inode)->cp_task)
@@ -2567,6 +2568,15 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
static int f2fs_write_data_page(struct page *page,
struct writeback_control *wbc)
{
+ struct inode *inode = page->mapping->host;
+
+ if (f2fs_compressed_file(inode)) {
+ if (f2fs_is_compressed_cluster(inode, page->index)) {
+ redirty_page_for_writepage(wbc, page);
+ return AOP_WRITEPAGE_ACTIVATE;
+ }
+ }
+
return f2fs_write_single_data_page(page, NULL, NULL, NULL,
wbc, FS_DATA_IO);
}
@@ -2581,7 +2591,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
enum iostat_type io_type)
{
int ret = 0;
- int done = 0;
+ int done = 0, retry = 0;
struct pagevec pvec;
struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
struct bio *bio = NULL;
@@ -2639,10 +2649,11 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
else
tag = PAGECACHE_TAG_DIRTY;
retry:
+ retry = 0;
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag_pages_for_writeback(mapping, index, end);
done_index = index;
- while (!done && (index <= end)) {
+ while (!done && !retry && (index <= end)) {
nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
tag);
if (nr_pages == 0)
@@ -2650,25 +2661,42 @@ static int f2fs_write_cache_pages(struct address_space *mapping,

for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
- bool need_readd = false;
-
+ bool need_readd;
readd:
+ need_readd = false;
if (f2fs_compressed_file(inode)) {
+ void *fsdata = NULL;
+ struct page *pagep;
+ int ret2;
+
ret = f2fs_init_compress_ctx(&cc);
if (ret) {
done = 1;
break;
}

- if (!f2fs_cluster_can_merge_page(&cc,
- page->index)) {
- need_readd = true;
+ if (!f2fs_cluster_can_merge_page(&cc, page->index)) {
ret = f2fs_write_multi_pages(&cc,
- &submitted, wbc, io_type);
+ &submitted, wbc, io_type);
+ if (!ret)
+ need_readd = true;
goto result;
}
+ if (f2fs_cluster_is_empty(&cc)) {
+ ret2 = f2fs_prepare_compress_overwrite(inode,
+ &pagep, page->index, &fsdata);
+ if (ret2 < 0) {
+ ret = ret2;
+ done = 1;
+ break;
+ } else if (ret2 &&
+ !f2fs_compress_write_end(inode, fsdata,
+ page->index, true)) {
+ retry = 1;
+ break;
+ }
+ }
}
-
/* give a priority to WB_SYNC threads */
if (atomic_read(&sbi->wb_sync_req[DATA]) &&
wbc->sync_mode == WB_SYNC_NONE) {
@@ -2702,7 +2730,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
if (!clear_page_dirty_for_io(page))
goto continue_unlock;

- if (f2fs_compressed_file(mapping->host)) {
+ if (f2fs_compressed_file(inode)) {
ret = f2fs_compress_ctx_add_page(&cc, page);
f2fs_bug_on(sbi, ret);
continue;
@@ -2754,7 +2782,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
/* TODO: error handling */
}

- if (!cycled && !done) {
+ if ((!cycled && !done) || retry) {
cycled = 1;
index = 0;
end = writeback_index - 1;
@@ -2770,8 +2798,6 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
if (bio)
f2fs_submit_merged_ipu_write(sbi, &bio, NULL);

- f2fs_destroy_compress_ctx(&cc);
-
return ret;
}

@@ -3017,26 +3043,18 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
}

if (f2fs_compressed_file(inode)) {
- struct compress_ctx cc = {
- .inode = inode,
- .cluster_size = F2FS_I(inode)->i_cluster_size,
- .cluster_idx = NULL_CLUSTER,
- .rpages = NULL,
- .nr_rpages = 0,
- };
+ int ret;

*fsdata = NULL;

- err = f2fs_is_compressed_cluster(&cc, index);
- if (err < 0)
+ ret = f2fs_prepare_compress_overwrite(inode, pagep,
+ index, fsdata);
+ if (ret < 0) {
+ err = ret;
goto fail;
- if (!err)
- goto repeat;
-
- err = f2fs_prepare_compress_overwrite(&cc, pagep, index, fsdata,
- err == CLUSTER_HAS_SPACE);
- /* need to goto fail? */
- return err;
+ } else if (ret) {
+ return 0;
+ }
}

repeat:
@@ -3139,7 +3157,7 @@ static int f2fs_write_end(struct file *file,

/* overwrite compressed file */
if (f2fs_compressed_file(inode) && fsdata) {
- f2fs_compress_write_end(inode, fsdata, copied);
+ f2fs_compress_write_end(inode, fsdata, page->index, copied);
goto update_time;
}

@@ -3534,6 +3552,15 @@ static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
if (ret)
return ret;

+ if (f2fs_compressed_file(inode)) {
+ if (F2FS_I(inode)->i_compressed_blocks)
+ return -EINVAL;
+
+ F2FS_I(inode)->i_flags &= ~FS_COMPR_FL;
+ clear_inode_flag(inode, FI_COMPRESSED_FILE);
+ stat_dec_compr_inode(inode);
+ }
+
ret = check_swap_activate(file, sis->max);
if (ret)
return ret;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d22a4e2bb8b8..9c3399fdd6c1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2541,6 +2541,7 @@ enum {
FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
+ FI_MMAP_FILE, /* indicate file was mmapped */
};

static inline void __mark_inode_dirty_flag(struct inode *inode,
@@ -2766,6 +2767,11 @@ static inline int f2fs_has_inline_dots(struct inode *inode)
return is_inode_flag_set(inode, FI_INLINE_DOTS);
}

+static inline int f2fs_is_mmap_file(struct inode *inode)
+{
+ return is_inode_flag_set(inode, FI_MMAP_FILE);
+}
+
static inline bool f2fs_is_pinned_file(struct inode *inode)
{
return is_inode_flag_set(inode, FI_PIN_FILE);
@@ -3609,7 +3615,7 @@ void f2fs_destroy_root_stats(void);
#define stat_inc_atomic_write(inode) do { } while (0)
#define stat_dec_atomic_write(inode) do { } while (0)
#define stat_inc_compr_blocks(inode) do { } while (0)
-#define stat_dec_compr_blocks(inode) do { } while (0)
+#define stat_sub_compr_blocks(inode) do { } while (0)
#define stat_update_max_atomic_write(inode) do { } while (0)
#define stat_inc_volatile_write(inode) do { } while (0)
#define stat_dec_volatile_write(inode) do { } while (0)
@@ -3755,13 +3761,13 @@ static inline bool f2fs_post_read_required(struct inode *inode)
* compress.c
*/
bool f2fs_is_compressed_page(struct page *page);
+int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index);
struct page *f2fs_compress_control_page(struct page *page);
void f2fs_reset_compress_ctx(struct compress_ctx *cc);
-int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
- struct page **page_ret, pgoff_t index,
- void **fsdata, bool prealloc);
-void f2fs_compress_write_end(struct inode *inode, void *fsdata,
- bool written);
+int f2fs_prepare_compress_overwrite(struct inode *inode,
+ struct page **pagep, pgoff_t index, void **fsdata);
+bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
+ pgoff_t index, bool written);
void f2fs_compress_write_end_io(struct bio *bio, struct page *page);
void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity);
bool f2fs_cluster_is_empty(struct compress_ctx *cc);
@@ -3771,7 +3777,7 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
int *submitted,
struct writeback_control *wbc,
enum iostat_type io_type);
-int f2fs_is_compressed_cluster(struct compress_ctx *cc, pgoff_t index);
+int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index);
int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
unsigned nr_pages, sector_t *last_block_in_bio,
bool is_readahead);
@@ -3923,6 +3929,8 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
return true;
if (f2fs_is_multi_device(sbi))
return true;
+ if (f2fs_compressed_file(inode))
+ return true;
/*
* for blkzoned device, fallback direct IO to buffered IO, so
* all IOs can be serialized by log-structured write.
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 8a92e8fd648c..99380c419b87 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -51,7 +51,8 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
struct inode *inode = file_inode(vmf->vma->vm_file);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct dnode_of_data dn = { .node_changed = false };
- int err;
+ bool need_alloc = true;
+ int err = 0;

if (unlikely(f2fs_cp_error(sbi))) {
err = -EIO;
@@ -63,6 +64,18 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
goto err;
}

+ if (f2fs_compressed_file(inode)) {
+ int ret = f2fs_is_compressed_cluster(inode, page->index);
+
+ if (ret < 0) {
+ err = ret;
+ goto err;
+ } else if (ret) {
+ f2fs_bug_on(sbi, ret == CLUSTER_HAS_SPACE);
+ need_alloc = false;
+ }
+ }
+
sb_start_pagefault(inode->i_sb);

f2fs_bug_on(sbi, f2fs_has_inline_data(inode));
@@ -78,15 +91,17 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
goto out_sem;
}

- /* block allocation */
- __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
- set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = f2fs_get_block(&dn, page->index);
- f2fs_put_dnode(&dn);
- __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
- if (err) {
- unlock_page(page);
- goto out_sem;
+ if (need_alloc) {
+ /* block allocation */
+ __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = f2fs_get_block(&dn, page->index);
+ f2fs_put_dnode(&dn);
+ __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
+ if (err) {
+ unlock_page(page);
+ goto out_sem;
+ }
}

/* fill the page */
@@ -492,6 +507,7 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)

file_accessed(file);
vma->vm_ops = &f2fs_file_vm_ops;
+ set_inode_flag(inode, FI_MMAP_FILE);
return 0;
}

@@ -1781,8 +1797,18 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
return -EINVAL;
if (iflags & FS_NOCOMP_FL)
return -EINVAL;
- if (S_ISREG(inode->i_mode))
- clear_inode_flag(inode, FI_INLINE_DATA);
+ if (fi->i_flags & FS_COMPR_FL) {
+ int err = f2fs_convert_inline_inode(inode);
+
+ if (err)
+ return err;
+
+ if (!f2fs_may_compress(inode))
+ return -EINVAL;
+
+ set_inode_flag(inode, FI_COMPRESSED_FILE);
+ stat_inc_compr_inode(inode);
+ }
}
if ((iflags ^ fi->i_flags) & FS_NOCOMP_FL) {
if (fi->i_flags & FS_COMPR_FL)
@@ -1793,19 +1819,6 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
f2fs_bug_on(F2FS_I_SB(inode), (fi->i_flags & FS_COMPR_FL) &&
(fi->i_flags & FS_NOCOMP_FL));

- if (fi->i_flags & FS_COMPR_FL) {
- int err = f2fs_convert_inline_inode(inode);
-
- if (err)
- return err;
-
- if (!f2fs_may_compress(inode))
- return -EINVAL;
-
- set_inode_flag(inode, FI_COMPRESSED_FILE);
- stat_inc_compr_inode(inode);
- }
-
if (fi->i_flags & F2FS_PROJINHERIT_FL)
set_inode_flag(inode, FI_PROJ_INHERIT);
else
@@ -1988,6 +2001,12 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)

inode_lock(inode);

+ if (f2fs_compressed_file(inode) && !fi->i_compressed_blocks) {
+ fi->i_flags &= ~FS_COMPR_FL;
+ clear_inode_flag(inode, FI_COMPRESSED_FILE);
+ stat_dec_compr_inode(inode);
+ }
+
if (f2fs_is_atomic_file(inode)) {
if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST))
ret = -EINVAL;
@@ -3190,7 +3209,7 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg)
}

if (f2fs_compressed_file(inode)) {
- if (F2FS_HAS_BLOCKS(inode) || i_size_read(inode)) {
+ if (F2FS_I(inode)->i_compressed_blocks) {
ret = -EOPNOTSUPP;
goto out;
}
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 9f37e95c4a4b..ac0c51cefca2 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -128,9 +128,11 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
1 << F2FS_I(inode)->i_log_cluster_size;

/* Inherit the compression flag in directory */
- if ((F2FS_I(inode)->i_flags & FS_COMPR_FL) &&
- f2fs_may_compress(inode))
+ if ((F2FS_I(dir)->i_flags & FS_COMPR_FL) &&
+ f2fs_may_compress(inode)) {
+ F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
set_inode_flag(inode, FI_COMPRESSED_FILE);
+ }
}

f2fs_set_inode_flags(inode);
@@ -282,6 +284,7 @@ int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
const unsigned char *name)
{
+ __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
unsigned char (*ext)[F2FS_EXTENSION_LEN];
unsigned int ext_cnt = F2FS_OPTION(sbi).compress_ext_cnt;
int i, cold_count, hot_count;
@@ -292,13 +295,24 @@ static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
!f2fs_may_compress(inode))
return;

+ down_read(&sbi->sb_lock);
+
ext = F2FS_OPTION(sbi).extensions;

cold_count = le32_to_cpu(sbi->raw_super->extension_count);
hot_count = sbi->raw_super->hot_ext_count;

+ for (i = cold_count; i < cold_count + hot_count; i++) {
+ if (is_extension_exist(name, extlist[i])) {
+ up_read(&sbi->sb_lock);
+ return;
+ }
+ }
+
+ up_read(&sbi->sb_lock);
+
for (i = 0; i < ext_cnt; i++) {
- if (is_extension_exist(name, ext[i]) && !file_is_hot(inode)) {
+ if (is_extension_exist(name, ext[i])) {
F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
set_inode_flag(inode, FI_COMPRESSED_FILE);
return;
--
2.19.0.605.g01d371f741-goog

2019-11-01 11:25:40

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

Hi Jaegeuk,

On 2019/10/31 23:35, Jaegeuk Kim wrote:
> Hi Chao,
>
> On 10/31, Chao Yu wrote:
>> On 2019/10/31 0:50, Eric Biggers wrote:
>>> No, just use kvmalloc(). The whole point of kvmalloc() is that it tries
>>> kmalloc() and then falls back to vmalloc() if it fails.
>>
>> Okay, it's fine to me, let me fix this in another patch.
>
> I've fixed some bugs. (e.g., mmap) Please apply this in your next patch, so that
> I can continue to test new version as early as possible.

Applied with some fixes as below comments.

>
> With this patch, I could boot up a device and install some apps successfully
> with "compress_extension=*".

Ah, '*' can trigger big pressure on compression paths.

>
> ---
> fs/f2fs/compress.c | 229 +++++++++++++++++++++++----------------------
> fs/f2fs/data.c | 109 +++++++++++++--------
> fs/f2fs/f2fs.h | 22 +++--
> fs/f2fs/file.c | 71 +++++++++-----
> fs/f2fs/namei.c | 20 +++-
> 5 files changed, 264 insertions(+), 187 deletions(-)
>
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index f276d82a67aa..e03d57396ea2 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -77,8 +77,9 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
>
> - if (cc->rpages)
> + if (cc->nr_rpages)
> return 0;
> +
> cc->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) * cc->cluster_size,
> GFP_KERNEL);
> if (!cc->rpages)
> @@ -88,7 +89,9 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)
>
> void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> {
> - kvfree(cc->rpages);
> + f2fs_reset_compress_ctx(cc);
> + WARN_ON(cc->nr_rpages);

f2fs_reset_compress_ctx() will reset cc->nr_rpages to zero, I removed it for now.

> + kfree(cc->rpages);
> }
>
> int f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page)
> @@ -224,16 +227,6 @@ static const struct f2fs_compress_ops f2fs_lz4_ops = {
> .decompress_pages = lz4_decompress_pages,
> };
>
> -static void f2fs_release_cluster_pages(struct compress_ctx *cc)
> -{
> - int i;
> -
> - for (i = 0; i < cc->nr_rpages; i++) {
> - inode_dec_dirty_pages(cc->inode);
> - unlock_page(cc->rpages[i]);
> - }
> -}
> -
> static struct page *f2fs_grab_page(void)
> {
> struct page *page;
> @@ -321,6 +314,7 @@ static int f2fs_compress_pages(struct compress_ctx *cc)
> trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx,
> cc->clen, ret);
> return 0;
> +
> out_vunmap_cbuf:
> vunmap(cc->cbuf);
> out_vunmap_rbuf:
> @@ -393,10 +387,9 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
> vunmap(dic->rbuf);
> out_free_dic:
> f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
> - f2fs_free_dic(dic);
> -
> trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
> dic->clen, ret);
> + f2fs_free_dic(dic);
> }
>
> static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
> @@ -443,51 +436,25 @@ static bool __cluster_may_compress(struct compress_ctx *cc)
> return false;
> if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> return false;
> - if (f2fs_is_drop_cache(cc->inode))
> - return false;
> - if (f2fs_is_volatile_file(cc->inode))
> - return false;
>
> offset = i_size & (PAGE_SIZE - 1);
> if ((page->index > end_index) ||
> (page->index == end_index && !offset))
> return false;
> + if (page->index != start_idx_of_cluster(cc) + i)
> + return false;

Should this be a bug?

> }
> return true;
> }
>
> -int f2fs_is_cluster_existed(struct compress_ctx *cc)
> -{
> - struct dnode_of_data dn;
> - unsigned int start_idx = start_idx_of_cluster(cc);
> - int ret;
> - int i;
> -
> - set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
> - ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
> - if (ret)
> - return ret;
> -
> - for (i = 0; i < cc->cluster_size; i++, dn.ofs_in_node++) {
> - block_t blkaddr = datablock_addr(dn.inode, dn.node_page,
> - dn.ofs_in_node);
> - if (blkaddr == COMPRESS_ADDR) {
> - ret = 1;
> - break;
> - }
> - if (__is_valid_data_blkaddr(blkaddr)) {
> - ret = 2;
> - break;
> - }
> - }
> - f2fs_put_dnode(&dn);
> - return ret;
> -}
> -
> static bool cluster_may_compress(struct compress_ctx *cc)
> {
> if (!f2fs_compressed_file(cc->inode))
> return false;
> + if (f2fs_is_atomic_file(cc->inode))
> + return false;
> + if (f2fs_is_mmap_file(cc->inode))
> + return false;
> if (!f2fs_cluster_is_full(cc))
> return false;
> return __cluster_may_compress(cc);
> @@ -495,19 +462,59 @@ static bool cluster_may_compress(struct compress_ctx *cc)
>
> void f2fs_reset_compress_ctx(struct compress_ctx *cc)
> {
> - if (cc->rpages)
> - memset(cc->rpages, 0, sizeof(struct page *) * cc->cluster_size);
> cc->nr_rpages = 0;
> cc->nr_cpages = 0;
> cc->cluster_idx = NULL_CLUSTER;
> }
>
> +int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
> +{
> + struct dnode_of_data dn;
> + unsigned int start_idx = cluster_idx(cc, index) * cc->cluster_size;
> + int ret, i;
> +
> + set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
> + ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
> + if (ret) {
> + if (ret == -ENOENT)
> + ret = 0;
> + goto fail;
> + }
> + if (dn.data_blkaddr == COMPRESS_ADDR) {
> + ret = CLUSTER_IS_FULL;
> + for (i = 1; i < cc->cluster_size; i++) {
> + block_t blkaddr;
> +
> + blkaddr = datablock_addr(dn.inode,
> + dn.node_page, dn.ofs_in_node + i);
> + if (blkaddr == NULL_ADDR) {
> + ret = CLUSTER_HAS_SPACE;
> + break;
> + }
> + }
> + }
> +fail:
> + f2fs_put_dnode(&dn);
> + return ret;
> +}
> +
> +int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index)
> +{
> + struct compress_ctx cc = {
> + .inode = inode,
> + .cluster_size = F2FS_I(inode)->i_cluster_size,
> + };
> +
> + return is_compressed_cluster(&cc, index);
> +}
> +
> static void set_cluster_writeback(struct compress_ctx *cc)
> {
> int i;
>
> for (i = 0; i < cc->cluster_size; i++)
> - set_page_writeback(cc->rpages[i]);
> + if (cc->rpages[i])
> + set_page_writeback(cc->rpages[i]);
> }
>
> static void set_cluster_dirty(struct compress_ctx *cc)
> @@ -515,17 +522,17 @@ static void set_cluster_dirty(struct compress_ctx *cc)
> int i;
>
> for (i = 0; i < cc->cluster_size; i++)
> - set_page_dirty(cc->rpages[i]);
> + if (cc->rpages[i])
> + set_page_dirty(cc->rpages[i]);
> }
>
> -int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
> - struct page **pagep, pgoff_t index,
> - void **fsdata, bool prealloc)
> +static int prepare_compress_overwrite(struct compress_ctx *cc,
> + struct page **pagep, pgoff_t index, void **fsdata,
> + bool prealloc)
> {
> - struct inode *inode = cc->inode;
> - struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
> struct bio *bio = NULL;
> - struct address_space *mapping = inode->i_mapping;
> + struct address_space *mapping = cc->inode->i_mapping;
> struct page *page;
> struct dnode_of_data dn;
> sector_t last_block_in_bio;
> @@ -586,13 +593,12 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
> }
> goto retry;
> }
> -
> }
>
> if (prealloc) {
> __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
>
> - set_new_dnode(&dn, inode, NULL, NULL, 0);
> + set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
>
> for (i = cc->cluster_size - 1; i > 0; i--) {
> ret = f2fs_get_block(&dn, start_idx + i);
> @@ -609,7 +615,8 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
>
> *fsdata = cc->rpages;
> *pagep = cc->rpages[offset_in_cluster(cc, index)];
> - return 0;
> + return CLUSTER_IS_FULL;
> +
> unlock_pages:
> for (idx = 0; idx < i; idx++) {
> if (cc->rpages[idx])
> @@ -626,13 +633,34 @@ int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
> return ret;
> }
>
> -void f2fs_compress_write_end(struct inode *inode, void *fsdata,
> - bool written)
> +int f2fs_prepare_compress_overwrite(struct inode *inode,
> + struct page **pagep, pgoff_t index, void **fsdata)
> +{
> + struct compress_ctx cc = {
> + .inode = inode,
> + .cluster_size = F2FS_I(inode)->i_cluster_size,
> + .cluster_idx = NULL_CLUSTER,
> + .rpages = NULL,
> + .nr_rpages = 0,
> + };
> + int ret = is_compressed_cluster(&cc, index);
> +
> + if (ret <= 0)
> + return ret;
> +
> + /* compressed case */
> + return prepare_compress_overwrite(&cc, pagep, index,
> + fsdata, ret == CLUSTER_HAS_SPACE);
> +}
> +
> +bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
> + pgoff_t index, bool written)
> {
> struct compress_ctx cc = {
> .cluster_size = F2FS_I(inode)->i_cluster_size,
> .rpages = fsdata,
> };
> + bool first_index = (index == cc.rpages[0]->index);
> int i;
>
> if (written)
> @@ -640,6 +668,11 @@ void f2fs_compress_write_end(struct inode *inode, void *fsdata,
>
> for (i = 0; i < cc.cluster_size; i++)
> f2fs_put_page(cc.rpages[i], 1);
> +
> + f2fs_destroy_compress_ctx(&cc);
> +
> + return first_index;
> +
> }
>
> static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> @@ -723,6 +756,8 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
>
> blkaddr = datablock_addr(dn.inode, dn.node_page,
> dn.ofs_in_node);
> + fio.page = cc->rpages[i];
> + fio.old_blkaddr = blkaddr;
>
> /* cluster header */
> if (i == 0) {
> @@ -731,7 +766,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> if (__is_valid_data_blkaddr(blkaddr))
> f2fs_invalidate_blocks(sbi, blkaddr);
> f2fs_update_data_blkaddr(&dn, COMPRESS_ADDR);
> - continue;
> + goto unlock_continue;
> }
>
> if (pre_compressed_blocks && __is_valid_data_blkaddr(blkaddr))
> @@ -742,13 +777,11 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> f2fs_invalidate_blocks(sbi, blkaddr);
> f2fs_update_data_blkaddr(&dn, NEW_ADDR);
> }
> - continue;
> + goto unlock_continue;
> }
>
> f2fs_bug_on(fio.sbi, blkaddr == NULL_ADDR);
>
> - fio.page = cc->rpages[i];
> - fio.old_blkaddr = blkaddr;
>
> if (fio.encrypted)
> fio.encrypted_page = cc->cpages[i - 1];
> @@ -759,6 +792,9 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> cc->cpages[i - 1] = NULL;
> f2fs_outplace_write_data(&dn, &fio);
> (*submitted)++;
> +unlock_continue:
> + inode_dec_dirty_pages(cc->inode);
> + unlock_page(fio.page);
> }
>
> if (pre_compressed_blocks) {
> @@ -778,10 +814,6 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> f2fs_put_dnode(&dn);
> f2fs_unlock_op(sbi);
>
> - f2fs_release_cluster_pages(cc);
> -
> - cc->rpages = NULL;
> -
> if (err) {
> file_set_keep_isize(inode);
> } else {
> @@ -791,6 +823,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
> up_write(&fi->i_sem);
> }
> return 0;
> +
> out_destroy_crypt:
> for (i -= 1; i >= 0; i--)
> fscrypt_finalize_bounce_page(&cc->cpages[i]);
> @@ -824,12 +857,13 @@ void f2fs_compress_write_end_io(struct bio *bio, struct page *page)
> return;
>
> for (i = 0; i < cic->nr_rpages; i++) {
> + WARN_ON(!cic->rpages[i]);
> clear_cold_data(cic->rpages[i]);
> end_page_writeback(cic->rpages[i]);
> }
>
> - kvfree(cic->rpages);
> - kvfree(cic);
> + kfree(cic->rpages);
> + kfree(cic);
> }
>
> static int f2fs_write_raw_pages(struct compress_ctx *cc,
> @@ -843,6 +877,7 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
> for (i = 0; i < cc->cluster_size; i++) {
> if (!cc->rpages[i])
> continue;
> + BUG_ON(!PageLocked(cc->rpages[i]));
> ret = f2fs_write_single_data_page(cc->rpages[i], &_submitted,
> NULL, NULL, wbc, io_type);
> if (ret) {
> @@ -855,9 +890,10 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
> *submitted += _submitted;
> }
> return 0;
> +
> out_fail:
> /* TODO: revoke partially updated block addresses */
> - for (i += 1; i < cc->cluster_size; i++) {
> + for (++i; i < cc->cluster_size; i++) {
> if (!cc->rpages[i])
> continue;
> redirty_page_for_writepage(wbc, cc->rpages[i]);
> @@ -890,9 +926,14 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
> }
> write:
> if (err == -EAGAIN) {
> + bool compressed = false;
> +
> f2fs_bug_on(F2FS_I_SB(cc->inode), *submitted);
> + if (is_compressed_cluster(cc, start_idx_of_cluster(cc)))
> + compressed = true;
> +
> err = f2fs_write_raw_pages(cc, submitted, wbc, io_type);
> - if (f2fs_is_cluster_existed(cc) == 1) {
> + if (compressed) {
> stat_sub_compr_blocks(cc->inode, *submitted);
> F2FS_I(cc->inode)->i_compressed_blocks -= *submitted;
> f2fs_mark_inode_dirty_sync(cc->inode, true);
> @@ -902,37 +943,6 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
> return err;
> }
>
> -int f2fs_is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
> -{
> - struct dnode_of_data dn;
> - unsigned int start_idx = cluster_idx(cc, index) * cc->cluster_size;
> - int ret, i;
> -
> - set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
> - ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
> - if (ret) {
> - if (ret == -ENOENT)
> - ret = 0;
> - goto fail;
> - }
> - if (dn.data_blkaddr == COMPRESS_ADDR) {
> - ret = CLUSTER_IS_FULL;
> - for (i = 1; i < cc->cluster_size; i++) {
> - block_t blkaddr;
> -
> - blkaddr = datablock_addr(dn.inode,
> - dn.node_page, dn.ofs_in_node + i);
> - if (blkaddr == NULL_ADDR) {
> - ret = CLUSTER_HAS_SPACE;
> - break;
> - }
> - }
> - }
> -fail:
> - f2fs_put_dnode(&dn);
> - return ret;
> -}
> -
> struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
> @@ -991,9 +1001,8 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
>
> dic->rpages = cc->rpages;
> dic->nr_rpages = cc->cluster_size;
> -
> - cc->rpages = NULL;
> return dic;
> +
> out_free:
> f2fs_free_dic(dic);
> out:
> @@ -1011,7 +1020,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
> unlock_page(dic->tpages[i]);
> put_page(dic->tpages[i]);
> }
> - kvfree(dic->tpages);
> + kfree(dic->tpages);
> }
>
> if (dic->cpages) {
> @@ -1020,11 +1029,11 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
> continue;
> f2fs_put_compressed_page(dic->cpages[i]);
> }
> - kvfree(dic->cpages);
> + kfree(dic->cpages);
> }
>
> - kvfree(dic->rpages);
> - kvfree(dic);
> + kfree(dic->rpages);
> + kfree(dic);
> }
>
> void f2fs_set_cluster_uptodate(struct page **rpages,
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index bac96c3a8bc9..b8e0431747b1 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1925,18 +1925,18 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
>
> /* get rid of pages beyond EOF */
> - for (i = cc->nr_rpages - 1; i >= 0; i--) {
> + for (i = 0; i < cc->cluster_size; i++) {
> struct page *page = cc->rpages[i];
>
> if (!page)
> continue;
> - if ((sector_t)page->index < last_block_in_file)
> - break;
> -
> - zero_user_segment(page, 0, PAGE_SIZE);
> - if (!PageUptodate(page))
> - SetPageUptodate(page);
> -
> + if ((sector_t)page->index >= last_block_in_file) {
> + zero_user_segment(page, 0, PAGE_SIZE);
> + if (!PageUptodate(page))
> + SetPageUptodate(page);
> + } else if (!PageUptodate(page)) {
> + continue;
> + }
> unlock_page(page);
> cc->rpages[i] = NULL;
> cc->nr_rpages--;
> @@ -2031,6 +2031,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> f2fs_reset_compress_ctx(cc);
> *bio_ret = bio;
> return 0;
> +
> out_put_dnode:
> f2fs_put_dnode(&dn);
> out:
> @@ -2100,7 +2101,7 @@ int f2fs_mpage_readpages(struct address_space *mapping,
> if (ret)
> goto set_error_page;
> }
> - ret = f2fs_is_compressed_cluster(&cc, page->index);
> + ret = f2fs_is_compressed_cluster(inode, page->index);
> if (ret < 0)
> goto set_error_page;
> else if (!ret)
> @@ -2457,7 +2458,8 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
> if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> goto redirty_out;
>
> - if (page->index < end_index || f2fs_verity_in_progress(inode))
> + if (f2fs_compressed_file(inode) ||
> + page->index < end_index || f2fs_verity_in_progress(inode))
> goto write;
>
> /*
> @@ -2533,7 +2535,6 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
> f2fs_remove_dirty_inode(inode);
> submitted = NULL;
> }
> -
> unlock_page(page);
> if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode) &&
> !F2FS_I(inode)->cp_task)
> @@ -2567,6 +2568,15 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
> static int f2fs_write_data_page(struct page *page,
> struct writeback_control *wbc)
> {
> + struct inode *inode = page->mapping->host;
> +
> + if (f2fs_compressed_file(inode)) {
> + if (f2fs_is_compressed_cluster(inode, page->index)) {
> + redirty_page_for_writepage(wbc, page);
> + return AOP_WRITEPAGE_ACTIVATE;
> + }
> + }
> +
> return f2fs_write_single_data_page(page, NULL, NULL, NULL,
> wbc, FS_DATA_IO);
> }
> @@ -2581,7 +2591,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
> enum iostat_type io_type)
> {
> int ret = 0;
> - int done = 0;
> + int done = 0, retry = 0;
> struct pagevec pvec;
> struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
> struct bio *bio = NULL;
> @@ -2639,10 +2649,11 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
> else
> tag = PAGECACHE_TAG_DIRTY;
> retry:
> + retry = 0;
> if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> tag_pages_for_writeback(mapping, index, end);
> done_index = index;
> - while (!done && (index <= end)) {
> + while (!done && !retry && (index <= end)) {
> nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
> tag);
> if (nr_pages == 0)
> @@ -2650,25 +2661,42 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
>
> for (i = 0; i < nr_pages; i++) {
> struct page *page = pvec.pages[i];
> - bool need_readd = false;
> -
> + bool need_readd;
> readd:
> + need_readd = false;
> if (f2fs_compressed_file(inode)) {
> + void *fsdata = NULL;
> + struct page *pagep;
> + int ret2;
> +
> ret = f2fs_init_compress_ctx(&cc);
> if (ret) {
> done = 1;
> break;
> }
>
> - if (!f2fs_cluster_can_merge_page(&cc,
> - page->index)) {
> - need_readd = true;
> + if (!f2fs_cluster_can_merge_page(&cc, page->index)) {
> ret = f2fs_write_multi_pages(&cc,
> - &submitted, wbc, io_type);
> + &submitted, wbc, io_type);
> + if (!ret)
> + need_readd = true;
> goto result;
> }
> + if (f2fs_cluster_is_empty(&cc)) {
> + ret2 = f2fs_prepare_compress_overwrite(inode,
> + &pagep, page->index, &fsdata);
> + if (ret2 < 0) {
> + ret = ret2;
> + done = 1;
> + break;
> + } else if (ret2 &&
> + !f2fs_compress_write_end(inode, fsdata,
> + page->index, true)) {
> + retry = 1;
> + break;
> + }
> + }
> }
> -
> /* give a priority to WB_SYNC threads */
> if (atomic_read(&sbi->wb_sync_req[DATA]) &&
> wbc->sync_mode == WB_SYNC_NONE) {
> @@ -2702,7 +2730,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
> if (!clear_page_dirty_for_io(page))
> goto continue_unlock;
>
> - if (f2fs_compressed_file(mapping->host)) {
> + if (f2fs_compressed_file(inode)) {
> ret = f2fs_compress_ctx_add_page(&cc, page);
> f2fs_bug_on(sbi, ret);
> continue;
> @@ -2754,7 +2782,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
> /* TODO: error handling */
> }
>
> - if (!cycled && !done) {
> + if ((!cycled && !done) || retry) {
> cycled = 1;
> index = 0;
> end = writeback_index - 1;
> @@ -2770,8 +2798,6 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
> if (bio)
> f2fs_submit_merged_ipu_write(sbi, &bio, NULL);
>
> - f2fs_destroy_compress_ctx(&cc);
> -
> return ret;
> }
>
> @@ -3017,26 +3043,18 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
> }
>
> if (f2fs_compressed_file(inode)) {
> - struct compress_ctx cc = {
> - .inode = inode,
> - .cluster_size = F2FS_I(inode)->i_cluster_size,
> - .cluster_idx = NULL_CLUSTER,
> - .rpages = NULL,
> - .nr_rpages = 0,
> - };
> + int ret;
>
> *fsdata = NULL;
>
> - err = f2fs_is_compressed_cluster(&cc, index);
> - if (err < 0)
> + ret = f2fs_prepare_compress_overwrite(inode, pagep,
> + index, fsdata);
> + if (ret < 0) {
> + err = ret;
> goto fail;
> - if (!err)
> - goto repeat;
> -
> - err = f2fs_prepare_compress_overwrite(&cc, pagep, index, fsdata,
> - err == CLUSTER_HAS_SPACE);
> - /* need to goto fail? */
> - return err;
> + } else if (ret) {
> + return 0;
> + }
> }
>
> repeat:
> @@ -3139,7 +3157,7 @@ static int f2fs_write_end(struct file *file,
>
> /* overwrite compressed file */
> if (f2fs_compressed_file(inode) && fsdata) {
> - f2fs_compress_write_end(inode, fsdata, copied);
> + f2fs_compress_write_end(inode, fsdata, page->index, copied);
> goto update_time;
> }
>
> @@ -3534,6 +3552,15 @@ static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
> if (ret)
> return ret;
>
> + if (f2fs_compressed_file(inode)) {
> + if (F2FS_I(inode)->i_compressed_blocks)
> + return -EINVAL;
> +
> + F2FS_I(inode)->i_flags &= ~FS_COMPR_FL;
> + clear_inode_flag(inode, FI_COMPRESSED_FILE);
> + stat_dec_compr_inode(inode);
> + }
> +
> ret = check_swap_activate(file, sis->max);
> if (ret)
> return ret;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index d22a4e2bb8b8..9c3399fdd6c1 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2541,6 +2541,7 @@ enum {
> FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
> FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
> FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
> + FI_MMAP_FILE, /* indicate file was mmapped */
> };
>
> static inline void __mark_inode_dirty_flag(struct inode *inode,
> @@ -2766,6 +2767,11 @@ static inline int f2fs_has_inline_dots(struct inode *inode)
> return is_inode_flag_set(inode, FI_INLINE_DOTS);
> }
>
> +static inline int f2fs_is_mmap_file(struct inode *inode)
> +{
> + return is_inode_flag_set(inode, FI_MMAP_FILE);
> +}
> +
> static inline bool f2fs_is_pinned_file(struct inode *inode)
> {
> return is_inode_flag_set(inode, FI_PIN_FILE);
> @@ -3609,7 +3615,7 @@ void f2fs_destroy_root_stats(void);
> #define stat_inc_atomic_write(inode) do { } while (0)
> #define stat_dec_atomic_write(inode) do { } while (0)
> #define stat_inc_compr_blocks(inode) do { } while (0)
> -#define stat_dec_compr_blocks(inode) do { } while (0)
> +#define stat_sub_compr_blocks(inode) do { } while (0)
> #define stat_update_max_atomic_write(inode) do { } while (0)
> #define stat_inc_volatile_write(inode) do { } while (0)
> #define stat_dec_volatile_write(inode) do { } while (0)
> @@ -3755,13 +3761,13 @@ static inline bool f2fs_post_read_required(struct inode *inode)
> * compress.c
> */
> bool f2fs_is_compressed_page(struct page *page);
> +int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index);
> struct page *f2fs_compress_control_page(struct page *page);
> void f2fs_reset_compress_ctx(struct compress_ctx *cc);
> -int f2fs_prepare_compress_overwrite(struct compress_ctx *cc,
> - struct page **page_ret, pgoff_t index,
> - void **fsdata, bool prealloc);
> -void f2fs_compress_write_end(struct inode *inode, void *fsdata,
> - bool written);
> +int f2fs_prepare_compress_overwrite(struct inode *inode,
> + struct page **pagep, pgoff_t index, void **fsdata);
> +bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
> + pgoff_t index, bool written);
> void f2fs_compress_write_end_io(struct bio *bio, struct page *page);
> void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity);
> bool f2fs_cluster_is_empty(struct compress_ctx *cc);
> @@ -3771,7 +3777,7 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
> int *submitted,
> struct writeback_control *wbc,
> enum iostat_type io_type);
> -int f2fs_is_compressed_cluster(struct compress_ctx *cc, pgoff_t index);
> +int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index);
> int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> unsigned nr_pages, sector_t *last_block_in_bio,
> bool is_readahead);
> @@ -3923,6 +3929,8 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
> return true;
> if (f2fs_is_multi_device(sbi))
> return true;
> + if (f2fs_compressed_file(inode))
> + return true;
> /*
> * for blkzoned device, fallback direct IO to buffered IO, so
> * all IOs can be serialized by log-structured write.
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 8a92e8fd648c..99380c419b87 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -51,7 +51,8 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
> struct inode *inode = file_inode(vmf->vma->vm_file);
> struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> struct dnode_of_data dn = { .node_changed = false };
> - int err;
> + bool need_alloc = true;
> + int err = 0;
>
> if (unlikely(f2fs_cp_error(sbi))) {
> err = -EIO;
> @@ -63,6 +64,18 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
> goto err;
> }
>
> + if (f2fs_compressed_file(inode)) {
> + int ret = f2fs_is_compressed_cluster(inode, page->index);
> +
> + if (ret < 0) {
> + err = ret;
> + goto err;
> + } else if (ret) {
> + f2fs_bug_on(sbi, ret == CLUSTER_HAS_SPACE);
> + need_alloc = false;
> + }
> + }
> +
> sb_start_pagefault(inode->i_sb);
>
> f2fs_bug_on(sbi, f2fs_has_inline_data(inode));
> @@ -78,15 +91,17 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf)
> goto out_sem;
> }
>
> - /* block allocation */
> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
> - set_new_dnode(&dn, inode, NULL, NULL, 0);
> - err = f2fs_get_block(&dn, page->index);
> - f2fs_put_dnode(&dn);
> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
> - if (err) {
> - unlock_page(page);
> - goto out_sem;
> + if (need_alloc) {
> + /* block allocation */
> + __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
> + set_new_dnode(&dn, inode, NULL, NULL, 0);
> + err = f2fs_get_block(&dn, page->index);
> + f2fs_put_dnode(&dn);
> + __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
> + if (err) {
> + unlock_page(page);
> + goto out_sem;
> + }
> }
>
> /* fill the page */
> @@ -492,6 +507,7 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
>
> file_accessed(file);
> vma->vm_ops = &f2fs_file_vm_ops;
> + set_inode_flag(inode, FI_MMAP_FILE);
> return 0;
> }
>
> @@ -1781,8 +1797,18 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
> return -EINVAL;
> if (iflags & FS_NOCOMP_FL)
> return -EINVAL;
> - if (S_ISREG(inode->i_mode))
> - clear_inode_flag(inode, FI_INLINE_DATA);
> + if (fi->i_flags & FS_COMPR_FL) {

i_flags & F2FS_COMPR_FL

Need to change all FS_{COMPR, NOCOMP}_FL to F2FS_{COMPR, NOCOMP}_FL

> + int err = f2fs_convert_inline_inode(inode);
> +
> + if (err)
> + return err;
> +
> + if (!f2fs_may_compress(inode))
> + return -EINVAL;
> +
> + set_inode_flag(inode, FI_COMPRESSED_FILE);
> + stat_inc_compr_inode(inode);
> + }
> }
> if ((iflags ^ fi->i_flags) & FS_NOCOMP_FL) {
> if (fi->i_flags & FS_COMPR_FL)
> @@ -1793,19 +1819,6 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
> f2fs_bug_on(F2FS_I_SB(inode), (fi->i_flags & FS_COMPR_FL) &&
> (fi->i_flags & FS_NOCOMP_FL));
>
> - if (fi->i_flags & FS_COMPR_FL) {
> - int err = f2fs_convert_inline_inode(inode);
> -
> - if (err)
> - return err;
> -
> - if (!f2fs_may_compress(inode))
> - return -EINVAL;
> -
> - set_inode_flag(inode, FI_COMPRESSED_FILE);
> - stat_inc_compr_inode(inode);
> - }
> -
> if (fi->i_flags & F2FS_PROJINHERIT_FL)
> set_inode_flag(inode, FI_PROJ_INHERIT);
> else
> @@ -1988,6 +2001,12 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
>
> inode_lock(inode);
>
> + if (f2fs_compressed_file(inode) && !fi->i_compressed_blocks) {
> + fi->i_flags &= ~FS_COMPR_FL;
> + clear_inode_flag(inode, FI_COMPRESSED_FILE);
> + stat_dec_compr_inode(inode);
> + }
> +
> if (f2fs_is_atomic_file(inode)) {
> if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST))
> ret = -EINVAL;
> @@ -3190,7 +3209,7 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg)
> }
>
> if (f2fs_compressed_file(inode)) {
> - if (F2FS_HAS_BLOCKS(inode) || i_size_read(inode)) {
> + if (F2FS_I(inode)->i_compressed_blocks) {
> ret = -EOPNOTSUPP;
> goto out;
> }
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index 9f37e95c4a4b..ac0c51cefca2 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -128,9 +128,11 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
> 1 << F2FS_I(inode)->i_log_cluster_size;
>
> /* Inherit the compression flag in directory */
> - if ((F2FS_I(inode)->i_flags & FS_COMPR_FL) &&
> - f2fs_may_compress(inode))
> + if ((F2FS_I(dir)->i_flags & FS_COMPR_FL) &&
> + f2fs_may_compress(inode)) {
> + F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
> set_inode_flag(inode, FI_COMPRESSED_FILE);
> + }
> }
>
> f2fs_set_inode_flags(inode);
> @@ -282,6 +284,7 @@ int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
> static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
> const unsigned char *name)
> {
> + __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
> unsigned char (*ext)[F2FS_EXTENSION_LEN];
> unsigned int ext_cnt = F2FS_OPTION(sbi).compress_ext_cnt;
> int i, cold_count, hot_count;
> @@ -292,13 +295,24 @@ static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
> !f2fs_may_compress(inode))
> return;
>
> + down_read(&sbi->sb_lock);
> +
> ext = F2FS_OPTION(sbi).extensions;
>
> cold_count = le32_to_cpu(sbi->raw_super->extension_count);
> hot_count = sbi->raw_super->hot_ext_count;
>
> + for (i = cold_count; i < cold_count + hot_count; i++) {
> + if (is_extension_exist(name, extlist[i])) {
> + up_read(&sbi->sb_lock);
> + return;
> + }
> + }
> +
> + up_read(&sbi->sb_lock);
> +
> for (i = 0; i < ext_cnt; i++) {
> - if (is_extension_exist(name, ext[i]) && !file_is_hot(inode)) {
> + if (is_extension_exist(name, ext[i])) {
> F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
> set_inode_flag(inode, FI_COMPRESSED_FILE);
> return;
>

2019-11-13 13:12:05

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

Hi Jaegeuk,

I've split workqueue for fsverity, please test compression based on last patch.

I shutdown F2FS_FS_COMPRESSION config, it looks all verity testcases can pass, will
do more test for compress/encrypt/fsverity combination later.

The diff is as below, code base is last g-dev-test branch:

From 5b51682bc3013b8de6dee4906865181c3ded435f Mon Sep 17 00:00:00 2001
From: Chao Yu <[email protected]>
Date: Tue, 12 Nov 2019 10:03:21 +0800
Subject: [PATCH INCREMENT] f2fs: compress: split workqueue for fsverity

Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/compress.c | 16 +++++---
fs/f2fs/data.c | 94 +++++++++++++++++++++++++++++++++++-----------
fs/f2fs/f2fs.h | 2 +-
3 files changed, 84 insertions(+), 28 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index f4ce825f12b4..254275325890 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -377,7 +377,7 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)

dec_page_count(sbi, F2FS_RD_DATA);

- if (bio->bi_status)
+ if (bio->bi_status || PageError(page))
dic->failed = true;

if (refcount_dec_not_one(&dic->ref))
@@ -419,10 +419,14 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
out_vunmap_rbuf:
vunmap(dic->rbuf);
out_free_dic:
- f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
+ if (!verity)
+ f2fs_decompress_end_io(dic->rpages, dic->cluster_size,
+ ret, false);
+
trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
- dic->clen, ret);
- f2fs_free_dic(dic);
+ dic->clen, ret);
+ if (!verity)
+ f2fs_free_dic(dic);
}

static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
@@ -1086,7 +1090,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
kfree(dic);
}

-void f2fs_set_cluster_uptodate(struct page **rpages,
+void f2fs_decompress_end_io(struct page **rpages,
unsigned int cluster_size, bool err, bool verity)
{
int i;
@@ -1108,4 +1112,4 @@ void f2fs_set_cluster_uptodate(struct page **rpages,
}
unlock_page(rpage);
}
-}
+}
\ No newline at end of file
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c9362a53f8a1..2d64c6ffee84 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -98,7 +98,7 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
page = bv->bv_page;

#ifdef CONFIG_F2FS_FS_COMPRESSION
- if (compr && PagePrivate(page)) {
+ if (compr && f2fs_is_compressed_page(page)) {
f2fs_decompress_pages(bio, page, verity);
continue;
}
@@ -115,9 +115,14 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
dec_page_count(F2FS_P_SB(page), __read_io_type(page));
unlock_page(page);
}
- if (bio->bi_private)
- mempool_free(bio->bi_private, bio_post_read_ctx_pool);
- bio_put(bio);
+}
+
+static void f2fs_release_read_bio(struct bio *bio);
+static void __f2fs_read_end_io(struct bio *bio, bool compr, bool verity)
+{
+ if (!compr)
+ __read_end_io(bio, false, verity);
+ f2fs_release_read_bio(bio);
}

static void f2fs_decompress_bio(struct bio *bio, bool verity)
@@ -127,19 +132,50 @@ static void f2fs_decompress_bio(struct bio *bio, bool verity)

static void bio_post_read_processing(struct bio_post_read_ctx *ctx);

-static void decrypt_work(struct bio_post_read_ctx *ctx)
+static void f2fs_decrypt_work(struct bio_post_read_ctx *ctx)
{
fscrypt_decrypt_bio(ctx->bio);
}

-static void decompress_work(struct bio_post_read_ctx *ctx, bool verity)
+static void f2fs_decompress_work(struct bio_post_read_ctx *ctx)
+{
+ f2fs_decompress_bio(ctx->bio, ctx->enabled_steps & (1 << STEP_VERITY));
+}
+
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+void f2fs_verify_pages(struct page **rpages, unsigned int cluster_size)
{
- f2fs_decompress_bio(ctx->bio, verity);
+ f2fs_decompress_end_io(rpages, cluster_size, false, true);
}

-static void verity_work(struct bio_post_read_ctx *ctx)
+static void f2fs_verify_bio(struct bio *bio)
{
+ struct page *page = bio_first_page_all(bio);
+ struct decompress_io_ctx *dic =
+ (struct decompress_io_ctx *)page_private(page);
+
+ f2fs_verify_pages(dic->rpages, dic->cluster_size);
+ f2fs_free_dic(dic);
+}
+#endif
+
+static void f2fs_verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+ /* previous step is decompression */
+ if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
+
+ f2fs_verify_bio(ctx->bio);
+ f2fs_release_read_bio(ctx->bio);
+ return;
+ }
+#endif
+
fsverity_verify_bio(ctx->bio);
+ __f2fs_read_end_io(ctx->bio, false, false);
}

static void f2fs_post_read_work(struct work_struct *work)
@@ -148,18 +184,19 @@ static void f2fs_post_read_work(struct work_struct *work)
container_of(work, struct bio_post_read_ctx, work);

if (ctx->enabled_steps & (1 << STEP_DECRYPT))
- decrypt_work(ctx);
+ f2fs_decrypt_work(ctx);

- if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
- decompress_work(ctx,
- ctx->enabled_steps & (1 << STEP_VERITY));
+ if (ctx->enabled_steps & (1 << STEP_DECOMPRESS))
+ f2fs_decompress_work(ctx);
+
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, f2fs_verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
return;
}

- if (ctx->enabled_steps & (1 << STEP_VERITY))
- verity_work(ctx);
-
- __read_end_io(ctx->bio, false, false);
+ __f2fs_read_end_io(ctx->bio,
+ ctx->enabled_steps & (1 << STEP_DECOMPRESS), false);
}

static void f2fs_enqueue_post_read_work(struct f2fs_sb_info *sbi,
@@ -176,12 +213,20 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
* we shouldn't recurse to the same workqueue.
*/

- if (ctx->enabled_steps) {
+ if (ctx->enabled_steps & (1 << STEP_DECRYPT) ||
+ ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
INIT_WORK(&ctx->work, f2fs_post_read_work);
f2fs_enqueue_post_read_work(ctx->sbi, &ctx->work);
return;
}
- __read_end_io(ctx->bio, false, false);
+
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, f2fs_verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+
+ __f2fs_read_end_io(ctx->bio, false, false);
}

static bool f2fs_bio_post_read_required(struct bio *bio)
@@ -205,7 +250,7 @@ static void f2fs_read_end_io(struct bio *bio)
return;
}

- __read_end_io(bio, false, false);
+ __f2fs_read_end_io(bio, false, false);
}

static void f2fs_write_end_io(struct bio *bio)
@@ -864,6 +909,13 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
return bio;
}

+static void f2fs_release_read_bio(struct bio *bio)
+{
+ if (bio->bi_private)
+ mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+ bio_put(bio);
+}
+
/* This can handle encryption stuffs */
static int f2fs_submit_page_read(struct inode *inode, struct page *page,
block_t blkaddr)
@@ -2023,7 +2075,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
dic->failed = true;
if (refcount_sub_and_test(dic->nr_cpages - i,
&dic->ref))
- f2fs_set_cluster_uptodate(dic->rpages,
+ f2fs_decompress_end_io(dic->rpages,
cc->cluster_size, true,
false);
f2fs_free_dic(dic);
@@ -2053,7 +2105,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
out_put_dnode:
f2fs_put_dnode(&dn);
out:
- f2fs_set_cluster_uptodate(cc->rpages, cc->cluster_size, true, false);
+ f2fs_decompress_end_io(cc->rpages, cc->cluster_size, true, false);
f2fs_destroy_compress_ctx(cc);
*bio_ret = bio;
return ret;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8a3a35b42a37..20067fa3b035 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3795,7 +3795,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
bool is_readahead);
struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
void f2fs_free_dic(struct decompress_io_ctx *dic);
-void f2fs_set_cluster_uptodate(struct page **rpages,
+void f2fs_decompress_end_io(struct page **rpages,
unsigned int cluster_size, bool err, bool verity);
int f2fs_init_compress_ctx(struct compress_ctx *cc);
void f2fs_destroy_compress_ctx(struct compress_ctx *cc);
--
2.18.0.rc1

On 2019/10/31 1:02, Eric Biggers wrote:
> On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
>>>>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>>>>>> {
>>>>>> - /*
>>>>>> - * We use different work queues for decryption and for verity because
>>>>>> - * verity may require reading metadata pages that need decryption, and
>>>>>> - * we shouldn't recurse to the same workqueue.
>>>>>> - */
>>>>>
>>>>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
>>>>> decryption and for verity? See the comment above which is being deleted.
>>>>
>>>> Could you explain more about how deadlock happen? or share me a link address if
>>>> you have described that case somewhere?
>>>>
>>>
>>> The verity work can read pages from the file which require decryption. I'm
>>> concerned that it could deadlock if the work is scheduled on the same workqueue.
>>
>> I assume you've tried one workqueue, and suffered deadlock..
>>
>>> Granted, I'm not an expert in Linux workqueues, so if you've investigated this
>>> and determined that it's safe, can you explain why?
>>
>> I'm not familiar with workqueue... I guess it may not safe that if the work is
>> scheduled to the same cpu in where verity was waiting for data? if the work is
>> scheduled to other cpu, it may be safe.
>>
>> I can check that before splitting the workqueue for verity and decrypt/decompress.
>>
>
> Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
> The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
> the Merkle tree page to be decrypted. This is with the v2 compression patch;
> it works fine on current mainline.
>
> INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
> Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:0 D 0 61 2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
> context_switch kernel/sched/core.c:3384 [inline]
> __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> schedule+0x44/0xd0 kernel/sched/core.c:4136
> io_schedule+0x11/0x40 kernel/sched/core.c:5780
> wait_on_page_bit_common mm/filemap.c:1174 [inline]
> wait_on_page_bit mm/filemap.c:1223 [inline]
> wait_on_page_locked include/linux/pagemap.h:527 [inline]
> wait_on_page_locked include/linux/pagemap.h:524 [inline]
> wait_on_page_read mm/filemap.c:2767 [inline]
> do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> read_cache_page+0xd/0x10 mm/filemap.c:2894
> f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> verify_page+0x110/0x560 fs/verity/verify.c:120
> fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> verity_work fs/f2fs/data.c:142 [inline]
> f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> process_one_work+0x225/0x550 kernel/workqueue.c:2269
> worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> kthread+0x125/0x140 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
> Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:1 D 0 1140 2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
> context_switch kernel/sched/core.c:3384 [inline]
> __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> schedule+0x44/0xd0 kernel/sched/core.c:4136
> io_schedule+0x11/0x40 kernel/sched/core.c:5780
> wait_on_page_bit_common mm/filemap.c:1174 [inline]
> wait_on_page_bit mm/filemap.c:1223 [inline]
> wait_on_page_locked include/linux/pagemap.h:527 [inline]
> wait_on_page_locked include/linux/pagemap.h:524 [inline]
> wait_on_page_read mm/filemap.c:2767 [inline]
> do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> read_cache_page+0xd/0x10 mm/filemap.c:2894
> f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> verify_page+0x110/0x560 fs/verity/verify.c:120
> fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> verity_work fs/f2fs/data.c:142 [inline]
> f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> process_one_work+0x225/0x550 kernel/workqueue.c:2269
> worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> kthread+0x125/0x140 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/21:
> #0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
> 2 locks held by kworker/u5:0/61:
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> 2 locks held by kworker/u5:1/1140:
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> .
>

2019-11-18 16:14:01

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 11/13, Chao Yu wrote:
> Hi Jaegeuk,
>
> I've split workqueue for fsverity, please test compression based on last patch.
>
> I shutdown F2FS_FS_COMPRESSION config, it looks all verity testcases can pass, will
> do more test for compress/encrypt/fsverity combination later.

Thanks, I applied and start some tests.

>
> The diff is as below, code base is last g-dev-test branch:
>
> >From 5b51682bc3013b8de6dee4906865181c3ded435f Mon Sep 17 00:00:00 2001
> From: Chao Yu <[email protected]>
> Date: Tue, 12 Nov 2019 10:03:21 +0800
> Subject: [PATCH INCREMENT] f2fs: compress: split workqueue for fsverity
>
> Signed-off-by: Chao Yu <[email protected]>
> ---
> fs/f2fs/compress.c | 16 +++++---
> fs/f2fs/data.c | 94 +++++++++++++++++++++++++++++++++++-----------
> fs/f2fs/f2fs.h | 2 +-
> 3 files changed, 84 insertions(+), 28 deletions(-)
>
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index f4ce825f12b4..254275325890 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -377,7 +377,7 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
>
> dec_page_count(sbi, F2FS_RD_DATA);
>
> - if (bio->bi_status)
> + if (bio->bi_status || PageError(page))
> dic->failed = true;
>
> if (refcount_dec_not_one(&dic->ref))
> @@ -419,10 +419,14 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
> out_vunmap_rbuf:
> vunmap(dic->rbuf);
> out_free_dic:
> - f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
> + if (!verity)
> + f2fs_decompress_end_io(dic->rpages, dic->cluster_size,
> + ret, false);
> +
> trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
> - dic->clen, ret);
> - f2fs_free_dic(dic);
> + dic->clen, ret);
> + if (!verity)
> + f2fs_free_dic(dic);
> }
>
> static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
> @@ -1086,7 +1090,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
> kfree(dic);
> }
>
> -void f2fs_set_cluster_uptodate(struct page **rpages,
> +void f2fs_decompress_end_io(struct page **rpages,
> unsigned int cluster_size, bool err, bool verity)
> {
> int i;
> @@ -1108,4 +1112,4 @@ void f2fs_set_cluster_uptodate(struct page **rpages,
> }
> unlock_page(rpage);
> }
> -}
> +}
> \ No newline at end of file
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index c9362a53f8a1..2d64c6ffee84 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -98,7 +98,7 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
> page = bv->bv_page;
>
> #ifdef CONFIG_F2FS_FS_COMPRESSION
> - if (compr && PagePrivate(page)) {
> + if (compr && f2fs_is_compressed_page(page)) {
> f2fs_decompress_pages(bio, page, verity);
> continue;
> }
> @@ -115,9 +115,14 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
> dec_page_count(F2FS_P_SB(page), __read_io_type(page));
> unlock_page(page);
> }
> - if (bio->bi_private)
> - mempool_free(bio->bi_private, bio_post_read_ctx_pool);
> - bio_put(bio);
> +}
> +
> +static void f2fs_release_read_bio(struct bio *bio);
> +static void __f2fs_read_end_io(struct bio *bio, bool compr, bool verity)
> +{
> + if (!compr)
> + __read_end_io(bio, false, verity);
> + f2fs_release_read_bio(bio);
> }
>
> static void f2fs_decompress_bio(struct bio *bio, bool verity)
> @@ -127,19 +132,50 @@ static void f2fs_decompress_bio(struct bio *bio, bool verity)
>
> static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
>
> -static void decrypt_work(struct bio_post_read_ctx *ctx)
> +static void f2fs_decrypt_work(struct bio_post_read_ctx *ctx)
> {
> fscrypt_decrypt_bio(ctx->bio);
> }
>
> -static void decompress_work(struct bio_post_read_ctx *ctx, bool verity)
> +static void f2fs_decompress_work(struct bio_post_read_ctx *ctx)
> +{
> + f2fs_decompress_bio(ctx->bio, ctx->enabled_steps & (1 << STEP_VERITY));
> +}
> +
> +#ifdef CONFIG_F2FS_FS_COMPRESSION
> +void f2fs_verify_pages(struct page **rpages, unsigned int cluster_size)
> {
> - f2fs_decompress_bio(ctx->bio, verity);
> + f2fs_decompress_end_io(rpages, cluster_size, false, true);
> }
>
> -static void verity_work(struct bio_post_read_ctx *ctx)
> +static void f2fs_verify_bio(struct bio *bio)
> {
> + struct page *page = bio_first_page_all(bio);
> + struct decompress_io_ctx *dic =
> + (struct decompress_io_ctx *)page_private(page);
> +
> + f2fs_verify_pages(dic->rpages, dic->cluster_size);
> + f2fs_free_dic(dic);
> +}
> +#endif
> +
> +static void f2fs_verity_work(struct work_struct *work)
> +{
> + struct bio_post_read_ctx *ctx =
> + container_of(work, struct bio_post_read_ctx, work);
> +
> +#ifdef CONFIG_F2FS_FS_COMPRESSION
> + /* previous step is decompression */
> + if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> +
> + f2fs_verify_bio(ctx->bio);
> + f2fs_release_read_bio(ctx->bio);
> + return;
> + }
> +#endif
> +
> fsverity_verify_bio(ctx->bio);
> + __f2fs_read_end_io(ctx->bio, false, false);
> }
>
> static void f2fs_post_read_work(struct work_struct *work)
> @@ -148,18 +184,19 @@ static void f2fs_post_read_work(struct work_struct *work)
> container_of(work, struct bio_post_read_ctx, work);
>
> if (ctx->enabled_steps & (1 << STEP_DECRYPT))
> - decrypt_work(ctx);
> + f2fs_decrypt_work(ctx);
>
> - if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> - decompress_work(ctx,
> - ctx->enabled_steps & (1 << STEP_VERITY));
> + if (ctx->enabled_steps & (1 << STEP_DECOMPRESS))
> + f2fs_decompress_work(ctx);
> +
> + if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> + INIT_WORK(&ctx->work, f2fs_verity_work);
> + fsverity_enqueue_verify_work(&ctx->work);
> return;
> }
>
> - if (ctx->enabled_steps & (1 << STEP_VERITY))
> - verity_work(ctx);
> -
> - __read_end_io(ctx->bio, false, false);
> + __f2fs_read_end_io(ctx->bio,
> + ctx->enabled_steps & (1 << STEP_DECOMPRESS), false);
> }
>
> static void f2fs_enqueue_post_read_work(struct f2fs_sb_info *sbi,
> @@ -176,12 +213,20 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> * we shouldn't recurse to the same workqueue.
> */
>
> - if (ctx->enabled_steps) {
> + if (ctx->enabled_steps & (1 << STEP_DECRYPT) ||
> + ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> INIT_WORK(&ctx->work, f2fs_post_read_work);
> f2fs_enqueue_post_read_work(ctx->sbi, &ctx->work);
> return;
> }
> - __read_end_io(ctx->bio, false, false);
> +
> + if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> + INIT_WORK(&ctx->work, f2fs_verity_work);
> + fsverity_enqueue_verify_work(&ctx->work);
> + return;
> + }
> +
> + __f2fs_read_end_io(ctx->bio, false, false);
> }
>
> static bool f2fs_bio_post_read_required(struct bio *bio)
> @@ -205,7 +250,7 @@ static void f2fs_read_end_io(struct bio *bio)
> return;
> }
>
> - __read_end_io(bio, false, false);
> + __f2fs_read_end_io(bio, false, false);
> }
>
> static void f2fs_write_end_io(struct bio *bio)
> @@ -864,6 +909,13 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> return bio;
> }
>
> +static void f2fs_release_read_bio(struct bio *bio)
> +{
> + if (bio->bi_private)
> + mempool_free(bio->bi_private, bio_post_read_ctx_pool);
> + bio_put(bio);
> +}
> +
> /* This can handle encryption stuffs */
> static int f2fs_submit_page_read(struct inode *inode, struct page *page,
> block_t blkaddr)
> @@ -2023,7 +2075,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> dic->failed = true;
> if (refcount_sub_and_test(dic->nr_cpages - i,
> &dic->ref))
> - f2fs_set_cluster_uptodate(dic->rpages,
> + f2fs_decompress_end_io(dic->rpages,
> cc->cluster_size, true,
> false);
> f2fs_free_dic(dic);
> @@ -2053,7 +2105,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> out_put_dnode:
> f2fs_put_dnode(&dn);
> out:
> - f2fs_set_cluster_uptodate(cc->rpages, cc->cluster_size, true, false);
> + f2fs_decompress_end_io(cc->rpages, cc->cluster_size, true, false);
> f2fs_destroy_compress_ctx(cc);
> *bio_ret = bio;
> return ret;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 8a3a35b42a37..20067fa3b035 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3795,7 +3795,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> bool is_readahead);
> struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
> void f2fs_free_dic(struct decompress_io_ctx *dic);
> -void f2fs_set_cluster_uptodate(struct page **rpages,
> +void f2fs_decompress_end_io(struct page **rpages,
> unsigned int cluster_size, bool err, bool verity);
> int f2fs_init_compress_ctx(struct compress_ctx *cc);
> void f2fs_destroy_compress_ctx(struct compress_ctx *cc);
> --
> 2.18.0.rc1
>
>
>
> On 2019/10/31 1:02, Eric Biggers wrote:
> > On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> >>>>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> >>>>>> {
> >>>>>> - /*
> >>>>>> - * We use different work queues for decryption and for verity because
> >>>>>> - * verity may require reading metadata pages that need decryption, and
> >>>>>> - * we shouldn't recurse to the same workqueue.
> >>>>>> - */
> >>>>>
> >>>>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> >>>>> decryption and for verity? See the comment above which is being deleted.
> >>>>
> >>>> Could you explain more about how deadlock happen? or share me a link address if
> >>>> you have described that case somewhere?
> >>>>
> >>>
> >>> The verity work can read pages from the file which require decryption. I'm
> >>> concerned that it could deadlock if the work is scheduled on the same workqueue.
> >>
> >> I assume you've tried one workqueue, and suffered deadlock..
> >>
> >>> Granted, I'm not an expert in Linux workqueues, so if you've investigated this
> >>> and determined that it's safe, can you explain why?
> >>
> >> I'm not familiar with workqueue... I guess it may not safe that if the work is
> >> scheduled to the same cpu in where verity was waiting for data? if the work is
> >> scheduled to other cpu, it may be safe.
> >>
> >> I can check that before splitting the workqueue for verity and decrypt/decompress.
> >>
> >
> > Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
> > The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
> > the Merkle tree page to be decrypted. This is with the v2 compression patch;
> > it works fine on current mainline.
> >
> > INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
> > Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/u5:0 D 0 61 2 0x80004000
> > Workqueue: f2fs_post_read_wq f2fs_post_read_work
> > Call Trace:
> > context_switch kernel/sched/core.c:3384 [inline]
> > __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> > schedule+0x44/0xd0 kernel/sched/core.c:4136
> > io_schedule+0x11/0x40 kernel/sched/core.c:5780
> > wait_on_page_bit_common mm/filemap.c:1174 [inline]
> > wait_on_page_bit mm/filemap.c:1223 [inline]
> > wait_on_page_locked include/linux/pagemap.h:527 [inline]
> > wait_on_page_locked include/linux/pagemap.h:524 [inline]
> > wait_on_page_read mm/filemap.c:2767 [inline]
> > do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> > read_cache_page+0xd/0x10 mm/filemap.c:2894
> > f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> > verify_page+0x110/0x560 fs/verity/verify.c:120
> > fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> > verity_work fs/f2fs/data.c:142 [inline]
> > f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> > process_one_work+0x225/0x550 kernel/workqueue.c:2269
> > worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> > kthread+0x125/0x140 kernel/kthread.c:255
> > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
> > Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/u5:1 D 0 1140 2 0x80004000
> > Workqueue: f2fs_post_read_wq f2fs_post_read_work
> > Call Trace:
> > context_switch kernel/sched/core.c:3384 [inline]
> > __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> > schedule+0x44/0xd0 kernel/sched/core.c:4136
> > io_schedule+0x11/0x40 kernel/sched/core.c:5780
> > wait_on_page_bit_common mm/filemap.c:1174 [inline]
> > wait_on_page_bit mm/filemap.c:1223 [inline]
> > wait_on_page_locked include/linux/pagemap.h:527 [inline]
> > wait_on_page_locked include/linux/pagemap.h:524 [inline]
> > wait_on_page_read mm/filemap.c:2767 [inline]
> > do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> > read_cache_page+0xd/0x10 mm/filemap.c:2894
> > f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> > verify_page+0x110/0x560 fs/verity/verify.c:120
> > fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> > verity_work fs/f2fs/data.c:142 [inline]
> > f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> > process_one_work+0x225/0x550 kernel/workqueue.c:2269
> > worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> > kthread+0x125/0x140 kernel/kthread.c:255
> > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> >
> > Showing all locks held in the system:
> > 1 lock held by khungtaskd/21:
> > #0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
> > 2 locks held by kworker/u5:0/61:
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > 2 locks held by kworker/u5:1/1140:
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > .
> >

2019-11-18 20:59:55

[permalink] [raw]

Subject: Re: [PATCH 2/2] f2fs: support data compression

On 11/18, Jaegeuk Kim wrote:
> On 11/13, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> > I've split workqueue for fsverity, please test compression based on last patch.
> >
> > I shutdown F2FS_FS_COMPRESSION config, it looks all verity testcases can pass, will
> > do more test for compress/encrypt/fsverity combination later.
>
> Thanks, I applied and start some tests.

I modified below to fix wrong compression check in read path.

--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1007,6 +1007,7 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
if (!dic)
return ERR_PTR(-ENOMEM);

+ dic->magic = F2FS_COMPRESSED_PAGE_MAGIC;
dic->inode = cc->inode;
refcount_set(&dic->ref, 1);
dic->cluster_idx = cc->cluster_idx;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 02a2e7261b457..399ba883632a0 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1255,6 +1255,7 @@ struct compress_io_ctx {

/* decompress io context for read IO path */
struct decompress_io_ctx {
+ u32 magic; /* magic number to indicate page is compressed */
struct inode *inode; /* inode the context belong to */
unsigned int cluster_idx; /* cluster index number */
unsigned int cluster_size; /* page count in cluster */

>
> >
> > The diff is as below, code base is last g-dev-test branch:
> >
> > >From 5b51682bc3013b8de6dee4906865181c3ded435f Mon Sep 17 00:00:00 2001
> > From: Chao Yu <[email protected]>
> > Date: Tue, 12 Nov 2019 10:03:21 +0800
> > Subject: [PATCH INCREMENT] f2fs: compress: split workqueue for fsverity
> >
> > Signed-off-by: Chao Yu <[email protected]>
> > ---
> > fs/f2fs/compress.c | 16 +++++---
> > fs/f2fs/data.c | 94 +++++++++++++++++++++++++++++++++++-----------
> > fs/f2fs/f2fs.h | 2 +-
> > 3 files changed, 84 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> > index f4ce825f12b4..254275325890 100644
> > --- a/fs/f2fs/compress.c
> > +++ b/fs/f2fs/compress.c
> > @@ -377,7 +377,7 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
> >
> > dec_page_count(sbi, F2FS_RD_DATA);
> >
> > - if (bio->bi_status)
> > + if (bio->bi_status || PageError(page))
> > dic->failed = true;
> >
> > if (refcount_dec_not_one(&dic->ref))
> > @@ -419,10 +419,14 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
> > out_vunmap_rbuf:
> > vunmap(dic->rbuf);
> > out_free_dic:
> > - f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
> > + if (!verity)
> > + f2fs_decompress_end_io(dic->rpages, dic->cluster_size,
> > + ret, false);
> > +
> > trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
> > - dic->clen, ret);
> > - f2fs_free_dic(dic);
> > + dic->clen, ret);
> > + if (!verity)
> > + f2fs_free_dic(dic);
> > }
> >
> > static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
> > @@ -1086,7 +1090,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
> > kfree(dic);
> > }
> >
> > -void f2fs_set_cluster_uptodate(struct page **rpages,
> > +void f2fs_decompress_end_io(struct page **rpages,
> > unsigned int cluster_size, bool err, bool verity)
> > {
> > int i;
> > @@ -1108,4 +1112,4 @@ void f2fs_set_cluster_uptodate(struct page **rpages,
> > }
> > unlock_page(rpage);
> > }
> > -}
> > +}
> > \ No newline at end of file
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index c9362a53f8a1..2d64c6ffee84 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -98,7 +98,7 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
> > page = bv->bv_page;
> >
> > #ifdef CONFIG_F2FS_FS_COMPRESSION
> > - if (compr && PagePrivate(page)) {
> > + if (compr && f2fs_is_compressed_page(page)) {
> > f2fs_decompress_pages(bio, page, verity);
> > continue;
> > }
> > @@ -115,9 +115,14 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
> > dec_page_count(F2FS_P_SB(page), __read_io_type(page));
> > unlock_page(page);
> > }
> > - if (bio->bi_private)
> > - mempool_free(bio->bi_private, bio_post_read_ctx_pool);
> > - bio_put(bio);
> > +}
> > +
> > +static void f2fs_release_read_bio(struct bio *bio);
> > +static void __f2fs_read_end_io(struct bio *bio, bool compr, bool verity)
> > +{
> > + if (!compr)
> > + __read_end_io(bio, false, verity);
> > + f2fs_release_read_bio(bio);
> > }
> >
> > static void f2fs_decompress_bio(struct bio *bio, bool verity)
> > @@ -127,19 +132,50 @@ static void f2fs_decompress_bio(struct bio *bio, bool verity)
> >
> > static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
> >
> > -static void decrypt_work(struct bio_post_read_ctx *ctx)
> > +static void f2fs_decrypt_work(struct bio_post_read_ctx *ctx)
> > {
> > fscrypt_decrypt_bio(ctx->bio);
> > }
> >
> > -static void decompress_work(struct bio_post_read_ctx *ctx, bool verity)
> > +static void f2fs_decompress_work(struct bio_post_read_ctx *ctx)
> > +{
> > + f2fs_decompress_bio(ctx->bio, ctx->enabled_steps & (1 << STEP_VERITY));
> > +}
> > +
> > +#ifdef CONFIG_F2FS_FS_COMPRESSION
> > +void f2fs_verify_pages(struct page **rpages, unsigned int cluster_size)
> > {
> > - f2fs_decompress_bio(ctx->bio, verity);
> > + f2fs_decompress_end_io(rpages, cluster_size, false, true);
> > }
> >
> > -static void verity_work(struct bio_post_read_ctx *ctx)
> > +static void f2fs_verify_bio(struct bio *bio)
> > {
> > + struct page *page = bio_first_page_all(bio);
> > + struct decompress_io_ctx *dic =
> > + (struct decompress_io_ctx *)page_private(page);
> > +
> > + f2fs_verify_pages(dic->rpages, dic->cluster_size);
> > + f2fs_free_dic(dic);
> > +}
> > +#endif
> > +
> > +static void f2fs_verity_work(struct work_struct *work)
> > +{
> > + struct bio_post_read_ctx *ctx =
> > + container_of(work, struct bio_post_read_ctx, work);
> > +
> > +#ifdef CONFIG_F2FS_FS_COMPRESSION
> > + /* previous step is decompression */
> > + if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> > +
> > + f2fs_verify_bio(ctx->bio);
> > + f2fs_release_read_bio(ctx->bio);
> > + return;
> > + }
> > +#endif
> > +
> > fsverity_verify_bio(ctx->bio);
> > + __f2fs_read_end_io(ctx->bio, false, false);
> > }
> >
> > static void f2fs_post_read_work(struct work_struct *work)
> > @@ -148,18 +184,19 @@ static void f2fs_post_read_work(struct work_struct *work)
> > container_of(work, struct bio_post_read_ctx, work);
> >
> > if (ctx->enabled_steps & (1 << STEP_DECRYPT))
> > - decrypt_work(ctx);
> > + f2fs_decrypt_work(ctx);
> >
> > - if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> > - decompress_work(ctx,
> > - ctx->enabled_steps & (1 << STEP_VERITY));
> > + if (ctx->enabled_steps & (1 << STEP_DECOMPRESS))
> > + f2fs_decompress_work(ctx);
> > +
> > + if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> > + INIT_WORK(&ctx->work, f2fs_verity_work);
> > + fsverity_enqueue_verify_work(&ctx->work);
> > return;
> > }
> >
> > - if (ctx->enabled_steps & (1 << STEP_VERITY))
> > - verity_work(ctx);
> > -
> > - __read_end_io(ctx->bio, false, false);
> > + __f2fs_read_end_io(ctx->bio,
> > + ctx->enabled_steps & (1 << STEP_DECOMPRESS), false);
> > }
> >
> > static void f2fs_enqueue_post_read_work(struct f2fs_sb_info *sbi,
> > @@ -176,12 +213,20 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> > * we shouldn't recurse to the same workqueue.
> > */
> >
> > - if (ctx->enabled_steps) {
> > + if (ctx->enabled_steps & (1 << STEP_DECRYPT) ||
> > + ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
> > INIT_WORK(&ctx->work, f2fs_post_read_work);
> > f2fs_enqueue_post_read_work(ctx->sbi, &ctx->work);
> > return;
> > }
> > - __read_end_io(ctx->bio, false, false);
> > +
> > + if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> > + INIT_WORK(&ctx->work, f2fs_verity_work);
> > + fsverity_enqueue_verify_work(&ctx->work);
> > + return;
> > + }
> > +
> > + __f2fs_read_end_io(ctx->bio, false, false);
> > }
> >
> > static bool f2fs_bio_post_read_required(struct bio *bio)
> > @@ -205,7 +250,7 @@ static void f2fs_read_end_io(struct bio *bio)
> > return;
> > }
> >
> > - __read_end_io(bio, false, false);
> > + __f2fs_read_end_io(bio, false, false);
> > }
> >
> > static void f2fs_write_end_io(struct bio *bio)
> > @@ -864,6 +909,13 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> > return bio;
> > }
> >
> > +static void f2fs_release_read_bio(struct bio *bio)
> > +{
> > + if (bio->bi_private)
> > + mempool_free(bio->bi_private, bio_post_read_ctx_pool);
> > + bio_put(bio);
> > +}
> > +
> > /* This can handle encryption stuffs */
> > static int f2fs_submit_page_read(struct inode *inode, struct page *page,
> > block_t blkaddr)
> > @@ -2023,7 +2075,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> > dic->failed = true;
> > if (refcount_sub_and_test(dic->nr_cpages - i,
> > &dic->ref))
> > - f2fs_set_cluster_uptodate(dic->rpages,
> > + f2fs_decompress_end_io(dic->rpages,
> > cc->cluster_size, true,
> > false);
> > f2fs_free_dic(dic);
> > @@ -2053,7 +2105,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> > out_put_dnode:
> > f2fs_put_dnode(&dn);
> > out:
> > - f2fs_set_cluster_uptodate(cc->rpages, cc->cluster_size, true, false);
> > + f2fs_decompress_end_io(cc->rpages, cc->cluster_size, true, false);
> > f2fs_destroy_compress_ctx(cc);
> > *bio_ret = bio;
> > return ret;
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 8a3a35b42a37..20067fa3b035 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3795,7 +3795,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
> > bool is_readahead);
> > struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
> > void f2fs_free_dic(struct decompress_io_ctx *dic);
> > -void f2fs_set_cluster_uptodate(struct page **rpages,
> > +void f2fs_decompress_end_io(struct page **rpages,
> > unsigned int cluster_size, bool err, bool verity);
> > int f2fs_init_compress_ctx(struct compress_ctx *cc);
> > void f2fs_destroy_compress_ctx(struct compress_ctx *cc);
> > --
> > 2.18.0.rc1
> >
> >
> >
> > On 2019/10/31 1:02, Eric Biggers wrote:
> > > On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> > >>>>>> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> > >>>>>> {
> > >>>>>> - /*
> > >>>>>> - * We use different work queues for decryption and for verity because
> > >>>>>> - * verity may require reading metadata pages that need decryption, and
> > >>>>>> - * we shouldn't recurse to the same workqueue.
> > >>>>>> - */
> > >>>>>
> > >>>>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> > >>>>> decryption and for verity? See the comment above which is being deleted.
> > >>>>
> > >>>> Could you explain more about how deadlock happen? or share me a link address if
> > >>>> you have described that case somewhere?
> > >>>>
> > >>>
> > >>> The verity work can read pages from the file which require decryption. I'm
> > >>> concerned that it could deadlock if the work is scheduled on the same workqueue.
> > >>
> > >> I assume you've tried one workqueue, and suffered deadlock..
> > >>
> > >>> Granted, I'm not an expert in Linux workqueues, so if you've investigated this
> > >>> and determined that it's safe, can you explain why?
> > >>
> > >> I'm not familiar with workqueue... I guess it may not safe that if the work is
> > >> scheduled to the same cpu in where verity was waiting for data? if the work is
> > >> scheduled to other cpu, it may be safe.
> > >>
> > >> I can check that before splitting the workqueue for verity and decrypt/decompress.
> > >>
> > >
> > > Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
> > > The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
> > > the Merkle tree page to be decrypted. This is with the v2 compression patch;
> > > it works fine on current mainline.
> > >
> > > INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
> > > Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > kworker/u5:0 D 0 61 2 0x80004000
> > > Workqueue: f2fs_post_read_wq f2fs_post_read_work
> > > Call Trace:
> > > context_switch kernel/sched/core.c:3384 [inline]
> > > __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> > > schedule+0x44/0xd0 kernel/sched/core.c:4136
> > > io_schedule+0x11/0x40 kernel/sched/core.c:5780
> > > wait_on_page_bit_common mm/filemap.c:1174 [inline]
> > > wait_on_page_bit mm/filemap.c:1223 [inline]
> > > wait_on_page_locked include/linux/pagemap.h:527 [inline]
> > > wait_on_page_locked include/linux/pagemap.h:524 [inline]
> > > wait_on_page_read mm/filemap.c:2767 [inline]
> > > do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> > > read_cache_page+0xd/0x10 mm/filemap.c:2894
> > > f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> > > verify_page+0x110/0x560 fs/verity/verify.c:120
> > > fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> > > verity_work fs/f2fs/data.c:142 [inline]
> > > f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> > > process_one_work+0x225/0x550 kernel/workqueue.c:2269
> > > worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> > > kthread+0x125/0x140 kernel/kthread.c:255
> > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > > INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
> > > Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > kworker/u5:1 D 0 1140 2 0x80004000
> > > Workqueue: f2fs_post_read_wq f2fs_post_read_work
> > > Call Trace:
> > > context_switch kernel/sched/core.c:3384 [inline]
> > > __schedule+0x299/0x6c0 kernel/sched/core.c:4069
> > > schedule+0x44/0xd0 kernel/sched/core.c:4136
> > > io_schedule+0x11/0x40 kernel/sched/core.c:5780
> > > wait_on_page_bit_common mm/filemap.c:1174 [inline]
> > > wait_on_page_bit mm/filemap.c:1223 [inline]
> > > wait_on_page_locked include/linux/pagemap.h:527 [inline]
> > > wait_on_page_locked include/linux/pagemap.h:524 [inline]
> > > wait_on_page_read mm/filemap.c:2767 [inline]
> > > do_read_cache_page+0x407/0x660 mm/filemap.c:2810
> > > read_cache_page+0xd/0x10 mm/filemap.c:2894
> > > f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
> > > verify_page+0x110/0x560 fs/verity/verify.c:120
> > > fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
> > > verity_work fs/f2fs/data.c:142 [inline]
> > > f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
> > > process_one_work+0x225/0x550 kernel/workqueue.c:2269
> > > worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
> > > kthread+0x125/0x140 kernel/kthread.c:255
> > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > >
> > > Showing all locks held in the system:
> > > 1 lock held by khungtaskd/21:
> > > #0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
> > > 2 locks held by kworker/u5:0/61:
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > > #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > > 2 locks held by kworker/u5:1/1140:
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > > #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
> > > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
> > > #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
> > > .
> > >

2019-11-25 18:53:25

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

Fix having my additional fixes:

---
fs/f2fs/compress.c | 114 ++++++++++++++++++--------------
fs/f2fs/data.c | 158 ++++++++++++++++++++++++++++++---------------
fs/f2fs/f2fs.h | 29 +++++++--
fs/f2fs/file.c | 25 +++----
fs/f2fs/inode.c | 7 +-
fs/f2fs/namei.c | 7 +-
6 files changed, 208 insertions(+), 132 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index e9f633c30942..7ebd2bc018bd 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -8,6 +8,7 @@
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/writeback.h>
+#include <linux/backing-dev.h>
#include <linux/lzo.h>
#include <linux/lz4.h>

@@ -86,15 +87,13 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)

cc->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) <<
cc->log_cluster_size, GFP_NOFS);
- if (!cc->rpages)
- return -ENOMEM;
- return 0;
+ return cc->rpages ? 0 : -ENOMEM;
}

void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
{
- f2fs_reset_compress_ctx(cc);
kfree(cc->rpages);
+ f2fs_reset_compress_ctx(cc);
}

void f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page)
@@ -378,7 +377,7 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)

dec_page_count(sbi, F2FS_RD_DATA);

- if (bio->bi_status)
+ if (bio->bi_status || PageError(page))
dic->failed = true;

if (refcount_dec_not_one(&dic->ref))
@@ -420,10 +419,14 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
out_vunmap_rbuf:
vunmap(dic->rbuf);
out_free_dic:
- f2fs_set_cluster_uptodate(dic->rpages, dic->cluster_size, ret, verity);
+ if (!verity)
+ f2fs_decompress_end_io(dic->rpages, dic->cluster_size,
+ ret, false);
+
trace_f2fs_decompress_pages_end(dic->inode, dic->cluster_idx,
- dic->clen, ret);
- f2fs_free_dic(dic);
+ dic->clen, ret);
+ if (!verity)
+ f2fs_free_dic(dic);
}

static bool is_page_in_cluster(struct compress_ctx *cc, pgoff_t index)
@@ -470,22 +473,18 @@ static bool __cluster_may_compress(struct compress_ctx *cc)
/* beyond EOF */
if (page->index >= nr_pages)
return false;
- if (page->index != start_idx_of_cluster(cc) + i)
- return false;
}
return true;
}

-int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
+static int is_compressed_cluster(struct compress_ctx *cc)
{
struct dnode_of_data dn;
- unsigned int start_idx = cluster_idx(cc, index) <<
- cc->log_cluster_size;
int ret;
- int i;

set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
- ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
+ ret = f2fs_get_dnode_of_data(&dn, start_idx_of_cluster(cc),
+ LOOKUP_NODE);
if (ret) {
if (ret == -ENOENT)
ret = 0;
@@ -493,6 +492,8 @@ int is_compressed_cluster(struct compress_ctx *cc, pgoff_t index)
}

if (dn.data_blkaddr == COMPRESS_ADDR) {
+ int i;
+
ret = CLUSTER_IS_FULL;
for (i = 1; i < cc->cluster_size; i++) {
block_t blkaddr;
@@ -516,9 +517,10 @@ int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index)
.inode = inode,
.log_cluster_size = F2FS_I(inode)->i_log_cluster_size,
.cluster_size = F2FS_I(inode)->i_cluster_size,
+ .cluster_idx = index >> F2FS_I(inode)->i_log_cluster_size,
};

- return is_compressed_cluster(&cc, index);
+ return is_compressed_cluster(&cc);
}

static bool cluster_may_compress(struct compress_ctx *cc)
@@ -536,6 +538,7 @@ static bool cluster_may_compress(struct compress_ctx *cc)

void f2fs_reset_compress_ctx(struct compress_ctx *cc)
{
+ cc->rpages = NULL;
cc->nr_rpages = 0;
cc->nr_cpages = 0;
cc->cluster_idx = NULL_CLUSTER;
@@ -565,19 +568,18 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
bool prealloc)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode);
- struct bio *bio = NULL;
struct address_space *mapping = cc->inode->i_mapping;
struct page *page;
struct dnode_of_data dn;
sector_t last_block_in_bio;
unsigned fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT;
- unsigned int start_idx = cluster_idx(cc, index) << cc->log_cluster_size;
+ unsigned int start_idx = start_idx_of_cluster(cc);
int i, idx;
int ret;

ret = f2fs_init_compress_ctx(cc);
if (ret)
- goto out;
+ return ret;
retry:
/* keep page reference to avoid page reclaim */
for (i = 0; i < cc->cluster_size; i++) {
@@ -588,26 +590,25 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
goto unlock_pages;
}

- if (PageUptodate(page)) {
+ if (PageUptodate(page))
unlock_page(page);
- continue;
- }
-
- f2fs_compress_ctx_add_page(cc, page);
+ else
+ f2fs_compress_ctx_add_page(cc, page);
}

if (!f2fs_cluster_is_empty(cc)) {
+ struct bio *bio = NULL;
+
ret = f2fs_read_multi_pages(cc, &bio, cc->cluster_size,
&last_block_in_bio, false);
if (ret)
- goto out;
-
+ return ret;
if (bio)
f2fs_submit_bio(sbi, bio, DATA);

ret = f2fs_init_compress_ctx(cc);
if (ret)
- goto out;
+ return ret;
}

for (i = 0; i < cc->cluster_size; i++) {
@@ -620,10 +621,12 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
f2fs_put_page(page, 0);

if (!PageUptodate(page)) {
- for (idx = i; idx >= 0; idx--) {
- f2fs_put_page(cc->rpages[idx], 0);
- f2fs_put_page(cc->rpages[idx], 1);
+ for (idx = 0; idx < cc->cluster_size; idx++) {
+ f2fs_put_page(cc->rpages[idx],
+ (idx <= i) ? 1 : 0);
+ cc->rpages[idx] = NULL;
}
+ cc->nr_rpages = 0;
goto retry;
}
}
@@ -658,11 +661,10 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
release_pages:
for (idx = 0; idx < i; idx++) {
page = find_lock_page(mapping, start_idx + idx);
- f2fs_put_page(page, 0);
f2fs_put_page(page, 1);
+ f2fs_put_page(page, 0);
}
f2fs_destroy_compress_ctx(cc);
-out:
return ret;
}

@@ -671,12 +673,13 @@ int f2fs_prepare_compress_overwrite(struct inode *inode,
{
struct compress_ctx cc = {
.inode = inode,
+ .log_cluster_size = F2FS_I(inode)->i_log_cluster_size,
.cluster_size = F2FS_I(inode)->i_cluster_size,
- .cluster_idx = NULL_CLUSTER,
+ .cluster_idx = index >> F2FS_I(inode)->i_log_cluster_size,
.rpages = NULL,
.nr_rpages = 0,
};
- int ret = is_compressed_cluster(&cc, index);
+ int ret = is_compressed_cluster(&cc);

if (ret <= 0)
return ret;
@@ -687,7 +690,7 @@ int f2fs_prepare_compress_overwrite(struct inode *inode,
}

bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
- pgoff_t index, bool written)
+ pgoff_t index, unsigned copied)

{
struct compress_ctx cc = {
@@ -698,7 +701,7 @@ bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
bool first_index = (index == cc.rpages[0]->index);
int i;

- if (written)
+ if (copied)
set_cluster_dirty(&cc);

for (i = 0; i < cc.cluster_size; i++)
@@ -707,7 +710,6 @@ bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
f2fs_destroy_compress_ctx(&cc);

return first_index;
-
}

static int f2fs_write_compressed_pages(struct compress_ctx *cc,
@@ -857,6 +859,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
fi->last_disk_size = psize;
up_write(&fi->i_sem);
}
+ f2fs_reset_compress_ctx(cc);
return 0;

out_destroy_crypt:
@@ -904,7 +907,8 @@ void f2fs_compress_write_end_io(struct bio *bio, struct page *page)
static int f2fs_write_raw_pages(struct compress_ctx *cc,
int *submitted,
struct writeback_control *wbc,
- enum iostat_type io_type)
+ enum iostat_type io_type,
+ bool compressed)
{
int i, _submitted;
int ret, err = 0;
@@ -912,12 +916,24 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
for (i = 0; i < cc->cluster_size; i++) {
if (!cc->rpages[i])
continue;
+retry_write:
BUG_ON(!PageLocked(cc->rpages[i]));
+
ret = f2fs_write_single_data_page(cc->rpages[i], &_submitted,
- NULL, NULL, wbc, io_type);
+ NULL, NULL, wbc, io_type,
+ compressed);
if (ret) {
- if (ret == AOP_WRITEPAGE_ACTIVATE)
+ if (ret == AOP_WRITEPAGE_ACTIVATE) {
unlock_page(cc->rpages[i]);
+ ret = 0;
+ } else if (ret == -EAGAIN) {
+ ret = 0;
+ cond_resched();
+ congestion_wait(BLK_RW_ASYNC, HZ/50);
+ lock_page(cc->rpages[i]);
+ clear_page_dirty_for_io(cc->rpages[i]);
+ goto retry_write;
+ }
err = ret;
goto out_fail;
}
@@ -928,6 +944,8 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,

out_fail:
/* TODO: revoke partially updated block addresses */
+ BUG_ON(compressed);
+
for (++i; i < cc->cluster_size; i++) {
if (!cc->rpages[i])
continue;
@@ -948,7 +966,6 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
int err = -EAGAIN;

*submitted = 0;
-
if (cluster_may_compress(cc)) {
err = f2fs_compress_pages(cc);
if (err) {
@@ -964,18 +981,19 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
bool compressed = false;

f2fs_bug_on(F2FS_I_SB(cc->inode), *submitted);
- if (is_compressed_cluster(cc, start_idx_of_cluster(cc)))
+
+ if (is_compressed_cluster(cc))
compressed = true;

- err = f2fs_write_raw_pages(cc, submitted, wbc, io_type);
+ err = f2fs_write_raw_pages(cc, submitted, wbc,
+ io_type, compressed);
if (compressed) {
stat_sub_compr_blocks(cc->inode, *submitted);
F2FS_I(cc->inode)->i_compressed_blocks -= *submitted;
f2fs_mark_inode_dirty_sync(cc->inode, true);
}
+ f2fs_destroy_compress_ctx(cc);
}
-
- f2fs_reset_compress_ctx(cc);
return err;
}

@@ -988,8 +1006,9 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)

dic = f2fs_kzalloc(sbi, sizeof(struct decompress_io_ctx), GFP_NOFS);
if (!dic)
- goto out;
+ return ERR_PTR(-ENOMEM);

+ dic->magic = F2FS_COMPRESSED_PAGE_MAGIC;
dic->inode = cc->inode;
refcount_set(&dic->ref, 1);
dic->cluster_idx = cc->cluster_idx;
@@ -1042,7 +1061,6 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)

out_free:
f2fs_free_dic(dic);
-out:
return ERR_PTR(-ENOMEM);
}

@@ -1073,7 +1091,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
kfree(dic);
}

-void f2fs_set_cluster_uptodate(struct page **rpages,
+void f2fs_decompress_end_io(struct page **rpages,
unsigned int cluster_size, bool err, bool verity)
{
int i;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c10cbd7d1c06..fcdd6d493f83 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -98,7 +98,7 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
page = bv->bv_page;

#ifdef CONFIG_F2FS_FS_COMPRESSION
- if (compr && PagePrivate(page)) {
+ if (compr && f2fs_is_compressed_page(page)) {
f2fs_decompress_pages(bio, page, verity);
continue;
}
@@ -115,9 +115,14 @@ static void __read_end_io(struct bio *bio, bool compr, bool verity)
dec_page_count(F2FS_P_SB(page), __read_io_type(page));
unlock_page(page);
}
- if (bio->bi_private)
- mempool_free(bio->bi_private, bio_post_read_ctx_pool);
- bio_put(bio);
+}
+
+static void f2fs_release_read_bio(struct bio *bio);
+static void __f2fs_read_end_io(struct bio *bio, bool compr, bool verity)
+{
+ if (!compr)
+ __read_end_io(bio, false, verity);
+ f2fs_release_read_bio(bio);
}

static void f2fs_decompress_bio(struct bio *bio, bool verity)
@@ -127,19 +132,45 @@ static void f2fs_decompress_bio(struct bio *bio, bool verity)

static void bio_post_read_processing(struct bio_post_read_ctx *ctx);

-static void decrypt_work(struct bio_post_read_ctx *ctx)
+static void f2fs_decrypt_work(struct bio_post_read_ctx *ctx)
{
fscrypt_decrypt_bio(ctx->bio);
}

-static void decompress_work(struct bio_post_read_ctx *ctx, bool verity)
+static void f2fs_decompress_work(struct bio_post_read_ctx *ctx)
{
- f2fs_decompress_bio(ctx->bio, verity);
+ f2fs_decompress_bio(ctx->bio, ctx->enabled_steps & (1 << STEP_VERITY));
}

-static void verity_work(struct bio_post_read_ctx *ctx)
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+static void f2fs_verify_bio(struct bio *bio)
{
+ struct page *page = bio_first_page_all(bio);
+ struct decompress_io_ctx *dic =
+ (struct decompress_io_ctx *)page_private(page);
+
+ f2fs_decompress_end_io(dic->rpages, dic->cluster_size, false, true);
+ f2fs_free_dic(dic);
+}
+#endif
+
+static void f2fs_verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+ /* previous step is decompression */
+ if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
+
+ f2fs_verify_bio(ctx->bio);
+ f2fs_release_read_bio(ctx->bio);
+ return;
+ }
+#endif
+
fsverity_verify_bio(ctx->bio);
+ __f2fs_read_end_io(ctx->bio, false, false);
}

static void f2fs_post_read_work(struct work_struct *work)
@@ -148,18 +179,19 @@ static void f2fs_post_read_work(struct work_struct *work)
container_of(work, struct bio_post_read_ctx, work);

if (ctx->enabled_steps & (1 << STEP_DECRYPT))
- decrypt_work(ctx);
+ f2fs_decrypt_work(ctx);

- if (ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
- decompress_work(ctx,
- ctx->enabled_steps & (1 << STEP_VERITY));
+ if (ctx->enabled_steps & (1 << STEP_DECOMPRESS))
+ f2fs_decompress_work(ctx);
+
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, f2fs_verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
return;
}

- if (ctx->enabled_steps & (1 << STEP_VERITY))
- verity_work(ctx);
-
- __read_end_io(ctx->bio, false, false);
+ __f2fs_read_end_io(ctx->bio,
+ ctx->enabled_steps & (1 << STEP_DECOMPRESS), false);
}

static void f2fs_enqueue_post_read_work(struct f2fs_sb_info *sbi,
@@ -176,12 +208,20 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
* we shouldn't recurse to the same workqueue.
*/

- if (ctx->enabled_steps) {
+ if (ctx->enabled_steps & (1 << STEP_DECRYPT) ||
+ ctx->enabled_steps & (1 << STEP_DECOMPRESS)) {
INIT_WORK(&ctx->work, f2fs_post_read_work);
f2fs_enqueue_post_read_work(ctx->sbi, &ctx->work);
return;
}
- __read_end_io(ctx->bio, false, false);
+
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, f2fs_verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+
+ __f2fs_read_end_io(ctx->bio, false, false);
}

static bool f2fs_bio_post_read_required(struct bio *bio)
@@ -205,7 +245,7 @@ static void f2fs_read_end_io(struct bio *bio)
return;
}

- __read_end_io(bio, false, false);
+ __f2fs_read_end_io(bio, false, false);
}

static void f2fs_write_end_io(struct bio *bio)
@@ -624,7 +664,8 @@ static int add_ipu_page(struct f2fs_sb_info *sbi, struct bio **bio,

found = true;

- if (bio_add_page(*bio, page, PAGE_SIZE, 0) == PAGE_SIZE) {
+ if (bio_add_page(*bio, page, PAGE_SIZE, 0) ==
+ PAGE_SIZE) {
ret = 0;
break;
}
@@ -858,6 +899,13 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
return bio;
}

+static void f2fs_release_read_bio(struct bio *bio)
+{
+ if (bio->bi_private)
+ mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+ bio_put(bio);
+}
+
/* This can handle encryption stuffs */
static int f2fs_submit_page_read(struct inode *inode, struct page *page,
block_t blkaddr)
@@ -1963,7 +2011,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
if (ret)
goto out;

- f2fs_bug_on(sbi, dn.data_blkaddr != COMPRESS_ADDR);
+ if (dn.data_blkaddr != COMPRESS_ADDR)
+ goto out;

for (i = 1; i < cc->cluster_size; i++) {
block_t blkaddr;
@@ -2017,7 +2066,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
dic->failed = true;
if (refcount_sub_and_test(dic->nr_cpages - i,
&dic->ref))
- f2fs_set_cluster_uptodate(dic->rpages,
+ f2fs_decompress_end_io(dic->rpages,
cc->cluster_size, true,
false);
f2fs_free_dic(dic);
@@ -2047,8 +2096,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
out_put_dnode:
f2fs_put_dnode(&dn);
out:
- f2fs_set_cluster_uptodate(cc->rpages, cc->cluster_size, true, false);
- f2fs_reset_compress_ctx(cc);
+ f2fs_decompress_end_io(cc->rpages, cc->cluster_size, true, false);
+ f2fs_destroy_compress_ctx(cc);
*bio_ret = bio;
return ret;
}
@@ -2443,7 +2492,8 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
struct bio **bio,
sector_t *last_block,
struct writeback_control *wbc,
- enum iostat_type io_type)
+ enum iostat_type io_type,
+ bool compressed)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -2488,8 +2538,9 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;

- if (f2fs_compressed_file(inode) ||
- page->index < end_index || f2fs_verity_in_progress(inode))
+ if (page->index < end_index ||
+ f2fs_verity_in_progress(inode) ||
+ compressed)
goto write;

/*
@@ -2610,7 +2661,7 @@ static int f2fs_write_data_page(struct page *page,
#endif

return f2fs_write_single_data_page(page, NULL, NULL, NULL,
- wbc, FS_DATA_IO);
+ wbc, FS_DATA_IO, false);
}

/*
@@ -2696,17 +2747,12 @@ static int f2fs_write_cache_pages(struct address_space *mapping,

for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
- bool need_readd = false;
-
+ bool need_readd;
readd:
#ifdef CONFIG_F2FS_FS_COMPRESSION
need_readd = false;

if (f2fs_compressed_file(inode)) {
- void *fsdata = NULL;
- struct page *pagep;
- int ret2;
-
ret = f2fs_init_compress_ctx(&cc);
if (ret) {
done = 1;
@@ -2715,7 +2761,6 @@ static int f2fs_write_cache_pages(struct address_space *mapping,

if (!f2fs_cluster_can_merge_page(&cc,
page->index)) {
-
ret = f2fs_write_multi_pages(&cc,
&submitted, wbc, io_type);
if (!ret)
@@ -2724,6 +2769,10 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
}

if (f2fs_cluster_is_empty(&cc)) {
+ void *fsdata = NULL;
+ struct page *pagep;
+ int ret2;
+
ret2 = f2fs_prepare_compress_overwrite(
inode, &pagep,
page->index, &fsdata);
@@ -2733,24 +2782,27 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
break;
} else if (ret2 &&
!f2fs_compress_write_end(inode,
- fsdata, page->index,
- true)) {
+ fsdata, page->index,
+ 1)) {
retry = 1;
break;
}
+ } else {
+ goto lock_page;
}
}
#endif
-
/* give a priority to WB_SYNC threads */
if (atomic_read(&sbi->wb_sync_req[DATA]) &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
}
-
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+lock_page:
+#endif
done_index = page->index;
-
+retry_write:
lock_page(page);

if (unlikely(page->mapping != mapping)) {
@@ -2782,7 +2834,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
}
#endif
ret = f2fs_write_single_data_page(page, &submitted,
- &bio, &last_block, wbc, io_type);
+ &bio, &last_block, wbc, io_type, false);
if (ret == AOP_WRITEPAGE_ACTIVATE)
unlock_page(page);
#ifdef CONFIG_F2FS_FS_COMPRESSION
@@ -2801,6 +2853,12 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
goto next;
} else if (ret == -EAGAIN) {
ret = 0;
+ if (wbc->sync_mode == WB_SYNC_ALL) {
+ cond_resched();
+ congestion_wait(BLK_RW_ASYNC,
+ HZ/50);
+ goto retry_write;
+ }
goto next;
}
done_index = page->index + 1;
@@ -2817,21 +2875,21 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
if (need_readd)
goto readd;
}
-
pagevec_release(&pvec);
cond_resched();
}
-
#ifdef CONFIG_F2FS_FS_COMPRESSION
/* flush remained pages in compress cluster */
if (f2fs_compressed_file(inode) && !f2fs_cluster_is_empty(&cc)) {
ret = f2fs_write_multi_pages(&cc, &submitted, wbc, io_type);
nwritten += submitted;
wbc->nr_to_write -= submitted;
- /* TODO: error handling */
+ if (ret) {
+ done = 1;
+ retry = 0;
+ }
}
#endif
-
if ((!cycled && !done) || retry) {
cycled = 1;
index = 0;
@@ -3606,14 +3664,8 @@ static int f2fs_swap_activate(struct swap_info_struct *sis, struct file *file,
if (ret)
return ret;

- if (f2fs_compressed_file(inode)) {
- if (F2FS_I(inode)->i_compressed_blocks)
- return -EINVAL;
-
- F2FS_I(inode)->i_flags &= ~FS_COMPR_FL;
- clear_inode_flag(inode, FI_COMPRESSED_FILE);
- stat_dec_compr_inode(inode);
- }
+ if (f2fs_disable_compressed_file(inode))
+ return -EINVAL;

ret = check_swap_activate(file, sis->max);
if (ret)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 11c42042367b..ee7309ca671a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1253,6 +1253,7 @@ struct compress_io_ctx {

/* decompress io context for read IO path */
struct decompress_io_ctx {
+ u32 magic; /* magic number to indicate page is compressed */
struct inode *inode; /* inode the context belong to */
unsigned int cluster_idx; /* cluster index number */
unsigned int cluster_size; /* page count in cluster */
@@ -2737,6 +2738,8 @@ static inline void set_compress_context(struct inode *inode)
F2FS_OPTION(sbi).compress_log_size;
F2FS_I(inode)->i_cluster_size =
1 << F2FS_I(inode)->i_log_cluster_size;
+ F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
+ set_inode_flag(inode, FI_COMPRESSED_FILE);
}

static inline unsigned int addrs_per_inode(struct inode *inode)
@@ -3390,7 +3393,8 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio);
int f2fs_write_single_data_page(struct page *page, int *submitted,
struct bio **bio, sector_t *last_block,
struct writeback_control *wbc,
- enum iostat_type io_type);
+ enum iostat_type io_type,
+ bool compressed);
void f2fs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length);
int f2fs_release_page(struct page *page, gfp_t wait);
@@ -3631,8 +3635,8 @@ void f2fs_destroy_root_stats(void);
#define stat_dec_inline_dir(inode) do { } while (0)
#define stat_inc_compr_inode(inode) do { } while (0)
#define stat_dec_compr_inode(inode) do { } while (0)
-#define stat_add_compr_blocks(inode) do { } while (0)
-#define stat_sub_compr_blocks(inode) do { } while (0)
+#define stat_add_compr_blocks(inode, blocks) do { } while (0)
+#define stat_sub_compr_blocks(inode, blocks) do { } while (0)
#define stat_inc_atomic_write(inode) do { } while (0)
#define stat_dec_atomic_write(inode) do { } while (0)
#define stat_update_max_atomic_write(inode) do { } while (0)
@@ -3786,7 +3790,7 @@ void f2fs_reset_compress_ctx(struct compress_ctx *cc);
int f2fs_prepare_compress_overwrite(struct inode *inode,
struct page **pagep, pgoff_t index, void **fsdata);
bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
- pgoff_t index, bool written);
+ pgoff_t index, unsigned copied);
void f2fs_compress_write_end_io(struct bio *bio, struct page *page);
bool f2fs_is_compress_backend_ready(struct inode *inode);
void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity);
@@ -3803,7 +3807,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
bool is_readahead);
struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
void f2fs_free_dic(struct decompress_io_ctx *dic);
-void f2fs_set_cluster_uptodate(struct page **rpages,
+void f2fs_decompress_end_io(struct page **rpages,
unsigned int cluster_size, bool err, bool verity);
int f2fs_init_compress_ctx(struct compress_ctx *cc);
void f2fs_destroy_compress_ctx(struct compress_ctx *cc);
@@ -3824,6 +3828,21 @@ static inline struct page *f2fs_compress_control_page(struct page *page)
}
#endif

+static inline u64 f2fs_disable_compressed_file(struct inode *inode)
+{
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+
+ if (!f2fs_compressed_file(inode))
+ return 0;
+ if (fi->i_compressed_blocks)
+ return fi->i_compressed_blocks;
+
+ fi->i_flags &= ~F2FS_COMPR_FL;
+ clear_inode_flag(inode, FI_COMPRESSED_FILE);
+ stat_dec_compr_inode(inode);
+ return 0;
+}
+
#define F2FS_FEATURE_FUNCS(name, flagname) \
static inline int f2fs_sb_has_##name(struct f2fs_sb_info *sbi) \
{ \
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index a688a4cb212b..4163fc3db1a3 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -523,6 +523,9 @@ static int f2fs_file_open(struct inode *inode, struct file *filp)
if (err)
return err;

+ if (!f2fs_is_compress_backend_ready(inode))
+ return -EOPNOTSUPP;
+
err = fsverity_file_open(inode, filp);
if (err)
return err;
@@ -1821,7 +1824,6 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
return -EINVAL;

set_compress_context(inode);
- set_inode_flag(inode, FI_COMPRESSED_FILE);
stat_inc_compr_inode(inode);
}
}
@@ -2016,11 +2018,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)

inode_lock(inode);

- if (f2fs_compressed_file(inode) && !fi->i_compressed_blocks) {
- fi->i_flags &= ~F2FS_COMPR_FL;
- clear_inode_flag(inode, FI_COMPRESSED_FILE);
- stat_dec_compr_inode(inode);
- }
+ f2fs_disable_compressed_file(inode);

if (f2fs_is_atomic_file(inode)) {
if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST))
@@ -3224,20 +3222,15 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg)
goto out;
}

- if (f2fs_compressed_file(inode)) {
- if (F2FS_I(inode)->i_compressed_blocks) {
- ret = -EOPNOTSUPP;
- goto out;
- }
- F2FS_I(inode)->i_flags &= ~F2FS_COMPR_FL;
- clear_inode_flag(inode, FI_COMPRESSED_FILE);
- stat_dec_compr_inode(inode);
- }
-
ret = f2fs_convert_inline_inode(inode);
if (ret)
goto out;

+ if (f2fs_disable_compressed_file(inode)) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
set_inode_flag(inode, FI_PIN_FILE);
ret = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN];
done:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7a85060adad5..3fa728f40c2a 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -421,7 +421,8 @@ static int do_read_inode(struct inode *inode)
fi->i_crtime.tv_nsec = le32_to_cpu(ri->i_crtime_nsec);
}

- if (f2fs_has_extra_attr(inode) && f2fs_sb_has_compression(sbi)) {
+ if (f2fs_has_extra_attr(inode) && f2fs_sb_has_compression(sbi) &&
+ (fi->i_flags & F2FS_COMPR_FL)) {
if (F2FS_FITS_IN_INODE(ri, fi->i_extra_isize,
i_log_cluster_size)) {
fi->i_compressed_blocks =
@@ -429,10 +430,8 @@ static int do_read_inode(struct inode *inode)
fi->i_compress_algorithm = ri->i_compress_algorithm;
fi->i_log_cluster_size = ri->i_log_cluster_size;
fi->i_cluster_size = 1 << fi->i_log_cluster_size;
- }
-
- if ((fi->i_flags & F2FS_COMPR_FL) && f2fs_may_compress(inode))
set_inode_flag(inode, FI_COMPRESSED_FILE);
+ }
}

F2FS_I(inode)->i_disk_time[0] = inode->i_atime;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 00c56a3e944b..ac6b1f946e03 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -122,11 +122,8 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
if (f2fs_sb_has_compression(sbi)) {
/* Inherit the compression flag in directory */
if ((F2FS_I(dir)->i_flags & F2FS_COMPR_FL) &&
- f2fs_may_compress(inode)) {
+ f2fs_may_compress(inode))
set_compress_context(inode);
- F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
- set_inode_flag(inode, FI_COMPRESSED_FILE);
- }
}

f2fs_set_inode_flags(inode);
@@ -309,9 +306,7 @@ static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
if (!is_extension_exist(name, ext[i]))
continue;

- F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
set_compress_context(inode);
- set_inode_flag(inode, FI_COMPRESSED_FILE);
return;
}
}
--
2.19.0.605.g01d371f741-goog

2019-12-11 01:28:11

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

Hi Chao,

Let me know, if it's okay to integrate compression patch all together.
I don't have a critical bug to fix w/ them now.

Another fix:
---
fs/f2fs/compress.c | 101 ++++++++++++++++++++++++++++-----------------
fs/f2fs/data.c | 15 ++++---
fs/f2fs/f2fs.h | 1 -
3 files changed, 72 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 7ebd2bc018bd..af23ed6deffd 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -73,6 +73,17 @@ static void f2fs_put_compressed_page(struct page *page)
put_page(page);
}

+static void f2fs_put_rpages(struct compress_ctx *cc)
+{
+ unsigned int i;
+
+ for (i = 0; i < cc->cluster_size; i++) {
+ if (!cc->rpages[i])
+ continue;
+ put_page(cc->rpages[i]);
+ }
+}
+
struct page *f2fs_compress_control_page(struct page *page)
{
return ((struct compress_io_ctx *)page_private(page))->rpages[0];
@@ -93,7 +104,10 @@ int f2fs_init_compress_ctx(struct compress_ctx *cc)
void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
{
kfree(cc->rpages);
- f2fs_reset_compress_ctx(cc);
+ cc->rpages = NULL;
+ cc->nr_rpages = 0;
+ cc->nr_cpages = 0;
+ cc->cluster_idx = NULL_CLUSTER;
}

void f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page)
@@ -536,14 +550,6 @@ static bool cluster_may_compress(struct compress_ctx *cc)
return __cluster_may_compress(cc);
}

-void f2fs_reset_compress_ctx(struct compress_ctx *cc)
-{
- cc->rpages = NULL;
- cc->nr_rpages = 0;
- cc->nr_cpages = 0;
- cc->cluster_idx = NULL_CLUSTER;
-}
-
static void set_cluster_writeback(struct compress_ctx *cc)
{
int i;
@@ -602,13 +608,13 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
ret = f2fs_read_multi_pages(cc, &bio, cc->cluster_size,
&last_block_in_bio, false);
if (ret)
- return ret;
+ goto release_pages;
if (bio)
f2fs_submit_bio(sbi, bio, DATA);

ret = f2fs_init_compress_ctx(cc);
if (ret)
- return ret;
+ goto release_pages;
}

for (i = 0; i < cc->cluster_size; i++) {
@@ -638,9 +644,11 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,

for (i = cc->cluster_size - 1; i > 0; i--) {
ret = f2fs_get_block(&dn, start_idx + i);
- if (ret)
+ if (ret) {
/* TODO: release preallocate blocks */
- goto release_pages;
+ i = cc->cluster_size;
+ goto unlock_pages;
+ }

if (dn.data_blkaddr != NEW_ADDR)
break;
@@ -769,7 +777,11 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
cic->magic = F2FS_COMPRESSED_PAGE_MAGIC;
cic->inode = inode;
refcount_set(&cic->ref, 1);
- cic->rpages = cc->rpages;
+ cic->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) <<
+ cc->log_cluster_size, GFP_NOFS);
+ if (!cic->rpages)
+ goto out_put_cic;
+
cic->nr_rpages = cc->cluster_size;

for (i = 0; i < cc->nr_cpages; i++) {
@@ -793,7 +805,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,

blkaddr = datablock_addr(dn.inode, dn.node_page,
dn.ofs_in_node);
- fio.page = cc->rpages[i];
+ fio.page = cic->rpages[i] = cc->rpages[i];
fio.old_blkaddr = blkaddr;

/* cluster header */
@@ -819,7 +831,6 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,

f2fs_bug_on(fio.sbi, blkaddr == NULL_ADDR);

-
if (fio.encrypted)
fio.encrypted_page = cc->cpages[i - 1];
else if (fio.compressed)
@@ -859,17 +870,22 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
fi->last_disk_size = psize;
up_write(&fi->i_sem);
}
- f2fs_reset_compress_ctx(cc);
+ f2fs_put_rpages(cc);
+ f2fs_destroy_compress_ctx(cc);
return 0;

out_destroy_crypt:
- for (i -= 1; i >= 0; i--)
+ kfree(cic->rpages);
+
+ for (--i; i >= 0; i--)
fscrypt_finalize_bounce_page(&cc->cpages[i]);
for (i = 0; i < cc->nr_cpages; i++) {
if (!cc->cpages[i])
continue;
f2fs_put_page(cc->cpages[i], 1);
}
+out_put_cic:
+ kfree(cic);
out_put_dnode:
f2fs_put_dnode(&dn);
out_unlock_op:
@@ -963,37 +979,39 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
struct f2fs_inode_info *fi = F2FS_I(cc->inode);
const struct f2fs_compress_ops *cops =
f2fs_cops[fi->i_compress_algorithm];
- int err = -EAGAIN;
+ bool compressed = false;
+ int err;

*submitted = 0;
if (cluster_may_compress(cc)) {
err = f2fs_compress_pages(cc);
- if (err) {
- err = -EAGAIN;
+ if (err == -EAGAIN)
goto write;
- }
+ else if (err)
+ goto put_out;
+
err = f2fs_write_compressed_pages(cc, submitted,
wbc, io_type);
cops->destroy_compress_ctx(cc);
+ if (!err)
+ return 0;
+ f2fs_bug_on(F2FS_I_SB(cc->inode), err != -EAGAIN);
}
write:
- if (err == -EAGAIN) {
- bool compressed = false;
-
- f2fs_bug_on(F2FS_I_SB(cc->inode), *submitted);
+ f2fs_bug_on(F2FS_I_SB(cc->inode), *submitted);

- if (is_compressed_cluster(cc))
- compressed = true;
+ if (is_compressed_cluster(cc))
+ compressed = true;

- err = f2fs_write_raw_pages(cc, submitted, wbc,
- io_type, compressed);
- if (compressed) {
- stat_sub_compr_blocks(cc->inode, *submitted);
- F2FS_I(cc->inode)->i_compressed_blocks -= *submitted;
- f2fs_mark_inode_dirty_sync(cc->inode, true);
- }
- f2fs_destroy_compress_ctx(cc);
+ err = f2fs_write_raw_pages(cc, submitted, wbc, io_type, compressed);
+ if (compressed) {
+ stat_sub_compr_blocks(cc->inode, *submitted);
+ F2FS_I(cc->inode)->i_compressed_blocks -= *submitted;
+ f2fs_mark_inode_dirty_sync(cc->inode, true);
}
+put_out:
+ f2fs_put_rpages(cc);
+ f2fs_destroy_compress_ctx(cc);
return err;
}

@@ -1055,7 +1073,13 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
dic->tpages[i] = cc->rpages[i];
}

- dic->rpages = cc->rpages;
+ dic->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) <<
+ cc->log_cluster_size, GFP_NOFS);
+ if (!dic->rpages)
+ goto out_free;
+
+ for (i = 0; i < dic->cluster_size; i++)
+ dic->rpages[i] = cc->rpages[i];
dic->nr_rpages = cc->cluster_size;
return dic;

@@ -1072,8 +1096,7 @@ void f2fs_free_dic(struct decompress_io_ctx *dic)
for (i = 0; i < dic->cluster_size; i++) {
if (dic->rpages[i])
continue;
- unlock_page(dic->tpages[i]);
- put_page(dic->tpages[i]);
+ f2fs_put_page(dic->tpages[i], 1);
}
kfree(dic->tpages);
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7046b222e8de..19cd03450066 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2099,7 +2099,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
false);
f2fs_free_dic(dic);
f2fs_put_dnode(&dn);
- f2fs_reset_compress_ctx(cc);
+ f2fs_destroy_compress_ctx(cc);
*bio_ret = bio;
return ret;
}
@@ -2117,7 +2117,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,

f2fs_put_dnode(&dn);

- f2fs_reset_compress_ctx(cc);
+ f2fs_destroy_compress_ctx(cc);
*bio_ret = bio;
return 0;

@@ -2125,7 +2125,6 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
f2fs_put_dnode(&dn);
out:
f2fs_decompress_end_io(cc->rpages, cc->cluster_size, true, false);
- f2fs_destroy_compress_ctx(cc);
*bio_ret = bio;
return ret;
}
@@ -2192,8 +2191,10 @@ int f2fs_mpage_readpages(struct address_space *mapping,
max_nr_pages,
&last_block_in_bio,
is_readahead);
- if (ret)
+ if (ret) {
+ f2fs_destroy_compress_ctx(&cc);
goto set_error_page;
+ }
}
ret = f2fs_is_compressed_cluster(inode, page->index);
if (ret < 0)
@@ -2229,11 +2230,14 @@ int f2fs_mpage_readpages(struct address_space *mapping,
#ifdef CONFIG_F2FS_FS_COMPRESSION
if (f2fs_compressed_file(inode)) {
/* last page */
- if (nr_pages == 1 && !f2fs_cluster_is_empty(&cc))
+ if (nr_pages == 1 && !f2fs_cluster_is_empty(&cc)) {
ret = f2fs_read_multi_pages(&cc, &bio,
max_nr_pages,
&last_block_in_bio,
is_readahead);
+ if (ret)
+ f2fs_destroy_compress_ctx(&cc);
+ }
}
#endif
}
@@ -2856,6 +2860,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,

#ifdef CONFIG_F2FS_FS_COMPRESSION
if (f2fs_compressed_file(inode)) {
+ get_page(page);
f2fs_compress_ctx_add_page(&cc, page);
continue;
}
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 26a4cc1fd686..5d55cef66410 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3765,7 +3765,6 @@ static inline bool f2fs_post_read_required(struct inode *inode)
#ifdef CONFIG_F2FS_FS_COMPRESSION
bool f2fs_is_compressed_page(struct page *page);
struct page *f2fs_compress_control_page(struct page *page);
-void f2fs_reset_compress_ctx(struct compress_ctx *cc);
int f2fs_prepare_compress_overwrite(struct inode *inode,
struct page **pagep, pgoff_t index, void **fsdata);
bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
--
2.24.0.525.g8f36a354ae-goog

2019-12-12 15:09:12