2019-06-24 07:25:30

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 0/8] staging: erofs: decompression inplace approach

This is patch v3 of erofs decompression inplace approach, which is sent
out by my personal email since I'm out of office to attend Open Source
Summit China 2019 these days. No major change from PATCH v2 since no
noticeable issue raised from landing to our products till now, mainly
as a response to Chao's suggestions.

See the bottom lines which are taken from RFC PATCH v1 and describe
the principle of these technologies.

The series is based on the latest staging-next since all dependencies
have already been merged.

changelog from v2:
- wrap up some offsets as marcos;
- add some error handling for erofs_get_pcpubuf();
- move some decompression inplace stuffes from PATCH 5 -> 6.

changelog from v1:
- keep Z_EROFS_NR_INLINE_PAGEVECS in unzip_vle.h after switching to
new decompression backend;
- add some DBG_BUGONs in new decompression backend to observe
potential issues;
- minor code cleanup.

8<--------

Hi,

After working on for more than half a year, the detail of erofs decompression
inplace is almost determined and ready for linux-next.

Currently, inplace IO is used if the whole compressed data is used
in order to reduce compressed pages extra memory overhead and an extra
memcpy (the only one memcpy) will be used for each inplace IO since
temporary buffer is needed to keep decompressing safe for inplace IO.

However, most of lz-based decompression algorithms support decompression
inplace by their algorithm designs, such as LZ4, LZO, etc.

If iend - oend margin is large enough, decompression inplace can be done
in the same buffer safely, as illustrated below:

start of compressed logical extent
| end of this logical extent
| |
______v___________________________v________
... | page 6 | page 7 | page 8 | page 9 | ...
|__________|__________|__________|__________|
. ^ . ^
. |compressed|
. | data |
. . .
|< dstsize >|<margin>|
oend iend
op ip

Fixed-size output compression can make the full use of this feature
to reduce memory overhead and avoid extra memcpy compared with fixed-size
input compression since iend is strictly not less than oend for fixed-size
output compression with inplace IO to last pages.

In addition, erofs compression indexes have been improved as well by
introducing compacted compression indexes.

These two techniques all benefit sequential read (on x86_64, 710.8MiB/s ->
755.4MiB/s; on Kirin980, 725MiB/s -> 812MiB/s) therefore erofs could have
similar sequential read performance against ext4 in a larger CR range
on high-spend SSD / NVMe devices as well.

However, note that it is _cpu vs storage device_ tradeoff, there is no
absolute performance conclusion for all on-market combinations.

Test images:
name size CR
enwik9 1000000000 1.00
enwik9_4k.squashfs.img 621211648 1.61
enwik9_4k.erofs.img 558133248 1.79
enwik9_8k.squashfs.img 556191744 1.80
enwik9_16k.squashfs.img 502661120 1.99
enwik9_128k.squashfs.img 398204928 2.51

Test Environment:
CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz (4 cores, 8 threads)
DDR: 8G
SSD: INTEL SSDPEKKF360G7H
Kernel: Linux 5.2-rc3+ (with lz4-1.8.3 algorithm)

Test configuration:
squashfs:
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
erofs:
CONFIG_EROFS_FS_USE_VM_MAP_RAM=y
CONFIG_EROFS_FS_ZIP=y
CONFIG_EROFS_FS_CLUSTER_PAGE_LIMIT=1
CONFIG_EROFS_FS_ZIP_CACHE_BIPOLAR=y

with intel_pstate=disable,
8 cpus on at 1801000 scaling_{min,max}_freq,
userspace scaling_governor

Sequential read results (MiB/s):
1 2 3 4 5 avg
enwik9_4k.ext4.img 767 770 738 726 724 745
enwik9_4k.erofs.img 756 745 770 746 760 755.4
enwik9_4k.squashfs.img 90.3 83.0 94.3 90.7 92.6 90.18
enwik9_8k.squashfs.img 111 108 110 108 110 109.4
enwik9_16k.squashfs.img 158 163 146 165 174 161.2
enwik9_128k.squashfs.img 324 314 262 262 296 291.6


Thanks,
Gao Xiang

Gao Xiang (8):
staging: erofs: add compacted ondisk compression indexes
staging: erofs: add compacted compression indexes support
staging: erofs: move per-CPU buffers implementation to utils.c
staging: erofs: move stagingpage operations to compress.h
staging: erofs: introduce generic decompression backend
staging: erofs: introduce LZ4 decompression inplace
staging: erofs: switch to new decompression backend
staging: erofs: integrate decompression inplace

drivers/staging/erofs/Makefile | 2 +-
drivers/staging/erofs/compress.h | 62 ++++
drivers/staging/erofs/data.c | 4 +-
drivers/staging/erofs/decompressor.c | 329 ++++++++++++++++++
drivers/staging/erofs/erofs_fs.h | 68 +++-
drivers/staging/erofs/inode.c | 12 +-
drivers/staging/erofs/internal.h | 52 ++-
drivers/staging/erofs/unzip_vle.c | 368 ++------------------
drivers/staging/erofs/unzip_vle.h | 38 +--
drivers/staging/erofs/unzip_vle_lz4.c | 229 -------------
drivers/staging/erofs/utils.c | 12 +
drivers/staging/erofs/zmap.c | 463 ++++++++++++++++++++++++++
12 files changed, 1006 insertions(+), 633 deletions(-)
create mode 100644 drivers/staging/erofs/compress.h
create mode 100644 drivers/staging/erofs/decompressor.c
delete mode 100644 drivers/staging/erofs/unzip_vle_lz4.c
create mode 100644 drivers/staging/erofs/zmap.c

--
2.17.1


2019-06-24 07:25:31

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 3/8] staging: erofs: move per-CPU buffers implementation to utils.c

From: Gao Xiang <[email protected]>

This patch moves per-CPU buffers to utils.c in order for
the upcoming generic decompression framework to use it.

Note that I tried to use generic per-CPU buffer or
per-CPU page approaches to clean up further, but obvious
performanace regression (about 2% for sequential read) was
observed.

Therefore let's leave it as it is instead, just move
to utils.c and I'll try to dig into the root cause later.

Signed-off-by: Gao Xiang <[email protected]>
---
change log v3:
- handle error code if returned by erofs_get_pcpubuf() pointed out by Chao.

drivers/staging/erofs/internal.h | 26 +++++++++++++++++++
drivers/staging/erofs/unzip_vle.c | 5 ++--
drivers/staging/erofs/unzip_vle.h | 4 +--
drivers/staging/erofs/unzip_vle_lz4.c | 37 +++++++++++----------------
drivers/staging/erofs/utils.c | 12 +++++++++
5 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index f3063b13c117..dcbe6f7f5dae 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -321,6 +321,16 @@ static inline void z_erofs_exit_zip_subsystem(void) {}

/* page count of a compressed cluster */
#define erofs_clusterpages(sbi) ((1 << (sbi)->clusterbits) / PAGE_SIZE)
+#define Z_EROFS_NR_INLINE_PAGEVECS 3
+
+#if (Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_NR_INLINE_PAGEVECS)
+#define EROFS_PCPUBUF_NR_PAGES Z_EROFS_CLUSTER_MAX_PAGES
+#else
+#define EROFS_PCPUBUF_NR_PAGES Z_EROFS_NR_INLINE_PAGEVECS
+#endif
+
+#else
+#define EROFS_PCPUBUF_NR_PAGES 0
#endif

typedef u64 erofs_off_t;
@@ -608,6 +618,22 @@ static inline void erofs_vunmap(const void *mem, unsigned int count)
extern struct shrinker erofs_shrinker_info;

struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp);
+
+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+void *erofs_get_pcpubuf(unsigned int pagenr);
+#define erofs_put_pcpubuf(buf) do { \
+ (void)&(buf); \
+ preempt_enable(); \
+} while (0)
+#else
+static inline void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+ return ERR_PTR(-ENOTSUPP);
+}
+
+#define erofs_put_pcpubuf(buf) do {} while (0)
+#endif
+
void erofs_register_super(struct super_block *sb);
void erofs_unregister_super(struct super_block *sb);

diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 8aea938172df..08f2d4302ecb 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -552,8 +552,7 @@ static int z_erofs_vle_work_iter_begin(struct z_erofs_vle_work_builder *builder,
if (IS_ERR(work))
return PTR_ERR(work);
got_it:
- z_erofs_pagevec_ctor_init(&builder->vector,
- Z_EROFS_VLE_INLINE_PAGEVECS,
+ z_erofs_pagevec_ctor_init(&builder->vector, Z_EROFS_NR_INLINE_PAGEVECS,
work->pagevec, work->vcnt);

if (builder->role >= Z_EROFS_VLE_WORK_PRIMARY) {
@@ -936,7 +935,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
for (i = 0; i < nr_pages; ++i)
pages[i] = NULL;

- z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_VLE_INLINE_PAGEVECS,
+ z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
work->pagevec, 0);

for (i = 0; i < work->vcnt; ++i) {
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 902e67d04029..9c53009700cf 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -44,8 +44,6 @@ static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool,
*
*/

-#define Z_EROFS_VLE_INLINE_PAGEVECS 3
-
struct z_erofs_vle_work {
struct mutex lock;

@@ -58,7 +56,7 @@ struct z_erofs_vle_work {

union {
/* L: pagevec */
- erofs_vtptr_t pagevec[Z_EROFS_VLE_INLINE_PAGEVECS];
+ erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
struct rcu_head rcu;
};
};
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c
index 0daac9b984a8..02e694d9288d 100644
--- a/drivers/staging/erofs/unzip_vle_lz4.c
+++ b/drivers/staging/erofs/unzip_vle_lz4.c
@@ -34,16 +34,6 @@ static int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
return -EIO;
}

-#if Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_VLE_INLINE_PAGEVECS
-#define EROFS_PERCPU_NR_PAGES Z_EROFS_CLUSTER_MAX_PAGES
-#else
-#define EROFS_PERCPU_NR_PAGES Z_EROFS_VLE_INLINE_PAGEVECS
-#endif
-
-static struct {
- char data[PAGE_SIZE * EROFS_PERCPU_NR_PAGES];
-} erofs_pcpubuf[NR_CPUS];
-
int z_erofs_vle_plain_copy(struct page **compressed_pages,
unsigned int clusterpages,
struct page **pages,
@@ -56,8 +46,9 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
char *percpu_data;
bool mirrored[Z_EROFS_CLUSTER_MAX_PAGES] = { 0 };

- preempt_disable();
- percpu_data = erofs_pcpubuf[smp_processor_id()].data;
+ percpu_data = erofs_get_pcpubuf(0);
+ if (IS_ERR(percpu_data))
+ return PTR_ERR(percpu_data);

j = 0;
for (i = 0; i < nr_pages; j = i++) {
@@ -117,7 +108,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
if (src && !mirrored[j])
kunmap_atomic(src);

- preempt_enable();
+ erofs_put_pcpubuf(percpu_data);
return 0;
}

@@ -131,7 +122,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
unsigned int nr_pages, i, j;
int ret;

- if (outlen + pageofs > EROFS_PERCPU_NR_PAGES * PAGE_SIZE)
+ if (outlen + pageofs > EROFS_PCPUBUF_NR_PAGES * PAGE_SIZE)
return -ENOTSUPP;

nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE);
@@ -144,8 +135,9 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
return -ENOMEM;
}

- preempt_disable();
- vout = erofs_pcpubuf[smp_processor_id()].data;
+ vout = erofs_get_pcpubuf(0);
+ if (IS_ERR(vout))
+ return PTR_ERR(vout);

ret = z_erofs_unzip_lz4(vin, vout + pageofs,
clusterpages * PAGE_SIZE, outlen);
@@ -174,7 +166,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
}

out:
- preempt_enable();
+ erofs_put_pcpubuf(vout);

if (clusterpages == 1)
kunmap_atomic(vin);
@@ -196,8 +188,9 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
int ret;

if (overlapped) {
- preempt_disable();
- vin = erofs_pcpubuf[smp_processor_id()].data;
+ vin = erofs_get_pcpubuf(0);
+ if (IS_ERR(vin))
+ return PTR_ERR(vin);

for (i = 0; i < clusterpages; ++i) {
void *t = kmap_atomic(compressed_pages[i]);
@@ -216,13 +209,13 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
if (ret > 0)
ret = 0;

- if (!overlapped) {
+ if (overlapped) {
+ erofs_put_pcpubuf(vin);
+ } else {
if (clusterpages == 1)
kunmap_atomic(vin);
else
erofs_vunmap(vin, clusterpages);
- } else {
- preempt_enable();
}
return ret;
}
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c
index 3e7d30b6de1d..4bbd3bf34acd 100644
--- a/drivers/staging/erofs/utils.c
+++ b/drivers/staging/erofs/utils.c
@@ -27,6 +27,18 @@ struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp)
return page;
}

+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+static struct {
+ u8 data[PAGE_SIZE * EROFS_PCPUBUF_NR_PAGES];
+} ____cacheline_aligned_in_smp erofs_pcpubuf[NR_CPUS];
+
+void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+ preempt_disable();
+ return &erofs_pcpubuf[smp_processor_id()].data[pagenr * PAGE_SIZE];
+}
+#endif
+
/* global shrink count (for all mounted EROFS instances) */
static atomic_long_t erofs_global_shrink_cnt;

--
2.17.1

2019-06-24 07:25:33

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 6/8] staging: erofs: introduce LZ4 decompression inplace

From: Gao Xiang <[email protected]>

compressed data will be usually loaded into last pages of
the extent (the last page for 4k) for in-place decompression
(more specifically, in-place IO), as ilustration below,

start of compressed logical extent
| end of this logical extent
| |
______v___________________________v________
... | page 6 | page 7 | page 8 | page 9 | ...
|__________|__________|__________|__________|
. ^ . ^
. |compressed|
. | data |
. . .
|< dstsize >|<margin>|
oend iend
op ip

Therefore, it's possible to do decompression inplace (thus no
memcpy at all) if the margin is sufficient and safe enough [1],
and it can be implemented only for fixed-size output compression
compared with fixed-size input compression.

No memcpy for most of in-place IO (about 99% of enwik9) after
decompression inplace is implemented and sequential read will
be improved of course (see the following patches for test results).

[1] https://github.com/lz4/lz4/commit/b17f578a919b7e6b078cede2d52be29dd48c8e8c
https://github.com/lz4/lz4/commit/5997e139f53169fa3a1c1b4418d2452a90b01602

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
drivers/staging/erofs/compress.h | 1 +
drivers/staging/erofs/decompressor.c | 36 ++++++++++++++++++++++++----
drivers/staging/erofs/erofs_fs.h | 3 ++-
3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index ebeccb1f4eae..c43aa3374d28 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -17,6 +17,7 @@ enum {
};

struct z_erofs_decompress_req {
+ struct super_block *sb;
struct page **in, **out;

unsigned short pageofs_out;
diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
index df8fd68a338b..80f1f39719ba 100644
--- a/drivers/staging/erofs/decompressor.c
+++ b/drivers/staging/erofs/decompressor.c
@@ -14,6 +14,9 @@
#endif

#define LZ4_MAX_DISTANCE_PAGES DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+#ifndef LZ4_DECOMPRESS_INPLACE_MARGIN
+#define LZ4_DECOMPRESS_INPLACE_MARGIN(srcsize) (((srcsize) >> 8) + 32)
+#endif

struct z_erofs_decompressor {
/*
@@ -112,7 +115,7 @@ static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
{
unsigned int inputmargin, inlen;
u8 *src;
- bool copied;
+ bool copied, support_0padding;
int ret;

if (rq->inputsize > PAGE_SIZE)
@@ -120,13 +123,38 @@ static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)

src = kmap_atomic(*rq->in);
inputmargin = 0;
+ support_0padding = false;
+
+ /* decompression inplace is only safe when 0padding is enabled */
+ if (EROFS_SB(rq->sb)->requirements & EROFS_REQUIREMENT_LZ4_0PADDING) {
+ support_0padding = true;
+
+ while (!src[inputmargin & ~PAGE_MASK])
+ if (!(++inputmargin & ~PAGE_MASK))
+ break;
+
+ if (inputmargin >= rq->inputsize) {
+ kunmap_atomic(src);
+ return -EIO;
+ }
+ }

copied = false;
inlen = rq->inputsize - inputmargin;
if (rq->inplace_io) {
- src = generic_copy_inplace_data(rq, src, inputmargin);
- inputmargin = 0;
- copied = true;
+ const uint oend = (rq->pageofs_out +
+ rq->outputsize) & ~PAGE_MASK;
+ const uint nr = PAGE_ALIGN(rq->pageofs_out +
+ rq->outputsize) >> PAGE_SHIFT;
+
+ if (rq->partial_decoding || !support_0padding ||
+ rq->out[nr - 1] != rq->in[0] ||
+ rq->inputsize - oend <
+ LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) {
+ src = generic_copy_inplace_data(rq, src, inputmargin);
+ inputmargin = 0;
+ copied = true;
+ }
}

ret = LZ4_decompress_safe_partial(src + inputmargin, out,
diff --git a/drivers/staging/erofs/erofs_fs.h b/drivers/staging/erofs/erofs_fs.h
index 9a9aaf2d9fbb..9f61abb7c1ca 100644
--- a/drivers/staging/erofs/erofs_fs.h
+++ b/drivers/staging/erofs/erofs_fs.h
@@ -21,7 +21,8 @@
* Any bits that aren't in EROFS_ALL_REQUIREMENTS should be
* incompatible with this kernel version.
*/
-#define EROFS_ALL_REQUIREMENTS 0
+#define EROFS_REQUIREMENT_LZ4_0PADDING 0x00000001
+#define EROFS_ALL_REQUIREMENTS EROFS_REQUIREMENT_LZ4_0PADDING

struct erofs_super_block {
/* 0 */__le32 magic; /* in the little endian */
--
2.17.1

2019-06-24 07:25:50

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 5/8] staging: erofs: introduce generic decompression backend

From: Gao Xiang <[email protected]>

This patch adds a new generic decompression framework
in order to replace the old LZ4-specific decompression code.

Even though LZ4 is still the only supported algorithm, yet
it is more cleaner and easy to integrate new algorithm than
the old almost hard-coded decompression backend.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
change log v3:
- move extra 0PADDING logic (used as a decompression inplace enable mark)
to the following patch;

drivers/staging/erofs/Makefile | 2 +-
drivers/staging/erofs/compress.h | 21 ++
drivers/staging/erofs/decompressor.c | 301 +++++++++++++++++++++++++++
3 files changed, 323 insertions(+), 1 deletion(-)
create mode 100644 drivers/staging/erofs/decompressor.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index 84b412c7a991..adeb5d6e2668 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
ccflags-y += -I $(srctree)/$(src)/include
erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o decompressor.o

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index 1dcfc3b35118..ebeccb1f4eae 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -9,6 +9,24 @@
#ifndef __EROFS_FS_COMPRESS_H
#define __EROFS_FS_COMPRESS_H

+#include "internal.h"
+
+enum {
+ Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+ Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
+struct z_erofs_decompress_req {
+ struct page **in, **out;
+
+ unsigned short pageofs_out;
+ unsigned int inputsize, outputsize;
+
+ /* indicate the algorithm will be used for decompression */
+ unsigned int alg;
+ bool inplace_io, partial_decoding;
+};
+
/*
* - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
* used to mark temporary allocated pages from other
@@ -36,5 +54,8 @@ static inline bool z_erofs_put_stagingpage(struct list_head *pagepool,
return true;
}

+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool);
+
#endif

diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
new file mode 100644
index 000000000000..df8fd68a338b
--- /dev/null
+++ b/drivers/staging/erofs/decompressor.c
@@ -0,0 +1,301 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/drivers/staging/erofs/decompressor.c
+ *
+ * Copyright (C) 2019 HUAWEI, Inc.
+ * http://www.huawei.com/
+ * Created by Gao Xiang <[email protected]>
+ */
+#include "compress.h"
+#include <linux/lz4.h>
+
+#ifndef LZ4_DISTANCE_MAX /* history window size */
+#define LZ4_DISTANCE_MAX 65535 /* set to maximum value by default */
+#endif
+
+#define LZ4_MAX_DISTANCE_PAGES DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+
+struct z_erofs_decompressor {
+ /*
+ * if destpages have sparsed pages, fill them with bounce pages.
+ * it also check whether destpages indicate continuous physical memory.
+ */
+ int (*prepare_destpages)(struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool);
+ int (*decompress)(struct z_erofs_decompress_req *rq, u8 *out);
+ char *name;
+};
+
+static int lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool)
+{
+ const unsigned int nr =
+ PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+ struct page *availables[LZ4_MAX_DISTANCE_PAGES] = { NULL };
+ unsigned long unused[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
+ BITS_PER_LONG)] = { 0 };
+ void *kaddr = NULL;
+ unsigned int i, j, k;
+
+ for (i = 0; i < nr; ++i) {
+ struct page *const page = rq->out[i];
+
+ j = i & (LZ4_MAX_DISTANCE_PAGES - 1);
+ if (availables[j])
+ __set_bit(j, unused);
+
+ if (page) {
+ if (kaddr) {
+ if (kaddr + PAGE_SIZE == page_address(page))
+ kaddr += PAGE_SIZE;
+ else
+ kaddr = NULL;
+ } else if (!i) {
+ kaddr = page_address(page);
+ }
+ continue;
+ }
+ kaddr = NULL;
+
+ k = find_first_bit(unused, LZ4_MAX_DISTANCE_PAGES);
+ if (k < LZ4_MAX_DISTANCE_PAGES) {
+ j = k;
+ get_page(availables[j]);
+ } else {
+ DBG_BUGON(availables[j]);
+
+ if (!list_empty(pagepool)) {
+ availables[j] = lru_to_page(pagepool);
+ list_del(&availables[j]->lru);
+ DBG_BUGON(page_ref_count(availables[j]) != 1);
+ } else {
+ availables[j] = alloc_pages(GFP_KERNEL, 0);
+ if (!availables[j])
+ return -ENOMEM;
+ }
+ availables[j]->mapping = Z_EROFS_MAPPING_STAGING;
+ }
+ rq->out[i] = availables[j];
+ __clear_bit(j, unused);
+ }
+ return kaddr ? 1 : 0;
+}
+
+static void *generic_copy_inplace_data(struct z_erofs_decompress_req *rq,
+ u8 *src, unsigned int pageofs_in)
+{
+ /*
+ * if in-place decompression is ongoing, those decompressed
+ * pages should be copied in order to avoid being overlapped.
+ */
+ struct page **in = rq->in;
+ u8 *const tmp = erofs_get_pcpubuf(0);
+ u8 *tmpp = tmp;
+ unsigned int inlen = rq->inputsize - pageofs_in;
+ unsigned int count = min_t(uint, inlen, PAGE_SIZE - pageofs_in);
+
+ while (tmpp < tmp + inlen) {
+ if (!src)
+ src = kmap_atomic(*in);
+ memcpy(tmpp, src + pageofs_in, count);
+ kunmap_atomic(src);
+ src = NULL;
+ tmpp += count;
+ pageofs_in = 0;
+ count = PAGE_SIZE;
+ ++in;
+ }
+ return tmp;
+}
+
+static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
+{
+ unsigned int inputmargin, inlen;
+ u8 *src;
+ bool copied;
+ int ret;
+
+ if (rq->inputsize > PAGE_SIZE)
+ return -ENOTSUPP;
+
+ src = kmap_atomic(*rq->in);
+ inputmargin = 0;
+
+ copied = false;
+ inlen = rq->inputsize - inputmargin;
+ if (rq->inplace_io) {
+ src = generic_copy_inplace_data(rq, src, inputmargin);
+ inputmargin = 0;
+ copied = true;
+ }
+
+ ret = LZ4_decompress_safe_partial(src + inputmargin, out,
+ inlen, rq->outputsize,
+ rq->outputsize);
+ if (ret < 0) {
+ errln("%s, failed to decompress, in[%p, %u, %u] out[%p, %u]",
+ __func__, src + inputmargin, inlen, inputmargin,
+ out, rq->outputsize);
+ WARN_ON(1);
+ print_hex_dump(KERN_DEBUG, "[ in]: ", DUMP_PREFIX_OFFSET,
+ 16, 1, src + inputmargin, inlen, true);
+ print_hex_dump(KERN_DEBUG, "[out]: ", DUMP_PREFIX_OFFSET,
+ 16, 1, out, rq->outputsize, true);
+ ret = -EIO;
+ }
+
+ if (copied)
+ erofs_put_pcpubuf(src);
+ else
+ kunmap_atomic(src);
+ return ret;
+}
+
+static struct z_erofs_decompressor decompressors[] = {
+ [Z_EROFS_COMPRESSION_SHIFTED] = {
+ .name = "shifted"
+ },
+ [Z_EROFS_COMPRESSION_LZ4] = {
+ .prepare_destpages = lz4_prepare_destpages,
+ .decompress = lz4_decompress,
+ .name = "lz4"
+ },
+};
+
+static void copy_from_pcpubuf(struct page **out, const char *dst,
+ unsigned short pageofs_out,
+ unsigned int outputsize)
+{
+ const char *end = dst + outputsize;
+ const unsigned int righthalf = PAGE_SIZE - pageofs_out;
+ const char *cur = dst - pageofs_out;
+
+ while (cur < end) {
+ struct page *const page = *out++;
+
+ if (page) {
+ char *buf = kmap_atomic(page);
+
+ if (cur >= dst) {
+ memcpy(buf, cur, min_t(uint, PAGE_SIZE,
+ end - cur));
+ } else {
+ memcpy(buf + pageofs_out, cur + pageofs_out,
+ min_t(uint, righthalf, end - cur));
+ }
+ kunmap_atomic(buf);
+ }
+ cur += PAGE_SIZE;
+ }
+}
+
+static int decompress_generic(struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool)
+{
+ const unsigned int nrpages_out =
+ PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+ const struct z_erofs_decompressor *alg = decompressors + rq->alg;
+ unsigned int dst_maptype;
+ void *dst;
+ int ret;
+
+ if (nrpages_out == 1 && !rq->inplace_io) {
+ DBG_BUGON(!*rq->out);
+ dst = kmap_atomic(*rq->out);
+ dst_maptype = 0;
+ goto dstmap_out;
+ }
+
+ /*
+ * For the case of small output size (especially much less
+ * than PAGE_SIZE), memcpy the decompressed data rather than
+ * compressed data is preferred.
+ */
+ if (rq->outputsize <= PAGE_SIZE * 7 / 8) {
+ dst = erofs_get_pcpubuf(0);
+ if (IS_ERR(dst))
+ return PTR_ERR(dst);
+
+ rq->inplace_io = false;
+ ret = alg->decompress(rq, dst);
+ if (!ret)
+ copy_from_pcpubuf(rq->out, dst, rq->pageofs_out,
+ rq->outputsize);
+
+ erofs_put_pcpubuf(dst);
+ return ret;
+ }
+
+ ret = alg->prepare_destpages(rq, pagepool);
+ if (ret < 0) {
+ return ret;
+ } else if (ret) {
+ dst = page_address(*rq->out);
+ dst_maptype = 1;
+ goto dstmap_out;
+ }
+
+ dst = erofs_vmap(rq->out, nrpages_out);
+ if (!dst)
+ return -ENOMEM;
+ dst_maptype = 2;
+
+dstmap_out:
+ ret = alg->decompress(rq, dst + rq->pageofs_out);
+
+ if (!dst_maptype)
+ kunmap_atomic(dst);
+ else if (dst_maptype == 2)
+ erofs_vunmap(dst, nrpages_out);
+ return ret;
+}
+
+static int shifted_decompress(const struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool)
+{
+ const unsigned int nrpages_out =
+ PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+ const unsigned int righthalf = PAGE_SIZE - rq->pageofs_out;
+ unsigned char *src, *dst;
+
+ if (nrpages_out > 2) {
+ DBG_BUGON(1);
+ return -EIO;
+ }
+
+ if (rq->out[0] == *rq->in) {
+ DBG_BUGON(nrpages_out != 1);
+ return 0;
+ }
+
+ src = kmap_atomic(*rq->in);
+ if (!rq->out[0]) {
+ dst = NULL;
+ } else {
+ dst = kmap_atomic(rq->out[0]);
+ memcpy(dst + rq->pageofs_out, src, righthalf);
+ }
+
+ if (rq->out[1] == *rq->in) {
+ memmove(src, src + righthalf, rq->pageofs_out);
+ } else if (nrpages_out == 2) {
+ if (dst)
+ kunmap_atomic(dst);
+ DBG_BUGON(!rq->out[1]);
+ dst = kmap_atomic(rq->out[1]);
+ memcpy(dst, src + righthalf, rq->pageofs_out);
+ }
+ if (dst)
+ kunmap_atomic(dst);
+ kunmap_atomic(src);
+ return 0;
+}
+
+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+ struct list_head *pagepool)
+{
+ if (rq->alg == Z_EROFS_COMPRESSION_SHIFTED)
+ return shifted_decompress(rq, pagepool);
+ return decompress_generic(rq, pagepool);
+}
+
--
2.17.1

2019-06-24 07:35:14

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] staging: erofs: move per-CPU buffers implementation to utils.c

On 2019/6/24 15:22, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> This patch moves per-CPU buffers to utils.c in order for
> the upcoming generic decompression framework to use it.
>
> Note that I tried to use generic per-CPU buffer or
> per-CPU page approaches to clean up further, but obvious
> performanace regression (about 2% for sequential read) was
> observed.
>
> Therefore let's leave it as it is instead, just move
> to utils.c and I'll try to dig into the root cause later.
>
> Signed-off-by: Gao Xiang <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2019-06-24 07:35:58

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v3 0/8] staging: erofs: decompression inplace approach

On 2019/6/24 15:22, Gao Xiang wrote:
> This is patch v3 of erofs decompression inplace approach, which is sent
> out by my personal email since I'm out of office to attend Open Source
> Summit China 2019 these days. No major change from PATCH v2 since no
> noticeable issue raised from landing to our products till now, mainly
> as a response to Chao's suggestions.
>
> See the bottom lines which are taken from RFC PATCH v1 and describe
> the principle of these technologies.
>
> The series is based on the latest staging-next since all dependencies
> have already been merged.
>
> changelog from v2:
> - wrap up some offsets as marcos;
> - add some error handling for erofs_get_pcpubuf();
> - move some decompression inplace stuffes from PATCH 5 -> 6.

Thanks for the update, those all changes look good to me. :)

Thanks,