2021-04-07 18:21:48

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 00/10] erofs: add big pcluster compression support

Hi folks,

This is the formal version of EROFS big pcluster support, which means
EROFS can compress data into more than 1 fs block after this patchset.

{l,p}cluster are EROFS-specific concepts, standing for `logical cluster'
and `physical cluster' correspondingly. Logical cluster is the basic unit
of compress indexes in file logical mapping, e.g. it can build compress
indexes in 2 blocks rather than 1 block (currently only 1 block lcluster
is supported). Physical cluster is a container of physical compressed
blocks which contains compressed data, the size of which is the multiple
of lclustersize.

Different from previous thoughts, which had fixed-sized pclusterblks
recorded in the on-disk compress index header, our on-disk design allows
variable-sized pclusterblks now. The main reasons are
- user data varies in compression ratio locally, so fixed-sized
clustersize approach is space-wasting and causes extra read
amplification for high CR cases;

- inplace decompression needs zero padding to guarantee its safe margin,
but we don't want to pad more than 1 fs block for big pcluster;

- end users can now customize the pcluster size according to data type
since various pclustersize can exist in a file, for example, using
different pcluster size for executable code and one-shot data. such
design should be more flexible than many other public compression fses
(Btw, each file in EROFS can have maximum 2 algorithms at the same time
by using HEAD1/2, which will be formally added with LZMA support.)

In brief, EROFS can now compress from variable-sized input to
variable-sized pcluster blocks, as illustrated below:

|<-_lcluster_->|________________________|<-_lcluster_->|
|____._________|_________ .. ___________|_______.______|
. .
. .
.__________________________________.
|______________| .. |______________|
|<- pcluster ->|

The next step would be how to record the compressed block count in
lclusters. In compress indexes, there are 2 concepts called HEAD and
NONHEAD lclusters. The difference is that HEAD lcluster starts a new
pcluster in the lcluster, but NONHEAD not. It's easy to understand
that big pclusters at least have 2 pclusters, thus at least 2 lclusters
as well.

Therefore, let the delta0 (distance to its HEAD lcluster) of first NONHEAD
compress index store the compressed block count with a special flag as a
new called CBLKCNT compress index. It's also easy to know its delta0 is
constantly 1, as illustrated below:
________________________________________________________
|_HEAD_|_CBLKCNT_|_NONHEAD_|_..._|_NONHEAD_|_HEAD | HEAD |
|<------ a pcluster with CBLKCNT --------->|<-- -->|
^ a pcluster with 1

If another HEAD follows a HEAD lcluster, there is no room to record
CBLKCNT, but it's easy to know the size of pcluster will be 1.

More implementation details about this and compact indexes are in the
commit message.

On the runtime performance side, the current EROFS test results are:
________________________________________________________________
| file system | size | seq read | rand read | rand9m read |
|_______________|___________|_ MiB/s __|__ MiB/s __|___ MiB/s ___|
|___erofs_4k____|_556879872_|_ 781.4 __|__ 55.3 ___|___ 25.3 ___|
|___erofs_16k___|_452509696_|_ 864.8 __|_ 123.2 ___|___ 20.8 ___|
|___erofs_32k___|_415223808_|_ 899.8 __|_ 105.8 _*_|___ 16.8 ____|
|___erofs_64k___|_393814016_|_ 906.6 __|__ 66.6 _*_|___ 11.8 ____|
|__squashfs_8k__|_556191744_|_ 64.9 __|__ 19.3 ___|____ 9.1 ____|
|__squashfs_16k_|_502661120_|_ 98.9 __|__ 38.0 ___|____ 9.8 ____|
|__squashfs_32k_|_458784768_|_ 115.4 __|__ 71.6 _*_|___ 10.0 ____|
|_squashfs_128k_|_398204928_|_ 257.2 __|_ 253.8 _*_|___ 10.9 ____|
|____ext4_4k____|____()_____|_ 786.6 __|__ 28.6 ___|___ 27.8 ____|


* Squashfs grabs more page cache to keep all decompressed data with
grab_cache_page_nowait() than the normal requested readahead (see
squashfs_copy_cache and squashfs_readpage_block).
In principle, EROFS can also cache such all decompressed data
if necessary, yet it's low priority for now and has little use
(rand9m is actually a better rand read workload, since the amount
of I/O is 9m rather than full-sized 1000m).

More details are in
https://lore.kernel.org/r/[email protected]

Also it's easy to know EROFS is not a fixed pcluster design, so users
can make several optimized strategy according to data type when mkfs.
And there is still room to optimize runtime performance for big pcluster
even further.

Finally, it passes ro_fsstress and can also successfully boot buildroot
& Android system with android-mainline repo.

current mkfs repo for big pcluster:
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental-bigpcluster-compact

Thanks for your time on reading this!

Thanks,
Gao Xiang

changes since v2:
- introduce a new erofs_vm_ram_map() helper to reduce duplicated logic
and fix uninitialized variable pointed out by Colin & Joe;
- add a new EXPERIMENTAL warning for new big pcluster feature to end
users.

changes since v1:
- add a missing vunmap in erofs_pcpubuf_exit();
- refine comments and commit messages.

Gao Xiang (10):
erofs: reserve physical_clusterbits[]
erofs: introduce multipage per-CPU buffers
erofs: introduce physical cluster slab pools
erofs: fix up inplace I/O pointer for big pcluster
erofs: add big physical cluster definition
erofs: adjust per-CPU buffers according to max_pclusterblks
erofs: support parsing big pcluster compress indexes
erofs: support parsing big pcluster compact indexes
erofs: support decompress big pcluster for lz4 backend
erofs: enable big pcluster feature

fs/erofs/Kconfig | 14 ---
fs/erofs/Makefile | 2 +-
fs/erofs/decompressor.c | 235 ++++++++++++++++++++++++----------------
fs/erofs/erofs_fs.h | 31 ++++--
fs/erofs/internal.h | 44 ++++----
fs/erofs/pcpubuf.c | 134 +++++++++++++++++++++++
fs/erofs/super.c | 1 +
fs/erofs/utils.c | 12 --
fs/erofs/zdata.c | 193 +++++++++++++++++++++------------
fs/erofs/zdata.h | 14 +--
fs/erofs/zmap.c | 162 ++++++++++++++++++++++-----
11 files changed, 587 insertions(+), 255 deletions(-)
create mode 100644 fs/erofs/pcpubuf.c

--
2.20.1


2021-04-07 18:22:04

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 03/10] erofs: introduce physical cluster slab pools

From: Gao Xiang <[email protected]>

Since multiple pcluster sizes could be used at once, the number of
compressed pages will become a variable factor. It's necessary to
introduce slab pools rather than a single slab cache now.

This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
use now.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/Kconfig | 14 ----
fs/erofs/erofs_fs.h | 3 +
fs/erofs/internal.h | 3 -
fs/erofs/zdata.c | 172 +++++++++++++++++++++++++++++---------------
fs/erofs/zdata.h | 14 ++--
5 files changed, 126 insertions(+), 80 deletions(-)

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 74b0aaa7114c..858b3339f381 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -76,17 +76,3 @@ config EROFS_FS_ZIP

If you don't want to enable compression feature, say N.

-config EROFS_FS_CLUSTER_PAGE_LIMIT
- int "EROFS Cluster Pages Hard Limit"
- depends on EROFS_FS_ZIP
- range 1 256
- default "1"
- help
- Indicates maximum # of pages of a compressed
- physical cluster.
-
- For example, if files in a image were compressed
- into 8k-unit, hard limit should not be configured
- less than 2. Otherwise, the image will be refused
- to mount on this kernel.
-
diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 626b7d3e9ab7..76777673eb63 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -201,6 +201,9 @@ static inline unsigned int erofs_xattr_entry_size(struct erofs_xattr_entry *e)
e->e_name_len + le16_to_cpu(e->e_value_size));
}

+/* maximum supported size of a physical compression cluster */
+#define Z_EROFS_PCLUSTER_MAX_SIZE (1024 * 1024)
+
/* available compression algorithm types (for h_algorithmtype) */
enum {
Z_EROFS_COMPRESSION_LZ4 = 0,
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f707d28a46d9..06c294929069 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -194,9 +194,6 @@ static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp)
return v;
}
#endif /* !CONFIG_SMP */
-
-/* hard limit of pages per compressed cluster */
-#define Z_EROFS_CLUSTER_MAX_PAGES (CONFIG_EROFS_FS_CLUSTER_PAGE_LIMIT)
#endif /* !CONFIG_EROFS_FS_ZIP */

/* we strictly follow PAGE_SIZE and no buffer head yet */
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index e3f0100d82d1..db296d324333 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -10,6 +10,93 @@

#include <trace/events/erofs.h>

+/*
+ * since pclustersize is variable for big pcluster feature, introduce slab
+ * pools implementation for different pcluster sizes.
+ */
+struct z_erofs_pcluster_slab {
+ struct kmem_cache *slab;
+ unsigned int maxpages;
+ char name[48];
+};
+
+#define _PCLP(n) { .maxpages = n }
+
+static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
+ _PCLP(1), _PCLP(4), _PCLP(16), _PCLP(64), _PCLP(128),
+ _PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
+};
+
+static void z_erofs_destroy_pcluster_pool(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(pcluster_pool); ++i) {
+ if (!pcluster_pool[i].slab)
+ continue;
+ kmem_cache_destroy(pcluster_pool[i].slab);
+ pcluster_pool[i].slab = NULL;
+ }
+}
+
+static int z_erofs_create_pcluster_pool(void)
+{
+ struct z_erofs_pcluster_slab *pcs;
+ struct z_erofs_pcluster *a;
+ unsigned int size;
+
+ for (pcs = pcluster_pool;
+ pcs < pcluster_pool + ARRAY_SIZE(pcluster_pool); ++pcs) {
+ size = struct_size(a, compressed_pages, pcs->maxpages);
+
+ sprintf(pcs->name, "erofs_pcluster-%u", pcs->maxpages);
+ pcs->slab = kmem_cache_create(pcs->name, size, 0,
+ SLAB_RECLAIM_ACCOUNT, NULL);
+ if (pcs->slab)
+ continue;
+
+ z_erofs_destroy_pcluster_pool();
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static struct z_erofs_pcluster *z_erofs_alloc_pcluster(unsigned int nrpages)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(pcluster_pool); ++i) {
+ struct z_erofs_pcluster_slab *pcs = pcluster_pool + i;
+ struct z_erofs_pcluster *pcl;
+
+ if (nrpages > pcs->maxpages)
+ continue;
+
+ pcl = kmem_cache_zalloc(pcs->slab, GFP_NOFS);
+ if (!pcl)
+ return ERR_PTR(-ENOMEM);
+ pcl->pclusterpages = nrpages;
+ return pcl;
+ }
+ return ERR_PTR(-EINVAL);
+}
+
+static void z_erofs_free_pcluster(struct z_erofs_pcluster *pcl)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(pcluster_pool); ++i) {
+ struct z_erofs_pcluster_slab *pcs = pcluster_pool + i;
+
+ if (pcl->pclusterpages > pcs->maxpages)
+ continue;
+
+ kmem_cache_free(pcs->slab, pcl);
+ return;
+ }
+ DBG_BUGON(1);
+}
+
/*
* a compressed_pages[] placeholder in order to avoid
* being filled with file pages for in-place decompression.
@@ -37,12 +124,11 @@ typedef tagptr1_t compressed_page_t;
tagptr_fold(compressed_page_t, page, 1)

static struct workqueue_struct *z_erofs_workqueue __read_mostly;
-static struct kmem_cache *pcluster_cachep __read_mostly;

void z_erofs_exit_zip_subsystem(void)
{
destroy_workqueue(z_erofs_workqueue);
- kmem_cache_destroy(pcluster_cachep);
+ z_erofs_destroy_pcluster_pool();
}

static inline int z_erofs_init_workqueue(void)
@@ -59,32 +145,16 @@ static inline int z_erofs_init_workqueue(void)
return z_erofs_workqueue ? 0 : -ENOMEM;
}

-static void z_erofs_pcluster_init_once(void *ptr)
-{
- struct z_erofs_pcluster *pcl = ptr;
- struct z_erofs_collection *cl = z_erofs_primarycollection(pcl);
- unsigned int i;
-
- mutex_init(&cl->lock);
- cl->nr_pages = 0;
- cl->vcnt = 0;
- for (i = 0; i < Z_EROFS_CLUSTER_MAX_PAGES; ++i)
- pcl->compressed_pages[i] = NULL;
-}
-
int __init z_erofs_init_zip_subsystem(void)
{
- pcluster_cachep = kmem_cache_create("erofs_compress",
- Z_EROFS_WORKGROUP_SIZE, 0,
- SLAB_RECLAIM_ACCOUNT,
- z_erofs_pcluster_init_once);
- if (pcluster_cachep) {
- if (!z_erofs_init_workqueue())
- return 0;
-
- kmem_cache_destroy(pcluster_cachep);
- }
- return -ENOMEM;
+ int err = z_erofs_create_pcluster_pool();
+
+ if (err)
+ return err;
+ err = z_erofs_init_workqueue();
+ if (err)
+ z_erofs_destroy_pcluster_pool();
+ return err;
}

enum z_erofs_collectmode {
@@ -169,7 +239,6 @@ static void preload_compressed_pages(struct z_erofs_collector *clt,
struct list_head *pagepool)
{
const struct z_erofs_pcluster *pcl = clt->pcl;
- const unsigned int clusterpages = BIT(pcl->clusterbits);
struct page **pages = clt->compressedpages;
pgoff_t index = pcl->obj.index + (pages - pcl->compressed_pages);
bool standalone = true;
@@ -179,7 +248,7 @@ static void preload_compressed_pages(struct z_erofs_collector *clt,
if (clt->mode < COLLECT_PRIMARY_FOLLOWED)
return;

- for (; pages < pcl->compressed_pages + clusterpages; ++pages) {
+ for (; pages < pcl->compressed_pages + pcl->pclusterpages; ++pages) {
struct page *page;
compressed_page_t t;
struct page *newpage = NULL;
@@ -239,14 +308,13 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
struct z_erofs_pcluster *const pcl =
container_of(grp, struct z_erofs_pcluster, obj);
struct address_space *const mapping = MNGD_MAPPING(sbi);
- const unsigned int clusterpages = BIT(pcl->clusterbits);
int i;

/*
* refcount of workgroup is now freezed as 1,
* therefore no need to worry about available decompression users.
*/
- for (i = 0; i < clusterpages; ++i) {
+ for (i = 0; i < pcl->pclusterpages; ++i) {
struct page *page = pcl->compressed_pages[i];

if (!page)
@@ -271,13 +339,12 @@ int erofs_try_to_free_cached_page(struct address_space *mapping,
struct page *page)
{
struct z_erofs_pcluster *const pcl = (void *)page_private(page);
- const unsigned int clusterpages = BIT(pcl->clusterbits);
int ret = 0; /* 0 - busy */

if (erofs_workgroup_try_to_freeze(&pcl->obj, 1)) {
unsigned int i;

- for (i = 0; i < clusterpages; ++i) {
+ for (i = 0; i < pcl->pclusterpages; ++i) {
if (pcl->compressed_pages[i] == page) {
WRITE_ONCE(pcl->compressed_pages[i], NULL);
ret = 1;
@@ -297,9 +364,9 @@ static inline bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
struct page *page)
{
struct z_erofs_pcluster *const pcl = clt->pcl;
- const unsigned int clusterpages = BIT(pcl->clusterbits);

- while (clt->compressedpages < pcl->compressed_pages + clusterpages) {
+ while (clt->compressedpages <
+ pcl->compressed_pages + pcl->pclusterpages) {
if (!cmpxchg(clt->compressedpages++, NULL, page))
return true;
}
@@ -413,10 +480,10 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
struct erofs_workgroup *grp;
int err;

- /* no available workgroup, let's allocate one */
- pcl = kmem_cache_alloc(pcluster_cachep, GFP_NOFS);
- if (!pcl)
- return -ENOMEM;
+ /* no available pcluster, let's allocate one */
+ pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
+ if (IS_ERR(pcl))
+ return PTR_ERR(pcl);

atomic_set(&pcl->obj.refcount, 1);
pcl->obj.index = map->m_pa >> PAGE_SHIFT;
@@ -430,24 +497,18 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
else
pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;

- pcl->clusterbits = 0;
-
/* new pclusters should be claimed as type 1, primary and followed */
pcl->next = clt->owned_head;
clt->mode = COLLECT_PRIMARY_FOLLOWED;

cl = z_erofs_primarycollection(pcl);
-
- /* must be cleaned before freeing to slab */
- DBG_BUGON(cl->nr_pages);
- DBG_BUGON(cl->vcnt);
-
cl->pageofs = map->m_la & ~PAGE_MASK;

/*
* lock all primary followed works before visible to others
* and mutex_trylock *never* fails for a new pcluster.
*/
+ mutex_init(&cl->lock);
DBG_BUGON(!mutex_trylock(&cl->lock));

grp = erofs_insert_workgroup(inode->i_sb, &pcl->obj);
@@ -471,7 +532,7 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,

err_out:
mutex_unlock(&cl->lock);
- kmem_cache_free(pcluster_cachep, pcl);
+ z_erofs_free_pcluster(pcl);
return err;
}

@@ -517,7 +578,7 @@ static int z_erofs_collector_begin(struct z_erofs_collector *clt,

clt->compressedpages = clt->pcl->compressed_pages;
if (clt->mode <= COLLECT_PRIMARY) /* cannot do in-place I/O */
- clt->compressedpages += Z_EROFS_CLUSTER_MAX_PAGES;
+ clt->compressedpages += clt->pcl->pclusterpages;
return 0;
}

@@ -530,9 +591,8 @@ static void z_erofs_rcu_callback(struct rcu_head *head)
struct z_erofs_collection *const cl =
container_of(head, struct z_erofs_collection, rcu);

- kmem_cache_free(pcluster_cachep,
- container_of(cl, struct z_erofs_pcluster,
- primary_collection));
+ z_erofs_free_pcluster(container_of(cl, struct z_erofs_pcluster,
+ primary_collection));
}

void erofs_workgroup_free_rcu(struct erofs_workgroup *grp)
@@ -784,9 +844,8 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
struct list_head *pagepool)
{
struct erofs_sb_info *const sbi = EROFS_SB(sb);
- const unsigned int clusterpages = BIT(pcl->clusterbits);
struct z_erofs_pagevec_ctor ctor;
- unsigned int i, outputsize, llen, nr_pages;
+ unsigned int i, inputsize, outputsize, llen, nr_pages;
struct page *pages_onstack[Z_EROFS_VMAP_ONSTACK_PAGES];
struct page **pages, **compressed_pages, *page;

@@ -866,7 +925,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
overlapped = false;
compressed_pages = pcl->compressed_pages;

- for (i = 0; i < clusterpages; ++i) {
+ for (i = 0; i < pcl->pclusterpages; ++i) {
unsigned int pagenr;

page = compressed_pages[i];
@@ -919,12 +978,13 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
partial = true;
}

+ inputsize = pcl->pclusterpages * PAGE_SIZE;
err = z_erofs_decompress(&(struct z_erofs_decompress_req) {
.sb = sb,
.in = compressed_pages,
.out = pages,
.pageofs_out = cl->pageofs,
- .inputsize = PAGE_SIZE,
+ .inputsize = inputsize,
.outputsize = outputsize,
.alg = pcl->algorithmformat,
.inplace_io = overlapped,
@@ -933,7 +993,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,

out:
/* must handle all compressed pages before ending pages */
- for (i = 0; i < clusterpages; ++i) {
+ for (i = 0; i < pcl->pclusterpages; ++i) {
page = compressed_pages[i];

if (erofs_page_is_managed(sbi, page))
@@ -1236,7 +1296,7 @@ static void z_erofs_submit_queue(struct super_block *sb,
pcl = container_of(owned_head, struct z_erofs_pcluster, next);

cur = pcl->obj.index;
- end = cur + BIT(pcl->clusterbits);
+ end = cur + pcl->pclusterpages;

/* close the main owned chain at first */
owned_head = cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index b503b353d4ab..942ee69dff6a 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -10,6 +10,7 @@
#include "internal.h"
#include "zpvec.h"

+#define Z_EROFS_PCLUSTER_MAX_PAGES (Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE)
#define Z_EROFS_NR_INLINE_PAGEVECS 3

/*
@@ -59,16 +60,17 @@ struct z_erofs_pcluster {
/* A: point to next chained pcluster or TAILs */
z_erofs_next_pcluster_t next;

- /* A: compressed pages (including multi-usage pages) */
- struct page *compressed_pages[Z_EROFS_CLUSTER_MAX_PAGES];
-
/* A: lower limit of decompressed length and if full length or not */
unsigned int length;

+ /* I: physical cluster size in pages */
+ unsigned short pclusterpages;
+
/* I: compression algorithm format */
unsigned char algorithmformat;
- /* I: bit shift of physical cluster size */
- unsigned char clusterbits;
+
+ /* A: compressed pages (can be cached or inplaced pages) */
+ struct page *compressed_pages[];
};

#define z_erofs_primarycollection(pcluster) (&(pcluster)->primary_collection)
@@ -82,8 +84,6 @@ struct z_erofs_pcluster {

#define Z_EROFS_PCLUSTER_NIL (NULL)

-#define Z_EROFS_WORKGROUP_SIZE sizeof(struct z_erofs_pcluster)
-
struct z_erofs_decompressqueue {
struct super_block *sb;
atomic_t pending_bios;
--
2.20.1

2021-04-07 18:22:26

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 04/10] erofs: fix up inplace I/O pointer for big pcluster

From: Gao Xiang <[email protected]>

When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.

Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/zdata.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index db296d324333..78e4b598ecca 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -204,7 +204,8 @@ struct z_erofs_collector {

struct z_erofs_pcluster *pcl, *tailpcl;
struct z_erofs_collection *cl;
- struct page **compressedpages;
+ /* a pointer used to pick up inplace I/O pages */
+ struct page **icpage_ptr;
z_erofs_next_pcluster_t owned_head;

enum z_erofs_collectmode mode;
@@ -238,17 +239,19 @@ static void preload_compressed_pages(struct z_erofs_collector *clt,
enum z_erofs_cache_alloctype type,
struct list_head *pagepool)
{
- const struct z_erofs_pcluster *pcl = clt->pcl;
- struct page **pages = clt->compressedpages;
- pgoff_t index = pcl->obj.index + (pages - pcl->compressed_pages);
+ struct z_erofs_pcluster *pcl = clt->pcl;
bool standalone = true;
gfp_t gfp = (mapping_gfp_mask(mc) & ~__GFP_DIRECT_RECLAIM) |
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
+ struct page **pages;
+ pgoff_t index;

if (clt->mode < COLLECT_PRIMARY_FOLLOWED)
return;

- for (; pages < pcl->compressed_pages + pcl->pclusterpages; ++pages) {
+ pages = pcl->compressed_pages;
+ index = pcl->obj.index;
+ for (; index < pcl->obj.index + pcl->pclusterpages; ++index, ++pages) {
struct page *page;
compressed_page_t t;
struct page *newpage = NULL;
@@ -360,16 +363,14 @@ int erofs_try_to_free_cached_page(struct address_space *mapping,
}

/* page_type must be Z_EROFS_PAGE_TYPE_EXCLUSIVE */
-static inline bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
- struct page *page)
+static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
+ struct page *page)
{
struct z_erofs_pcluster *const pcl = clt->pcl;

- while (clt->compressedpages <
- pcl->compressed_pages + pcl->pclusterpages) {
- if (!cmpxchg(clt->compressedpages++, NULL, page))
+ while (clt->icpage_ptr > pcl->compressed_pages)
+ if (!cmpxchg(--clt->icpage_ptr, NULL, page))
return true;
- }
return false;
}

@@ -576,9 +577,8 @@ static int z_erofs_collector_begin(struct z_erofs_collector *clt,
z_erofs_pagevec_ctor_init(&clt->vector, Z_EROFS_NR_INLINE_PAGEVECS,
clt->cl->pagevec, clt->cl->vcnt);

- clt->compressedpages = clt->pcl->compressed_pages;
- if (clt->mode <= COLLECT_PRIMARY) /* cannot do in-place I/O */
- clt->compressedpages += clt->pcl->pclusterpages;
+ /* since file-backed online pages are traversed in reverse order */
+ clt->icpage_ptr = clt->pcl->compressed_pages + clt->pcl->pclusterpages;
return 0;
}

--
2.20.1

2021-04-07 18:22:27

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 05/10] erofs: add big physical cluster definition

From: Gao Xiang <[email protected]>

Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)

When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.

Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/erofs_fs.h | 19 +++++++++++++++----
fs/erofs/internal.h | 1 +
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 76777673eb63..ecc3a0ea0bc4 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -19,6 +19,7 @@
*/
#define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001
#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
+#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
#define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING

#define EROFS_SB_EXTSLOT_SIZE 16
@@ -214,17 +215,20 @@ enum {
/* 14 bytes (+ length field = 16 bytes) */
struct z_erofs_lz4_cfgs {
__le16 max_distance;
- u8 reserved[12];
+ __le16 max_pclusterblks;
+ u8 reserved[10];
} __packed;

/*
* bit 0 : COMPACTED_2B indexes (0 - off; 1 - on)
* e.g. for 4k logical cluster size, 4B if compacted 2B is off;
* (4B) + 2B + (4B) if compacted 2B is on.
+ * bit 1 : HEAD1 big pcluster (0 - off; 1 - on)
+ * bit 2 : HEAD2 big pcluster (0 - off; 1 - on)
*/
-#define Z_EROFS_ADVISE_COMPACTED_2B_BIT 0
-
-#define Z_EROFS_ADVISE_COMPACTED_2B (1 << Z_EROFS_ADVISE_COMPACTED_2B_BIT)
+#define Z_EROFS_ADVISE_COMPACTED_2B 0x0001
+#define Z_EROFS_ADVISE_BIG_PCLUSTER_1 0x0002
+#define Z_EROFS_ADVISE_BIG_PCLUSTER_2 0x0004

struct z_erofs_map_header {
__le32 h_reserved1;
@@ -279,6 +283,13 @@ enum {
#define Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS 2
#define Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT 0

+/*
+ * D0_CBLKCNT will be marked _only_ at the 1st non-head lcluster to store the
+ * compressed block count of a compressed extent (in logical clusters, aka.
+ * block count of a pcluster).
+ */
+#define Z_EROFS_VLE_DI_D0_CBLKCNT (1 << 11)
+
struct z_erofs_vle_decompressed_index {
__le16 di_advise;
/* where to decompress in the head cluster */
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 06c294929069..c4b3938a7e56 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -230,6 +230,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \

EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING)
EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS)
+EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER)
EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)

/* atomic flag definitions */
--
2.20.1

2021-04-07 18:22:29

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 06/10] erofs: adjust per-CPU buffers according to max_pclusterblks

From: Gao Xiang <[email protected]>

Adjust per-CPU buffers on demand since big pcluster definition is
available. Also, bail out unsupported pcluster size according to
Z_EROFS_PCLUSTER_MAX_SIZE.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/decompressor.c | 20 ++++++++++++++++----
fs/erofs/internal.h | 2 ++
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index fb4838c0f0df..900de4725d35 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -32,6 +32,7 @@ int z_erofs_load_lz4_config(struct super_block *sb,
struct erofs_super_block *dsb,
struct z_erofs_lz4_cfgs *lz4, int size)
{
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
u16 distance;

if (lz4) {
@@ -40,16 +41,27 @@ int z_erofs_load_lz4_config(struct super_block *sb,
return -EINVAL;
}
distance = le16_to_cpu(lz4->max_distance);
+
+ sbi->lz4.max_pclusterblks = le16_to_cpu(lz4->max_pclusterblks);
+ if (!sbi->lz4.max_pclusterblks) {
+ sbi->lz4.max_pclusterblks = 1; /* reserved case */
+ } else if (sbi->lz4.max_pclusterblks >
+ Z_EROFS_PCLUSTER_MAX_SIZE / EROFS_BLKSIZ) {
+ erofs_err(sb, "too large lz4 pclusterblks %u",
+ sbi->lz4.max_pclusterblks);
+ return -EINVAL;
+ } else if (sbi->lz4.max_pclusterblks >= 2) {
+ erofs_info(sb, "EXPERIMENTAL big pcluster feature in use. Use at your own risk!");
+ }
} else {
distance = le16_to_cpu(dsb->u1.lz4_max_distance);
+ sbi->lz4.max_pclusterblks = 1;
}

- EROFS_SB(sb)->lz4.max_distance_pages = distance ?
+ sbi->lz4.max_distance_pages = distance ?
DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
LZ4_MAX_DISTANCE_PAGES;
-
- /* TODO: use max pclusterblks after bigpcluster is enabled */
- return erofs_pcpubuf_growsize(1);
+ return erofs_pcpubuf_growsize(sbi->lz4.max_pclusterblks);
}

static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c4b3938a7e56..f1305af50f67 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -63,6 +63,8 @@ struct erofs_fs_context {
struct erofs_sb_lz4_info {
/* # of pages needed for EROFS lz4 rolling decompression */
u16 max_distance_pages;
+ /* maximum possible blocks for pclusters in the filesystem */
+ u16 max_pclusterblks;
};

struct erofs_sb_info {
--
2.20.1

2021-04-07 18:22:34

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 07/10] erofs: support parsing big pcluster compress indexes

From: Gao Xiang <[email protected]>

When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.

If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/zmap.c | 79 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 73 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7fd6bd843471..6c0c47f68b75 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -11,8 +11,10 @@
int z_erofs_fill_inode(struct inode *inode)
{
struct erofs_inode *const vi = EROFS_I(inode);
+ struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);

- if (vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY) {
+ if (!erofs_sb_has_big_pcluster(sbi) &&
+ vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY) {
vi->z_advise = 0;
vi->z_algorithmtype[0] = 0;
vi->z_algorithmtype[1] = 0;
@@ -49,7 +51,8 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
if (test_bit(EROFS_I_Z_INITED_BIT, &vi->flags))
goto out_unlock;

- DBG_BUGON(vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY);
+ DBG_BUGON(!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
+ vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY);

pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize +
vi->xattr_isize, 8);
@@ -96,7 +99,7 @@ struct z_erofs_maprecorder {
u8 type;
u16 clusterofs;
u16 delta[2];
- erofs_blk_t pblk;
+ erofs_blk_t pblk, compressedlcs;
};

static int z_erofs_reload_indexes(struct z_erofs_maprecorder *m,
@@ -159,6 +162,15 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
m->clusterofs = 1 << vi->z_logical_clusterbits;
m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
+ if (m->delta[0] & Z_EROFS_VLE_DI_D0_CBLKCNT) {
+ if (!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+ }
+ m->compressedlcs = m->delta[0] &
+ ~Z_EROFS_VLE_DI_D0_CBLKCNT;
+ m->delta[0] = 1;
+ }
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
break;
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
@@ -366,6 +378,58 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
return 0;
}

+static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
+ unsigned int initial_lcn)
+{
+ struct erofs_inode *const vi = EROFS_I(m->inode);
+ struct erofs_map_blocks *const map = m->map;
+ const unsigned int lclusterbits = vi->z_logical_clusterbits;
+ unsigned long lcn;
+ int err;
+
+ DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+ if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ map->m_plen = 1 << lclusterbits;
+ return 0;
+ }
+
+ lcn = m->lcn + 1;
+ if (m->compressedlcs)
+ goto out;
+ if (lcn == initial_lcn)
+ goto err_bonus_cblkcnt;
+
+ err = z_erofs_load_cluster_from_disk(m, lcn);
+ if (err)
+ return err;
+
+ switch (m->type) {
+ case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+ if (m->delta[0] != 1)
+ goto err_bonus_cblkcnt;
+ if (m->compressedlcs)
+ break;
+ fallthrough;
+ default:
+ erofs_err(m->inode->i_sb,
+ "cannot found CBLKCNT @ lcn %lu of nid %llu",
+ lcn, vi->nid);
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+ }
+out:
+ map->m_plen = m->compressedlcs << lclusterbits;
+ return 0;
+err_bonus_cblkcnt:
+ erofs_err(m->inode->i_sb,
+ "bogus CBLKCNT @ lcn %lu of nid %llu",
+ lcn, vi->nid);
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+}
+
int z_erofs_map_blocks_iter(struct inode *inode,
struct erofs_map_blocks *map,
int flags)
@@ -377,6 +441,7 @@ int z_erofs_map_blocks_iter(struct inode *inode,
};
int err = 0;
unsigned int lclusterbits, endoff;
+ unsigned long initial_lcn;
unsigned long long ofs, end;

trace_z_erofs_map_blocks_iter_enter(inode, map, flags);
@@ -395,10 +460,10 @@ int z_erofs_map_blocks_iter(struct inode *inode,

lclusterbits = vi->z_logical_clusterbits;
ofs = map->m_la;
- m.lcn = ofs >> lclusterbits;
+ initial_lcn = ofs >> lclusterbits;
endoff = ofs & ((1 << lclusterbits) - 1);

- err = z_erofs_load_cluster_from_disk(&m, m.lcn);
+ err = z_erofs_load_cluster_from_disk(&m, initial_lcn);
if (err)
goto unmap_out;

@@ -442,10 +507,12 @@ int z_erofs_map_blocks_iter(struct inode *inode,
}

map->m_llen = end - map->m_la;
- map->m_plen = 1 << lclusterbits;
map->m_pa = blknr_to_addr(m.pblk);
map->m_flags |= EROFS_MAP_MAPPED;

+ err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
+ if (err)
+ goto out;
unmap_out:
if (m.kaddr)
kunmap_atomic(m.kaddr);
--
2.20.1

2021-04-07 18:22:48

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 09/10] erofs: support decompress big pcluster for lz4 backend

From: Gao Xiang <[email protected]>

Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,

- (maptype 0) if there is only one compressed page + no need
to copy inplace I/O, just map it directly what we did before;

- (maptype 1) if there are more compressed pages + no need to
copy inplace I/O, vmap such compressed pages instead;

- (maptype 2) if inplace I/O needs to be copied, use per-CPU
buffers for decompression then.

Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.

Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/decompressor.c | 217 +++++++++++++++++++++++-----------------
fs/erofs/internal.h | 15 +++
2 files changed, 138 insertions(+), 94 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 900de4725d35..ff15461b389d 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -120,44 +120,84 @@ static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
return kaddr ? 1 : 0;
}

-static void *generic_copy_inplace_data(struct z_erofs_decompress_req *rq,
- u8 *src, unsigned int pageofs_in)
+static void *z_erofs_handle_inplace_io(struct z_erofs_decompress_req *rq,
+ void *inpage, unsigned int *inputmargin, int *maptype,
+ bool support_0padding)
{
- /*
- * if in-place decompression is ongoing, those decompressed
- * pages should be copied in order to avoid being overlapped.
- */
- struct page **in = rq->in;
- u8 *const tmp = erofs_get_pcpubuf(1);
- u8 *tmpp = tmp;
- unsigned int inlen = rq->inputsize - pageofs_in;
- unsigned int count = min_t(uint, inlen, PAGE_SIZE - pageofs_in);
-
- while (tmpp < tmp + inlen) {
- if (!src)
- src = kmap_atomic(*in);
- memcpy(tmpp, src + pageofs_in, count);
- kunmap_atomic(src);
- src = NULL;
- tmpp += count;
- pageofs_in = 0;
- count = PAGE_SIZE;
+ unsigned int nrpages_in, nrpages_out;
+ unsigned int ofull, oend, inputsize, total, i, j;
+ struct page **in;
+ void *src, *tmp;
+
+ inputsize = rq->inputsize;
+ nrpages_in = PAGE_ALIGN(inputsize) >> PAGE_SHIFT;
+ oend = rq->pageofs_out + rq->outputsize;
+ ofull = PAGE_ALIGN(oend);
+ nrpages_out = ofull >> PAGE_SHIFT;
+
+ if (rq->inplace_io) {
+ if (rq->partial_decoding || !support_0padding ||
+ ofull - oend < LZ4_DECOMPRESS_INPLACE_MARGIN(inputsize))
+ goto docopy;
+
+ for (i = 0; i < nrpages_in; ++i) {
+ DBG_BUGON(rq->in[i] == NULL);
+ for (j = 0; j < nrpages_out - nrpages_in + i; ++j)
+ if (rq->out[j] == rq->in[i])
+ goto docopy;
+ }
+ }
+
+ if (nrpages_in <= 1) {
+ *maptype = 0;
+ return inpage;
+ }
+ kunmap_atomic(inpage);
+ might_sleep();
+ src = erofs_vm_map_ram(rq->in, nrpages_in);
+ if (!src)
+ return ERR_PTR(-ENOMEM);
+ *maptype = 1;
+ return src;
+
+docopy:
+ /* Or copy compressed data which can be overlapped to per-CPU buffer */
+ in = rq->in;
+ src = erofs_get_pcpubuf(nrpages_in);
+ if (!src) {
+ DBG_BUGON(1);
+ return ERR_PTR(-EFAULT);
+ }
+
+ tmp = src;
+ total = rq->inputsize;
+ while (total) {
+ unsigned int page_copycnt =
+ min_t(unsigned int, total, PAGE_SIZE - *inputmargin);
+
+ if (!inpage)
+ inpage = kmap_atomic(*in);
+ memcpy(tmp, inpage + *inputmargin, page_copycnt);
+ kunmap_atomic(inpage);
+ inpage = NULL;
+ tmp += page_copycnt;
+ total -= page_copycnt;
++in;
+ *inputmargin = 0;
}
- return tmp;
+ *maptype = 2;
+ return src;
}

static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
{
- unsigned int inputmargin, inlen;
- u8 *src;
- bool copied, support_0padding;
- int ret;
-
- if (rq->inputsize > PAGE_SIZE)
- return -EOPNOTSUPP;
+ unsigned int inputmargin;
+ u8 *headpage, *src;
+ bool support_0padding;
+ int ret, maptype;

- src = kmap_atomic(*rq->in);
+ DBG_BUGON(*rq->in == NULL);
+ headpage = kmap_atomic(*rq->in);
inputmargin = 0;
support_0padding = false;

@@ -165,50 +205,39 @@ static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
if (erofs_sb_has_lz4_0padding(EROFS_SB(rq->sb))) {
support_0padding = true;

- while (!src[inputmargin & ~PAGE_MASK])
+ while (!headpage[inputmargin & ~PAGE_MASK])
if (!(++inputmargin & ~PAGE_MASK))
break;

if (inputmargin >= rq->inputsize) {
- kunmap_atomic(src);
+ kunmap_atomic(headpage);
return -EIO;
}
}

- copied = false;
- inlen = rq->inputsize - inputmargin;
- if (rq->inplace_io) {
- const uint oend = (rq->pageofs_out +
- rq->outputsize) & ~PAGE_MASK;
- const uint nr = PAGE_ALIGN(rq->pageofs_out +
- rq->outputsize) >> PAGE_SHIFT;
-
- if (rq->partial_decoding || !support_0padding ||
- rq->out[nr - 1] != rq->in[0] ||
- rq->inputsize - oend <
- LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) {
- src = generic_copy_inplace_data(rq, src, inputmargin);
- inputmargin = 0;
- copied = true;
- }
+ rq->inputsize -= inputmargin;
+ src = z_erofs_handle_inplace_io(rq, headpage, &inputmargin, &maptype,
+ support_0padding);
+ if (IS_ERR(src)) {
+ kunmap_atomic(headpage);
+ return PTR_ERR(src);
}

/* legacy format could compress extra data in a pcluster. */
if (rq->partial_decoding || !support_0padding)
ret = LZ4_decompress_safe_partial(src + inputmargin, out,
- inlen, rq->outputsize,
- rq->outputsize);
+ rq->inputsize, rq->outputsize, rq->outputsize);
else
ret = LZ4_decompress_safe(src + inputmargin, out,
- inlen, rq->outputsize);
+ rq->inputsize, rq->outputsize);

if (ret != rq->outputsize) {
erofs_err(rq->sb, "failed to decompress %d in[%u, %u] out[%u]",
- ret, inlen, inputmargin, rq->outputsize);
+ ret, rq->inputsize, inputmargin, rq->outputsize);

WARN_ON(1);
print_hex_dump(KERN_DEBUG, "[ in]: ", DUMP_PREFIX_OFFSET,
- 16, 1, src + inputmargin, inlen, true);
+ 16, 1, src + inputmargin, rq->inputsize, true);
print_hex_dump(KERN_DEBUG, "[out]: ", DUMP_PREFIX_OFFSET,
16, 1, out, rq->outputsize, true);

@@ -217,10 +246,16 @@ static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
ret = -EIO;
}

- if (copied)
- erofs_put_pcpubuf(src);
- else
+ if (maptype == 0) {
kunmap_atomic(src);
+ } else if (maptype == 1) {
+ vm_unmap_ram(src, PAGE_ALIGN(rq->inputsize) >> PAGE_SHIFT);
+ } else if (maptype == 2) {
+ erofs_put_pcpubuf(src);
+ } else {
+ DBG_BUGON(1);
+ return -EFAULT;
+ }
return ret;
}

@@ -270,57 +305,51 @@ static int z_erofs_decompress_generic(struct z_erofs_decompress_req *rq,
const struct z_erofs_decompressor *alg = decompressors + rq->alg;
unsigned int dst_maptype;
void *dst;
- int ret, i;
+ int ret;

- if (nrpages_out == 1 && !rq->inplace_io) {
- DBG_BUGON(!*rq->out);
- dst = kmap_atomic(*rq->out);
- dst_maptype = 0;
- goto dstmap_out;
- }
+ /* two optimized fast paths only for non bigpcluster cases yet */
+ if (rq->inputsize <= PAGE_SIZE) {
+ if (nrpages_out == 1 && !rq->inplace_io) {
+ DBG_BUGON(!*rq->out);
+ dst = kmap_atomic(*rq->out);
+ dst_maptype = 0;
+ goto dstmap_out;
+ }

- /*
- * For the case of small output size (especially much less
- * than PAGE_SIZE), memcpy the decompressed data rather than
- * compressed data is preferred.
- */
- if (rq->outputsize <= PAGE_SIZE * 7 / 8) {
- dst = erofs_get_pcpubuf(1);
- if (IS_ERR(dst))
- return PTR_ERR(dst);
-
- rq->inplace_io = false;
- ret = alg->decompress(rq, dst);
- if (!ret)
- copy_from_pcpubuf(rq->out, dst, rq->pageofs_out,
- rq->outputsize);
-
- erofs_put_pcpubuf(dst);
- return ret;
+ /*
+ * For the case of small output size (especially much less
+ * than PAGE_SIZE), memcpy the decompressed data rather than
+ * compressed data is preferred.
+ */
+ if (rq->outputsize <= PAGE_SIZE * 7 / 8) {
+ dst = erofs_get_pcpubuf(1);
+ if (IS_ERR(dst))
+ return PTR_ERR(dst);
+
+ rq->inplace_io = false;
+ ret = alg->decompress(rq, dst);
+ if (!ret)
+ copy_from_pcpubuf(rq->out, dst, rq->pageofs_out,
+ rq->outputsize);
+
+ erofs_put_pcpubuf(dst);
+ return ret;
+ }
}

+ /* general decoding path which can be used for all cases */
ret = alg->prepare_destpages(rq, pagepool);
- if (ret < 0) {
+ if (ret < 0)
return ret;
- } else if (ret) {
+ if (ret) {
dst = page_address(*rq->out);
dst_maptype = 1;
goto dstmap_out;
}

- i = 0;
- while (1) {
- dst = vm_map_ram(rq->out, nrpages_out, -1);
-
- /* retry two more times (totally 3 times) */
- if (dst || ++i >= 3)
- break;
- vm_unmap_aliases();
- }
-
+ dst = erofs_vm_map_ram(rq->out, nrpages_out);
if (!dst)
return -ENOMEM;
-
dst_maptype = 2;

dstmap_out:
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f1305af50f67..cb48f9dc50d5 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -402,6 +402,21 @@ int erofs_namei(struct inode *dir, struct qstr *name,
/* dir.c */
extern const struct file_operations erofs_dir_fops;

+static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count)
+{
+ int retried = 0;
+
+ while (1) {
+ void *p = vm_map_ram(pages, count, -1);
+
+ /* retry two more times (totally 3 times) */
+ if (p || ++retried >= 3)
+ return p;
+ vm_unmap_aliases();
+ }
+ return NULL;
+}
+
/* pcpubuf.c */
void *erofs_get_pcpubuf(unsigned int requiredpages);
void erofs_put_pcpubuf(void *ptr);
--
2.20.1

2021-04-07 18:23:21

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 10/10] erofs: enable big pcluster feature

From: Gao Xiang <[email protected]>

Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
all settled properly.

Acked-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/erofs_fs.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index ecc3a0ea0bc4..8739d3adf51f 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -20,7 +20,10 @@
#define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001
#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
-#define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING
+#define EROFS_ALL_FEATURE_INCOMPAT \
+ (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
+ EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
+ EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER)

#define EROFS_SB_EXTSLOT_SIZE 16

--
2.20.1