2021-10-07 17:08:58

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 0/3] erofs: some decompression improvements

Hi folks,

This patchset is mainly intended for the upcoming LZMA preparation,
but they still have some benefits to the exist LZ4 decompression.

The first patch looks up compression algorithms on mapping instead
of in the decompression frontend, which is used for the rest patches.

The second patch introduces another compression HEAD (HEAD2) so that
each file can be compressed with two different algorithms at most,
which can be used for the upcoming LZMA compression and LZ4 range
dictionary compression for various data patterns.

The third patch introduces a new readmore decompression strategy to
avoid partial decompression for large big pcluster. It resolves the
randread issue mentioned in the original big pcluster patchset [1]:

randread
Kernel: 5.15.0-rc2+
pclustersize Vanilla Patched
4096 54.6 MiB/s 54.0 MiB/s
16384 117.4 MiB/s 143.8 MiB/s
32768 113.6 MiB/s 199.6 MiB/s
65536 72.8 MiB/s 236.4 MiB/s

The latest version can also be fetched from
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/readmore

[1] https://lore.kernel.org/r/[email protected]

Thanks,
Gao Xiang

Gao Xiang (3):
erofs: get compression algorithms directly on mapping
erofs: introduce the secondary compression head
erofs: introduce readmore decompression strategy

fs/erofs/compress.h | 5 --
fs/erofs/erofs_fs.h | 8 ++-
fs/erofs/internal.h | 28 ++++++++-
fs/erofs/zdata.c | 106 ++++++++++++++++++++++++++---------
fs/erofs/zmap.c | 57 ++++++++++++-------
include/trace/events/erofs.h | 2 +-
6 files changed, 150 insertions(+), 56 deletions(-)

--
2.20.1


2021-10-07 19:43:12

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 1/3] erofs: get compression algorithms directly on mapping

From: Gao Xiang <[email protected]>

Currently, z_erofs_map_blocks returns whether extents are compressed
or not, and the decompression frontend gets the specific algorithms
then.

It works but not quite well in many aspests, for example:
- The decompression frontend has to deal with whether extents are
compressed or not again and lookup the algorithms if compressed.
It's duplicated and too detailed about the on-disk mapping.

- A new secondary compression head will be introduced later so that
each file can have 2 compression algorithms at most for different
type of data. It could increase the complexity of the decompression
frontend if still handled in this way;

- A new readmore decompression strategy will be introduced to get
better performance for much bigger pcluster and lzma, which needs
the specific algorithm in advance as well.

Let's look up compression algorithms directly in z_erofs_map_blocks()
instead.

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/compress.h | 5 -----
fs/erofs/internal.h | 12 +++++++++---
fs/erofs/zdata.c | 12 ++++++------
fs/erofs/zmap.c | 19 ++++++++++---------
include/trace/events/erofs.h | 2 +-
5 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 3701c72bacb2..ad62d1b4d371 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -8,11 +8,6 @@

#include "internal.h"

-enum {
- Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
- Z_EROFS_COMPRESSION_RUNTIME_MAX
-};
-
struct z_erofs_decompress_req {
struct super_block *sb;
struct page **in, **out;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9524e155b38f..48bfc6eb2b02 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
* of the corresponding uncompressed data in the file.
*/
enum {
- BH_Zipped = BH_PrivateStart,
+ BH_Encoded = BH_PrivateStart,
BH_FullMapped,
};

@@ -346,8 +346,8 @@ enum {
#define EROFS_MAP_MAPPED (1 << BH_Mapped)
/* Located in metadata (could be copied from bd_inode) */
#define EROFS_MAP_META (1 << BH_Meta)
-/* The extent has been compressed */
-#define EROFS_MAP_ZIPPED (1 << BH_Zipped)
+/* The extent is encoded */
+#define EROFS_MAP_ENCODED (1 << BH_Encoded)
/* The length of extent is full */
#define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)

@@ -355,6 +355,7 @@ struct erofs_map_blocks {
erofs_off_t m_pa, m_la;
u64 m_plen, m_llen;

+ char m_algorithmformat;
unsigned int m_flags;

struct page *mpage;
@@ -368,6 +369,11 @@ struct erofs_map_blocks {
*/
#define EROFS_GET_BLOCKS_FIEMAP 0x0002

+enum {
+ Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+ Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
/* zmap.c */
extern const struct iomap_ops z_erofs_iomap_report_ops;

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..5c34ef66677f 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
struct erofs_workgroup *grp;
int err;

+ if (!(map->m_flags & EROFS_MAP_ENCODED)) {
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+ }
+
/* no available pcluster, let's allocate one */
pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
if (IS_ERR(pcl))
@@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,

atomic_set(&pcl->obj.refcount, 1);
pcl->obj.index = map->m_pa >> PAGE_SHIFT;
-
+ pcl->algorithmformat = map->m_algorithmformat;
pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
(map->m_flags & EROFS_MAP_FULL_MAPPED ?
Z_EROFS_PCLUSTER_FULL_LENGTH : 0);

- if (map->m_flags & EROFS_MAP_ZIPPED)
- pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
- else
- pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
-
/* new pclusters should be claimed as type 1, primary and followed */
pcl->next = clt->owned_head;
clt->mode = COLLECT_PRIMARY_FOLLOWED;
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7a6df35fdc91..9d9c26343dab 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -111,7 +111,7 @@ struct z_erofs_maprecorder {

unsigned long lcn;
/* compression extent information gathered */
- u8 type;
+ u8 type, headtype;
u16 clusterofs;
u16 delta[2];
erofs_blk_t pblk, compressedlcs;
@@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- map->m_flags &= ~EROFS_MAP_ZIPPED;
- fallthrough;
case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
default:
@@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
- if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
+ if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
map->m_plen = 1 << lclusterbits;
return 0;
@@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
if (err)
goto unmap_out;

- map->m_flags = EROFS_MAP_ZIPPED; /* by default, compressed */
end = (m.lcn + 1ULL) << lclusterbits;

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- if (endoff >= m.clusterofs)
- map->m_flags &= ~EROFS_MAP_ZIPPED;
- fallthrough;
case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
if (endoff >= m.clusterofs) {
+ m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
break;
}
@@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,

map->m_llen = end - map->m_la;
map->m_pa = blknr_to_addr(m.pblk);
- map->m_flags |= EROFS_MAP_MAPPED;
+ map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;

err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
if (err)
goto out;

+ if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
+ map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+ else
+ map->m_algorithmformat = vi->z_algorithmtype[0];
+
if (flags & EROFS_GET_BLOCKS_FIEMAP) {
err = z_erofs_get_extent_decompressedlen(&m);
if (!err)
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index db4f2cec8360..16ae7b666810 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -24,7 +24,7 @@ struct erofs_map_blocks;
#define show_mflags(flags) __print_flags(flags, "", \
{ EROFS_MAP_MAPPED, "M" }, \
{ EROFS_MAP_META, "I" }, \
- { EROFS_MAP_ZIPPED, "Z" })
+ { EROFS_MAP_ENCODED, "E" })

TRACE_EVENT(erofs_lookup,

--
2.20.1

2021-10-07 19:43:27

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 3/3] erofs: introduce readmore decompression strategy

From: Gao Xiang <[email protected]>

Previously, the requested read length is strictly followed by EROFS
decompression strategy. However, it's quite inefficient to apply
partial decompression if non-negligible data in big pclusters needs
to be handled, especially for the upcoming LZMA algorithm.

Let's decompress more for the cases above as what other fses did.

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/internal.h | 16 ++++++++
fs/erofs/zdata.c | 94 ++++++++++++++++++++++++++++++++++++---------
fs/erofs/zmap.c | 4 +-
3 files changed, 94 insertions(+), 20 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 48bfc6eb2b02..e7378795a26c 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
EROFS_I_DATALAYOUT_BITS);
}

+/*
+ * Different from grab_cache_page_nowait(), reclaiming is never triggered
+ * when allocating new pages.
+ */
+static inline
+struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
+ pgoff_t index)
+{
+ return pagecache_get_page(mapping, index,
+ FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
+ readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
+}
+
extern const struct super_operations erofs_sops;

extern const struct address_space_operations erofs_raw_access_aops;
@@ -368,6 +381,8 @@ struct erofs_map_blocks {
* approach instead if possible since it's more metadata lightweight.)
*/
#define EROFS_GET_BLOCKS_FIEMAP 0x0002
+/* Used to map the whole extent if non-negligible data is already requested */
+#define EROFS_GET_BLOCKS_READMORE 0x0004

enum {
Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
@@ -375,6 +390,7 @@ enum {
};

/* zmap.c */
+#define Z_EROFS_LZ4_READMORE_THRESHOLD (9 * EROFS_BLKSIZ)
extern const struct iomap_ops z_erofs_iomap_report_ops;

#ifdef CONFIG_EROFS_FS_ZIP
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 5c34ef66677f..a1861a9acfd0 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1377,6 +1377,67 @@ static void z_erofs_runqueue(struct super_block *sb,
z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
}

+static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
+ struct readahead_control *rac,
+ erofs_off_t end,
+ struct list_head *pagepool,
+ bool backmost)
+{
+ struct inode *const inode = f->inode;
+ struct erofs_map_blocks *const map = &f->map;
+ erofs_off_t cur;
+ int err;
+
+ if (backmost) {
+ map->m_la = end;
+ err = z_erofs_map_blocks_iter(inode, map,
+ EROFS_GET_BLOCKS_READMORE);
+ if (err)
+ return;
+ end = round_up(end, PAGE_SIZE);
+
+ /* expend ra for the trailing edge if readahead */
+ if (rac) {
+ loff_t newstart = readahead_pos(rac);
+
+ readahead_expand(rac, newstart, end - newstart);
+ return;
+ }
+ } else {
+ end = round_up(map->m_la, PAGE_SIZE);
+
+ if (!map->m_llen)
+ return;
+ }
+
+ cur = map->m_la + map->m_llen - 1;
+ while (cur >= end) {
+ pgoff_t index = cur >> PAGE_SHIFT;
+ struct page *page;
+
+ page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
+ if (!page)
+ goto skip;
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ put_page(page);
+ goto skip;
+ }
+
+ err = z_erofs_do_read_page(f, page, pagepool);
+ if (err)
+ erofs_err(inode->i_sb,
+ "readmore error at page %lu @ nid %llu",
+ index, EROFS_I(inode)->nid);
+ put_page(page);
+skip:
+ if (cur < PAGE_SIZE)
+ break;
+ cur = (index << PAGE_SHIFT) - 1;
+ }
+}
+
static int z_erofs_readpage(struct file *file, struct page *page)
{
struct inode *const inode = page->mapping->host;
@@ -1385,10 +1446,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
LIST_HEAD(pagepool);

trace_erofs_readpage(page, false);
-
f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;

+ z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
+ &pagepool, true);
err = z_erofs_do_read_page(&f, page, &pagepool);
+ z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
+
(void)z_erofs_collector_end(&f.clt);

/* if some compressed cluster ready, need submit them anyway */
@@ -1409,29 +1473,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
{
struct inode *const inode = rac->mapping->host;
struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
-
- unsigned int nr_pages = readahead_count(rac);
- bool sync = (sbi->ctx.readahead_sync_decompress &&
- nr_pages <= sbi->ctx.max_sync_decompress_pages);
struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
struct page *page, *head = NULL;
+ unsigned int nr_pages;
LIST_HEAD(pagepool);

- trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
-
f.readahead = true;
f.headoffset = readahead_pos(rac);

- while ((page = readahead_page(rac))) {
- prefetchw(&page->flags);
-
- /*
- * A pure asynchronous readahead is indicated if
- * a PG_readahead marked page is hitted at first.
- * Let's also do asynchronous decompression for this case.
- */
- sync &= !(PageReadahead(page) && !head);
+ z_erofs_pcluster_readmore(&f, rac, f.headoffset +
+ readahead_length(rac) - 1, &pagepool, true);
+ nr_pages = readahead_count(rac);
+ trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);

+ while ((page = readahead_page(rac))) {
set_page_private(page, (unsigned long)head);
head = page;
}
@@ -1450,11 +1505,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
page->index, EROFS_I(inode)->nid);
put_page(page);
}
-
+ z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
(void)z_erofs_collector_end(&f.clt);

- z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
-
+ z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+ sbi->ctx.readahead_sync_decompress &&
+ nr_pages <= sbi->ctx.max_sync_decompress_pages);
if (f.map.mpage)
put_page(f.map.mpage);

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index a61cc7f55ef0..7f42a1c8a338 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -673,7 +673,9 @@ int z_erofs_map_blocks_iter(struct inode *inode,
else
map->m_algorithmformat = vi->z_algorithmtype[0];

- if (flags & EROFS_GET_BLOCKS_FIEMAP) {
+ if (flags & EROFS_GET_BLOCKS_FIEMAP ||
+ ((flags & EROFS_GET_BLOCKS_READMORE) &&
+ map->m_llen >= Z_EROFS_LZ4_READMORE_THRESHOLD)) {
err = z_erofs_get_extent_decompressedlen(&m);
if (!err)
map->m_flags |= EROFS_MAP_FULL_MAPPED;
--
2.20.1

2021-10-07 20:49:18

by Gao Xiang

[permalink] [raw]
Subject: [PATCH 2/3] erofs: introduce the secondary compression head

From: Gao Xiang <[email protected]>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/erofs_fs.h | 8 +++++---
fs/erofs/zmap.c | 34 +++++++++++++++++++++++++---------
2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
#define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
#define EROFS_ALL_FEATURE_INCOMPAT \
(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
- EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+ EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+ EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)

#define EROFS_SB_EXTSLOT_SIZE 16

@@ -314,9 +316,9 @@ struct z_erofs_map_header {
*/
enum {
Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
- Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
- Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
Z_EROFS_VLE_CLUSTER_TYPE_MAX
};

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..a61cc7f55ef0 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;

if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
- erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
+ erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
vi->z_algorithmtype[0], vi->nid);
err = -EOPNOTSUPP;
goto unmap_done;
}
+ if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
+ erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
+ vi->z_algorithmtype[1], vi->nid);
+ err = -EOPNOTSUPP;
+ goto unmap_done;
+ }

vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
@@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
break;
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->clusterofs = le16_to_cpu(di->di_clusterofs);
m->pblk = le32_to_cpu(di->di_u.blkaddr);
break;
@@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
@@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
int err;

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
- m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+ DBG_BUGON(m->type != m->headtype);
+
if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
map->m_plen = 1 << lclusterbits;
return 0;
}
-
lcn = m->lcn + 1;
if (m->compressedlcs)
goto out;
@@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

switch (m->type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
/*
* if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
* rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
DBG_BUGON(!m->delta[1] &&
m->clusterofs != 1 << lclusterbits);
} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
/* go on until the next HEAD lcluster */
if (lcn != headlcn)
break;
@@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
if (endoff >= m.clusterofs) {
m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
--
2.20.1

2021-10-08 03:03:32

by Yue Hu

[permalink] [raw]
Subject: Re: [PATCH 1/3] erofs: get compression algorithms directly on mapping

On Fri, 8 Oct 2021 01:06:03 +0800
Gao Xiang <[email protected]> wrote:

> From: Gao Xiang <[email protected]>
>
> Currently, z_erofs_map_blocks returns whether extents are compressed

Mapping function is z_erofs_map_blocks_iter()?

> or not, and the decompression frontend gets the specific algorithms
> then.
>
> It works but not quite well in many aspests, for example:
> - The decompression frontend has to deal with whether extents are
> compressed or not again and lookup the algorithms if compressed.
> It's duplicated and too detailed about the on-disk mapping.
>
> - A new secondary compression head will be introduced later so that
> each file can have 2 compression algorithms at most for different
> type of data. It could increase the complexity of the decompression
> frontend if still handled in this way;
>
> - A new readmore decompression strategy will be introduced to get
> better performance for much bigger pcluster and lzma, which needs
> the specific algorithm in advance as well.
>
> Let's look up compression algorithms directly in z_erofs_map_blocks()

Same mapping function name mistake?

> instead.
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> fs/erofs/compress.h | 5 -----
> fs/erofs/internal.h | 12 +++++++++---
> fs/erofs/zdata.c | 12 ++++++------
> fs/erofs/zmap.c | 19 ++++++++++---------
> include/trace/events/erofs.h | 2 +-
> 5 files changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
> index 3701c72bacb2..ad62d1b4d371 100644
> --- a/fs/erofs/compress.h
> +++ b/fs/erofs/compress.h
> @@ -8,11 +8,6 @@
>
> #include "internal.h"
>
> -enum {
> - Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> - Z_EROFS_COMPRESSION_RUNTIME_MAX
> -};
> -
> struct z_erofs_decompress_req {
> struct super_block *sb;
> struct page **in, **out;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 9524e155b38f..48bfc6eb2b02 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
> * of the corresponding uncompressed data in the file.
> */
> enum {
> - BH_Zipped = BH_PrivateStart,
> + BH_Encoded = BH_PrivateStart,
> BH_FullMapped,
> };
>
> @@ -346,8 +346,8 @@ enum {
> #define EROFS_MAP_MAPPED (1 << BH_Mapped)
> /* Located in metadata (could be copied from bd_inode) */
> #define EROFS_MAP_META (1 << BH_Meta)
> -/* The extent has been compressed */
> -#define EROFS_MAP_ZIPPED (1 << BH_Zipped)
> +/* The extent is encoded */
> +#define EROFS_MAP_ENCODED (1 << BH_Encoded)
> /* The length of extent is full */
> #define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)
>
> @@ -355,6 +355,7 @@ struct erofs_map_blocks {
> erofs_off_t m_pa, m_la;
> u64 m_plen, m_llen;
>
> + char m_algorithmformat;
> unsigned int m_flags;
>
> struct page *mpage;
> @@ -368,6 +369,11 @@ struct erofs_map_blocks {
> */
> #define EROFS_GET_BLOCKS_FIEMAP 0x0002
>
> +enum {
> + Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> + Z_EROFS_COMPRESSION_RUNTIME_MAX
> +};
> +
> /* zmap.c */
> extern const struct iomap_ops z_erofs_iomap_report_ops;
>
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 11c7a1aaebad..5c34ef66677f 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
> struct erofs_workgroup *grp;
> int err;
>
> + if (!(map->m_flags & EROFS_MAP_ENCODED)) {
> + DBG_BUGON(1);
> + return -EFSCORRUPTED;
> + }
> +
> /* no available pcluster, let's allocate one */
> pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
> if (IS_ERR(pcl))
> @@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>
> atomic_set(&pcl->obj.refcount, 1);
> pcl->obj.index = map->m_pa >> PAGE_SHIFT;
> -
> + pcl->algorithmformat = map->m_algorithmformat;
> pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
> (map->m_flags & EROFS_MAP_FULL_MAPPED ?
> Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
>
> - if (map->m_flags & EROFS_MAP_ZIPPED)
> - pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
> - else
> - pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> -
> /* new pclusters should be claimed as type 1, primary and followed */
> pcl->next = clt->owned_head;
> clt->mode = COLLECT_PRIMARY_FOLLOWED;
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 7a6df35fdc91..9d9c26343dab 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
>
> unsigned long lcn;
> /* compression extent information gathered */
> - u8 type;
> + u8 type, headtype;
> u16 clusterofs;
> u16 delta[2];
> erofs_blk_t pblk, compressedlcs;
> @@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> }
> return z_erofs_extent_lookback(m, m->delta[0]);
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - map->m_flags &= ~EROFS_MAP_ZIPPED;
> - fallthrough;
> case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + m->headtype = m->type;
> map->m_la = (lcn << lclusterbits) | m->clusterofs;
> break;
> default:
> @@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>
> DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> - if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
> + if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> map->m_plen = 1 << lclusterbits;
> return 0;
> @@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
> if (err)
> goto unmap_out;
>
> - map->m_flags = EROFS_MAP_ZIPPED; /* by default, compressed */
> end = (m.lcn + 1ULL) << lclusterbits;
>
> switch (m.type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - if (endoff >= m.clusterofs)
> - map->m_flags &= ~EROFS_MAP_ZIPPED;
> - fallthrough;
> case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> if (endoff >= m.clusterofs) {
> + m.headtype = m.type;
> map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> break;
> }
> @@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> map->m_llen = end - map->m_la;
> map->m_pa = blknr_to_addr(m.pblk);
> - map->m_flags |= EROFS_MAP_MAPPED;
> + map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;

Seems the new flag EROFS_MAP_ENCODED may be a little redundant?

It is used only for checking whether the file system is corrupted or not when
entering z_erofs_register_collection().

And before collection begins, we will check whether the EROFS_MAP_MAPPED is set
or not. We will do the collection only if EROFS_MAP_MAPPED is set. That's to say,
the above checking to EROFS_MAP_ENCODED should be not possible?

Thanks.

>
> err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
> if (err)
> goto out;
>
> + if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> + map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> + else
> + map->m_algorithmformat = vi->z_algorithmtype[0];
> +
> if (flags & EROFS_GET_BLOCKS_FIEMAP) {
> err = z_erofs_get_extent_decompressedlen(&m);
> if (!err)
> diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
> index db4f2cec8360..16ae7b666810 100644
> --- a/include/trace/events/erofs.h
> +++ b/include/trace/events/erofs.h
> @@ -24,7 +24,7 @@ struct erofs_map_blocks;
> #define show_mflags(flags) __print_flags(flags, "", \
> { EROFS_MAP_MAPPED, "M" }, \
> { EROFS_MAP_META, "I" }, \
> - { EROFS_MAP_ZIPPED, "Z" })
> + { EROFS_MAP_ENCODED, "E" })
>
> TRACE_EVENT(erofs_lookup,
>

2021-10-08 04:23:48

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH 1/3] erofs: get compression algorithms directly on mapping

Hi Yue,

On Fri, Oct 08, 2021 at 11:00:29AM +0800, Yue Hu wrote:
> On Fri, 8 Oct 2021 01:06:03 +0800
> Gao Xiang <[email protected]> wrote:
>
> > From: Gao Xiang <[email protected]>
> >
> > Currently, z_erofs_map_blocks returns whether extents are compressed
>
> Mapping function is z_erofs_map_blocks_iter()?

I just meant the compression mapping... I'm fine with the update..

>
> > or not, and the decompression frontend gets the specific algorithms
> > then.
> >
> > It works but not quite well in many aspests, for example:
> > - The decompression frontend has to deal with whether extents are
> > compressed or not again and lookup the algorithms if compressed.
> > It's duplicated and too detailed about the on-disk mapping.
> >
> > - A new secondary compression head will be introduced later so that
> > each file can have 2 compression algorithms at most for different
> > type of data. It could increase the complexity of the decompression
> > frontend if still handled in this way;
> >
> > - A new readmore decompression strategy will be introduced to get
> > better performance for much bigger pcluster and lzma, which needs
> > the specific algorithm in advance as well.
> >
> > Let's look up compression algorithms directly in z_erofs_map_blocks()
>
> Same mapping function name mistake?

Ok, will update as well.

>
> > instead.
> >
> > Signed-off-by: Gao Xiang <[email protected]>
> > ---
> > fs/erofs/compress.h | 5 -----
> > fs/erofs/internal.h | 12 +++++++++---
> > fs/erofs/zdata.c | 12 ++++++------
> > fs/erofs/zmap.c | 19 ++++++++++---------
> > include/trace/events/erofs.h | 2 +-
> > 5 files changed, 26 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
> > index 3701c72bacb2..ad62d1b4d371 100644
> > --- a/fs/erofs/compress.h
> > +++ b/fs/erofs/compress.h
> > @@ -8,11 +8,6 @@
> >
> > #include "internal.h"
> >
> > -enum {
> > - Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> > - Z_EROFS_COMPRESSION_RUNTIME_MAX
> > -};
> > -
> > struct z_erofs_decompress_req {
> > struct super_block *sb;
> > struct page **in, **out;
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 9524e155b38f..48bfc6eb2b02 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
> > * of the corresponding uncompressed data in the file.
> > */
> > enum {
> > - BH_Zipped = BH_PrivateStart,
> > + BH_Encoded = BH_PrivateStart,
> > BH_FullMapped,
> > };
> >
> > @@ -346,8 +346,8 @@ enum {
> > #define EROFS_MAP_MAPPED (1 << BH_Mapped)
> > /* Located in metadata (could be copied from bd_inode) */
> > #define EROFS_MAP_META (1 << BH_Meta)
> > -/* The extent has been compressed */
> > -#define EROFS_MAP_ZIPPED (1 << BH_Zipped)
> > +/* The extent is encoded */
> > +#define EROFS_MAP_ENCODED (1 << BH_Encoded)
> > /* The length of extent is full */
> > #define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)
> >
> > @@ -355,6 +355,7 @@ struct erofs_map_blocks {
> > erofs_off_t m_pa, m_la;
> > u64 m_plen, m_llen;
> >
> > + char m_algorithmformat;
> > unsigned int m_flags;
> >
> > struct page *mpage;
> > @@ -368,6 +369,11 @@ struct erofs_map_blocks {
> > */
> > #define EROFS_GET_BLOCKS_FIEMAP 0x0002
> >
> > +enum {
> > + Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> > + Z_EROFS_COMPRESSION_RUNTIME_MAX
> > +};
> > +
> > /* zmap.c */
> > extern const struct iomap_ops z_erofs_iomap_report_ops;
> >
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 11c7a1aaebad..5c34ef66677f 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
> > struct erofs_workgroup *grp;
> > int err;
> >
> > + if (!(map->m_flags & EROFS_MAP_ENCODED)) {
> > + DBG_BUGON(1);
> > + return -EFSCORRUPTED;
> > + }
> > +
> > /* no available pcluster, let's allocate one */
> > pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
> > if (IS_ERR(pcl))
> > @@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
> >
> > atomic_set(&pcl->obj.refcount, 1);
> > pcl->obj.index = map->m_pa >> PAGE_SHIFT;
> > -
> > + pcl->algorithmformat = map->m_algorithmformat;
> > pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
> > (map->m_flags & EROFS_MAP_FULL_MAPPED ?
> > Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
> >
> > - if (map->m_flags & EROFS_MAP_ZIPPED)
> > - pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
> > - else
> > - pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> > -
> > /* new pclusters should be claimed as type 1, primary and followed */
> > pcl->next = clt->owned_head;
> > clt->mode = COLLECT_PRIMARY_FOLLOWED;
> > diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> > index 7a6df35fdc91..9d9c26343dab 100644
> > --- a/fs/erofs/zmap.c
> > +++ b/fs/erofs/zmap.c
> > @@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
> >
> > unsigned long lcn;
> > /* compression extent information gathered */
> > - u8 type;
> > + u8 type, headtype;
> > u16 clusterofs;
> > u16 delta[2];
> > erofs_blk_t pblk, compressedlcs;
> > @@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> > }
> > return z_erofs_extent_lookback(m, m->delta[0]);
> > case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> > - map->m_flags &= ~EROFS_MAP_ZIPPED;
> > - fallthrough;
> > case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> > + m->headtype = m->type;
> > map->m_la = (lcn << lclusterbits) | m->clusterofs;
> > break;
> > default:
> > @@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
> >
> > DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> > m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> > - if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
> > + if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> > !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> > map->m_plen = 1 << lclusterbits;
> > return 0;
> > @@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
> > if (err)
> > goto unmap_out;
> >
> > - map->m_flags = EROFS_MAP_ZIPPED; /* by default, compressed */
> > end = (m.lcn + 1ULL) << lclusterbits;
> >
> > switch (m.type) {
> > case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> > - if (endoff >= m.clusterofs)
> > - map->m_flags &= ~EROFS_MAP_ZIPPED;
> > - fallthrough;
> > case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> > if (endoff >= m.clusterofs) {
> > + m.headtype = m.type;
> > map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> > break;
> > }
> > @@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
> >
> > map->m_llen = end - map->m_la;
> > map->m_pa = blknr_to_addr(m.pblk);
> > - map->m_flags |= EROFS_MAP_MAPPED;
> > + map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
>
> Seems the new flag EROFS_MAP_ENCODED may be a little redundant?
>
> It is used only for checking whether the file system is corrupted or not when
> entering z_erofs_register_collection().
>
> And before collection begins, we will check whether the EROFS_MAP_MAPPED is set
> or not. We will do the collection only if EROFS_MAP_MAPPED is set. That's to say,
> the above checking to EROFS_MAP_ENCODED should be not possible?

I think it's useful to mark such data isn't a plain data (cannot read
directly, plus m_llen != m_plen). For example, t's useful for fiemap
(actually there is an improvement / a fix for iomap fiemap report code
to identify/mark the encode flag.)

BTW, I found that this patchset was still buggy, I'd suggest to ignore
this patchset. I need to revise them later.

Thanks,
Gao Xiang

>
> Thanks.