2021-10-08 20:09:55

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v2 0/3] erofs: some decompression improvements

Hi folks,

This patchset is mainly intended for the upcoming LZMA preparation,
but they still have some benefits to the exist LZ4 decompression.

The first patch looks up compression algorithms on mapping instead
of in the decompression frontend, which is used for the rest patches.

The second patch introduces another compression HEAD (HEAD2) so that
each file can be compressed with two different algorithms at most,
which can be used for the upcoming LZMA compression and LZ4 range
dictionary compression for different data/access patterns.

The third patch introduces a new readmore decompression strategy
trying to improve randread for large LZ4 big pcluster and the upcoming
LZMA decompression. It mainly addresses the previous issue mentioned
in the original big pcluster patchset [1]:

FIO randread
Testdata: enwik9
Kernel: Linux 5.15.0-rc2

pclustersize Vanilla Patched
4096 54.6 MiB/s 56.1 MiB/s
16384 117.4 MiB/s 145.6 MiB/s
32768 113.6 MiB/s 203.4 MiB/s
65536 72.8 MiB/s 236.1 MiB/s

The latest version can also be fetched from
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/readmore

[1] https://lore.kernel.org/r/[email protected]

Thanks,
Gao Xiang

Changes since v1:
- correct the function name to z_erofs_map_blocks_iter() in the commit
message pointed out by Yue;

- fix the readmore logic which mainly impacts the LZMA approach later,
therefore test the Patched version again.

Gao Xiang (3):
erofs: get compression algorithms directly on mapping
erofs: introduce the secondary compression head
erofs: introduce readmore decompression strategy

fs/erofs/compress.h | 5 --
fs/erofs/erofs_fs.h | 8 ++-
fs/erofs/internal.h | 25 +++++++-
fs/erofs/zdata.c | 111 +++++++++++++++++++++++++++--------
fs/erofs/zmap.c | 55 +++++++++++------
include/trace/events/erofs.h | 2 +-
6 files changed, 151 insertions(+), 55 deletions(-)

--
2.20.1


2021-10-08 20:10:06

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping

From: Gao Xiang <[email protected]>

Currently, z_erofs_map_blocks_iter() returns whether extents are
compressed or not, and the decompression frontend gets the specific
algorithms then.

It works but not quite well in many aspests, for example:
- The decompression frontend has to deal with whether extents are
compressed or not again and lookup the algorithms if compressed.
It's duplicated and too detailed about the on-disk mapping.

- A new secondary compression head will be introduced later so that
each file can have 2 compression algorithms at most for different
type of data. It could increase the complexity of the decompression
frontend if still handled in this way;

- A new readmore decompression strategy will be introduced to get
better performance for much bigger pcluster and lzma, which needs
the specific algorithm in advance as well.

Let's look up compression algorithms in z_erofs_map_blocks_iter()
directly instead.

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/compress.h | 5 -----
fs/erofs/internal.h | 12 +++++++++---
fs/erofs/zdata.c | 12 ++++++------
fs/erofs/zmap.c | 19 ++++++++++---------
include/trace/events/erofs.h | 2 +-
5 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 3701c72bacb2..ad62d1b4d371 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -8,11 +8,6 @@

#include "internal.h"

-enum {
- Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
- Z_EROFS_COMPRESSION_RUNTIME_MAX
-};
-
struct z_erofs_decompress_req {
struct super_block *sb;
struct page **in, **out;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9524e155b38f..48bfc6eb2b02 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
* of the corresponding uncompressed data in the file.
*/
enum {
- BH_Zipped = BH_PrivateStart,
+ BH_Encoded = BH_PrivateStart,
BH_FullMapped,
};

@@ -346,8 +346,8 @@ enum {
#define EROFS_MAP_MAPPED (1 << BH_Mapped)
/* Located in metadata (could be copied from bd_inode) */
#define EROFS_MAP_META (1 << BH_Meta)
-/* The extent has been compressed */
-#define EROFS_MAP_ZIPPED (1 << BH_Zipped)
+/* The extent is encoded */
+#define EROFS_MAP_ENCODED (1 << BH_Encoded)
/* The length of extent is full */
#define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)

@@ -355,6 +355,7 @@ struct erofs_map_blocks {
erofs_off_t m_pa, m_la;
u64 m_plen, m_llen;

+ char m_algorithmformat;
unsigned int m_flags;

struct page *mpage;
@@ -368,6 +369,11 @@ struct erofs_map_blocks {
*/
#define EROFS_GET_BLOCKS_FIEMAP 0x0002

+enum {
+ Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+ Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
/* zmap.c */
extern const struct iomap_ops z_erofs_iomap_report_ops;

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..5c34ef66677f 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
struct erofs_workgroup *grp;
int err;

+ if (!(map->m_flags & EROFS_MAP_ENCODED)) {
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+ }
+
/* no available pcluster, let's allocate one */
pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
if (IS_ERR(pcl))
@@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,

atomic_set(&pcl->obj.refcount, 1);
pcl->obj.index = map->m_pa >> PAGE_SHIFT;
-
+ pcl->algorithmformat = map->m_algorithmformat;
pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
(map->m_flags & EROFS_MAP_FULL_MAPPED ?
Z_EROFS_PCLUSTER_FULL_LENGTH : 0);

- if (map->m_flags & EROFS_MAP_ZIPPED)
- pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
- else
- pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
-
/* new pclusters should be claimed as type 1, primary and followed */
pcl->next = clt->owned_head;
clt->mode = COLLECT_PRIMARY_FOLLOWED;
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7a6df35fdc91..9d9c26343dab 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -111,7 +111,7 @@ struct z_erofs_maprecorder {

unsigned long lcn;
/* compression extent information gathered */
- u8 type;
+ u8 type, headtype;
u16 clusterofs;
u16 delta[2];
erofs_blk_t pblk, compressedlcs;
@@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- map->m_flags &= ~EROFS_MAP_ZIPPED;
- fallthrough;
case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
default:
@@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
- if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
+ if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
map->m_plen = 1 << lclusterbits;
return 0;
@@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
if (err)
goto unmap_out;

- map->m_flags = EROFS_MAP_ZIPPED; /* by default, compressed */
end = (m.lcn + 1ULL) << lclusterbits;

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- if (endoff >= m.clusterofs)
- map->m_flags &= ~EROFS_MAP_ZIPPED;
- fallthrough;
case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
if (endoff >= m.clusterofs) {
+ m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
break;
}
@@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,

map->m_llen = end - map->m_la;
map->m_pa = blknr_to_addr(m.pblk);
- map->m_flags |= EROFS_MAP_MAPPED;
+ map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;

err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
if (err)
goto out;

+ if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
+ map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+ else
+ map->m_algorithmformat = vi->z_algorithmtype[0];
+
if (flags & EROFS_GET_BLOCKS_FIEMAP) {
err = z_erofs_get_extent_decompressedlen(&m);
if (!err)
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index db4f2cec8360..16ae7b666810 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -24,7 +24,7 @@ struct erofs_map_blocks;
#define show_mflags(flags) __print_flags(flags, "", \
{ EROFS_MAP_MAPPED, "M" }, \
{ EROFS_MAP_META, "I" }, \
- { EROFS_MAP_ZIPPED, "Z" })
+ { EROFS_MAP_ENCODED, "E" })

TRACE_EVENT(erofs_lookup,

--
2.20.1

2021-10-08 20:11:05

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v2 2/3] erofs: introduce the secondary compression head

From: Gao Xiang <[email protected]>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/erofs_fs.h | 8 +++++---
fs/erofs/zmap.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
#define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
#define EROFS_ALL_FEATURE_INCOMPAT \
(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
- EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+ EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+ EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)

#define EROFS_SB_EXTSLOT_SIZE 16

@@ -314,9 +316,9 @@ struct z_erofs_map_header {
*/
enum {
Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
- Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
- Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
Z_EROFS_VLE_CLUSTER_TYPE_MAX
};

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..03945f15ceae 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;

if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
- erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
+ erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
vi->z_algorithmtype[0], vi->nid);
err = -EOPNOTSUPP;
goto unmap_done;
}
+ if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
+ erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
+ vi->z_algorithmtype[1], vi->nid);
+ err = -EOPNOTSUPP;
+ goto unmap_done;
+ }

vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
@@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
break;
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->clusterofs = le16_to_cpu(di->di_clusterofs);
m->pblk = le32_to_cpu(di->di_u.blkaddr);
break;
@@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
@@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
int err;

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
- m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+ DBG_BUGON(m->type != m->headtype);
+
if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
map->m_plen = 1 << lclusterbits;
return 0;
}
-
lcn = m->lcn + 1;
if (m->compressedlcs)
goto out;
@@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

switch (m->type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
/*
* if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
* rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
DBG_BUGON(!m->delta[1] &&
m->clusterofs != 1 << lclusterbits);
} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
/* go on until the next HEAD lcluster */
if (lcn != headlcn)
break;
@@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
if (endoff >= m.clusterofs) {
m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+ else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+ map->m_algorithmformat = vi->z_algorithmtype[1];
else
map->m_algorithmformat = vi->z_algorithmtype[0];

--
2.20.1

2021-10-08 20:12:05

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v2 3/3] erofs: introduce readmore decompression strategy

From: Gao Xiang <[email protected]>

Previously, the readahead window was strictly followed by EROFS
decompression strategy in order to minimize extra memory footprint.
However, it could become inefficient if just reading the partial
requested data for much big LZ4 pclusters and the upcoming LZMA
implementation.

Let's try to request the leading data in a pcluster without
triggering memory reclaiming instead for the LZ4 approach first
to boost up 100% randread of large big pclusters, and it has no real
impact on low memory scenarios.

It also introduces a way to expand read lengths in order to decompress
the whole pcluster, which is useful for LZMA since the algorithm
itself is relatively slow and causes CPU bound, but LZ4 is not.

Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/internal.h | 13 ++++++
fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++---------
2 files changed, 93 insertions(+), 19 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 48bfc6eb2b02..7f96265ccbdb 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
EROFS_I_DATALAYOUT_BITS);
}

+/*
+ * Different from grab_cache_page_nowait(), reclaiming is never triggered
+ * when allocating new pages.
+ */
+static inline
+struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
+ pgoff_t index)
+{
+ return pagecache_get_page(mapping, index,
+ FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
+ readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
+}
+
extern const struct super_operations erofs_sops;

extern const struct address_space_operations erofs_raw_access_aops;
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 5c34ef66677f..febb018e10a7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
}

+/*
+ * Since partial uptodate is still unimplemented for now, we have to use
+ * approximate readmore strategies as a start.
+ */
+static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
+ struct readahead_control *rac,
+ erofs_off_t end,
+ struct list_head *pagepool,
+ bool backmost)
+{
+ struct inode *inode = f->inode;
+ struct erofs_map_blocks *map = &f->map;
+ erofs_off_t cur;
+ int err;
+
+ if (backmost) {
+ map->m_la = end;
+ /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
+ err = z_erofs_map_blocks_iter(inode, map, 0);
+ if (err)
+ return;
+
+ /* expend ra for the trailing edge if readahead */
+ if (rac) {
+ loff_t newstart = readahead_pos(rac);
+
+ cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
+ readahead_expand(rac, newstart, cur - newstart);
+ return;
+ }
+ end = round_up(end, PAGE_SIZE);
+ } else {
+ end = round_up(map->m_la, PAGE_SIZE);
+
+ if (!map->m_llen)
+ return;
+ }
+
+ cur = map->m_la + map->m_llen - 1;
+ while (cur >= end) {
+ pgoff_t index = cur >> PAGE_SHIFT;
+ struct page *page;
+
+ page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
+ if (!page)
+ goto skip;
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ put_page(page);
+ goto skip;
+ }
+
+ err = z_erofs_do_read_page(f, page, pagepool);
+ if (err)
+ erofs_err(inode->i_sb,
+ "readmore error at page %lu @ nid %llu",
+ index, EROFS_I(inode)->nid);
+ put_page(page);
+skip:
+ if (cur < PAGE_SIZE)
+ break;
+ cur = (index << PAGE_SHIFT) - 1;
+ }
+}
+
static int z_erofs_readpage(struct file *file, struct page *page)
{
struct inode *const inode = page->mapping->host;
@@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
LIST_HEAD(pagepool);

trace_erofs_readpage(page, false);
-
f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;

+ z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
+ &pagepool, true);
err = z_erofs_do_read_page(&f, page, &pagepool);
+ z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
+
(void)z_erofs_collector_end(&f.clt);

/* if some compressed cluster ready, need submit them anyway */
@@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
{
struct inode *const inode = rac->mapping->host;
struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
-
- unsigned int nr_pages = readahead_count(rac);
- bool sync = (sbi->ctx.readahead_sync_decompress &&
- nr_pages <= sbi->ctx.max_sync_decompress_pages);
struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
struct page *page, *head = NULL;
+ unsigned int nr_pages;
LIST_HEAD(pagepool);

- trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
-
f.readahead = true;
f.headoffset = readahead_pos(rac);

- while ((page = readahead_page(rac))) {
- prefetchw(&page->flags);
-
- /*
- * A pure asynchronous readahead is indicated if
- * a PG_readahead marked page is hitted at first.
- * Let's also do asynchronous decompression for this case.
- */
- sync &= !(PageReadahead(page) && !head);
+ z_erofs_pcluster_readmore(&f, rac, f.headoffset +
+ readahead_length(rac) - 1, &pagepool, true);
+ nr_pages = readahead_count(rac);
+ trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);

+ while ((page = readahead_page(rac))) {
set_page_private(page, (unsigned long)head);
head = page;
}
@@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
page->index, EROFS_I(inode)->nid);
put_page(page);
}
-
+ z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
(void)z_erofs_collector_end(&f.clt);

- z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
-
+ z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+ sbi->ctx.readahead_sync_decompress &&
+ nr_pages <= sbi->ctx.max_sync_decompress_pages);
if (f.map.mpage)
put_page(f.map.mpage);

--
2.20.1

2021-10-09 01:54:35

by Yue Hu

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping

On Sat, 9 Oct 2021 04:08:37 +0800
Gao Xiang <[email protected]> wrote:

> From: Gao Xiang <[email protected]>
>
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
>
> It works but not quite well in many aspests, for example:
> - The decompression frontend has to deal with whether extents are
> compressed or not again and lookup the algorithms if compressed.
> It's duplicated and too detailed about the on-disk mapping.
>
> - A new secondary compression head will be introduced later so that
> each file can have 2 compression algorithms at most for different
> type of data. It could increase the complexity of the decompression
> frontend if still handled in this way;
>
> - A new readmore decompression strategy will be introduced to get
> better performance for much bigger pcluster and lzma, which needs
> the specific algorithm in advance as well.
>
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> fs/erofs/compress.h | 5 -----
> fs/erofs/internal.h | 12 +++++++++---
> fs/erofs/zdata.c | 12 ++++++------
> fs/erofs/zmap.c | 19 ++++++++++---------
> include/trace/events/erofs.h | 2 +-
> 5 files changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
> index 3701c72bacb2..ad62d1b4d371 100644
> --- a/fs/erofs/compress.h
> +++ b/fs/erofs/compress.h
> @@ -8,11 +8,6 @@
>
> #include "internal.h"
>
> -enum {
> - Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> - Z_EROFS_COMPRESSION_RUNTIME_MAX
> -};
> -
> struct z_erofs_decompress_req {
> struct super_block *sb;
> struct page **in, **out;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 9524e155b38f..48bfc6eb2b02 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
> * of the corresponding uncompressed data in the file.
> */
> enum {
> - BH_Zipped = BH_PrivateStart,
> + BH_Encoded = BH_PrivateStart,
> BH_FullMapped,
> };
>
> @@ -346,8 +346,8 @@ enum {
> #define EROFS_MAP_MAPPED (1 << BH_Mapped)
> /* Located in metadata (could be copied from bd_inode) */
> #define EROFS_MAP_META (1 << BH_Meta)
> -/* The extent has been compressed */
> -#define EROFS_MAP_ZIPPED (1 << BH_Zipped)
> +/* The extent is encoded */
> +#define EROFS_MAP_ENCODED (1 << BH_Encoded)
> /* The length of extent is full */
> #define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)
>
> @@ -355,6 +355,7 @@ struct erofs_map_blocks {
> erofs_off_t m_pa, m_la;
> u64 m_plen, m_llen;
>
> + char m_algorithmformat;
> unsigned int m_flags;
>
> struct page *mpage;
> @@ -368,6 +369,11 @@ struct erofs_map_blocks {
> */
> #define EROFS_GET_BLOCKS_FIEMAP 0x0002
>
> +enum {
> + Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> + Z_EROFS_COMPRESSION_RUNTIME_MAX
> +};
> +
> /* zmap.c */
> extern const struct iomap_ops z_erofs_iomap_report_ops;
>
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 11c7a1aaebad..5c34ef66677f 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
> struct erofs_workgroup *grp;
> int err;
>
> + if (!(map->m_flags & EROFS_MAP_ENCODED)) {
> + DBG_BUGON(1);
> + return -EFSCORRUPTED;
> + }
> +
> /* no available pcluster, let's allocate one */
> pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
> if (IS_ERR(pcl))
> @@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>
> atomic_set(&pcl->obj.refcount, 1);
> pcl->obj.index = map->m_pa >> PAGE_SHIFT;
> -
> + pcl->algorithmformat = map->m_algorithmformat;
> pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
> (map->m_flags & EROFS_MAP_FULL_MAPPED ?
> Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
>
> - if (map->m_flags & EROFS_MAP_ZIPPED)
> - pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
> - else
> - pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> -
> /* new pclusters should be claimed as type 1, primary and followed */
> pcl->next = clt->owned_head;
> clt->mode = COLLECT_PRIMARY_FOLLOWED;
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 7a6df35fdc91..9d9c26343dab 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
>
> unsigned long lcn;
> /* compression extent information gathered */
> - u8 type;
> + u8 type, headtype;
> u16 clusterofs;
> u16 delta[2];
> erofs_blk_t pblk, compressedlcs;
> @@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> }
> return z_erofs_extent_lookback(m, m->delta[0]);
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - map->m_flags &= ~EROFS_MAP_ZIPPED;
> - fallthrough;
> case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + m->headtype = m->type;
> map->m_la = (lcn << lclusterbits) | m->clusterofs;
> break;
> default:
> @@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>
> DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> - if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
> + if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> map->m_plen = 1 << lclusterbits;
> return 0;
> @@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
> if (err)
> goto unmap_out;
>
> - map->m_flags = EROFS_MAP_ZIPPED; /* by default, compressed */
> end = (m.lcn + 1ULL) << lclusterbits;
>
> switch (m.type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - if (endoff >= m.clusterofs)
> - map->m_flags &= ~EROFS_MAP_ZIPPED;
> - fallthrough;
> case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> if (endoff >= m.clusterofs) {
> + m.headtype = m.type;
> map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> break;
> }
> @@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> map->m_llen = end - map->m_la;
> map->m_pa = blknr_to_addr(m.pblk);
> - map->m_flags |= EROFS_MAP_MAPPED;
> + map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
>
> err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
> if (err)
> goto out;
>
> + if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> + map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> + else
> + map->m_algorithmformat = vi->z_algorithmtype[0];
> +
> if (flags & EROFS_GET_BLOCKS_FIEMAP) {
> err = z_erofs_get_extent_decompressedlen(&m);
> if (!err)
> diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
> index db4f2cec8360..16ae7b666810 100644
> --- a/include/trace/events/erofs.h
> +++ b/include/trace/events/erofs.h
> @@ -24,7 +24,7 @@ struct erofs_map_blocks;
> #define show_mflags(flags) __print_flags(flags, "", \
> { EROFS_MAP_MAPPED, "M" }, \
> { EROFS_MAP_META, "I" }, \
> - { EROFS_MAP_ZIPPED, "Z" })
> + { EROFS_MAP_ENCODED, "E" })

Looks good to me.

Reviewed-by: Yue Hu <[email protected]>

>
> TRACE_EVENT(erofs_lookup,
>

2021-10-09 03:55:27

by Yue Hu

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] erofs: introduce the secondary compression head

On Sat, 9 Oct 2021 04:08:38 +0800
Gao Xiang <[email protected]> wrote:

> From: Gao Xiang <[email protected]>
>
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
>
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
>
> It has been stayed in the EROFS roadmap for years. Complete it now!
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> fs/erofs/erofs_fs.h | 8 +++++---
> fs/erofs/zmap.c | 36 +++++++++++++++++++++++++++---------
> 2 files changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
> #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
> #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
> #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
> #define EROFS_ALL_FEATURE_INCOMPAT \
> (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> + EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>
> #define EROFS_SB_EXTSLOT_SIZE 16
>
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> */
> enum {
> Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
> - Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
> Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
> - Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
> Z_EROFS_VLE_CLUSTER_TYPE_MAX
> };
>
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..03945f15ceae 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>
> if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> - erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> + erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
> vi->z_algorithmtype[0], vi->nid);
> err = -EOPNOTSUPP;
> goto unmap_done;
> }
> + if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> + erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> + vi->z_algorithmtype[1], vi->nid);
> + err = -EOPNOTSUPP;
> + goto unmap_done;
> + }

Seems duplicated a little, how about below code?

if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
i, vi->z_algorithmtype[i], vi->nid);
err = -EOPNOTSUPP;
goto unmap_done;
}

>
> vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
> if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
> @@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
> m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
> break;
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->clusterofs = le16_to_cpu(di->di_clusterofs);
> m->pblk = le32_to_cpu(di->di_u.blkaddr);
> break;
> @@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> }
> return z_erofs_extent_lookback(m, m->delta[0]);
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->headtype = m->type;
> map->m_la = (lcn << lclusterbits) | m->clusterofs;
> break;
> @@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
> int err;
>
> DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> - m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> + DBG_BUGON(m->type != m->headtype);
> +
> if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
> map->m_plen = 1 << lclusterbits;
> return 0;
> }
> -
> lcn = m->lcn + 1;
> if (m->compressedlcs)
> goto out;
> @@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>
> switch (m->type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> /*
> * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
> * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
> DBG_BUGON(!m->delta[1] &&
> m->clusterofs != 1 << lclusterbits);
> } else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
> /* go on until the next HEAD lcluster */
> if (lcn != headlcn)
> break;
> @@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> switch (m.type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> if (endoff >= m.clusterofs) {
> m.headtype = m.type;
> map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> + else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> + map->m_algorithmformat = vi->z_algorithmtype[1];
> else
> map->m_algorithmformat = vi->z_algorithmtype[0];
>

2021-10-09 04:49:31

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] erofs: introduce the secondary compression head

Hi Yue,

On Sat, Oct 09, 2021 at 11:50:32AM +0800, Yue Hu wrote:
> On Sat, 9 Oct 2021 04:08:38 +0800
> Gao Xiang <[email protected]> wrote:
>
> > From: Gao Xiang <[email protected]>
> >
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> >
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> >
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> >
> > Signed-off-by: Gao Xiang <[email protected]>
> > ---
> > fs/erofs/erofs_fs.h | 8 +++++---
> > fs/erofs/zmap.c | 36 +++++++++++++++++++++++++++---------
> > 2 files changed, 32 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> > #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
> > #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
> > #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
> > #define EROFS_ALL_FEATURE_INCOMPAT \
> > (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> > EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> > EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > + EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> >
> > #define EROFS_SB_EXTSLOT_SIZE 16
> >
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> > */
> > enum {
> > Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
> > - Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
> > + Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
> > Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
> > - Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
> > + Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
> > Z_EROFS_VLE_CLUSTER_TYPE_MAX
> > };
> >
> > diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> > index 9d9c26343dab..03945f15ceae 100644
> > --- a/fs/erofs/zmap.c
> > +++ b/fs/erofs/zmap.c
> > @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> > vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
> >
> > if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> > - erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> > + erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
> > vi->z_algorithmtype[0], vi->nid);
> > err = -EOPNOTSUPP;
> > goto unmap_done;
> > }
> > + if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> > + erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> > + vi->z_algorithmtype[1], vi->nid);
> > + err = -EOPNOTSUPP;
> > + goto unmap_done;
> > + }
>
> Seems duplicated a little, how about below code?
>
> if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
> vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
> erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> i, vi->z_algorithmtype[i], vi->nid);
> err = -EOPNOTSUPP;
> goto unmap_done;
> }

Yeah, good simplification. I will update it and rename `i' to `headnr'
here.

Thanks,
Gao Xiang

2021-10-09 18:14:21

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v3 2/3] erofs: introduce the secondary compression head

From: Gao Xiang <[email protected]>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <[email protected]>
---
v2: https://lore.kernel.org/r/[email protected]
changes since v2:
- simplify z_algorithmtype check suggested by Yue.

fs/erofs/erofs_fs.h | 8 +++++---
fs/erofs/zmap.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
#define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
#define EROFS_ALL_FEATURE_INCOMPAT \
(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
- EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+ EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+ EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)

#define EROFS_SB_EXTSLOT_SIZE 16

@@ -314,9 +316,9 @@ struct z_erofs_map_header {
*/
enum {
Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
- Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
- Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
Z_EROFS_VLE_CLUSTER_TYPE_MAX
};

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..864d9d5474d5 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
{
struct erofs_inode *const vi = EROFS_I(inode);
struct super_block *const sb = inode->i_sb;
- int err;
+ int err, headnr;
erofs_off_t pos;
struct page *page;
void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;

- if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
- erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
- vi->z_algorithmtype[0], vi->nid);
+ headnr = 0;
+ if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+ vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+ erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+ headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
err = -EOPNOTSUPP;
goto unmap_done;
}
@@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
break;
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->clusterofs = le16_to_cpu(di->di_clusterofs);
m->pblk = le32_to_cpu(di->di_u.blkaddr);
break;
@@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
@@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
int err;

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
- m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+ DBG_BUGON(m->type != m->headtype);
+
if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
map->m_plen = 1 << lclusterbits;
return 0;
}
-
lcn = m->lcn + 1;
if (m->compressedlcs)
goto out;
@@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

switch (m->type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
/*
* if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
* rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
DBG_BUGON(!m->delta[1] &&
m->clusterofs != 1 << lclusterbits);
} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
/* go on until the next HEAD lcluster */
if (lcn != headlcn)
break;
@@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
if (endoff >= m.clusterofs) {
m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+ else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+ map->m_algorithmformat = vi->z_algorithmtype[1];
else
map->m_algorithmformat = vi->z_algorithmtype[0];

--
2.20.1

2021-10-10 01:36:05

by Yue Hu

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] erofs: introduce the secondary compression head

On Sun, 10 Oct 2021 02:12:09 +0800
Gao Xiang <[email protected]> wrote:

> From: Gao Xiang <[email protected]>
>
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
>
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
>
> It has been stayed in the EROFS roadmap for years. Complete it now!
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> v2: https://lore.kernel.org/r/[email protected]
> changes since v2:
> - simplify z_algorithmtype check suggested by Yue.
>
> fs/erofs/erofs_fs.h | 8 +++++---
> fs/erofs/zmap.c | 38 ++++++++++++++++++++++++++------------
> 2 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
> #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
> #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
> #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
> #define EROFS_ALL_FEATURE_INCOMPAT \
> (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> + EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>
> #define EROFS_SB_EXTSLOT_SIZE 16
>
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> */
> enum {
> Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
> - Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
> Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
> - Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
> Z_EROFS_VLE_CLUSTER_TYPE_MAX
> };
>
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> {
> struct erofs_inode *const vi = EROFS_I(inode);
> struct super_block *const sb = inode->i_sb;
> - int err;
> + int err, headnr;
> erofs_off_t pos;
> struct page *page;
> void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
> vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>
> - if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> - erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> - vi->z_algorithmtype[0], vi->nid);
> + headnr = 0;
> + if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> + vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> + erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> + headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
> err = -EOPNOTSUPP;
> goto unmap_done;
> }
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
> m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
> break;
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->clusterofs = le16_to_cpu(di->di_clusterofs);
> m->pblk = le32_to_cpu(di->di_u.blkaddr);
> break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> }
> return z_erofs_extent_lookback(m, m->delta[0]);
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->headtype = m->type;
> map->m_la = (lcn << lclusterbits) | m->clusterofs;
> break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
> int err;
>
> DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> - m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> + DBG_BUGON(m->type != m->headtype);
> +
> if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
> map->m_plen = 1 << lclusterbits;
> return 0;
> }
> -
> lcn = m->lcn + 1;
> if (m->compressedlcs)
> goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>
> switch (m->type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> /*
> * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
> * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
> DBG_BUGON(!m->delta[1] &&
> m->clusterofs != 1 << lclusterbits);
> } else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
> /* go on until the next HEAD lcluster */
> if (lcn != headlcn)
> break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> switch (m.type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> if (endoff >= m.clusterofs) {
> m.headtype = m.type;
> map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> + else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> + map->m_algorithmformat = vi->z_algorithmtype[1];
> else
> map->m_algorithmformat = vi->z_algorithmtype[0];
>

Looks good to me.

Reviewed-by: Yue Hu <[email protected]>

2021-10-18 03:45:45

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] erofs: introduce the secondary compression head

Hi Chao,

On Sun, Oct 17, 2021 at 11:27:54PM +0800, Chao Yu wrote:
> On 2021/10/10 2:12, Gao Xiang wrote:
> > From: Gao Xiang <[email protected]>
> >
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> >
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> >
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> >
> > Signed-off-by: Gao Xiang <[email protected]>
> > ---
> > v2: https://lore.kernel.org/r/[email protected]
> > changes since v2:
> > - simplify z_algorithmtype check suggested by Yue.
> >
> > fs/erofs/erofs_fs.h | 8 +++++---
> > fs/erofs/zmap.c | 38 ++++++++++++++++++++++++++------------
> > 2 files changed, 31 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> > #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
> > #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
> > #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
> > #define EROFS_ALL_FEATURE_INCOMPAT \
> > (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> > EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> > EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > + EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> > #define EROFS_SB_EXTSLOT_SIZE 16
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> > */
> > enum {
> > Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
> > - Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
> > + Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
> > Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
> > - Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
> > + Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
>
> It needs to update comments above as well.

okay, let me revise them now.

Thanks,
Gao XIang

>
> Thanks,

2021-10-18 03:45:49

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy

On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
> On 2021/10/9 4:08, Gao Xiang wrote:
> > From: Gao Xiang <[email protected]>
> >
> > Previously, the readahead window was strictly followed by EROFS
> > decompression strategy in order to minimize extra memory footprint.
> > However, it could become inefficient if just reading the partial
> > requested data for much big LZ4 pclusters and the upcoming LZMA
> > implementation.
> >
> > Let's try to request the leading data in a pcluster without
> > triggering memory reclaiming instead for the LZ4 approach first
> > to boost up 100% randread of large big pclusters, and it has no real
> > impact on low memory scenarios.
> >
> > It also introduces a way to expand read lengths in order to decompress
> > the whole pcluster, which is useful for LZMA since the algorithm
> > itself is relatively slow and causes CPU bound, but LZ4 is not.
> >
> > Signed-off-by: Gao Xiang <[email protected]>
> > ---
> > fs/erofs/internal.h | 13 ++++++
> > fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++---------
> > 2 files changed, 93 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 48bfc6eb2b02..7f96265ccbdb 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> > EROFS_I_DATALAYOUT_BITS);
> > }
> > +/*
> > + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> > + * when allocating new pages.
> > + */
> > +static inline
> > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> > + pgoff_t index)
> > +{
> > + return pagecache_get_page(mapping, index,
> > + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> > + readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> > +}
> > +
> > extern const struct super_operations erofs_sops;
> > extern const struct address_space_operations erofs_raw_access_aops;
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 5c34ef66677f..febb018e10a7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> > z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> > }
> > +/*
> > + * Since partial uptodate is still unimplemented for now, we have to use
> > + * approximate readmore strategies as a start.
> > + */
> > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> > + struct readahead_control *rac,
> > + erofs_off_t end,
> > + struct list_head *pagepool,
> > + bool backmost)
> > +{
> > + struct inode *inode = f->inode;
> > + struct erofs_map_blocks *map = &f->map;
> > + erofs_off_t cur;
> > + int err;
> > +
> > + if (backmost) {
> > + map->m_la = end;
> > + /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> > + err = z_erofs_map_blocks_iter(inode, map, 0);
> > + if (err)
> > + return;
> > +
> > + /* expend ra for the trailing edge if readahead */
> > + if (rac) {
> > + loff_t newstart = readahead_pos(rac);
> > +
> > + cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> > + readahead_expand(rac, newstart, cur - newstart);
> > + return;
> > + }
> > + end = round_up(end, PAGE_SIZE);
> > + } else {
> > + end = round_up(map->m_la, PAGE_SIZE);
> > +
> > + if (!map->m_llen)
> > + return;
> > + }
> > +
> > + cur = map->m_la + map->m_llen - 1;
> > + while (cur >= end) {
> > + pgoff_t index = cur >> PAGE_SHIFT;
> > + struct page *page;
> > +
> > + page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> > + if (!page)
> > + goto skip;
> > +
> > + if (PageUptodate(page)) {
> > + unlock_page(page);
> > + put_page(page);
> > + goto skip;
> > + }
> > +
> > + err = z_erofs_do_read_page(f, page, pagepool);
> > + if (err)
> > + erofs_err(inode->i_sb,
> > + "readmore error at page %lu @ nid %llu",
> > + index, EROFS_I(inode)->nid);
> > + put_page(page);
> > +skip:
> > + if (cur < PAGE_SIZE)
> > + break;
> > + cur = (index << PAGE_SHIFT) - 1;
>
> Looks a little bit weird to readahead backward, any special reason here?

Due to the do_read_page implementation, since I'd like to avoid
to get the exact full extent length (FIEMAP-likewise) inside
do_read_page but only request the needed range, so it should be
all in a backward way. Also the submission chain can be then in
a forward way.

If the question was asked why we should read backward, as I said in the
commit message, big pclusters matter since we could read in more leading
data at once.

Thanks,
Gao Xiang

>
> Thanks,

2021-10-18 03:46:30

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
>
> It works but not quite well in many aspests, for example:
> - The decompression frontend has to deal with whether extents are
> compressed or not again and lookup the algorithms if compressed.
> It's duplicated and too detailed about the on-disk mapping.
>
> - A new secondary compression head will be introduced later so that
> each file can have 2 compression algorithms at most for different
> type of data. It could increase the complexity of the decompression
> frontend if still handled in this way;
>
> - A new readmore decompression strategy will be introduced to get
> better performance for much bigger pcluster and lzma, which needs
> the specific algorithm in advance as well.
>
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
>
> Signed-off-by: Gao Xiang <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2021-10-18 03:46:31

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v4 2/3] erofs: introduce the secondary compression head

From: Gao Xiang <[email protected]>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Reviewed-by: Yue Hu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
changes since v3:
- update comments about on-disk lclusters suggested by Chao.

fs/erofs/erofs_fs.h | 39 ++++++++++++++++++++-------------------
fs/erofs/zmap.c | 41 ++++++++++++++++++++++++++++-------------
2 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index e480b3854d88..87736cbf18cc 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -22,12 +22,14 @@
#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
#define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
#define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE 0x00000008
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
#define EROFS_ALL_FEATURE_INCOMPAT \
(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
- EROFS_FEATURE_INCOMPAT_DEVICE_TABLE)
+ EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
+ EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)

#define EROFS_SB_EXTSLOT_SIZE 16

@@ -303,35 +305,34 @@ struct z_erofs_map_header {
#define Z_EROFS_VLE_LEGACY_HEADER_PADDING 8

/*
- * Fixed-sized output compression ondisk Logical Extent cluster type:
- * 0 - literal (uncompressed) cluster
- * 1 - compressed cluster (for the head logical cluster)
- * 2 - compressed cluster (for the other logical clusters)
+ * Fixed-sized output compression on-disk logical cluster type:
+ * 0 - literal (uncompressed) lcluster
+ * 1,3 - compressed lcluster (for HEAD lclusters)
+ * 2 - compressed lcluster (for NONHEAD lclusters)
*
* In detail,
- * 0 - literal (uncompressed) cluster,
+ * 0 - literal (uncompressed) lcluster,
* di_advise = 0
- * di_clusterofs = the literal data offset of the cluster
- * di_blkaddr = the blkaddr of the literal cluster
+ * di_clusterofs = the literal data offset of the lcluster
+ * di_blkaddr = the blkaddr of the literal pcluster
*
- * 1 - compressed cluster (for the head logical cluster)
- * di_advise = 1
- * di_clusterofs = the decompressed data offset of the cluster
- * di_blkaddr = the blkaddr of the compressed cluster
+ * 1,3 - compressed lcluster (for HEAD lclusters)
+ * di_advise = 1 or 3
+ * di_clusterofs = the decompressed data offset of the lcluster
+ * di_blkaddr = the blkaddr of the compressed pcluster
*
- * 2 - compressed cluster (for the other logical clusters)
+ * 2 - compressed cluster (for NONHEAD lclusters)
* di_advise = 2
* di_clusterofs =
- * the decompressed data offset in its own head cluster
- * di_u.delta[0] = distance to its corresponding head cluster
- * di_u.delta[1] = distance to its corresponding tail cluster
- * (di_advise could be 0, 1 or 2)
+ * the decompressed data offset in its own HEAD lcluster
+ * di_u.delta[0] = distance to this HEAD lcluster
+ * di_u.delta[1] = distance to the next HEAD lcluster
*/
enum {
Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
- Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
- Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
+ Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,
Z_EROFS_VLE_CLUSTER_TYPE_MAX
};

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 1c3b068e5a42..85d0289429b3 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
{
struct erofs_inode *const vi = EROFS_I(inode);
struct super_block *const sb = inode->i_sb;
- int err;
+ int err, headnr;
erofs_off_t pos;
struct page *page;
void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;

- if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
- erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
- vi->z_algorithmtype[0], vi->nid);
+ headnr = 0;
+ if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+ vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+ erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+ headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
err = -EOPNOTSUPP;
goto unmap_done;
}
@@ -178,7 +180,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
m->clusterofs = 1 << vi->z_logical_clusterbits;
m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
if (m->delta[0] & Z_EROFS_VLE_DI_D0_CBLKCNT) {
- if (!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ if (!(vi->z_advise & (Z_EROFS_ADVISE_BIG_PCLUSTER_1 |
+ Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
DBG_BUGON(1);
return -EFSCORRUPTED;
}
@@ -189,7 +192,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
break;
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->clusterofs = le16_to_cpu(di->di_clusterofs);
m->pblk = le32_to_cpu(di->di_u.blkaddr);
break;
@@ -446,7 +450,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
}
return z_erofs_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
m->headtype = m->type;
map->m_la = (lcn << lclusterbits) | m->clusterofs;
break;
@@ -470,13 +475,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
int err;

DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
- m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+ m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+ DBG_BUGON(m->type != m->headtype);
+
if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+ ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+ !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
map->m_plen = 1 << lclusterbits;
return 0;
}
-
lcn = m->lcn + 1;
if (m->compressedlcs)
goto out;
@@ -498,7 +508,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,

switch (m->type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
/*
* if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
* rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +564,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
DBG_BUGON(!m->delta[1] &&
m->clusterofs != 1 << lclusterbits);
} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
- m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+ m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
/* go on until the next HEAD lcluster */
if (lcn != headlcn)
break;
@@ -613,7 +625,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

switch (m.type) {
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
- case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+ case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
if (endoff >= m.clusterofs) {
m.headtype = m.type;
map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +667,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,

if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+ else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+ map->m_algorithmformat = vi->z_algorithmtype[1];
else
map->m_algorithmformat = vi->z_algorithmtype[0];

--
2.20.1

2021-10-18 03:47:28

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> Previously, the readahead window was strictly followed by EROFS
> decompression strategy in order to minimize extra memory footprint.
> However, it could become inefficient if just reading the partial
> requested data for much big LZ4 pclusters and the upcoming LZMA
> implementation.
>
> Let's try to request the leading data in a pcluster without
> triggering memory reclaiming instead for the LZ4 approach first
> to boost up 100% randread of large big pclusters, and it has no real
> impact on low memory scenarios.
>
> It also introduces a way to expand read lengths in order to decompress
> the whole pcluster, which is useful for LZMA since the algorithm
> itself is relatively slow and causes CPU bound, but LZ4 is not.
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> fs/erofs/internal.h | 13 ++++++
> fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 93 insertions(+), 19 deletions(-)
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 48bfc6eb2b02..7f96265ccbdb 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> EROFS_I_DATALAYOUT_BITS);
> }
>
> +/*
> + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> + * when allocating new pages.
> + */
> +static inline
> +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> + pgoff_t index)
> +{
> + return pagecache_get_page(mapping, index,
> + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> + readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> +}
> +
> extern const struct super_operations erofs_sops;
>
> extern const struct address_space_operations erofs_raw_access_aops;
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 5c34ef66677f..febb018e10a7 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> }
>
> +/*
> + * Since partial uptodate is still unimplemented for now, we have to use
> + * approximate readmore strategies as a start.
> + */
> +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> + struct readahead_control *rac,
> + erofs_off_t end,
> + struct list_head *pagepool,
> + bool backmost)
> +{
> + struct inode *inode = f->inode;
> + struct erofs_map_blocks *map = &f->map;
> + erofs_off_t cur;
> + int err;
> +
> + if (backmost) {
> + map->m_la = end;
> + /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> + err = z_erofs_map_blocks_iter(inode, map, 0);
> + if (err)
> + return;
> +
> + /* expend ra for the trailing edge if readahead */
> + if (rac) {
> + loff_t newstart = readahead_pos(rac);
> +
> + cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> + readahead_expand(rac, newstart, cur - newstart);
> + return;
> + }
> + end = round_up(end, PAGE_SIZE);
> + } else {
> + end = round_up(map->m_la, PAGE_SIZE);
> +
> + if (!map->m_llen)
> + return;
> + }
> +
> + cur = map->m_la + map->m_llen - 1;
> + while (cur >= end) {
> + pgoff_t index = cur >> PAGE_SHIFT;
> + struct page *page;
> +
> + page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> + if (!page)
> + goto skip;
> +
> + if (PageUptodate(page)) {
> + unlock_page(page);
> + put_page(page);
> + goto skip;
> + }
> +
> + err = z_erofs_do_read_page(f, page, pagepool);
> + if (err)
> + erofs_err(inode->i_sb,
> + "readmore error at page %lu @ nid %llu",
> + index, EROFS_I(inode)->nid);
> + put_page(page);
> +skip:
> + if (cur < PAGE_SIZE)
> + break;
> + cur = (index << PAGE_SHIFT) - 1;

Looks a little bit weird to readahead backward, any special reason here?

Thanks,

> + }
> +}
> +
> static int z_erofs_readpage(struct file *file, struct page *page)
> {
> struct inode *const inode = page->mapping->host;
> @@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
> LIST_HEAD(pagepool);
>
> trace_erofs_readpage(page, false);
> -
> f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
>
> + z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
> + &pagepool, true);
> err = z_erofs_do_read_page(&f, page, &pagepool);
> + z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
> +
> (void)z_erofs_collector_end(&f.clt);
>
> /* if some compressed cluster ready, need submit them anyway */
> @@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
> {
> struct inode *const inode = rac->mapping->host;
> struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
> -
> - unsigned int nr_pages = readahead_count(rac);
> - bool sync = (sbi->ctx.readahead_sync_decompress &&
> - nr_pages <= sbi->ctx.max_sync_decompress_pages);
> struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
> struct page *page, *head = NULL;
> + unsigned int nr_pages;
> LIST_HEAD(pagepool);
>
> - trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
> -
> f.readahead = true;
> f.headoffset = readahead_pos(rac);
>
> - while ((page = readahead_page(rac))) {
> - prefetchw(&page->flags);
> -
> - /*
> - * A pure asynchronous readahead is indicated if
> - * a PG_readahead marked page is hitted at first.
> - * Let's also do asynchronous decompression for this case.
> - */
> - sync &= !(PageReadahead(page) && !head);
> + z_erofs_pcluster_readmore(&f, rac, f.headoffset +
> + readahead_length(rac) - 1, &pagepool, true);
> + nr_pages = readahead_count(rac);
> + trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
>
> + while ((page = readahead_page(rac))) {
> set_page_private(page, (unsigned long)head);
> head = page;
> }
> @@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
> page->index, EROFS_I(inode)->nid);
> put_page(page);
> }
> -
> + z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
> (void)z_erofs_collector_end(&f.clt);
>
> - z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
> -
> + z_erofs_runqueue(inode->i_sb, &f, &pagepool,
> + sbi->ctx.readahead_sync_decompress &&
> + nr_pages <= sbi->ctx.max_sync_decompress_pages);
> if (f.map.mpage)
> put_page(f.map.mpage);
>
>

2021-10-18 03:53:03

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] erofs: introduce the secondary compression head

On 2021/10/10 2:12, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
>
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
>
> It has been stayed in the EROFS roadmap for years. Complete it now!
>
> Signed-off-by: Gao Xiang <[email protected]>
> ---
> v2: https://lore.kernel.org/r/[email protected]
> changes since v2:
> - simplify z_algorithmtype check suggested by Yue.
>
> fs/erofs/erofs_fs.h | 8 +++++---
> fs/erofs/zmap.c | 38 ++++++++++++++++++++++++++------------
> 2 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
> #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002
> #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002
> #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2 0x00000008
> #define EROFS_ALL_FEATURE_INCOMPAT \
> (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> + EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>
> #define EROFS_SB_EXTSLOT_SIZE 16
>
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> */
> enum {
> Z_EROFS_VLE_CLUSTER_TYPE_PLAIN = 0,
> - Z_EROFS_VLE_CLUSTER_TYPE_HEAD = 1,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 = 1,
> Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD = 2,
> - Z_EROFS_VLE_CLUSTER_TYPE_RESERVED = 3,
> + Z_EROFS_VLE_CLUSTER_TYPE_HEAD2 = 3,

It needs to update comments above as well.

Thanks,

> Z_EROFS_VLE_CLUSTER_TYPE_MAX
> };
>
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> {
> struct erofs_inode *const vi = EROFS_I(inode);
> struct super_block *const sb = inode->i_sb;
> - int err;
> + int err, headnr;
> erofs_off_t pos;
> struct page *page;
> void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
> vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>
> - if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> - erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> - vi->z_algorithmtype[0], vi->nid);
> + headnr = 0;
> + if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> + vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> + erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> + headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
> err = -EOPNOTSUPP;
> goto unmap_done;
> }
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
> m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
> break;
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->clusterofs = le16_to_cpu(di->di_clusterofs);
> m->pblk = le32_to_cpu(di->di_u.blkaddr);
> break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
> }
> return z_erofs_extent_lookback(m, m->delta[0]);
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> m->headtype = m->type;
> map->m_la = (lcn << lclusterbits) | m->clusterofs;
> break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
> int err;
>
> DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> - m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> + DBG_BUGON(m->type != m->headtype);
> +
> if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> + ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> + !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
> map->m_plen = 1 << lclusterbits;
> return 0;
> }
> -
> lcn = m->lcn + 1;
> if (m->compressedlcs)
> goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>
> switch (m->type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> /*
> * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
> * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
> DBG_BUGON(!m->delta[1] &&
> m->clusterofs != 1 << lclusterbits);
> } else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> - m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> + m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
> /* go on until the next HEAD lcluster */
> if (lcn != headlcn)
> break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> switch (m.type) {
> case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> - case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
> if (endoff >= m.clusterofs) {
> m.headtype = m.type;
> map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>
> if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> + else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> + map->m_algorithmformat = vi->z_algorithmtype[1];
> else
> map->m_algorithmformat = vi->z_algorithmtype[0];
>
>

2021-10-19 13:00:32

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy

On 2021/10/17 23:42, Gao Xiang wrote:
> On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
>> On 2021/10/9 4:08, Gao Xiang wrote:
>>> From: Gao Xiang <[email protected]>
>>>
>>> Previously, the readahead window was strictly followed by EROFS
>>> decompression strategy in order to minimize extra memory footprint.
>>> However, it could become inefficient if just reading the partial
>>> requested data for much big LZ4 pclusters and the upcoming LZMA
>>> implementation.
>>>
>>> Let's try to request the leading data in a pcluster without
>>> triggering memory reclaiming instead for the LZ4 approach first
>>> to boost up 100% randread of large big pclusters, and it has no real
>>> impact on low memory scenarios.
>>>
>>> It also introduces a way to expand read lengths in order to decompress
>>> the whole pcluster, which is useful for LZMA since the algorithm
>>> itself is relatively slow and causes CPU bound, but LZ4 is not.
>>>
>>> Signed-off-by: Gao Xiang <[email protected]>

Looks fine to me now.

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2021-10-19 13:01:30

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] erofs: introduce the secondary compression head

On 2021/10/18 0:57, Gao Xiang wrote:
> From: Gao Xiang <[email protected]>
>
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
>
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
>
> It has been stayed in the EROFS roadmap for years. Complete it now!
>
> Reviewed-by: Yue Hu <[email protected]>
> Signed-off-by: Gao Xiang <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,