2019-12-09 15:31:10

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v2 0/6] S390 hardware compression support for kernel zlib

With IBM z15 mainframe the new DFLTCC instruction is available. It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patch-set adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:
https://github.com/madler/zlib/pull/410
The coding style is also preserved for future maintainability.
There is only limited set of userspace zlib functions represented in
kernel. Apart from that, all the memory allocation should be performed
in advance. Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.
Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).
Like it was implemented for userspace, kernel zlib will compress in
hardware on level 1, and in software on all other levels. Decompression
will always happen in hardware (when enabled).
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. However it does always produce
the standard conform output which can be inflated anyway.
The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel zlib
is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled. Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch 6)
the following performance results have been achieved using the ramdisk
with btrfs. These are relative numbers based on throughput rate and
compression ratio for zlib level 1:

Input data Deflate rate Inflate rate Compression ratio
NXU/Software NXU/Software NXU/Software
stream of zeroes 1.46 1.02 1.00
random ASCII data 10.44 3.00 0.96
ASCII text (dickens) 6,21 3.33 0.94
binary data (vmlinux) 8,37 3.90 1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers to
all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in native
LPAR on a z15 system. Results may vary based on individual workload,
configuration and software levels.

Changelog:
v1 -> v2:
- Added new external zlib function to check if s390 Deflate-Conversion
facility is installed and enabled (see patch 5).
- The larger buffer for btrfs zlib workspace is allocated only if
s390 hardware compression is enabled. In case of failure to allocate
4-page buffer, we fall back to a PAGE_SIZE buffer, as proposed
by Josef Bacik (see patch 6).

Mikhail Zaslonko (6):
lib/zlib: Add s390 hardware support for kernel zlib_deflate
s390/boot: Rename HEAP_SIZE due to name collision
lib/zlib: Add s390 hardware support for kernel zlib_inflate
s390/boot: Add dfltcc= kernel command line parameter
lib/zlib: Add zlib_deflate_dfltcc_enabled() function
btrfs: Use larger zlib buffer for s390 hardware compression

.../admin-guide/kernel-parameters.txt | 12 +
arch/s390/boot/compressed/decompressor.c | 8 +-
arch/s390/boot/ipl_parm.c | 14 +
arch/s390/include/asm/setup.h | 7 +
arch/s390/kernel/setup.c | 1 +
fs/btrfs/compression.c | 2 +-
fs/btrfs/zlib.c | 118 +++++---
include/linux/zlib.h | 6 +
lib/Kconfig | 22 ++
lib/Makefile | 1 +
lib/decompress_inflate.c | 13 +
lib/zlib_deflate/deflate.c | 85 +++---
lib/zlib_deflate/deflate_syms.c | 1 +
lib/zlib_deflate/deftree.c | 54 ----
lib/zlib_deflate/defutil.h | 134 ++++++++-
lib/zlib_dfltcc/Makefile | 11 +
lib/zlib_dfltcc/dfltcc.c | 55 ++++
lib/zlib_dfltcc/dfltcc.h | 147 +++++++++
lib/zlib_dfltcc/dfltcc_deflate.c | 280 ++++++++++++++++++
lib/zlib_dfltcc/dfltcc_inflate.c | 149 ++++++++++
lib/zlib_dfltcc/dfltcc_syms.c | 17 ++
lib/zlib_dfltcc/dfltcc_util.h | 124 ++++++++
lib/zlib_inflate/inflate.c | 32 +-
lib/zlib_inflate/inflate.h | 8 +
lib/zlib_inflate/infutil.h | 18 +-
25 files changed, 1163 insertions(+), 156 deletions(-)
create mode 100644 lib/zlib_dfltcc/Makefile
create mode 100644 lib/zlib_dfltcc/dfltcc.c
create mode 100644 lib/zlib_dfltcc/dfltcc.h
create mode 100644 lib/zlib_dfltcc/dfltcc_deflate.c
create mode 100644 lib/zlib_dfltcc/dfltcc_inflate.c
create mode 100644 lib/zlib_dfltcc/dfltcc_syms.c
create mode 100644 lib/zlib_dfltcc/dfltcc_util.h

--
2.17.1


2019-12-09 15:31:55

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v2 2/6] s390/boot: Rename HEAP_SIZE due to name collision

Change the conflicting macro name in preparation for zlib_inflate
hardware support.

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
arch/s390/boot/compressed/decompressor.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/s390/boot/compressed/decompressor.c b/arch/s390/boot/compressed/decompressor.c
index 45046630c56a..368fd372c875 100644
--- a/arch/s390/boot/compressed/decompressor.c
+++ b/arch/s390/boot/compressed/decompressor.c
@@ -30,13 +30,13 @@ extern unsigned char _compressed_start[];
extern unsigned char _compressed_end[];

#ifdef CONFIG_HAVE_KERNEL_BZIP2
-#define HEAP_SIZE 0x400000
+#define BOOT_HEAP_SIZE 0x400000
#else
-#define HEAP_SIZE 0x10000
+#define BOOT_HEAP_SIZE 0x10000
#endif

static unsigned long free_mem_ptr = (unsigned long) _end;
-static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE;
+static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE;

#ifdef CONFIG_KERNEL_GZIP
#include "../../../../lib/decompress_inflate.c"
@@ -62,7 +62,7 @@ static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE;
#include "../../../../lib/decompress_unxz.c"
#endif

-#define decompress_offset ALIGN((unsigned long)_end + HEAP_SIZE, PAGE_SIZE)
+#define decompress_offset ALIGN((unsigned long)_end + BOOT_HEAP_SIZE, PAGE_SIZE)

unsigned long mem_safe_offset(void)
{
--
2.17.1

2019-12-09 15:32:03

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v2 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

Due to the small size of zlib buffer (1 page) set in btrfs code, s390
hardware compression is rather limited in terms of performance. Increasing
the buffer size to 4 pages when s390 zlib hardware support is enabled
would bring significant benefit to btrfs zlib (up to 60% better performance
compared to the PAGE_SIZE buffer). In case of memory pressure we fall back
to a single page buffer during workspace allocation.

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
fs/btrfs/compression.c | 2 +-
fs/btrfs/zlib.c | 118 +++++++++++++++++++++++++++--------------
2 files changed, 80 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b05b361e2062..f789b356fd8b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1158,7 +1158,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
/* copy bytes from the working buffer into the pages */
while (working_bytes > 0) {
bytes = min_t(unsigned long, bvec.bv_len,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, working_bytes);

kaddr = kmap_atomic(bvec.bv_page);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index df1aace5df50..0bc0d57ba233 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -20,9 +20,12 @@
#include <linux/refcount.h>
#include "compression.h"

+#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
+
struct workspace {
z_stream strm;
char *buf;
+ unsigned long buf_size;
struct list_head list;
int level;
};
@@ -76,7 +79,17 @@ static struct list_head *zlib_alloc_workspace(unsigned int level)
zlib_inflate_workspacesize());
workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
workspace->level = level;
- workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf = NULL;
+ if (zlib_deflate_dfltcc_enabled()) {
+ workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
+ __GFP_NOMEMALLOC | __GFP_NORETRY |
+ __GFP_NOWARN | GFP_NOIO);
+ workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
+ }
+ if (!workspace->buf) {
+ workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf_size = PAGE_SIZE;
+ }
if (!workspace->strm.workspace || !workspace->buf)
goto fail;

@@ -97,6 +110,7 @@ static int zlib_compress_pages(struct list_head *ws,
unsigned long *total_out)
{
struct workspace *workspace = list_entry(ws, struct workspace, list);
+ int i;
int ret;
char *data_in;
char *cpage_out;
@@ -104,6 +118,7 @@ static int zlib_compress_pages(struct list_head *ws,
struct page *in_page = NULL;
struct page *out_page = NULL;
unsigned long bytes_left;
+ unsigned long in_buf_pages;
unsigned long len = *total_out;
unsigned long nr_dest_pages = *out_pages;
const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
@@ -121,9 +136,6 @@ static int zlib_compress_pages(struct list_head *ws,
workspace->strm.total_in = 0;
workspace->strm.total_out = 0;

- in_page = find_get_page(mapping, start >> PAGE_SHIFT);
- data_in = kmap(in_page);
-
out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
if (out_page == NULL) {
ret = -ENOMEM;
@@ -133,12 +145,34 @@ static int zlib_compress_pages(struct list_head *ws,
pages[0] = out_page;
nr_pages = 1;

- workspace->strm.next_in = data_in;
+ workspace->strm.next_in = workspace->buf;
+ workspace->strm.avail_in = 0;
workspace->strm.next_out = cpage_out;
workspace->strm.avail_out = PAGE_SIZE;
- workspace->strm.avail_in = min(len, PAGE_SIZE);

while (workspace->strm.total_in < len) {
+ /* get next set of pages and copy their contents to
+ * the input buffer for the following deflate call
+ */
+ if (workspace->strm.avail_in == 0) {
+ bytes_left = len - workspace->strm.total_in;
+ in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+ workspace->buf_size / PAGE_SIZE);
+ for (i = 0; i < in_buf_pages; i++) {
+ in_page = find_get_page(mapping,
+ start >> PAGE_SHIFT);
+ data_in = kmap(in_page);
+ memcpy(workspace->buf + i*PAGE_SIZE, data_in,
+ PAGE_SIZE);
+ kunmap(in_page);
+ put_page(in_page);
+ start += PAGE_SIZE;
+ }
+ workspace->strm.avail_in = min(bytes_left,
+ workspace->buf_size);
+ workspace->strm.next_in = workspace->buf;
+ }
+
ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
if (ret != Z_OK) {
pr_debug("BTRFS: deflate in loop returned %d\n",
@@ -155,6 +189,7 @@ static int zlib_compress_pages(struct list_head *ws,
ret = -E2BIG;
goto out;
}
+
/* we need another page for writing out. Test this
* before the total_in so we will pull in a new page for
* the stream end if required
@@ -180,33 +215,42 @@ static int zlib_compress_pages(struct list_head *ws,
/* we're all done */
if (workspace->strm.total_in >= len)
break;
-
- /* we've read in a full page, get a new one */
- if (workspace->strm.avail_in == 0) {
- if (workspace->strm.total_out > max_out)
- break;
-
- bytes_left = len - workspace->strm.total_in;
- kunmap(in_page);
- put_page(in_page);
-
- start += PAGE_SIZE;
- in_page = find_get_page(mapping,
- start >> PAGE_SHIFT);
- data_in = kmap(in_page);
- workspace->strm.avail_in = min(bytes_left,
- PAGE_SIZE);
- workspace->strm.next_in = data_in;
- }
+ if (workspace->strm.total_out > max_out)
+ break;
}
workspace->strm.avail_in = 0;
- ret = zlib_deflate(&workspace->strm, Z_FINISH);
- zlib_deflateEnd(&workspace->strm);
-
- if (ret != Z_STREAM_END) {
- ret = -EIO;
- goto out;
+ /* call deflate with Z_FINISH flush parameter providing more output
+ * space but no more input data, until it returns with Z_STREAM_END
+ */
+ while (ret != Z_STREAM_END) {
+ ret = zlib_deflate(&workspace->strm, Z_FINISH);
+ if (ret == Z_STREAM_END)
+ break;
+ if (ret != Z_OK && ret != Z_BUF_ERROR) {
+ zlib_deflateEnd(&workspace->strm);
+ ret = -EIO;
+ goto out;
+ } else if (workspace->strm.avail_out == 0) {
+ /* get another page for the stream end */
+ kunmap(out_page);
+ if (nr_pages == nr_dest_pages) {
+ out_page = NULL;
+ ret = -E2BIG;
+ goto out;
+ }
+ out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+ if (out_page == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ cpage_out = kmap(out_page);
+ pages[nr_pages] = out_page;
+ nr_pages++;
+ workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.next_out = cpage_out;
+ }
}
+ zlib_deflateEnd(&workspace->strm);

if (workspace->strm.total_out >= workspace->strm.total_in) {
ret = -E2BIG;
@@ -221,10 +265,6 @@ static int zlib_compress_pages(struct list_head *ws,
if (out_page)
kunmap(out_page);

- if (in_page) {
- kunmap(in_page);
- put_page(in_page);
- }
return ret;
}

@@ -250,7 +290,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)

workspace->strm.total_out = 0;
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -289,7 +329,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

if (workspace->strm.avail_in == 0) {
unsigned long tmp;
@@ -340,7 +380,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
workspace->strm.total_in = 0;

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
workspace->strm.total_out = 0;
/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -384,7 +424,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
buf_offset = 0;

bytes = min(PAGE_SIZE - pg_offset,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, bytes_left);

kaddr = kmap_atomic(dest_page);
@@ -395,7 +435,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
bytes_left -= bytes;
next:
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
}

if (ret != Z_STREAM_END && bytes_left != 0)
--
2.17.1

2019-12-09 15:32:45

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v2 5/6] lib/zlib: Add zlib_deflate_dfltcc_enabled() function

Add a new function to zlib.h checking if s390 Deflate-Conversion facility
is installed and enabled.

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
include/linux/zlib.h | 6 ++++++
lib/zlib_deflate/deflate.c | 6 ++++++
lib/zlib_deflate/deflate_syms.c | 1 +
lib/zlib_dfltcc/dfltcc.h | 3 +++
4 files changed, 16 insertions(+)

diff --git a/include/linux/zlib.h b/include/linux/zlib.h
index 92dbbd3f6c75..c757d848a758 100644
--- a/include/linux/zlib.h
+++ b/include/linux/zlib.h
@@ -191,6 +191,12 @@ extern int zlib_deflate_workspacesize (int windowBits, int memLevel);
exceed those passed here.
*/

+extern int zlib_deflate_dfltcc_enabled (void);
+/*
+ Returns 1 if Deflate-Conversion facility is installed and enabled,
+ otherwise 0.
+*/
+
/*
extern int deflateInit (z_streamp strm, int level);

diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c
index 9595b32b944a..52177d8527f9 100644
--- a/lib/zlib_deflate/deflate.c
+++ b/lib/zlib_deflate/deflate.c
@@ -59,6 +59,7 @@
#define DEFLATE_RESET_HOOK(strm) do {} while (0)
#define DEFLATE_HOOK(strm, flush, bstate) 0
#define DEFLATE_NEED_CHECKSUM(strm) 1
+#define DEFLATE_DFLTCC_ENABLED() 0
#endif

/* ===========================================================================
@@ -1138,3 +1139,8 @@ int zlib_deflate_workspacesize(int windowBits, int memLevel)
+ zlib_deflate_head_memsize(memLevel)
+ zlib_deflate_overlay_memsize(memLevel);
}
+
+int zlib_deflate_dfltcc_enabled(void)
+{
+ return DEFLATE_DFLTCC_ENABLED();
+}
diff --git a/lib/zlib_deflate/deflate_syms.c b/lib/zlib_deflate/deflate_syms.c
index 72fe4b73be53..24b740b99678 100644
--- a/lib/zlib_deflate/deflate_syms.c
+++ b/lib/zlib_deflate/deflate_syms.c
@@ -12,6 +12,7 @@
#include <linux/zlib.h>

EXPORT_SYMBOL(zlib_deflate_workspacesize);
+EXPORT_SYMBOL(zlib_deflate_dfltcc_enabled);
EXPORT_SYMBOL(zlib_deflate);
EXPORT_SYMBOL(zlib_deflateInit2);
EXPORT_SYMBOL(zlib_deflateEnd);
diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
index be70c807b62f..1bd9709416fb 100644
--- a/lib/zlib_dfltcc/dfltcc.h
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -3,6 +3,7 @@
#define DFLTCC_H

#include "../zlib_deflate/defutil.h"
+#include "dfltcc_util.h"

/*
* Tuning parameters.
@@ -121,6 +122,8 @@ dfltcc_inflate_action dfltcc_inflate(z_streamp strm,

#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))

+#define DEFLATE_DFLTCC_ENABLED() is_dfltcc_enabled()
+
#define INFLATE_RESET_HOOK(strm) \
dfltcc_reset((strm), sizeof(struct inflate_state))

--
2.17.1

2019-12-13 16:11:31

by Zaslonko Mikhail

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

Hello,

Could you please review the patch for btrfs below.

Apart from falling back to 1 page, I have set the condition to allocate
4-pages zlib workspace buffer only if s390 Deflate-Conversion facility
is installed and enabled. Thus, it will take effect on s390 architecture
only.

Currently in zlib_compress_pages() I always copy input pages to the workspace
buffer prior to zlib_deflate call. Would that make sense, to pass the page
itself, as before, based on the workspace buf_size (for 1-page buffer)?

As for calling zlib_deflate with Z_FINISH flush parameter in a loop until
Z_STREAM_END is returned, that comes in agreement with the zlib manual.

Please see for more details:
https://lkml.org/lkml/2019/12/9/537

Thanks,
Mikhail

On 09.12.2019 16:29, Mikhail Zaslonko wrote:
> Due to the small size of zlib buffer (1 page) set in btrfs code, s390
> hardware compression is rather limited in terms of performance. Increasing
> the buffer size to 4 pages when s390 zlib hardware support is enabled
> would bring significant benefit to btrfs zlib (up to 60% better performance
> compared to the PAGE_SIZE buffer). In case of memory pressure we fall back
> to a single page buffer during workspace allocation.
>
> Signed-off-by: Mikhail Zaslonko <[email protected]>
> ---
> fs/btrfs/compression.c | 2 +-
> fs/btrfs/zlib.c | 118 +++++++++++++++++++++++++++--------------
> 2 files changed, 80 insertions(+), 40 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index b05b361e2062..f789b356fd8b 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1158,7 +1158,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> /* copy bytes from the working buffer into the pages */
> while (working_bytes > 0) {
> bytes = min_t(unsigned long, bvec.bv_len,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, working_bytes);
>
> kaddr = kmap_atomic(bvec.bv_page);
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index df1aace5df50..0bc0d57ba233 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -20,9 +20,12 @@
> #include <linux/refcount.h>
> #include "compression.h"
>
> +#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
> +
> struct workspace {
> z_stream strm;
> char *buf;
> + unsigned long buf_size;
> struct list_head list;
> int level;
> };
> @@ -76,7 +79,17 @@ static struct list_head *zlib_alloc_workspace(unsigned int level)
> zlib_inflate_workspacesize());
> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
> workspace->level = level;
> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf = NULL;
> + if (zlib_deflate_dfltcc_enabled()) {
> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> + __GFP_NOMEMALLOC | __GFP_NORETRY |
> + __GFP_NOWARN | GFP_NOIO);
> + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
> + }
> + if (!workspace->buf) {
> + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf_size = PAGE_SIZE;
> + }
> if (!workspace->strm.workspace || !workspace->buf)
> goto fail;
>
> @@ -97,6 +110,7 @@ static int zlib_compress_pages(struct list_head *ws,
> unsigned long *total_out)
> {
> struct workspace *workspace = list_entry(ws, struct workspace, list);
> + int i;
> int ret;
> char *data_in;
> char *cpage_out;
> @@ -104,6 +118,7 @@ static int zlib_compress_pages(struct list_head *ws,
> struct page *in_page = NULL;
> struct page *out_page = NULL;
> unsigned long bytes_left;
> + unsigned long in_buf_pages;
> unsigned long len = *total_out;
> unsigned long nr_dest_pages = *out_pages;
> const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
> @@ -121,9 +136,6 @@ static int zlib_compress_pages(struct list_head *ws,
> workspace->strm.total_in = 0;
> workspace->strm.total_out = 0;
>
> - in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> -
> out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> if (out_page == NULL) {
> ret = -ENOMEM;
> @@ -133,12 +145,34 @@ static int zlib_compress_pages(struct list_head *ws,
> pages[0] = out_page;
> nr_pages = 1;
>
> - workspace->strm.next_in = data_in;
> + workspace->strm.next_in = workspace->buf;
> + workspace->strm.avail_in = 0;
> workspace->strm.next_out = cpage_out;
> workspace->strm.avail_out = PAGE_SIZE;
> - workspace->strm.avail_in = min(len, PAGE_SIZE);
>
> while (workspace->strm.total_in < len) {
> + /* get next set of pages and copy their contents to
> + * the input buffer for the following deflate call
> + */
> + if (workspace->strm.avail_in == 0) {
> + bytes_left = len - workspace->strm.total_in;
> + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> + workspace->buf_size / PAGE_SIZE);
> + for (i = 0; i < in_buf_pages; i++) {
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + memcpy(workspace->buf + i*PAGE_SIZE, data_in,
> + PAGE_SIZE);
> + kunmap(in_page);
> + put_page(in_page);
> + start += PAGE_SIZE;
> + }
> + workspace->strm.avail_in = min(bytes_left,
> + workspace->buf_size);
> + workspace->strm.next_in = workspace->buf;
> + }
> +
> ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
> if (ret != Z_OK) {
> pr_debug("BTRFS: deflate in loop returned %d\n",
> @@ -155,6 +189,7 @@ static int zlib_compress_pages(struct list_head *ws,
> ret = -E2BIG;
> goto out;
> }
> +
> /* we need another page for writing out. Test this
> * before the total_in so we will pull in a new page for
> * the stream end if required
> @@ -180,33 +215,42 @@ static int zlib_compress_pages(struct list_head *ws,
> /* we're all done */
> if (workspace->strm.total_in >= len)
> break;
> -
> - /* we've read in a full page, get a new one */
> - if (workspace->strm.avail_in == 0) {
> - if (workspace->strm.total_out > max_out)
> - break;
> -
> - bytes_left = len - workspace->strm.total_in;
> - kunmap(in_page);
> - put_page(in_page);
> -
> - start += PAGE_SIZE;
> - in_page = find_get_page(mapping,
> - start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> - workspace->strm.avail_in = min(bytes_left,
> - PAGE_SIZE);
> - workspace->strm.next_in = data_in;
> - }
> + if (workspace->strm.total_out > max_out)
> + break;
> }
> workspace->strm.avail_in = 0;
> - ret = zlib_deflate(&workspace->strm, Z_FINISH);
> - zlib_deflateEnd(&workspace->strm);
> -
> - if (ret != Z_STREAM_END) {
> - ret = -EIO;
> - goto out;
> + /* call deflate with Z_FINISH flush parameter providing more output
> + * space but no more input data, until it returns with Z_STREAM_END
> + */
> + while (ret != Z_STREAM_END) {
> + ret = zlib_deflate(&workspace->strm, Z_FINISH);
> + if (ret == Z_STREAM_END)
> + break;
> + if (ret != Z_OK && ret != Z_BUF_ERROR) {
> + zlib_deflateEnd(&workspace->strm);
> + ret = -EIO;
> + goto out;
> + } else if (workspace->strm.avail_out == 0) {
> + /* get another page for the stream end */
> + kunmap(out_page);
> + if (nr_pages == nr_dest_pages) {
> + out_page = NULL;
> + ret = -E2BIG;
> + goto out;
> + }
> + out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> + if (out_page == NULL) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + cpage_out = kmap(out_page);
> + pages[nr_pages] = out_page;
> + nr_pages++;
> + workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.next_out = cpage_out;
> + }
> }
> + zlib_deflateEnd(&workspace->strm);
>
> if (workspace->strm.total_out >= workspace->strm.total_in) {
> ret = -E2BIG;
> @@ -221,10 +265,6 @@ static int zlib_compress_pages(struct list_head *ws,
> if (out_page)
> kunmap(out_page);
>
> - if (in_page) {
> - kunmap(in_page);
> - put_page(in_page);
> - }
> return ret;
> }
>
> @@ -250,7 +290,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>
> workspace->strm.total_out = 0;
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
>
> /* If it's deflate, and it's got no preset dictionary, then
> we can tell zlib to skip the adler32 check. */
> @@ -289,7 +329,7 @@ static int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
> }
>
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
>
> if (workspace->strm.avail_in == 0) {
> unsigned long tmp;
> @@ -340,7 +380,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> workspace->strm.total_in = 0;
>
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
> workspace->strm.total_out = 0;
> /* If it's deflate, and it's got no preset dictionary, then
> we can tell zlib to skip the adler32 check. */
> @@ -384,7 +424,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> buf_offset = 0;
>
> bytes = min(PAGE_SIZE - pg_offset,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, bytes_left);
>
> kaddr = kmap_atomic(dest_page);
> @@ -395,7 +435,7 @@ static int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> bytes_left -= bytes;
> next:
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
> }
>
> if (ret != Z_STREAM_END && bytes_left != 0)
>

2019-12-13 17:36:25

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

On Fri, Dec 13, 2019 at 05:10:10PM +0100, Zaslonko Mikhail wrote:
> Hello,
>
> Could you please review the patch for btrfs below.
>
> Apart from falling back to 1 page, I have set the condition to allocate
> 4-pages zlib workspace buffer only if s390 Deflate-Conversion facility
> is installed and enabled. Thus, it will take effect on s390 architecture
> only.
>
> Currently in zlib_compress_pages() I always copy input pages to the workspace
> buffer prior to zlib_deflate call. Would that make sense, to pass the page
> itself, as before, based on the workspace buf_size (for 1-page buffer)?

Doesn't the copy back and forth kill the improvements brought by the
hw supported decompression?

> As for calling zlib_deflate with Z_FINISH flush parameter in a loop until
> Z_STREAM_END is returned, that comes in agreement with the zlib manual.

The concerns are about zlib stream that take 4 pages on input and on the
decompression side only 1 page is available for the output. Ie. as if
the filesystem was created on s390 with dflcc then opened on x86 host.
The zlib_deflate(Z_FINISH) happens on the compresission side.