2020-01-03 22:35:35

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v3 0/6] S390 hardware support for kernel zlib

With IBM z15 mainframe the new DFLTCC instruction is available. It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patch-set adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:
https://github.com/madler/zlib/pull/410
The coding style is also preserved for future maintainability.
There is only limited set of userspace zlib functions represented in
kernel. Apart from that, all the memory allocation should be performed
in advance. Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.
Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).
Like it was implemented for userspace, kernel zlib will compress in
hardware on level 1, and in software on all other levels. Decompression
will always happen in hardware (when enabled).
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired. However it does always produce
the standard conform output which can be inflated anyway.
The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel zlib
is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled. Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch 6)
the following performance results have been achieved using the ramdisk
with btrfs. These are relative numbers based on throughput rate and
compression ratio for zlib level 1:

Input data Deflate rate Inflate rate Compression ratio
NXU/Software NXU/Software NXU/Software
stream of zeroes 1.46 1.02 1.00
random ASCII data 10.44 3.00 0.96
ASCII text (dickens) 6,21 3.33 0.94
binary data (vmlinux) 8,37 3.90 1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers to
all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in native
LPAR on a z15 system. Results may vary based on individual workload,
configuration and software levels.

Changelog:
v2 -> v3:
- In btrfs zlib_compress_pages() do not copy input pages to the workspace
buffer if no s390 hardware compression is enabled or the remaining
input data fits in one page.
- Leave just a single Kconfig option for s390 zlib hardware
support (CONFIG_ZLIB_DFLTCC)
- Fix the issues reported by kbuild test robot.
v1 -> v2:
- Added new external zlib function to check if s390 Deflate-Conversion
facility is installed and enabled (see patch 5).
- The larger buffer for btrfs zlib workspace is allocated only if
s390 hardware compression is enabled. In case of failure to allocate
4-page buffer, we fall back to a PAGE_SIZE buffer, as proposed
by Josef Bacik (see patch 6).



Mikhail Zaslonko (6):
lib/zlib: Add s390 hardware support for kernel zlib_deflate
s390/boot: Rename HEAP_SIZE due to name collision
lib/zlib: Add s390 hardware support for kernel zlib_inflate
s390/boot: Add dfltcc= kernel command line parameter
lib/zlib: Add zlib_deflate_dfltcc_enabled() function
btrfs: Use larger zlib buffer for s390 hardware compression

.../admin-guide/kernel-parameters.txt | 12 +
arch/s390/boot/compressed/decompressor.c | 8 +-
arch/s390/boot/ipl_parm.c | 14 +
arch/s390/include/asm/setup.h | 7 +
arch/s390/kernel/setup.c | 2 +
fs/btrfs/compression.c | 2 +-
fs/btrfs/zlib.c | 128 +++++---
include/linux/zlib.h | 6 +
lib/Kconfig | 7 +
lib/Makefile | 1 +
lib/decompress_inflate.c | 13 +
lib/zlib_deflate/deflate.c | 85 +++---
lib/zlib_deflate/deflate_syms.c | 1 +
lib/zlib_deflate/deftree.c | 54 ----
lib/zlib_deflate/defutil.h | 134 ++++++++-
lib/zlib_dfltcc/Makefile | 11 +
lib/zlib_dfltcc/dfltcc.c | 55 ++++
lib/zlib_dfltcc/dfltcc.h | 155 ++++++++++
lib/zlib_dfltcc/dfltcc_deflate.c | 280 ++++++++++++++++++
lib/zlib_dfltcc/dfltcc_inflate.c | 149 ++++++++++
lib/zlib_dfltcc/dfltcc_syms.c | 17 ++
lib/zlib_dfltcc/dfltcc_util.h | 103 +++++++
lib/zlib_inflate/inflate.c | 32 +-
lib/zlib_inflate/inflate.h | 8 +
lib/zlib_inflate/infutil.h | 18 +-
25 files changed, 1150 insertions(+), 152 deletions(-)
create mode 100644 lib/zlib_dfltcc/Makefile
create mode 100644 lib/zlib_dfltcc/dfltcc.c
create mode 100644 lib/zlib_dfltcc/dfltcc.h
create mode 100644 lib/zlib_dfltcc/dfltcc_deflate.c
create mode 100644 lib/zlib_dfltcc/dfltcc_inflate.c
create mode 100644 lib/zlib_dfltcc/dfltcc_syms.c
create mode 100644 lib/zlib_dfltcc/dfltcc_util.h

--
2.17.1


2020-01-03 22:35:35

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

In order to benefit from s390 zlib hardware compression support,
increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
s390 zlib hardware support is enabled on the machine). This brings up
to 60% better performance in hardware on s390 compared to the PAGE_SIZE
buffer and much more compared to the software zlib processing in btrfs.
In case of memory pressure fall back to a single page buffer during
workspace allocation.

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
fs/btrfs/compression.c | 2 +-
fs/btrfs/zlib.c | 128 ++++++++++++++++++++++++++++++-----------
2 files changed, 94 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ee834ef7beb4..6bd0e75a822c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
/* copy bytes from the working buffer into the pages */
while (working_bytes > 0) {
bytes = min_t(unsigned long, bvec.bv_len,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, working_bytes);

kaddr = kmap_atomic(bvec.bv_page);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index a6c90a003c12..159486801631 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -20,9 +20,12 @@
#include <linux/refcount.h>
#include "compression.h"

+#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
+
struct workspace {
z_stream strm;
char *buf;
+ unsigned long buf_size;
struct list_head list;
int level;
};
@@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
zlib_inflate_workspacesize());
workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
workspace->level = level;
- workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf = NULL;
+ if (zlib_deflate_dfltcc_enabled()) {
+ workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
+ __GFP_NOMEMALLOC | __GFP_NORETRY |
+ __GFP_NOWARN | GFP_NOIO);
+ workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
+ }
+ if (!workspace->buf) {
+ workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf_size = PAGE_SIZE;
+ }
if (!workspace->strm.workspace || !workspace->buf)
goto fail;

@@ -78,6 +91,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
unsigned long *total_in, unsigned long *total_out)
{
struct workspace *workspace = list_entry(ws, struct workspace, list);
+ int i;
int ret;
char *data_in;
char *cpage_out;
@@ -85,6 +99,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
struct page *in_page = NULL;
struct page *out_page = NULL;
unsigned long bytes_left;
+ unsigned long in_buf_pages;
unsigned long len = *total_out;
unsigned long nr_dest_pages = *out_pages;
const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
@@ -102,9 +117,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
workspace->strm.total_in = 0;
workspace->strm.total_out = 0;

- in_page = find_get_page(mapping, start >> PAGE_SHIFT);
- data_in = kmap(in_page);
-
out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
if (out_page == NULL) {
ret = -ENOMEM;
@@ -114,12 +126,48 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
pages[0] = out_page;
nr_pages = 1;

- workspace->strm.next_in = data_in;
+ workspace->strm.next_in = workspace->buf;
+ workspace->strm.avail_in = 0;
workspace->strm.next_out = cpage_out;
workspace->strm.avail_out = PAGE_SIZE;
- workspace->strm.avail_in = min(len, PAGE_SIZE);

while (workspace->strm.total_in < len) {
+ /* get next input pages and copy the contents to
+ * the workspace buffer if required
+ */
+ if (workspace->strm.avail_in == 0) {
+ bytes_left = len - workspace->strm.total_in;
+ in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+ workspace->buf_size / PAGE_SIZE);
+ if (in_buf_pages > 1) {
+ for (i = 0; i < in_buf_pages; i++) {
+ if (in_page) {
+ kunmap(in_page);
+ put_page(in_page);
+ }
+ in_page = find_get_page(mapping,
+ start >> PAGE_SHIFT);
+ data_in = kmap(in_page);
+ memcpy(workspace->buf + i*PAGE_SIZE,
+ data_in, PAGE_SIZE);
+ start += PAGE_SIZE;
+ }
+ workspace->strm.next_in = workspace->buf;
+ } else {
+ if (in_page) {
+ kunmap(in_page);
+ put_page(in_page);
+ }
+ in_page = find_get_page(mapping,
+ start >> PAGE_SHIFT);
+ data_in = kmap(in_page);
+ start += PAGE_SIZE;
+ workspace->strm.next_in = data_in;
+ }
+ workspace->strm.avail_in = min(bytes_left,
+ workspace->buf_size);
+ }
+
ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
if (ret != Z_OK) {
pr_debug("BTRFS: deflate in loop returned %d\n",
@@ -136,6 +184,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
ret = -E2BIG;
goto out;
}
+
/* we need another page for writing out. Test this
* before the total_in so we will pull in a new page for
* the stream end if required
@@ -161,33 +210,42 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
/* we're all done */
if (workspace->strm.total_in >= len)
break;
-
- /* we've read in a full page, get a new one */
- if (workspace->strm.avail_in == 0) {
- if (workspace->strm.total_out > max_out)
- break;
-
- bytes_left = len - workspace->strm.total_in;
- kunmap(in_page);
- put_page(in_page);
-
- start += PAGE_SIZE;
- in_page = find_get_page(mapping,
- start >> PAGE_SHIFT);
- data_in = kmap(in_page);
- workspace->strm.avail_in = min(bytes_left,
- PAGE_SIZE);
- workspace->strm.next_in = data_in;
- }
+ if (workspace->strm.total_out > max_out)
+ break;
}
workspace->strm.avail_in = 0;
- ret = zlib_deflate(&workspace->strm, Z_FINISH);
- zlib_deflateEnd(&workspace->strm);
-
- if (ret != Z_STREAM_END) {
- ret = -EIO;
- goto out;
+ /* call deflate with Z_FINISH flush parameter providing more output
+ * space but no more input data, until it returns with Z_STREAM_END
+ */
+ while (ret != Z_STREAM_END) {
+ ret = zlib_deflate(&workspace->strm, Z_FINISH);
+ if (ret == Z_STREAM_END)
+ break;
+ if (ret != Z_OK && ret != Z_BUF_ERROR) {
+ zlib_deflateEnd(&workspace->strm);
+ ret = -EIO;
+ goto out;
+ } else if (workspace->strm.avail_out == 0) {
+ /* get another page for the stream end */
+ kunmap(out_page);
+ if (nr_pages == nr_dest_pages) {
+ out_page = NULL;
+ ret = -E2BIG;
+ goto out;
+ }
+ out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+ if (out_page == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ cpage_out = kmap(out_page);
+ pages[nr_pages] = out_page;
+ nr_pages++;
+ workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.next_out = cpage_out;
+ }
}
+ zlib_deflateEnd(&workspace->strm);

if (workspace->strm.total_out >= workspace->strm.total_in) {
ret = -E2BIG;
@@ -231,7 +289,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)

workspace->strm.total_out = 0;
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -270,7 +328,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

if (workspace->strm.avail_in == 0) {
unsigned long tmp;
@@ -320,7 +378,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
workspace->strm.total_in = 0;

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
workspace->strm.total_out = 0;
/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -364,7 +422,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
buf_offset = 0;

bytes = min(PAGE_SIZE - pg_offset,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, bytes_left);

kaddr = kmap_atomic(dest_page);
@@ -375,7 +433,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
bytes_left -= bytes;
next:
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
}

if (ret != Z_STREAM_END && bytes_left != 0)
--
2.17.1

2020-01-03 22:35:48

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v3 4/6] s390/boot: Add dfltcc= kernel command line parameter

Add the new kernel command line parameter 'dfltcc=' to configure
s390 zlib hardware support.
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++++
arch/s390/boot/ipl_parm.c | 14 ++++++++++++++
arch/s390/include/asm/setup.h | 7 +++++++
arch/s390/kernel/setup.c | 2 ++
lib/zlib_dfltcc/dfltcc.c | 5 ++++-
lib/zlib_dfltcc/dfltcc.h | 1 +
lib/zlib_dfltcc/dfltcc_deflate.c | 6 ++++++
lib/zlib_dfltcc/dfltcc_inflate.c | 6 ++++++
lib/zlib_dfltcc/dfltcc_util.h | 4 +++-
9 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..08fcbaa69877 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -834,6 +834,18 @@
dump out devices still on the deferred probe list after
retrying.

+ dfltcc= [HW,S390]
+ Format: { on | off | def_only | inf_only | always }
+ on: s390 zlib hardware support for compression on
+ level 1 and decompression (default)
+ off: No s390 zlib hardware support
+ def_only: s390 zlib hardware support for deflate
+ only (compression on level 1)
+ inf_only: s390 zlib hardware support for inflate
+ only (decompression)
+ always: Same as 'on' but ignores the selected compression
+ level always using hardware support (used for debugging)
+
dhash_entries= [KNL]
Set number of hash buckets for dentry cache.

diff --git a/arch/s390/boot/ipl_parm.c b/arch/s390/boot/ipl_parm.c
index 24ef67eb1cef..357adad991d2 100644
--- a/arch/s390/boot/ipl_parm.c
+++ b/arch/s390/boot/ipl_parm.c
@@ -14,6 +14,7 @@
char __bootdata(early_command_line)[COMMAND_LINE_SIZE];
struct ipl_parameter_block __bootdata_preserved(ipl_block);
int __bootdata_preserved(ipl_block_valid);
+unsigned int __bootdata_preserved(zlib_dfltcc_support) = ZLIB_DFLTCC_FULL;

unsigned long __bootdata(vmalloc_size) = VMALLOC_DEFAULT_SIZE;
unsigned long __bootdata(memory_end);
@@ -229,6 +230,19 @@ void parse_boot_command_line(void)
if (!strcmp(param, "vmalloc") && val)
vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE);

+ if (!strcmp(param, "dfltcc")) {
+ if (!strcmp(val, "off"))
+ zlib_dfltcc_support = ZLIB_DFLTCC_DISABLED;
+ else if (!strcmp(val, "on"))
+ zlib_dfltcc_support = ZLIB_DFLTCC_FULL;
+ else if (!strcmp(val, "def_only"))
+ zlib_dfltcc_support = ZLIB_DFLTCC_DEFLATE_ONLY;
+ else if (!strcmp(val, "inf_only"))
+ zlib_dfltcc_support = ZLIB_DFLTCC_INFLATE_ONLY;
+ else if (!strcmp(val, "always"))
+ zlib_dfltcc_support = ZLIB_DFLTCC_FULL_DEBUG;
+ }
+
if (!strcmp(param, "noexec")) {
rc = kstrtobool(val, &enabled);
if (!rc && !enabled)
diff --git a/arch/s390/include/asm/setup.h b/arch/s390/include/asm/setup.h
index 69289e99cabd..b241ddb67caf 100644
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -79,6 +79,13 @@ struct parmarea {
char command_line[ARCH_COMMAND_LINE_SIZE]; /* 0x10480 */
};

+extern unsigned int zlib_dfltcc_support;
+#define ZLIB_DFLTCC_DISABLED 0
+#define ZLIB_DFLTCC_FULL 1
+#define ZLIB_DFLTCC_DEFLATE_ONLY 2
+#define ZLIB_DFLTCC_INFLATE_ONLY 3
+#define ZLIB_DFLTCC_FULL_DEBUG 4
+
extern int noexec_disabled;
extern int memory_end_set;
extern unsigned long memory_end;
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 9cbf490fd162..a577d7df379c 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -111,6 +111,8 @@ unsigned long __bootdata_preserved(__etext_dma);
unsigned long __bootdata_preserved(__sdma);
unsigned long __bootdata_preserved(__edma);
unsigned long __bootdata_preserved(__kaslr_offset);
+unsigned int __bootdata_preserved(zlib_dfltcc_support);
+EXPORT_SYMBOL(zlib_dfltcc_support);

unsigned long VMALLOC_START;
EXPORT_SYMBOL(VMALLOC_START);
diff --git a/lib/zlib_dfltcc/dfltcc.c b/lib/zlib_dfltcc/dfltcc.c
index 7f77e5bb01c6..c30de430b30c 100644
--- a/lib/zlib_dfltcc/dfltcc.c
+++ b/lib/zlib_dfltcc/dfltcc.c
@@ -44,7 +44,10 @@ void dfltcc_reset(
dfltcc_state->param.nt = 1;

/* Initialize tuning parameters */
- dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
+ if (zlib_dfltcc_support == ZLIB_DFLTCC_FULL_DEBUG)
+ dfltcc_state->level_mask = DFLTCC_LEVEL_MASK_DEBUG;
+ else
+ dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
diff --git a/lib/zlib_dfltcc/dfltcc.h b/lib/zlib_dfltcc/dfltcc.h
index 4782c92bb2ff..be70c807b62f 100644
--- a/lib/zlib_dfltcc/dfltcc.h
+++ b/lib/zlib_dfltcc/dfltcc.h
@@ -8,6 +8,7 @@
* Tuning parameters.
*/
#define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */
+#define DFLTCC_LEVEL_MASK_DEBUG 0x3fe /* DFLTCC compression for all levels */
#define DFLTCC_BLOCK_SIZE 1048576
#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
diff --git a/lib/zlib_dfltcc/dfltcc_deflate.c b/lib/zlib_dfltcc/dfltcc_deflate.c
index b9d16917c30a..aa4539f5c071 100644
--- a/lib/zlib_dfltcc/dfltcc_deflate.c
+++ b/lib/zlib_dfltcc/dfltcc_deflate.c
@@ -3,6 +3,7 @@
#include "../zlib_deflate/defutil.h"
#include "dfltcc_util.h"
#include "dfltcc.h"
+#include <asm/setup.h>
#include <linux/zutil.h>

/*
@@ -15,6 +16,11 @@ int dfltcc_can_deflate(
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);

+ /* Check for kernel dfltcc command line parameter */
+ if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+ zlib_dfltcc_support == ZLIB_DFLTCC_INFLATE_ONLY)
+ return 0;
+
/* Unsupported compression settings */
if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy,
dfltcc_state->level_mask))
diff --git a/lib/zlib_dfltcc/dfltcc_inflate.c b/lib/zlib_dfltcc/dfltcc_inflate.c
index 12a93a06bd61..aa9ef23474df 100644
--- a/lib/zlib_dfltcc/dfltcc_inflate.c
+++ b/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -3,6 +3,7 @@
#include "../zlib_inflate/inflate.h"
#include "dfltcc_util.h"
#include "dfltcc.h"
+#include <asm/setup.h>
#include <linux/zutil.h>

/*
@@ -15,6 +16,11 @@ int dfltcc_can_inflate(
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);

+ /* Check for kernel dfltcc command line parameter */
+ if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+ zlib_dfltcc_support == ZLIB_DFLTCC_DEFLATE_ONLY)
+ return 0;
+
/* Unsupported compression settings */
if (state->wbits != HB_BITS)
return 0;
diff --git a/lib/zlib_dfltcc/dfltcc_util.h b/lib/zlib_dfltcc/dfltcc_util.h
index 82cd1950c416..7c0d3bdc50a9 100644
--- a/lib/zlib_dfltcc/dfltcc_util.h
+++ b/lib/zlib_dfltcc/dfltcc_util.h
@@ -4,6 +4,7 @@

#include <linux/zutil.h>
#include <asm/facility.h>
+#include <asm/setup.h>

/*
* C wrapper for the DEFLATE CONVERSION CALL instruction.
@@ -102,7 +103,8 @@ static inline int dfltcc_are_params_ok(

static inline int is_dfltcc_enabled(void)
{
- return test_facility(DFLTCC_FACILITY);
+ return (zlib_dfltcc_support != ZLIB_DFLTCC_DISABLED &&
+ test_facility(DFLTCC_FACILITY));
}

char *oesc_msg(char *buf, int oesc);
--
2.17.1

2020-01-06 16:42:02

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

On 1/3/20 5:33 PM, Mikhail Zaslonko wrote:
> In order to benefit from s390 zlib hardware compression support,
> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
> s390 zlib hardware support is enabled on the machine). This brings up
> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
> buffer and much more compared to the software zlib processing in btrfs.
> In case of memory pressure fall back to a single page buffer during
> workspace allocation.
>
> Signed-off-by: Mikhail Zaslonko <[email protected]>
> ---
> fs/btrfs/compression.c | 2 +-
> fs/btrfs/zlib.c | 128 ++++++++++++++++++++++++++++++-----------
> 2 files changed, 94 insertions(+), 36 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ee834ef7beb4..6bd0e75a822c 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> /* copy bytes from the working buffer into the pages */
> while (working_bytes > 0) {
> bytes = min_t(unsigned long, bvec.bv_len,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, working_bytes);
>
> kaddr = kmap_atomic(bvec.bv_page);
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index a6c90a003c12..159486801631 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -20,9 +20,12 @@
> #include <linux/refcount.h>
> #include "compression.h"
>
> +#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
> +
> struct workspace {
> z_stream strm;
> char *buf;
> + unsigned long buf_size;
> struct list_head list;
> int level;
> };
> @@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
> zlib_inflate_workspacesize());
> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
> workspace->level = level;
> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf = NULL;
> + if (zlib_deflate_dfltcc_enabled()) {
> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> + __GFP_NOMEMALLOC | __GFP_NORETRY |
> + __GFP_NOWARN | GFP_NOIO);
> + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
> + }
> + if (!workspace->buf) {
> + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf_size = PAGE_SIZE;
> + }
> if (!workspace->strm.workspace || !workspace->buf)
> goto fail;
>
> @@ -78,6 +91,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> unsigned long *total_in, unsigned long *total_out)
> {
> struct workspace *workspace = list_entry(ws, struct workspace, list);
> + int i;
> int ret;
> char *data_in;
> char *cpage_out;
> @@ -85,6 +99,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> struct page *in_page = NULL;
> struct page *out_page = NULL;
> unsigned long bytes_left;
> + unsigned long in_buf_pages;
> unsigned long len = *total_out;
> unsigned long nr_dest_pages = *out_pages;
> const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
> @@ -102,9 +117,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> workspace->strm.total_in = 0;
> workspace->strm.total_out = 0;
>
> - in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> -
> out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> if (out_page == NULL) {
> ret = -ENOMEM;
> @@ -114,12 +126,48 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> pages[0] = out_page;
> nr_pages = 1;
>
> - workspace->strm.next_in = data_in;
> + workspace->strm.next_in = workspace->buf;
> + workspace->strm.avail_in = 0;
> workspace->strm.next_out = cpage_out;
> workspace->strm.avail_out = PAGE_SIZE;
> - workspace->strm.avail_in = min(len, PAGE_SIZE);
>
> while (workspace->strm.total_in < len) {
> + /* get next input pages and copy the contents to
> + * the workspace buffer if required
> + */
> + if (workspace->strm.avail_in == 0) {
> + bytes_left = len - workspace->strm.total_in;
> + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> + workspace->buf_size / PAGE_SIZE);
> + if (in_buf_pages > 1) {
> + for (i = 0; i < in_buf_pages; i++) {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + memcpy(workspace->buf + i*PAGE_SIZE,
> + data_in, PAGE_SIZE);
> + start += PAGE_SIZE;
> + }

Is there a reason to leave the last in_page mapped here? I realize we'll clean
it up further down, but since we're copying everything into the buf anyway why
not just map->copy->unmap for everything?

> + workspace->strm.next_in = workspace->buf;
> + } else {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + start += PAGE_SIZE;
> + workspace->strm.next_in = data_in;
> + }
> + workspace->strm.avail_in = min(bytes_left,
> + workspace->buf_size);
> + }
> +
> ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
> if (ret != Z_OK) {
> pr_debug("BTRFS: deflate in loop returned %d\n",
> @@ -136,6 +184,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> ret = -E2BIG;
> goto out;
> }
> +

Extra newline. Thanks,

Josef

2020-01-06 18:45:19

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

On Fri, Jan 03, 2020 at 11:33:34PM +0100, Mikhail Zaslonko wrote:
> In order to benefit from s390 zlib hardware compression support,
> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
> s390 zlib hardware support is enabled on the machine). This brings up
> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
> buffer and much more compared to the software zlib processing in btrfs.
> In case of memory pressure fall back to a single page buffer during
> workspace allocation.

You could also summarize the previous discussion eg. the question
whether zlib will decommpress the stream produced on larger input
buffers on system that has only one page available for each
decompression round.

> Signed-off-by: Mikhail Zaslonko <[email protected]>
> ---
> fs/btrfs/compression.c | 2 +-
> fs/btrfs/zlib.c | 128 ++++++++++++++++++++++++++++++-----------
> 2 files changed, 94 insertions(+), 36 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ee834ef7beb4..6bd0e75a822c 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> /* copy bytes from the working buffer into the pages */
> while (working_bytes > 0) {
> bytes = min_t(unsigned long, bvec.bv_len,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, working_bytes);
>
> kaddr = kmap_atomic(bvec.bv_page);
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index a6c90a003c12..159486801631 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -20,9 +20,12 @@
> #include <linux/refcount.h>
> #include "compression.h"
>
> +#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)

Please add a comment what's this constant for

> +
> struct workspace {
> z_stream strm;
> char *buf;
> + unsigned long buf_size;

int should be enough here

> struct list_head list;
> int level;
> };
> @@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
> zlib_inflate_workspacesize());
> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
> workspace->level = level;
> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf = NULL;
> + if (zlib_deflate_dfltcc_enabled()) {
> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> + __GFP_NOMEMALLOC | __GFP_NORETRY |
> + __GFP_NOWARN | GFP_NOIO);

Why do you use this wild GFP flag combination? I can understand NOWARN,
but why the others?

> + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
> + }
> + if (!workspace->buf) {
> + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf_size = PAGE_SIZE;
> + }
> if (!workspace->strm.workspace || !workspace->buf)
> goto fail;
>
> @@ -78,6 +91,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> unsigned long *total_in, unsigned long *total_out)
> {
> struct workspace *workspace = list_entry(ws, struct workspace, list);
> + int i;

This should be declared in the inner most scope of use

> int ret;
> char *data_in;
> char *cpage_out;
> @@ -85,6 +99,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> struct page *in_page = NULL;
> struct page *out_page = NULL;
> unsigned long bytes_left;
> + unsigned long in_buf_pages;

long does not seem to be necessary, int

> unsigned long len = *total_out;
> unsigned long nr_dest_pages = *out_pages;
> const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
> @@ -102,9 +117,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> workspace->strm.total_in = 0;
> workspace->strm.total_out = 0;
>
> - in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> -
> out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> if (out_page == NULL) {
> ret = -ENOMEM;
> @@ -114,12 +126,48 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> pages[0] = out_page;
> nr_pages = 1;
>
> - workspace->strm.next_in = data_in;
> + workspace->strm.next_in = workspace->buf;
> + workspace->strm.avail_in = 0;
> workspace->strm.next_out = cpage_out;
> workspace->strm.avail_out = PAGE_SIZE;
> - workspace->strm.avail_in = min(len, PAGE_SIZE);
>
> while (workspace->strm.total_in < len) {
> + /* get next input pages and copy the contents to
> + * the workspace buffer if required
> + */

Please format the comment properly

> + if (workspace->strm.avail_in == 0) {
> + bytes_left = len - workspace->strm.total_in;
> + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> + workspace->buf_size / PAGE_SIZE);
> + if (in_buf_pages > 1) {

int i;

> + for (i = 0; i < in_buf_pages; i++) {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + memcpy(workspace->buf + i*PAGE_SIZE,

spaces around '*': i * PAGE_SIZE

> + data_in, PAGE_SIZE);
> + start += PAGE_SIZE;
> + }
> + workspace->strm.next_in = workspace->buf;
> + } else {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + start += PAGE_SIZE;
> + workspace->strm.next_in = data_in;
> + }
> + workspace->strm.avail_in = min(bytes_left,
> + workspace->buf_size);
> + }
> +
> ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
> if (ret != Z_OK) {
> pr_debug("BTRFS: deflate in loop returned %d\n",
> @@ -136,6 +184,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> ret = -E2BIG;
> goto out;
> }
> +
> /* we need another page for writing out. Test this
> * before the total_in so we will pull in a new page for
> * the stream end if required
> @@ -161,33 +210,42 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> /* we're all done */
> if (workspace->strm.total_in >= len)
> break;
> -
> - /* we've read in a full page, get a new one */
> - if (workspace->strm.avail_in == 0) {
> - if (workspace->strm.total_out > max_out)
> - break;
> -
> - bytes_left = len - workspace->strm.total_in;
> - kunmap(in_page);
> - put_page(in_page);
> -
> - start += PAGE_SIZE;
> - in_page = find_get_page(mapping,
> - start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> - workspace->strm.avail_in = min(bytes_left,
> - PAGE_SIZE);
> - workspace->strm.next_in = data_in;
> - }
> + if (workspace->strm.total_out > max_out)
> + break;
> }
> workspace->strm.avail_in = 0;
> - ret = zlib_deflate(&workspace->strm, Z_FINISH);
> - zlib_deflateEnd(&workspace->strm);
> -
> - if (ret != Z_STREAM_END) {
> - ret = -EIO;
> - goto out;
> + /* call deflate with Z_FINISH flush parameter providing more output
> + * space but no more input data, until it returns with Z_STREAM_END
> + */

Comment formatting

> + while (ret != Z_STREAM_END) {
> + ret = zlib_deflate(&workspace->strm, Z_FINISH);
> + if (ret == Z_STREAM_END)
> + break;
> + if (ret != Z_OK && ret != Z_BUF_ERROR) {
> + zlib_deflateEnd(&workspace->strm);
> + ret = -EIO;
> + goto out;
> + } else if (workspace->strm.avail_out == 0) {
> + /* get another page for the stream end */
> + kunmap(out_page);
> + if (nr_pages == nr_dest_pages) {
> + out_page = NULL;
> + ret = -E2BIG;
> + goto out;
> + }
> + out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> + if (out_page == NULL) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + cpage_out = kmap(out_page);
> + pages[nr_pages] = out_page;
> + nr_pages++;
> + workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.next_out = cpage_out;
> + }
> }
> + zlib_deflateEnd(&workspace->strm);
>
> if (workspace->strm.total_out >= workspace->strm.total_in) {
> ret = -E2BIG;
> @@ -231,7 +289,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>
> workspace->strm.total_out = 0;
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
>
> /* If it's deflate, and it's got no preset dictionary, then
> we can tell zlib to skip the adler32 check. */
> @@ -270,7 +328,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
> }
>
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
>
> if (workspace->strm.avail_in == 0) {
> unsigned long tmp;
> @@ -320,7 +378,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> workspace->strm.total_in = 0;
>
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
> workspace->strm.total_out = 0;
> /* If it's deflate, and it's got no preset dictionary, then
> we can tell zlib to skip the adler32 check. */
> @@ -364,7 +422,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> buf_offset = 0;
>
> bytes = min(PAGE_SIZE - pg_offset,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, bytes_left);
>
> kaddr = kmap_atomic(dest_page);
> @@ -375,7 +433,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
> bytes_left -= bytes;
> next:
> workspace->strm.next_out = workspace->buf;
> - workspace->strm.avail_out = PAGE_SIZE;
> + workspace->strm.avail_out = workspace->buf_size;
> }
>
> if (ret != Z_STREAM_END && bytes_left != 0)
> --
> 2.17.1
>

2020-01-07 00:20:06

by Eduard Shishkin

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression



On 1/6/20 7:43 PM, David Sterba wrote:
> On Fri, Jan 03, 2020 at 11:33:34PM +0100, Mikhail Zaslonko wrote:
>> In order to benefit from s390 zlib hardware compression support,
>> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
>> s390 zlib hardware support is enabled on the machine). This brings up
>> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
>> buffer and much more compared to the software zlib processing in btrfs.
>> In case of memory pressure fall back to a single page buffer during
>> workspace allocation.
> You could also summarize the previous discussion eg. the question
> whether zlib will decommpress the stream produced on larger input
> buffers on system that has only one page available for each
> decompression round.
>
>> Signed-off-by: Mikhail Zaslonko <[email protected]>
>> ---
>> fs/btrfs/compression.c | 2 +-
>> fs/btrfs/zlib.c | 128 ++++++++++++++++++++++++++++++-----------
>> 2 files changed, 94 insertions(+), 36 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index ee834ef7beb4..6bd0e75a822c 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
>> /* copy bytes from the working buffer into the pages */
>> while (working_bytes > 0) {
>> bytes = min_t(unsigned long, bvec.bv_len,
>> - PAGE_SIZE - buf_offset);
>> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
>> bytes = min(bytes, working_bytes);
>>
>> kaddr = kmap_atomic(bvec.bv_page);
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index a6c90a003c12..159486801631 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -20,9 +20,12 @@
>> #include <linux/refcount.h>
>> #include "compression.h"
>>
>> +#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
> Please add a comment what's this constant for
>
>> +
>> struct workspace {
>> z_stream strm;
>> char *buf;
>> + unsigned long buf_size;
> int should be enough here
>
>> struct list_head list;
>> int level;
>> };
>> @@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>> zlib_inflate_workspacesize());
>> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
>> workspace->level = level;
>> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> + workspace->buf = NULL;
>> + if (zlib_deflate_dfltcc_enabled()) {
>> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
>> + __GFP_NOMEMALLOC | __GFP_NORETRY |
>> + __GFP_NOWARN | GFP_NOIO);
> Why do you use this wild GFP flag combination? I can understand NOWARN,
> but why the others?

This addresses the following complaint about order 2 allocation with
GFP_KERNEL:
https://lkml.org/lkml/2019/11/26/417

Below a fallback to a single page is implemented, as it was suggested.
So, the initial (more costly) allocation should be made with minimal
aggression
to allow the allocator fail. Otherwise, that fallback simply doesn't
make sense.
Hence, such a combination.

Thanks,
Eduard.

>> + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
>> + }
>> + if (!workspace->buf) {
>> + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> + workspace->buf_size = PAGE_SIZE;
>> + }
>> if (!workspace->strm.workspace || !workspace->buf)
>> goto fail;
>>
>> @@ -78,6 +91,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> unsigned long *total_in, unsigned long *total_out)
>> {
>> struct workspace *workspace = list_entry(ws, struct workspace, list);
>> + int i;
> This should be declared in the inner most scope of use
>
>> int ret;
>> char *data_in;
>> char *cpage_out;
>> @@ -85,6 +99,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> struct page *in_page = NULL;
>> struct page *out_page = NULL;
>> unsigned long bytes_left;
>> + unsigned long in_buf_pages;
> long does not seem to be necessary, int
>
>> unsigned long len = *total_out;
>> unsigned long nr_dest_pages = *out_pages;
>> const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>> @@ -102,9 +117,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> workspace->strm.total_in = 0;
>> workspace->strm.total_out = 0;
>>
>> - in_page = find_get_page(mapping, start >> PAGE_SHIFT);
>> - data_in = kmap(in_page);
>> -
>> out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> if (out_page == NULL) {
>> ret = -ENOMEM;
>> @@ -114,12 +126,48 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> pages[0] = out_page;
>> nr_pages = 1;
>>
>> - workspace->strm.next_in = data_in;
>> + workspace->strm.next_in = workspace->buf;
>> + workspace->strm.avail_in = 0;
>> workspace->strm.next_out = cpage_out;
>> workspace->strm.avail_out = PAGE_SIZE;
>> - workspace->strm.avail_in = min(len, PAGE_SIZE);
>>
>> while (workspace->strm.total_in < len) {
>> + /* get next input pages and copy the contents to
>> + * the workspace buffer if required
>> + */
> Please format the comment properly
>
>> + if (workspace->strm.avail_in == 0) {
>> + bytes_left = len - workspace->strm.total_in;
>> + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>> + workspace->buf_size / PAGE_SIZE);
>> + if (in_buf_pages > 1) {
> int i;
>
>> + for (i = 0; i < in_buf_pages; i++) {
>> + if (in_page) {
>> + kunmap(in_page);
>> + put_page(in_page);
>> + }
>> + in_page = find_get_page(mapping,
>> + start >> PAGE_SHIFT);
>> + data_in = kmap(in_page);
>> + memcpy(workspace->buf + i*PAGE_SIZE,
> spaces around '*': i * PAGE_SIZE
>
>> + data_in, PAGE_SIZE);
>> + start += PAGE_SIZE;
>> + }
>> + workspace->strm.next_in = workspace->buf;
>> + } else {
>> + if (in_page) {
>> + kunmap(in_page);
>> + put_page(in_page);
>> + }
>> + in_page = find_get_page(mapping,
>> + start >> PAGE_SHIFT);
>> + data_in = kmap(in_page);
>> + start += PAGE_SIZE;
>> + workspace->strm.next_in = data_in;
>> + }
>> + workspace->strm.avail_in = min(bytes_left,
>> + workspace->buf_size);
>> + }
>> +
>> ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
>> if (ret != Z_OK) {
>> pr_debug("BTRFS: deflate in loop returned %d\n",
>> @@ -136,6 +184,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> ret = -E2BIG;
>> goto out;
>> }
>> +
>> /* we need another page for writing out. Test this
>> * before the total_in so we will pull in a new page for
>> * the stream end if required
>> @@ -161,33 +210,42 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>> /* we're all done */
>> if (workspace->strm.total_in >= len)
>> break;
>> -
>> - /* we've read in a full page, get a new one */
>> - if (workspace->strm.avail_in == 0) {
>> - if (workspace->strm.total_out > max_out)
>> - break;
>> -
>> - bytes_left = len - workspace->strm.total_in;
>> - kunmap(in_page);
>> - put_page(in_page);
>> -
>> - start += PAGE_SIZE;
>> - in_page = find_get_page(mapping,
>> - start >> PAGE_SHIFT);
>> - data_in = kmap(in_page);
>> - workspace->strm.avail_in = min(bytes_left,
>> - PAGE_SIZE);
>> - workspace->strm.next_in = data_in;
>> - }
>> + if (workspace->strm.total_out > max_out)
>> + break;
>> }
>> workspace->strm.avail_in = 0;
>> - ret = zlib_deflate(&workspace->strm, Z_FINISH);
>> - zlib_deflateEnd(&workspace->strm);
>> -
>> - if (ret != Z_STREAM_END) {
>> - ret = -EIO;
>> - goto out;
>> + /* call deflate with Z_FINISH flush parameter providing more output
>> + * space but no more input data, until it returns with Z_STREAM_END
>> + */
> Comment formatting
>
>> + while (ret != Z_STREAM_END) {
>> + ret = zlib_deflate(&workspace->strm, Z_FINISH);
>> + if (ret == Z_STREAM_END)
>> + break;
>> + if (ret != Z_OK && ret != Z_BUF_ERROR) {
>> + zlib_deflateEnd(&workspace->strm);
>> + ret = -EIO;
>> + goto out;
>> + } else if (workspace->strm.avail_out == 0) {
>> + /* get another page for the stream end */
>> + kunmap(out_page);
>> + if (nr_pages == nr_dest_pages) {
>> + out_page = NULL;
>> + ret = -E2BIG;
>> + goto out;
>> + }
>> + out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> + if (out_page == NULL) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> + cpage_out = kmap(out_page);
>> + pages[nr_pages] = out_page;
>> + nr_pages++;
>> + workspace->strm.avail_out = PAGE_SIZE;
>> + workspace->strm.next_out = cpage_out;
>> + }
>> }
>> + zlib_deflateEnd(&workspace->strm);
>>
>> if (workspace->strm.total_out >= workspace->strm.total_in) {
>> ret = -E2BIG;
>> @@ -231,7 +289,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>>
>> workspace->strm.total_out = 0;
>> workspace->strm.next_out = workspace->buf;
>> - workspace->strm.avail_out = PAGE_SIZE;
>> + workspace->strm.avail_out = workspace->buf_size;
>>
>> /* If it's deflate, and it's got no preset dictionary, then
>> we can tell zlib to skip the adler32 check. */
>> @@ -270,7 +328,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>> }
>>
>> workspace->strm.next_out = workspace->buf;
>> - workspace->strm.avail_out = PAGE_SIZE;
>> + workspace->strm.avail_out = workspace->buf_size;
>>
>> if (workspace->strm.avail_in == 0) {
>> unsigned long tmp;
>> @@ -320,7 +378,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
>> workspace->strm.total_in = 0;
>>
>> workspace->strm.next_out = workspace->buf;
>> - workspace->strm.avail_out = PAGE_SIZE;
>> + workspace->strm.avail_out = workspace->buf_size;
>> workspace->strm.total_out = 0;
>> /* If it's deflate, and it's got no preset dictionary, then
>> we can tell zlib to skip the adler32 check. */
>> @@ -364,7 +422,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
>> buf_offset = 0;
>>
>> bytes = min(PAGE_SIZE - pg_offset,
>> - PAGE_SIZE - buf_offset);
>> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
>> bytes = min(bytes, bytes_left);
>>
>> kaddr = kmap_atomic(dest_page);
>> @@ -375,7 +433,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
>> bytes_left -= bytes;
>> next:
>> workspace->strm.next_out = workspace->buf;
>> - workspace->strm.avail_out = PAGE_SIZE;
>> + workspace->strm.avail_out = workspace->buf_size;
>> }
>>
>> if (ret != Z_STREAM_END && bytes_left != 0)
>> --
>> 2.17.1
>>

2020-01-07 10:33:37

by Zaslonko Mikhail

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

Hello,

On 06.01.2020 17:40, Josef Bacik wrote:
> On 1/3/20 5:33 PM, Mikhail Zaslonko wrote:
>> In order to benefit from s390 zlib hardware compression support,
>> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
>> s390 zlib hardware support is enabled on the machine). This brings up
>> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
>> buffer and much more compared to the software zlib processing in btrfs.
>> In case of memory pressure fall back to a single page buffer during
>> workspace allocation.
>>
>> Signed-off-by: Mikhail Zaslonko <[email protected]>
>> ---
>>   fs/btrfs/compression.c |   2 +-
>>   fs/btrfs/zlib.c        | 128 ++++++++++++++++++++++++++++++-----------
>>   2 files changed, 94 insertions(+), 36 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index ee834ef7beb4..6bd0e75a822c 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
>>       /* copy bytes from the working buffer into the pages */
>>       while (working_bytes > 0) {
>>           bytes = min_t(unsigned long, bvec.bv_len,
>> -                PAGE_SIZE - buf_offset);
>> +                PAGE_SIZE - (buf_offset % PAGE_SIZE));
>>           bytes = min(bytes, working_bytes);
>>             kaddr = kmap_atomic(bvec.bv_page);
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index a6c90a003c12..159486801631 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -20,9 +20,12 @@
>>   #include <linux/refcount.h>
>>   #include "compression.h"
>>   +#define ZLIB_DFLTCC_BUF_SIZE    (4 * PAGE_SIZE)
>> +
>>   struct workspace {
>>       z_stream strm;
>>       char *buf;
>> +    unsigned long buf_size;
>>       struct list_head list;
>>       int level;
>>   };
>> @@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>>               zlib_inflate_workspacesize());
>>       workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
>>       workspace->level = level;
>> -    workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +    workspace->buf = NULL;
>> +    if (zlib_deflate_dfltcc_enabled()) {
>> +        workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
>> +                     __GFP_NOMEMALLOC | __GFP_NORETRY |
>> +                     __GFP_NOWARN | GFP_NOIO);
>> +        workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
>> +    }
>> +    if (!workspace->buf) {
>> +        workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +        workspace->buf_size = PAGE_SIZE;
>> +    }
>>       if (!workspace->strm.workspace || !workspace->buf)
>>           goto fail;
>>   @@ -78,6 +91,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>           unsigned long *total_in, unsigned long *total_out)
>>   {
>>       struct workspace *workspace = list_entry(ws, struct workspace, list);
>> +    int i;
>>       int ret;
>>       char *data_in;
>>       char *cpage_out;
>> @@ -85,6 +99,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       struct page *in_page = NULL;
>>       struct page *out_page = NULL;
>>       unsigned long bytes_left;
>> +    unsigned long in_buf_pages;
>>       unsigned long len = *total_out;
>>       unsigned long nr_dest_pages = *out_pages;
>>       const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>> @@ -102,9 +117,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       workspace->strm.total_in = 0;
>>       workspace->strm.total_out = 0;
>>   -    in_page = find_get_page(mapping, start >> PAGE_SHIFT);
>> -    data_in = kmap(in_page);
>> -
>>       out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>>       if (out_page == NULL) {
>>           ret = -ENOMEM;
>> @@ -114,12 +126,48 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       pages[0] = out_page;
>>       nr_pages = 1;
>>   -    workspace->strm.next_in = data_in;
>> +    workspace->strm.next_in = workspace->buf;
>> +    workspace->strm.avail_in = 0;
>>       workspace->strm.next_out = cpage_out;
>>       workspace->strm.avail_out = PAGE_SIZE;
>> -    workspace->strm.avail_in = min(len, PAGE_SIZE);
>>         while (workspace->strm.total_in < len) {
>> +        /* get next input pages and copy the contents to
>> +         * the workspace buffer if required
>> +         */
>> +        if (workspace->strm.avail_in == 0) {
>> +            bytes_left = len - workspace->strm.total_in;
>> +            in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>> +                       workspace->buf_size / PAGE_SIZE);
>> +            if (in_buf_pages > 1) {
>> +                for (i = 0; i < in_buf_pages; i++) {
>> +                    if (in_page) {
>> +                        kunmap(in_page);
>> +                        put_page(in_page);
>> +                    }
>> +                    in_page = find_get_page(mapping,
>> +                                start >> PAGE_SHIFT);
>> +                    data_in = kmap(in_page);
>> +                    memcpy(workspace->buf + i*PAGE_SIZE,
>> +                           data_in, PAGE_SIZE);
>> +                    start += PAGE_SIZE;
>> +                }
>
> Is there a reason to leave the last in_page mapped here?  I realize we'll clean it up further down, but since we're copying everything into the buf anyway why not just map->copy->unmap for everything?

The idea was not to copy the last data chunk if it fits in a single page, thus using the same code flow as for a PAGE_SIZE buffer (the ELSE branch below).

>
>> +                workspace->strm.next_in = workspace->buf;
>> +            } else {
>> +                if (in_page) {
>> +                    kunmap(in_page);
>> +                    put_page(in_page);
>> +                }
>> +                in_page = find_get_page(mapping,
>> +                            start >> PAGE_SHIFT);
>> +                data_in = kmap(in_page);
>> +                start += PAGE_SIZE;
>> +                workspace->strm.next_in = data_in;
>> +            }
>> +            workspace->strm.avail_in = min(bytes_left,
>> +                               workspace->buf_size);
>> +        }
>> +
>>           ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
>>           if (ret != Z_OK) {
>>               pr_debug("BTRFS: deflate in loop returned %d\n",
>> @@ -136,6 +184,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>               ret = -E2BIG;
>>               goto out;
>>           }
>> +
>
> Extra newline.  Thanks,
>
> Josef

2020-01-07 14:33:42

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] btrfs: Use larger zlib buffer for s390 hardware compression

On Tue, Jan 07, 2020 at 01:18:06AM +0100, Eduard Shishkin wrote:
> >> @@ -61,7 +64,17 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
> >> zlib_inflate_workspacesize());
> >> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
> >> workspace->level = level;
> >> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> >> + workspace->buf = NULL;
> >> + if (zlib_deflate_dfltcc_enabled()) {
> >> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> >> + __GFP_NOMEMALLOC | __GFP_NORETRY |
> >> + __GFP_NOWARN | GFP_NOIO);
> > Why do you use this wild GFP flag combination? I can understand NOWARN,
> > but why the others?
>
> This addresses the following complaint about order 2 allocation with
> GFP_KERNEL:
> https://lkml.org/lkml/2019/11/26/417
>
> Below a fallback to a single page is implemented, as it was suggested.
> So, the initial (more costly) allocation should be made with minimal
> aggression
> to allow the allocator fail. Otherwise, that fallback simply doesn't
> make sense.
> Hence, such a combination.

I see, please add a comment explaining that.

2020-01-08 12:49:34

by Zaslonko Mikhail

[permalink] [raw]
Subject: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

In order to benefit from s390 zlib hardware compression support,
increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
s390 zlib hardware support is enabled on the machine). This brings up
to 60% better performance in hardware on s390 compared to the PAGE_SIZE
buffer and much more compared to the software zlib processing in btrfs.
In case of memory pressure, fall back to a single page buffer during
workspace allocation.
The data compressed with larger input buffers will still conform to zlib
standard and thus can be decompressed also on a systems that uses only
PAGE_SIZE buffer for btrfs zlib.

Signed-off-by: Mikhail Zaslonko <[email protected]>
---
fs/btrfs/compression.c | 2 +-
fs/btrfs/zlib.c | 135 ++++++++++++++++++++++++++++++-----------
2 files changed, 101 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ee834ef7beb4..6bd0e75a822c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
/* copy bytes from the working buffer into the pages */
while (working_bytes > 0) {
bytes = min_t(unsigned long, bvec.bv_len,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, working_bytes);

kaddr = kmap_atomic(bvec.bv_page);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index a6c90a003c12..05615a1099db 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -20,9 +20,13 @@
#include <linux/refcount.h>
#include "compression.h"

+/* workspace buffer size for s390 zlib hardware support */
+#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
+
struct workspace {
z_stream strm;
char *buf;
+ unsigned int buf_size;
struct list_head list;
int level;
};
@@ -61,7 +65,21 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
zlib_inflate_workspacesize());
workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
workspace->level = level;
- workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf = NULL;
+ /*
+ * In case of s390 zlib hardware support, allocate lager workspace
+ * buffer. If allocator fails, fall back to a single page buffer.
+ */
+ if (zlib_deflate_dfltcc_enabled()) {
+ workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
+ __GFP_NOMEMALLOC | __GFP_NORETRY |
+ __GFP_NOWARN | GFP_NOIO);
+ workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
+ }
+ if (!workspace->buf) {
+ workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ workspace->buf_size = PAGE_SIZE;
+ }
if (!workspace->strm.workspace || !workspace->buf)
goto fail;

@@ -85,6 +103,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
struct page *in_page = NULL;
struct page *out_page = NULL;
unsigned long bytes_left;
+ unsigned int in_buf_pages;
unsigned long len = *total_out;
unsigned long nr_dest_pages = *out_pages;
const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
@@ -102,9 +121,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
workspace->strm.total_in = 0;
workspace->strm.total_out = 0;

- in_page = find_get_page(mapping, start >> PAGE_SHIFT);
- data_in = kmap(in_page);
-
out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
if (out_page == NULL) {
ret = -ENOMEM;
@@ -114,12 +130,51 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
pages[0] = out_page;
nr_pages = 1;

- workspace->strm.next_in = data_in;
+ workspace->strm.next_in = workspace->buf;
+ workspace->strm.avail_in = 0;
workspace->strm.next_out = cpage_out;
workspace->strm.avail_out = PAGE_SIZE;
- workspace->strm.avail_in = min(len, PAGE_SIZE);

while (workspace->strm.total_in < len) {
+ /*
+ * Get next input pages and copy the contents to
+ * the workspace buffer if required.
+ */
+ if (workspace->strm.avail_in == 0) {
+ bytes_left = len - workspace->strm.total_in;
+ in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+ workspace->buf_size / PAGE_SIZE);
+ if (in_buf_pages > 1) {
+ int i;
+
+ for (i = 0; i < in_buf_pages; i++) {
+ if (in_page) {
+ kunmap(in_page);
+ put_page(in_page);
+ }
+ in_page = find_get_page(mapping,
+ start >> PAGE_SHIFT);
+ data_in = kmap(in_page);
+ memcpy(workspace->buf + i * PAGE_SIZE,
+ data_in, PAGE_SIZE);
+ start += PAGE_SIZE;
+ }
+ workspace->strm.next_in = workspace->buf;
+ } else {
+ if (in_page) {
+ kunmap(in_page);
+ put_page(in_page);
+ }
+ in_page = find_get_page(mapping,
+ start >> PAGE_SHIFT);
+ data_in = kmap(in_page);
+ start += PAGE_SIZE;
+ workspace->strm.next_in = data_in;
+ }
+ workspace->strm.avail_in = min(bytes_left,
+ (unsigned long) workspace->buf_size);
+ }
+
ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
if (ret != Z_OK) {
pr_debug("BTRFS: deflate in loop returned %d\n",
@@ -161,33 +216,43 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
/* we're all done */
if (workspace->strm.total_in >= len)
break;
-
- /* we've read in a full page, get a new one */
- if (workspace->strm.avail_in == 0) {
- if (workspace->strm.total_out > max_out)
- break;
-
- bytes_left = len - workspace->strm.total_in;
- kunmap(in_page);
- put_page(in_page);
-
- start += PAGE_SIZE;
- in_page = find_get_page(mapping,
- start >> PAGE_SHIFT);
- data_in = kmap(in_page);
- workspace->strm.avail_in = min(bytes_left,
- PAGE_SIZE);
- workspace->strm.next_in = data_in;
- }
+ if (workspace->strm.total_out > max_out)
+ break;
}
workspace->strm.avail_in = 0;
- ret = zlib_deflate(&workspace->strm, Z_FINISH);
- zlib_deflateEnd(&workspace->strm);
-
- if (ret != Z_STREAM_END) {
- ret = -EIO;
- goto out;
+ /*
+ * Call deflate with Z_FINISH flush parameter providing more output
+ * space but no more input data, until it returns with Z_STREAM_END.
+ */
+ while (ret != Z_STREAM_END) {
+ ret = zlib_deflate(&workspace->strm, Z_FINISH);
+ if (ret == Z_STREAM_END)
+ break;
+ if (ret != Z_OK && ret != Z_BUF_ERROR) {
+ zlib_deflateEnd(&workspace->strm);
+ ret = -EIO;
+ goto out;
+ } else if (workspace->strm.avail_out == 0) {
+ /* get another page for the stream end */
+ kunmap(out_page);
+ if (nr_pages == nr_dest_pages) {
+ out_page = NULL;
+ ret = -E2BIG;
+ goto out;
+ }
+ out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+ if (out_page == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ cpage_out = kmap(out_page);
+ pages[nr_pages] = out_page;
+ nr_pages++;
+ workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.next_out = cpage_out;
+ }
}
+ zlib_deflateEnd(&workspace->strm);

if (workspace->strm.total_out >= workspace->strm.total_in) {
ret = -E2BIG;
@@ -231,7 +296,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)

workspace->strm.total_out = 0;
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -270,7 +335,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
}

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;

if (workspace->strm.avail_in == 0) {
unsigned long tmp;
@@ -320,7 +385,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
workspace->strm.total_in = 0;

workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
workspace->strm.total_out = 0;
/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
@@ -364,7 +429,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
buf_offset = 0;

bytes = min(PAGE_SIZE - pg_offset,
- PAGE_SIZE - buf_offset);
+ PAGE_SIZE - (buf_offset % PAGE_SIZE));
bytes = min(bytes, bytes_left);

kaddr = kmap_atomic(dest_page);
@@ -375,7 +440,7 @@ int zlib_decompress(struct list_head *ws, unsigned char *data_in,
bytes_left -= bytes;
next:
workspace->strm.next_out = workspace->buf;
- workspace->strm.avail_out = PAGE_SIZE;
+ workspace->strm.avail_out = workspace->buf_size;
}

if (ret != Z_STREAM_END && bytes_left != 0)
--
2.17.1

2020-01-08 15:11:05

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

On 1/8/20 5:51 AM, Mikhail Zaslonko wrote:
> In order to benefit from s390 zlib hardware compression support,
> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
> s390 zlib hardware support is enabled on the machine). This brings up
> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
> buffer and much more compared to the software zlib processing in btrfs.
> In case of memory pressure, fall back to a single page buffer during
> workspace allocation.
> The data compressed with larger input buffers will still conform to zlib
> standard and thus can be decompressed also on a systems that uses only
> PAGE_SIZE buffer for btrfs zlib.
>
> Signed-off-by: Mikhail Zaslonko <[email protected]>


> ---
> fs/btrfs/compression.c | 2 +-
> fs/btrfs/zlib.c | 135 ++++++++++++++++++++++++++++++-----------
> 2 files changed, 101 insertions(+), 36 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ee834ef7beb4..6bd0e75a822c 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> /* copy bytes from the working buffer into the pages */
> while (working_bytes > 0) {
> bytes = min_t(unsigned long, bvec.bv_len,
> - PAGE_SIZE - buf_offset);
> + PAGE_SIZE - (buf_offset % PAGE_SIZE));
> bytes = min(bytes, working_bytes);
>
> kaddr = kmap_atomic(bvec.bv_page);
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index a6c90a003c12..05615a1099db 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -20,9 +20,13 @@
> #include <linux/refcount.h>
> #include "compression.h"
>
> +/* workspace buffer size for s390 zlib hardware support */
> +#define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE)
> +
> struct workspace {
> z_stream strm;
> char *buf;
> + unsigned int buf_size;
> struct list_head list;
> int level;
> };
> @@ -61,7 +65,21 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
> zlib_inflate_workspacesize());
> workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
> workspace->level = level;
> - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf = NULL;
> + /*
> + * In case of s390 zlib hardware support, allocate lager workspace
> + * buffer. If allocator fails, fall back to a single page buffer.
> + */
> + if (zlib_deflate_dfltcc_enabled()) {
> + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
> + __GFP_NOMEMALLOC | __GFP_NORETRY |
> + __GFP_NOWARN | GFP_NOIO);
> + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
> + }
> + if (!workspace->buf) {
> + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> + workspace->buf_size = PAGE_SIZE;
> + }
> if (!workspace->strm.workspace || !workspace->buf)
> goto fail;
>
> @@ -85,6 +103,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> struct page *in_page = NULL;
> struct page *out_page = NULL;
> unsigned long bytes_left;
> + unsigned int in_buf_pages;
> unsigned long len = *total_out;
> unsigned long nr_dest_pages = *out_pages;
> const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
> @@ -102,9 +121,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> workspace->strm.total_in = 0;
> workspace->strm.total_out = 0;
>
> - in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> -
> out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> if (out_page == NULL) {
> ret = -ENOMEM;
> @@ -114,12 +130,51 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> pages[0] = out_page;
> nr_pages = 1;
>
> - workspace->strm.next_in = data_in;
> + workspace->strm.next_in = workspace->buf;
> + workspace->strm.avail_in = 0;
> workspace->strm.next_out = cpage_out;
> workspace->strm.avail_out = PAGE_SIZE;
> - workspace->strm.avail_in = min(len, PAGE_SIZE);
>
> while (workspace->strm.total_in < len) {
> + /*
> + * Get next input pages and copy the contents to
> + * the workspace buffer if required.
> + */
> + if (workspace->strm.avail_in == 0) {
> + bytes_left = len - workspace->strm.total_in;
> + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> + workspace->buf_size / PAGE_SIZE);
> + if (in_buf_pages > 1) {
> + int i;
> +
> + for (i = 0; i < in_buf_pages; i++) {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + memcpy(workspace->buf + i * PAGE_SIZE,
> + data_in, PAGE_SIZE);
> + start += PAGE_SIZE;
> + }
> + workspace->strm.next_in = workspace->buf;
> + } else {
> + if (in_page) {
> + kunmap(in_page);
> + put_page(in_page);
> + }
> + in_page = find_get_page(mapping,
> + start >> PAGE_SHIFT);
> + data_in = kmap(in_page);
> + start += PAGE_SIZE;
> + workspace->strm.next_in = data_in;
> + }
> + workspace->strm.avail_in = min(bytes_left,
> + (unsigned long) workspace->buf_size);
> + }
> +
> ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
> if (ret != Z_OK) {
> pr_debug("BTRFS: deflate in loop returned %d\n",
> @@ -161,33 +216,43 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
> /* we're all done */
> if (workspace->strm.total_in >= len)
> break;
> -
> - /* we've read in a full page, get a new one */
> - if (workspace->strm.avail_in == 0) {
> - if (workspace->strm.total_out > max_out)
> - break;
> -
> - bytes_left = len - workspace->strm.total_in;
> - kunmap(in_page);
> - put_page(in_page);
> -
> - start += PAGE_SIZE;
> - in_page = find_get_page(mapping,
> - start >> PAGE_SHIFT);
> - data_in = kmap(in_page);
> - workspace->strm.avail_in = min(bytes_left,
> - PAGE_SIZE);
> - workspace->strm.next_in = data_in;
> - }
> + if (workspace->strm.total_out > max_out)
> + break;
> }
> workspace->strm.avail_in = 0;
> - ret = zlib_deflate(&workspace->strm, Z_FINISH);
> - zlib_deflateEnd(&workspace->strm);
> -
> - if (ret != Z_STREAM_END) {
> - ret = -EIO;
> - goto out;
> + /*
> + * Call deflate with Z_FINISH flush parameter providing more output
> + * space but no more input data, until it returns with Z_STREAM_END.
> + */
> + while (ret != Z_STREAM_END) {
> + ret = zlib_deflate(&workspace->strm, Z_FINISH);
> + if (ret == Z_STREAM_END)
> + break;
> + if (ret != Z_OK && ret != Z_BUF_ERROR) {
> + zlib_deflateEnd(&workspace->strm);
> + ret = -EIO;
> + goto out;
> + } else if (workspace->strm.avail_out == 0) {
> + /* get another page for the stream end */
> + kunmap(out_page);
> + if (nr_pages == nr_dest_pages) {
> + out_page = NULL;
> + ret = -E2BIG;
> + goto out;
> + }
> + out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> + if (out_page == NULL) {
> + ret = -ENOMEM;
> + goto out;
> + }

Do we need zlib_deflateEnd() for the above error cases? Thanks,

Josef

2020-01-08 20:00:35

by Zaslonko Mikhail

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

Hello,

On 08.01.2020 16:08, Josef Bacik wrote:
> On 1/8/20 5:51 AM, Mikhail Zaslonko wrote:
>> In order to benefit from s390 zlib hardware compression support,
>> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
>> s390 zlib hardware support is enabled on the machine). This brings up
>> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
>> buffer and much more compared to the software zlib processing in btrfs.
>> In case of memory pressure, fall back to a single page buffer during
>> workspace allocation.
>> The data compressed with larger input buffers will still conform to zlib
>> standard and thus can be decompressed also on a systems that uses only
>> PAGE_SIZE buffer for btrfs zlib.
>>
>> Signed-off-by: Mikhail Zaslonko <[email protected]>
>
>
>> ---
>>   fs/btrfs/compression.c |   2 +-
>>   fs/btrfs/zlib.c        | 135 ++++++++++++++++++++++++++++++-----------
>>   2 files changed, 101 insertions(+), 36 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index ee834ef7beb4..6bd0e75a822c 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -1285,7 +1285,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
>>       /* copy bytes from the working buffer into the pages */
>>       while (working_bytes > 0) {
>>           bytes = min_t(unsigned long, bvec.bv_len,
>> -                PAGE_SIZE - buf_offset);
>> +                PAGE_SIZE - (buf_offset % PAGE_SIZE));
>>           bytes = min(bytes, working_bytes);
>>             kaddr = kmap_atomic(bvec.bv_page);
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index a6c90a003c12..05615a1099db 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -20,9 +20,13 @@
>>   #include <linux/refcount.h>
>>   #include "compression.h"
>>   +/* workspace buffer size for s390 zlib hardware support */
>> +#define ZLIB_DFLTCC_BUF_SIZE    (4 * PAGE_SIZE)
>> +
>>   struct workspace {
>>       z_stream strm;
>>       char *buf;
>> +    unsigned int buf_size;
>>       struct list_head list;
>>       int level;
>>   };
>> @@ -61,7 +65,21 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>>               zlib_inflate_workspacesize());
>>       workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
>>       workspace->level = level;
>> -    workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +    workspace->buf = NULL;
>> +    /*
>> +     * In case of s390 zlib hardware support, allocate lager workspace
>> +     * buffer. If allocator fails, fall back to a single page buffer.
>> +     */
>> +    if (zlib_deflate_dfltcc_enabled()) {
>> +        workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
>> +                     __GFP_NOMEMALLOC | __GFP_NORETRY |
>> +                     __GFP_NOWARN | GFP_NOIO);
>> +        workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
>> +    }
>> +    if (!workspace->buf) {
>> +        workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +        workspace->buf_size = PAGE_SIZE;
>> +    }
>>       if (!workspace->strm.workspace || !workspace->buf)
>>           goto fail;
>>   @@ -85,6 +103,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       struct page *in_page = NULL;
>>       struct page *out_page = NULL;
>>       unsigned long bytes_left;
>> +    unsigned int in_buf_pages;
>>       unsigned long len = *total_out;
>>       unsigned long nr_dest_pages = *out_pages;
>>       const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>> @@ -102,9 +121,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       workspace->strm.total_in = 0;
>>       workspace->strm.total_out = 0;
>>   -    in_page = find_get_page(mapping, start >> PAGE_SHIFT);
>> -    data_in = kmap(in_page);
>> -
>>       out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>>       if (out_page == NULL) {
>>           ret = -ENOMEM;
>> @@ -114,12 +130,51 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>       pages[0] = out_page;
>>       nr_pages = 1;
>>   -    workspace->strm.next_in = data_in;
>> +    workspace->strm.next_in = workspace->buf;
>> +    workspace->strm.avail_in = 0;
>>       workspace->strm.next_out = cpage_out;
>>       workspace->strm.avail_out = PAGE_SIZE;
>> -    workspace->strm.avail_in = min(len, PAGE_SIZE);
>>         while (workspace->strm.total_in < len) {
>> +        /*
>> +         * Get next input pages and copy the contents to
>> +         * the workspace buffer if required.
>> +         */
>> +        if (workspace->strm.avail_in == 0) {
>> +            bytes_left = len - workspace->strm.total_in;
>> +            in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>> +                       workspace->buf_size / PAGE_SIZE);
>> +            if (in_buf_pages > 1) {
>> +                int i;
>> +
>> +                for (i = 0; i < in_buf_pages; i++) {
>> +                    if (in_page) {
>> +                        kunmap(in_page);
>> +                        put_page(in_page);
>> +                    }
>> +                    in_page = find_get_page(mapping,
>> +                                start >> PAGE_SHIFT);
>> +                    data_in = kmap(in_page);
>> +                    memcpy(workspace->buf + i * PAGE_SIZE,
>> +                           data_in, PAGE_SIZE);
>> +                    start += PAGE_SIZE;
>> +                }
>> +                workspace->strm.next_in = workspace->buf;
>> +            } else {
>> +                if (in_page) {
>> +                    kunmap(in_page);
>> +                    put_page(in_page);
>> +                }
>> +                in_page = find_get_page(mapping,
>> +                            start >> PAGE_SHIFT);
>> +                data_in = kmap(in_page);
>> +                start += PAGE_SIZE;
>> +                workspace->strm.next_in = data_in;
>> +            }
>> +            workspace->strm.avail_in = min(bytes_left,
>> +                               (unsigned long) workspace->buf_size);
>> +        }
>> +
>>           ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
>>           if (ret != Z_OK) {
>>               pr_debug("BTRFS: deflate in loop returned %d\n",
>> @@ -161,33 +216,43 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>           /* we're all done */
>>           if (workspace->strm.total_in >= len)
>>               break;
>> -
>> -        /* we've read in a full page, get a new one */
>> -        if (workspace->strm.avail_in == 0) {
>> -            if (workspace->strm.total_out > max_out)
>> -                break;
>> -
>> -            bytes_left = len - workspace->strm.total_in;
>> -            kunmap(in_page);
>> -            put_page(in_page);
>> -
>> -            start += PAGE_SIZE;
>> -            in_page = find_get_page(mapping,
>> -                        start >> PAGE_SHIFT);
>> -            data_in = kmap(in_page);
>> -            workspace->strm.avail_in = min(bytes_left,
>> -                               PAGE_SIZE);
>> -            workspace->strm.next_in = data_in;
>> -        }
>> +        if (workspace->strm.total_out > max_out)
>> +            break;
>>       }
>>       workspace->strm.avail_in = 0;
>> -    ret = zlib_deflate(&workspace->strm, Z_FINISH);
>> -    zlib_deflateEnd(&workspace->strm);
>> -
>> -    if (ret != Z_STREAM_END) {
>> -        ret = -EIO;
>> -        goto out;
>> +    /*
>> +     * Call deflate with Z_FINISH flush parameter providing more output
>> +     * space but no more input data, until it returns with Z_STREAM_END.
>> +     */
>> +    while (ret != Z_STREAM_END) {
>> +        ret = zlib_deflate(&workspace->strm, Z_FINISH);
>> +        if (ret == Z_STREAM_END)
>> +            break;
>> +        if (ret != Z_OK && ret != Z_BUF_ERROR) {
>> +            zlib_deflateEnd(&workspace->strm);
>> +            ret = -EIO;
>> +            goto out;
>> +        } else if (workspace->strm.avail_out == 0) {
>> +            /* get another page for the stream end */
>> +            kunmap(out_page);
>> +            if (nr_pages == nr_dest_pages) {
>> +                out_page = NULL;
>> +                ret = -E2BIG;
>> +                goto out;
>> +            }
>> +            out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +            if (out_page == NULL) {
>> +                ret = -ENOMEM;
>> +                goto out;
>> +            }
>
> Do we need zlib_deflateEnd() for the above error cases?  Thanks,

The original btrfs code did not call zlib_deflateEnd() for -E2BIG and
-ENOMEM cases, so I stick to the same logic.
Unlike userspace zlib where deflateEnd() frees all dynamically allocated
memory, in the kernel it doesn't do much apart from setting the return
code (since all the memory allocations for kernel zlib are performed in advance).

>
> Josef

2020-01-09 01:11:35

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

On Wed, Jan 08, 2020 at 07:48:31PM +0100, Zaslonko Mikhail wrote:
> >> +??????? } else if (workspace->strm.avail_out == 0) {
> >> +??????????? /* get another page for the stream end */
> >> +??????????? kunmap(out_page);
> >> +??????????? if (nr_pages == nr_dest_pages) {
> >> +??????????????? out_page = NULL;
> >> +??????????????? ret = -E2BIG;
> >> +??????????????? goto out;
> >> +??????????? }
> >> +??????????? out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> >> +??????????? if (out_page == NULL) {
> >> +??????????????? ret = -ENOMEM;
> >> +??????????????? goto out;
> >> +??????????? }
> >
> > Do we need zlib_deflateEnd() for the above error cases?? Thanks,
>
> The original btrfs code did not call zlib_deflateEnd() for -E2BIG and
> -ENOMEM cases, so I stick to the same logic.
> Unlike userspace zlib where deflateEnd() frees all dynamically allocated
> memory, in the kernel it doesn't do much apart from setting the return
> code (since all the memory allocations for kernel zlib are performed in advance).

Agreed, deflateEnd is not necessary in the error cases.

2020-01-13 10:05:45

by Zaslonko Mikhail

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

Hello David,

On 09.01.2020 02:10, David Sterba wrote:
> On Wed, Jan 08, 2020 at 07:48:31PM +0100, Zaslonko Mikhail wrote:
>>>> +        } else if (workspace->strm.avail_out == 0) {
>>>> +            /* get another page for the stream end */
>>>> +            kunmap(out_page);
>>>> +            if (nr_pages == nr_dest_pages) {
>>>> +                out_page = NULL;
>>>> +                ret = -E2BIG;
>>>> +                goto out;
>>>> +            }
>>>> +            out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>>>> +            if (out_page == NULL) {
>>>> +                ret = -ENOMEM;
>>>> +                goto out;
>>>> +            }
>>>
>>> Do we need zlib_deflateEnd() for the above error cases?  Thanks,
>>
>> The original btrfs code did not call zlib_deflateEnd() for -E2BIG and
>> -ENOMEM cases, so I stick to the same logic.
>> Unlike userspace zlib where deflateEnd() frees all dynamically allocated
>> memory, in the kernel it doesn't do much apart from setting the return
>> code (since all the memory allocations for kernel zlib are performed in advance).
>
> Agreed, deflateEnd is not necessary in the error cases.

Can I consider this as 'Acked-by' from your side?
Are there any unanswered questions left on this patch?

>

Thanks,
Mikhail

2020-01-14 16:20:41

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

On Wed, Jan 08, 2020 at 11:51:03AM +0100, Mikhail Zaslonko wrote:
> In order to benefit from s390 zlib hardware compression support,
> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
> s390 zlib hardware support is enabled on the machine). This brings up
> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
> buffer and much more compared to the software zlib processing in btrfs.
> In case of memory pressure, fall back to a single page buffer during
> workspace allocation.
> The data compressed with larger input buffers will still conform to zlib
> standard and thus can be decompressed also on a systems that uses only
> PAGE_SIZE buffer for btrfs zlib.
>
> Signed-off-by: Mikhail Zaslonko <[email protected]>

Reviewed-by: David Sterba <[email protected]>

2020-01-14 16:21:59

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

On Mon, Jan 13, 2020 at 11:03:29AM +0100, Zaslonko Mikhail wrote:
> Hello David,
>
> On 09.01.2020 02:10, David Sterba wrote:
> > On Wed, Jan 08, 2020 at 07:48:31PM +0100, Zaslonko Mikhail wrote:
> >>>> +??????? } else if (workspace->strm.avail_out == 0) {
> >>>> +??????????? /* get another page for the stream end */
> >>>> +??????????? kunmap(out_page);
> >>>> +??????????? if (nr_pages == nr_dest_pages) {
> >>>> +??????????????? out_page = NULL;
> >>>> +??????????????? ret = -E2BIG;
> >>>> +??????????????? goto out;
> >>>> +??????????? }
> >>>> +??????????? out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> >>>> +??????????? if (out_page == NULL) {
> >>>> +??????????????? ret = -ENOMEM;
> >>>> +??????????????? goto out;
> >>>> +??????????? }
> >>>
> >>> Do we need zlib_deflateEnd() for the above error cases?? Thanks,
> >>
> >> The original btrfs code did not call zlib_deflateEnd() for -E2BIG and
> >> -ENOMEM cases, so I stick to the same logic.
> >> Unlike userspace zlib where deflateEnd() frees all dynamically allocated
> >> memory, in the kernel it doesn't do much apart from setting the return
> >> code (since all the memory allocations for kernel zlib are performed in advance).
> >
> > Agreed, deflateEnd is not necessary in the error cases.
>
> Can I consider this as 'Acked-by' from your side?

Yes, I also sent rev-by to the patch.

2020-01-15 09:56:18

by Zaslonko Mikhail

[permalink] [raw]
Subject: Fwd: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression

Hello Andrew,

Could you please pick this V4 btrfs patch and the other five patches from V3 patch
set (https://lkml.org/lkml/2020/1/3/568).

Older patches can be dropped from linux-next:
* ac2aeaa0f4d3 btrfs: use larger zlib buffer for s390 hardware compression
* ac19f15aea33 lib/zlib: add zlib_deflate_dfltcc_enabled() function
* 3447c9a51196 s390/boot: add dfltcc= kernel command line parameter
* 890060df4401 lib/zlib: add s390 hardware support for kernel zlib_inflate
* 148f7d060405 s390/boot: rename HEAP_SIZE due to name collision
* bfbef17740fd lib/zlib: add s390 hardware support for kernel zlib_deflate

Thanks,
Mikhail.


-------- Forwarded Message --------
Subject: Re: [PATCH v4] btrfs: Use larger zlib buffer for s390 hardware compression
Date: Tue, 14 Jan 2020 17:18:57 +0100
From: David Sterba <[email protected]>
Reply-To: [email protected]
To: Mikhail Zaslonko <[email protected]>
CC: Andrew Morton <[email protected]>, Chris Mason <[email protected]>, Josef Bacik <[email protected]>, David Sterba <[email protected]>, Richard Purdie <[email protected]>, Heiko Carstens <[email protected]>, Vasily Gorbik <[email protected]>, Christian Borntraeger <[email protected]>, Eduard Shishkin <[email protected]>, Ilya Leoshkevich <[email protected]>, [email protected], [email protected]

On Wed, Jan 08, 2020 at 11:51:03AM +0100, Mikhail Zaslonko wrote:
> In order to benefit from s390 zlib hardware compression support,
> increase the btrfs zlib workspace buffer size from 1 to 4 pages (if
> s390 zlib hardware support is enabled on the machine). This brings up
> to 60% better performance in hardware on s390 compared to the PAGE_SIZE
> buffer and much more compared to the software zlib processing in btrfs.
> In case of memory pressure, fall back to a single page buffer during
> workspace allocation.
> The data compressed with larger input buffers will still conform to zlib
> standard and thus can be decompressed also on a systems that uses only
> PAGE_SIZE buffer for btrfs zlib.
>
> Signed-off-by: Mikhail Zaslonko <[email protected]>

Reviewed-by: David Sterba <[email protected]>