Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp19048027ybl; Fri, 3 Jan 2020 14:35:35 -0800 (PST) X-Google-Smtp-Source: APXvYqx4GdGVZGCnW3RCeURtv3TCT+RQv/paayaUKXkc+M5u6zF7a9MWtfgwBENkhFLDhPzIYG/b X-Received: by 2002:a9d:5c02:: with SMTP id o2mr93807087otk.176.1578090935407; Fri, 03 Jan 2020 14:35:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578090935; cv=none; d=google.com; s=arc-20160816; b=ClrF2wja3yJEWNXdRK10ru7I80Y+L41P3xGLvkzQ/Hix0qEzNuv4JO90GmXZRnOhDm Rum+1dky8H/J+W3zitjRM8D3MbFBabgurelsJjxc3JJL6uB7C3v3hpbWJlFCNIuBNovK BqvM0pY0ZB4yQet7vpq6voQDP5RyYEnFmcXrvsfi+Kvt2inPMbpCM/WVwOlfAFbO1fCp 5F6Sgi8z9Gzw894pVYvVEpAqCL0GO1QXmikePBnQXpv3/BBuGmEjIntxi+tEQomFa7S8 pYcCcblZoxF7DxBIA0n2OonqZrG9bCHnHBGLbd2jxUmluG6FD4hkcTme5IkB3PNiInmN AI6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=1vNbMQ78Srn33VBpLpaTw/gTfhit8+oBK7zH8a+wfGU=; b=ttOTxUCSyHHjg84WvE/JJGGRkYMJ+qF5KvufUA3Kz9ad4AuO7dMVxMPX/FUn4EQTwI 3xnwiBWmdRWGYn99eeAEo+R23noBtXn+ieR+fWCx3itOCsv0Uq+r4fnvs334EIwsi/2n MBEUxZzXnnQm51gZ++4MnfEav7MpDsKXeHupB4o0uvBrjGsnF8hDa2rOZcq9lSOuItNK lAYsu3zmVz4zyoTfK1Xn03YqU83h0rberQEwm0g5hkwsRn8Cnck+HEfl4hukE3LzKzj9 G4OMeW8vUiYqqR2Tpuc6k3kNcFuYSApRPhVQzvJOmIM0a+ProboR3lo3xIq7929eO7QQ POjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 6si19827812oin.93.2020.01.03.14.35.22; Fri, 03 Jan 2020 14:35:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728861AbgACWdo (ORCPT + 99 others); Fri, 3 Jan 2020 17:33:44 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:32980 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728730AbgACWdn (ORCPT ); Fri, 3 Jan 2020 17:33:43 -0500 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 003MHPdi080283 for ; Fri, 3 Jan 2020 17:33:42 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0b-001b2d01.pphosted.com with ESMTP id 2xa48mg336-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 03 Jan 2020 17:33:42 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 3 Jan 2020 22:33:40 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 3 Jan 2020 22:33:36 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 003MXZXJ54853854 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 3 Jan 2020 22:33:35 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2836E5206D; Fri, 3 Jan 2020 22:33:35 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id C167852067; Fri, 3 Jan 2020 22:33:34 +0000 (GMT) From: Mikhail Zaslonko To: Andrew Morton , Chris Mason , Josef Bacik , David Sterba Cc: Richard Purdie , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Eduard Shishkin , Ilya Leoshkevich , linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 0/6] S390 hardware support for kernel zlib Date: Fri, 3 Jan 2020 23:33:28 +0100 X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 x-cbid: 20010322-0016-0000-0000-000002DA49EC X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20010322-0017-0000-0000-0000333CB865 Message-Id: <20200103223334.20669-1-zaslonko@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,18.0.572 definitions=2020-01-03_06:2020-01-02,2020-01-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 malwarescore=0 spamscore=0 mlxscore=0 suspectscore=0 clxscore=1015 priorityscore=1501 lowpriorityscore=0 adultscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1910280000 definitions=main-2001030203 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With IBM z15 mainframe the new DFLTCC instruction is available. It implements deflate algorithm in hardware (Nest Acceleration Unit - NXU) with estimated compression and decompression performance orders of magnitude faster than the current zlib. This patch-set adds s390 hardware compression support to kernel zlib. The code is based on the userspace zlib implementation: https://github.com/madler/zlib/pull/410 The coding style is also preserved for future maintainability. There is only limited set of userspace zlib functions represented in kernel. Apart from that, all the memory allocation should be performed in advance. Thus, the workarea structures are extended with the parameter lists required for the DEFLATE CONVENTION CALL instruction. Since kernel zlib itself does not support gzip headers, only Adler-32 checksum is processed (also can be produced by DFLTCC facility). Like it was implemented for userspace, kernel zlib will compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware (when enabled). Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. However it does always produce the standard conform output which can be inflated anyway. The new kernel command line parameter 'dfltcc' is introduced to configure s390 zlib hardware support: Format: { on | off | def_only | inf_only | always } on: s390 zlib hardware support for compression on level 1 and decompression (default) off: No s390 zlib hardware support def_only: s390 zlib hardware support for deflate only (compression on level 1) inf_only: s390 zlib hardware support for inflate only (decompression) always: Same as 'on' but ignores the selected compression level always using hardware support (used for debugging) The main purpose of the integration of the NXU support into the kernel zlib is the use of hardware deflate in btrfs filesystem with on-the-fly compression enabled. Apart from that, hardware support can also be used during boot for decompressing the kernel or the ramdisk image With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch 6) the following performance results have been achieved using the ramdisk with btrfs. These are relative numbers based on throughput rate and compression ratio for zlib level 1: Input data Deflate rate Inflate rate Compression ratio NXU/Software NXU/Software NXU/Software stream of zeroes 1.46 1.02 1.00 random ASCII data 10.44 3.00 0.96 ASCII text (dickens) 6,21 3.33 0.94 binary data (vmlinux) 8,37 3.90 1.02 This means that s390 hardware deflate can provide up to 10 times faster compression (on level 1) and up to 4 times faster decompression (refers to all compression levels) for btrfs zlib. Disclaimer: Performance results are based on IBM internal tests using DD command-line utility on btrfs on a Fedora 30 based internal driver in native LPAR on a z15 system. Results may vary based on individual workload, configuration and software levels. Changelog: v2 -> v3: - In btrfs zlib_compress_pages() do not copy input pages to the workspace buffer if no s390 hardware compression is enabled or the remaining input data fits in one page. - Leave just a single Kconfig option for s390 zlib hardware support (CONFIG_ZLIB_DFLTCC) - Fix the issues reported by kbuild test robot. v1 -> v2: - Added new external zlib function to check if s390 Deflate-Conversion facility is installed and enabled (see patch 5). - The larger buffer for btrfs zlib workspace is allocated only if s390 hardware compression is enabled. In case of failure to allocate 4-page buffer, we fall back to a PAGE_SIZE buffer, as proposed by Josef Bacik (see patch 6). Mikhail Zaslonko (6): lib/zlib: Add s390 hardware support for kernel zlib_deflate s390/boot: Rename HEAP_SIZE due to name collision lib/zlib: Add s390 hardware support for kernel zlib_inflate s390/boot: Add dfltcc= kernel command line parameter lib/zlib: Add zlib_deflate_dfltcc_enabled() function btrfs: Use larger zlib buffer for s390 hardware compression .../admin-guide/kernel-parameters.txt | 12 + arch/s390/boot/compressed/decompressor.c | 8 +- arch/s390/boot/ipl_parm.c | 14 + arch/s390/include/asm/setup.h | 7 + arch/s390/kernel/setup.c | 2 + fs/btrfs/compression.c | 2 +- fs/btrfs/zlib.c | 128 +++++--- include/linux/zlib.h | 6 + lib/Kconfig | 7 + lib/Makefile | 1 + lib/decompress_inflate.c | 13 + lib/zlib_deflate/deflate.c | 85 +++--- lib/zlib_deflate/deflate_syms.c | 1 + lib/zlib_deflate/deftree.c | 54 ---- lib/zlib_deflate/defutil.h | 134 ++++++++- lib/zlib_dfltcc/Makefile | 11 + lib/zlib_dfltcc/dfltcc.c | 55 ++++ lib/zlib_dfltcc/dfltcc.h | 155 ++++++++++ lib/zlib_dfltcc/dfltcc_deflate.c | 280 ++++++++++++++++++ lib/zlib_dfltcc/dfltcc_inflate.c | 149 ++++++++++ lib/zlib_dfltcc/dfltcc_syms.c | 17 ++ lib/zlib_dfltcc/dfltcc_util.h | 103 +++++++ lib/zlib_inflate/inflate.c | 32 +- lib/zlib_inflate/inflate.h | 8 + lib/zlib_inflate/infutil.h | 18 +- 25 files changed, 1150 insertions(+), 152 deletions(-) create mode 100644 lib/zlib_dfltcc/Makefile create mode 100644 lib/zlib_dfltcc/dfltcc.c create mode 100644 lib/zlib_dfltcc/dfltcc.h create mode 100644 lib/zlib_dfltcc/dfltcc_deflate.c create mode 100644 lib/zlib_dfltcc/dfltcc_inflate.c create mode 100644 lib/zlib_dfltcc/dfltcc_syms.c create mode 100644 lib/zlib_dfltcc/dfltcc_util.h -- 2.17.1