Received: by 10.213.65.16 with SMTP id m16csp24291imf; Sun, 11 Mar 2018 12:59:50 -0700 (PDT) X-Google-Smtp-Source: AG47ELvabvyWF2urjBN2zNJHBXVjCZo6g3/pZE77a33HYIPxQqY52lqKbgDC9BZO8/1ajwike8Jh X-Received: by 2002:a17:902:650c:: with SMTP id b12-v6mr5810494plk.147.1520798390860; Sun, 11 Mar 2018 12:59:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520798390; cv=none; d=google.com; s=arc-20160816; b=VPXnnMS9sVCOvspyX+eKREfntJpg4ITPrf7Q4i2CCPmoy44gHVuTuk6SJ1l0IqPxH2 n8VrWg/COY0YzSqtWFb6EqxiK2M7vXzKQU1uX5FDAOn/Gm679mwCz5KQEULQLXApO2s+ uGP0FDMU7dLrU333u2U9FrphbYidmqgCMNLpyfJf2kKMBNPtt9GjgKw8WNMFbuh4UoGA f635avOSgWiS3utw0V+FKqdHfZ9TMk1G0VM1+AgwNu1daFA1LfbI4zrapxPGXup1AXug QGJCfb05ytz1JBSUJ1L0SQOCb3Z5ES5dMa9uH+WrRxC8f5R2ZvdpqwWW0XqYyk6GFE71 KFlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=Qh9GbPFYT55hIqh77CfhpFRgyf5ahqy6GDzAiHpCj0o=; b=o2ZqjkVrhuAaMKVKx96oOPKiuahSrCIvKXrMLjYoHYBfWc/Io9RNa0+0P3Hw0zCVeJ lEbr0NLCF0Uy//y1AicS4YbNUJB/OCcS8tgqjIwYFQqniyOW78XZbybJf2mDXOzq/lE5 ANkBDWIU0BP+XwhJY5gf8J4pq/sJ6vLAX9LrNTiiGoyDekh0exW8BYl9At5b9QMNONja z1wuB6v/7tuuh4au755L7LUniVK+N7QLcpOVV/4J2HNYqLdnF5OzZd/XNdSmi5xincQR VO1t2+l4S5kjz6xpdFt9EAsZqar0g6NoycVvl4UqdvhTeV0sWaxR1pwqDsaI+8d2hqEK IdPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q+PDRmD2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g1si4021648pgo.257.2018.03.11.12.59.36; Sun, 11 Mar 2018 12:59:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q+PDRmD2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932219AbeCKT6l (ORCPT + 99 others); Sun, 11 Mar 2018 15:58:41 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:43852 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932148AbeCKT6j (ORCPT ); Sun, 11 Mar 2018 15:58:39 -0400 Received: by mail-pg0-f65.google.com with SMTP id e9so5640196pgs.10; Sun, 11 Mar 2018 12:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Qh9GbPFYT55hIqh77CfhpFRgyf5ahqy6GDzAiHpCj0o=; b=q+PDRmD2uxdZSItqQVIrp/zMJmUxTnnFE0hG/RH0yNpyii3f7GRBc8ZJvmBPI/9ucD n/pV0ZAkSrhXk0vVY9rnT0LObeLb2a2PHRqFfo6ihB4nwCEvRC8T97ED3kwAg6mPac+V tkyhYjt2NbBGhp1I+TuubofYq/fslHgym2UH4dkBBQoqyqZq8olDT3VpdRfGVc2Oy81U 3mndiIp6ostbZgsRtGHnTpi6CxexkBv/dkbQiJBw6Mt8yrovdu0u0Lsg0xkNxmJPbYBu Euoy89iI9GVZE/k9j2n7jkInAvN8UHT4iKUZSW12naQPPnotdriaoA5qV/cp8PT7WpU8 4Fgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Qh9GbPFYT55hIqh77CfhpFRgyf5ahqy6GDzAiHpCj0o=; b=BeCn50iHKYq25Z7V/74JAkogPxo2b0mMPF1Jr0+0qQGglRfr02bcRN+EBI32poRUQP T4cJAsMqxl6TJPheHF1QogDAZs0tFgj0GqtNCacP4kB948eKoPJwZOL9jqc/hSb86CPx IpuXn3KfT9t+6W0l7fYZ8SI2fXBNwBfbHhzb8ljVBvaa3+gkErSMj9rQFn/Mfggnhf6a Eg548TyPmnTb42rDE9rkFZQ+QllJSnE34enqmYZRqGqXThLRZI4WzIbKJPWdB9oCsw0y hRSBi9KQ28EzTRM4zJqUHzRq6tPStjMRus72Wjpr9e5G8NBNtUyKGmIOmUYS1GT9h+qV wFpA== X-Gm-Message-State: AElRT7EtAgT+TXYcvO9y5UDWzCG8ZT9wi/1VdUEGP/XTVBHNFyvcPVUs mcqM9pTzVwvJrejZ3LeY0IHi/qWT X-Received: by 10.99.95.84 with SMTP id t81mr4451289pgb.400.1520798318701; Sun, 11 Mar 2018 12:58:38 -0700 (PDT) Received: from zzz.localdomain (c-67-185-97-198.hsd1.wa.comcast.net. [67.185.97.198]) by smtp.gmail.com with ESMTPSA id t20sm12333391pfh.182.2018.03.11.12.58.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 11 Mar 2018 12:58:37 -0700 (PDT) Date: Sun, 11 Mar 2018 12:58:35 -0700 From: Eric Biggers To: Benjamin Warnke <4bwarnke@informatik.uni-hamburg.de> Cc: Linux Crypto Mailing List , linux-kernel@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, minchan@kernel.org, ngupta@vflare.org, sergey.senozhatsky.work@gmail.com Subject: Re: [RESEND PATCH v3] crypto: add zBeWalgo compression for zram Message-ID: <20180311195835.GB630@zzz.localdomain> References: <20180306221342.GA178079@gmail.com> <16D8CBDD-CE09-49A5-86A1-1176847EDD71@informatik.uni-hamburg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16D8CBDD-CE09-49A5-86A1-1176847EDD71@informatik.uni-hamburg.de> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Benjamin, On Wed, Mar 07, 2018 at 12:50:08PM +0100, Benjamin Warnke wrote: > Hi Eric, > > > On 06.03.2018 at 23:13, Eric Biggers wrote: > > > > Hi Benjamin, > > > > On Tue, Mar 06, 2018 at 09:23:08PM +0100, Benjamin Warnke wrote: > >> Currently ZRAM uses compression-algorithms from the crypto-api. ZRAM > >> compresses each page individually. As a result the compression algorithm is > >> forced to use a very small sliding window. None of the available compression > >> algorithms is designed to achieve high compression ratios with small inputs. > >> > >> This patch adds a new compression algorithm 'zBeWalgo' to the crypto api. This > >> algorithm focusses on increasing the capacity of the compressed block-device > >> created by ZRAM. The choice of compression algorithms is always a tradeoff > >> between speed and compression ratio. > >> > > [...] > >> > >> Zstd is not in the list of compared algorithms, because it is not available > >> for Zram in the currently available kernel trees. > >> > > > > This still isn't a valid excuse for not comparing it to Zstandard. Zstandard is > > already in the kernel in lib/. It would only need glue code to wire it up as an > > scomp_alg to the crypto API. > > I'll add benchmarks with Zstandard. > > > In contrast you're proposing an entirely new > > algorithm. Ultimately, by proposing a new algorithm you're on the hook to > > demonstrate that existing algorithms aren't good enough. Note that many of the > > existing algorithms including Zstandard also have multiple compression levels > > available to choose from. > > The crypto api exposes each algorithm only once with its default compression level. > Currently there is no switch/option/mechanism/code or something to switch thoose levels. > Of course I could benchmark the algorithms with multiple compression levels. > How many / Which compression levels would you like to see to be compared with each other? > However many are needed to demonstrate that the new algorithm is needed. E.g. if your primary argument is that the new algorithm provides a better compression ratio, than you'd need to show that simply choosing a higher compression level with Zstandard or DEFLATE isn't good enough. > > It's also rare to add a compression or encryption algorithm to the Linux kernel > > that isn't already used somewhere else. > > The kernel is the only place where so many small pieces of data need to be compressed. > In user space nobody cares that the data is splitted into pages, and therefore compression usually applied to large datasets. > Not really true. There are other situations where people need near-random access to data so not too much can be compressed at once. Also there are applications where compression needs to be applied at the level of individual packets or messages because e.g. the messages need to be sent over the network right away to meet a time constraint. > > Have you at least written a formal > > specification of the 'zBeWalgo' data format? > > Not yet. > > > Zstandard, for example, had a > > well-tested and optimized userspace library and a specification of the data > > format before it was added to the kernel. And even before that it took a couple > > years of development to stabilize the Zstandard data format. > > > > Now, it's true that you don't strictly need a stable data format if the > > algorithm will only ever be used for RAM compression. But, if you want to take > > that shortcut, then you're on the hook to justify it, and perhaps to enlighten > > the crypto API about algorithms that don't have a stable data format (e.g. using > > a new CRYPTO_ALG_* flag) so that users have to more explicitly opt-in to using > > them. > > I could add CRYPTO_ALG_TYPE_COMPRESS_UNSTABLE as an alias for CRYPTO_ALG_TYPE_COMPRESS > A real flag would probably make more sense than an algorithm type. Note that it could also be meaningful for other algorithm types besides compression, e.g. encryption algorithms. > > You cannot just ignore fuzz-safety in the decompressor either. The existing > > decompressors exposed through the crypto API are fuzz-safe, so it's not valid to > > compare an unsafe decompressor to them. If you really do want to add an unsafe > > decompressor then you'd likely need to look into adding crypto API support for > > users to request unsafe decompression -- which could also be used to expose the > > unsafe version of the LZ4 decompressor (LZ4_decompress_fast() instead of > > LZ4_decompress_safe()). Or if you do make the decompressor safe, then of course > > you'd actually need to fuzz it; I suggest porting the code to userspace and > > running american fuzzy lop (afl-fuzz) on it: http://lcamtuf.coredump.cx/afl/ > > The current version of the decompressor is fuzz-safe. I tested it with lots of data and made sure, that each possible branch in the code is executed multiple times in different combinations. > Are you sure? Just taking a random example I found in 2 minutes, in decompress_rle() there is no validation that the destination buffer is not overflowed, even though each 2 source bytes can expand into 128 destination bytes. Branch coverage is not enough. You need to fuzz it with real fuzzers such as afl-fuzz and libFuzzer. It's 2018; there's no excuse for not doing so anymore for code that is easily plugged into a fuzzer, e.g. decompression functions. - Eric