Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754056AbdHYFZz (ORCPT ); Fri, 25 Aug 2017 01:25:55 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:32885 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751453AbdHYFZx (ORCPT ); Fri, 25 Aug 2017 01:25:53 -0400 Date: Fri, 25 Aug 2017 14:26:09 +0900 From: Sergey Senozhatsky To: Nick Terrell Cc: Sergey Senozhatsky , Joonsoo Kim , "linux-kernel@vger.kernel.org" , "minchan@kernel.org" , Yann Collet Subject: Re: [PATCH] zram: add zstd to the supported algorithms list Message-ID: <20170825052609.GC5876@jagdpanzerIV.localdomain> References: <20170824014936.4738-1-sergey.senozhatsky@gmail.com> <20170825004947.GE29701@js1304-P5Q-DELUXE> <27EDD68A-61DC-42F1-8422-16B9AB9F0EB3@fb.com> <20170825015301.GB743@jagdpanzerIV.localdomain> <08620C31-AD6A-4DDB-ACA6-22243920AE1E@fb.com> <20170825022106.GC743@jagdpanzerIV.localdomain> <69F9B64F-B138-4D8B-9167-9B91855CFFA9@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <69F9B64F-B138-4D8B-9167-9B91855CFFA9@fb.com> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1971 Lines: 35 On (08/25/17 02:46), Nick Terrell wrote: > On 8/24/17, 7:21 PM, "Sergey Senozhatsky" wrote: > > not really familiar either... I was thinking about having "zstd" and > > "zstd_dict" crypto_alg structs - one would be !dict, the other one would > > allocate dict and pass it to compress/decompress zstd callbacks. "zstd" > > vecrsion would invoke zstd_params() passing zeros as compress and dict > > sizes to ZSTD_getParams(), while "zstd_dict" would invoke, lets say, > > zstd_params_dict() passing PAGE_SIZE-s. hm... (0, PAGE_SIZE)? to > > ZSTD_getParams(). just a rough idea... > > The way zstd dictionaries work is the user provides some data which gets > "prepended" to the data that is about to be compressed, without actually > writing it to output. That way zstd can find matches in the dictionary and > represent them for "free". That means the user has to pass the same data to > both the compressor and decompressor. ah... I thought zstd would construct the dictionary for us based on the data it compresses; and we just need to provide the buffer. > We could build a dictionary, say every 20 minutes, by sampling 512 B chunks > of the RAM and constructing a 16 KB dictionary. Then recompress all the > compressed RAM with the new dictionary. This is just a simple example of a > dictionary construction algorithm. You could imagine grouping pages by > application, and building a dictionary per application, since those pages > would likely be more similar. > > Regarding the crypto API, I think it would be possible to experiment by > creating functions like > `zstd_comp_add_dictionary(void *ctx, void *data, size_t size)' > and `zstd_decomp_add_dictionary(void *ctx, void *data, size_t size)' > in the crypto zstd implementation and declare them in `zcomp.c'. If the > experiments prove that using zstd dictionaries (or LZ4 dictionaries) is > worthwhile, then we can figure out how we can make it work for real. -ss