Date: Mon, 28 Aug 2017 15:52:38 +0900
From: Minchan Kim <minchan@kernel.org>
To: Nick Terrell <terrelln@fb.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        "sergey.senozhatsky.work@gmail.com" 
        <sergey.senozhatsky.work@gmail.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Yann Collet <cyan@fb.com>
Subject: Re: [PATCH] zram: add zstd to the supported algorithms list
Message-ID: <20170828065238.GA6309@blaptop>
References: <20170824014936.4738-1-sergey.senozhatsky@gmail.com>
 <AA6434F2-6932-436E-B063-B6C0417A553C@contoso.com>
 <20170825004947.GE29701@js1304-P5Q-DELUXE>
 <27EDD68A-61DC-42F1-8422-16B9AB9F0EB3@fb.com>
 <20170825051925.GB26819@blaptop>
 <315465C2-671F-4165-970E-B74ACFB9398D@fb.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <315465C2-671F-4165-970E-B74ACFB9398D@fb.com>
User-Agent: Mutt/1.8.3 (2017-05-23)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3472
Lines: 80

Hi Nick,

On Fri, Aug 25, 2017 at 07:31:14PM +0000, Nick Terrell wrote:
> On 8/24/17, 10:19 PM, "Minchan Kim" <minchan@kernel.org> wrote:
> > On Fri, Aug 25, 2017 at 01:35:35AM +0000, Nick Terrell wrote:
> [..]
> > > I think using dictionaries in zram could be very interesting. We could for
> > > example, take a random sample of the RAM and use that as the dictionary
> > > for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> > > dictionary (sizes may vary).
> > 
> > For static option, could we create the dictionary with data in zram
> > and dump the dictionary into file. And then, rebuiling zram or kernel
> > includes the dictionary into images.
> > 
> > For it, we would need some knob like
> > 
> >         cat /sys/block/zram/zstd_dict > dict.data
> > 
> >         CONFIG_ZSTD_DICT_DIR=
> >         CONFIG_ZSTD_DICT_FILE= 
> 
> My guess is that a static dictionary won't cut it, since different
> workloads will have drastically different RAM contents, so we won't be able
> to construct a single dictionary that works for them all. I'd love to be
> proven wrong though.

zRAM is popular for system swap in embedded world. In mobile phone,
there would be different workloads as you said but other scenario
like refrigerator, TV and so will have very specific scenario
so it would be a great to have.

> 
> > For dynamic option, could we make the dictionary with data
> > in zram dynamically? So, upcoming pages will use the newly
> > created dictionary but old compressed pages will use own dictionary.
> 
> Yeah thats totally possible on the compression side, we would just need to
> save which pages were compressed with which dictionary somewhere.

Great. We have zram->table for object based and zspage for pages unit
so I expect it wouldn't be hard to implement.

> 
> > I'm not sure it's possible, anyway, if predefined dict can help
> > comp ratio a lot in 4K data, I really love the feature and will support
> > to have it. ;)
> > 
> > > 
> > > I'm not sure how you would pass a dictionary into the crypto compression
> > > API, but I'm sure we can make something work if dictionary compression
> > > proves to be beneficial enough.
> > 
> > Yes, it would be better to integrate the feature crypto but Please, don't tie to
> > crypto API. If it's hard to support with current cypto API in short time,
> > I really want to support it with zcomp_zstd.c.
> > 
> > Please look at old zcomp model.
> > http://elixir.free-electrons.com/linux/v4.7/source/drivers/block/zram/zcomp_lz4.c
> 
> Thanks for the link, we could definitely make zcomp work with dictionaries.
> 
> > > What data have you, or anyone, used for benchmarking compression ratio and 
> > > speed for RAM? Since it is such a specialized application, the standard
> > > compression benchmarks aren't very applicable.
> > 
> > I have used my image dumped from desktop swap device.
> > Of course, it doesn't cover all of cases in the world but it would be better
> > to use IO benchmark buffer, IMHO. :)
> 
> Since adding dictionary support won't be quite as easy as adding zstd
> support, I think the first step is building a set of benchmarks that
> represent some common real world scenarios. We can easily test different
> dictionary construction algorithms in userspace, and determine if the work
> will pay off for some workloads. I'll collect some RAM samples from my
> device and run some preliminary tests.

Sweet. I am looking forward to seeing your result.
Thanks!