Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Date:   Fri, 4 Nov 2022 12:48:42 +0900
From:   Sergey Senozhatsky <senozhatsky@chromium.org>
To:     Minchan Kim <minchan@kernel.org>
Cc:     Sergey Senozhatsky <senozhatsky@chromium.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Nitin Gupta <ngupta@vflare.org>, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org
Subject: Re: [PATCHv4 4/9] zram: Introduce recompress sysfs knob
Message-ID: <Y2SLmuxLy8tf1X9m@google.com>
References: <20221018045533.2396670-1-senozhatsky@chromium.org>
 <20221018045533.2396670-5-senozhatsky@chromium.org>
 <Y2Lbxp6to4QNYyGe@google.com>
 <Y2M0t5etyJiUfeQi@google.com>
 <Y2PzseskzPelrZum@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Y2PzseskzPelrZum@google.com>
Precedence: bulk

On (22/11/03 10:00), Minchan Kim wrote:
[..]
> > Per-my understanding this threshold can change quite often,
> > depending on memory pressure and so on. So we may force
> > user-space to issues more syscalls, without any gain in
> > simplicity.
> 
> Sorry, didn't understand your point. Let me clarify my idea.
> If we have separate knob for recompress thresh hold, we could
> work like this.
> 
> # recompress any compressed pages which is greater than 888 bytes.
> echo 888 > /sys/block/zram0/recompress_threshold
> 
> # try to compress any pages greather than threshold with following
> # algorithm.
> 
> echo "type=lzo priority=1" > /sys/block/zram0/recompress_algo
> echo "type=zstd priority=2" > /sys/block/zram0/recompress_algo
> echo "type=deflate priority=3" > /sys/block/zram0/recompress_algo

OK. We can always add more sysfs knobs and make threshold a global
per-device value.

I think I prefer the approach when threshold is part of the current
recompress context, not something derived form another context. That
is, when all values (page type, threshold, possibly algorithm index)
are submitted by user-space for this particular recompression

	echo "type=huge threshold=3000 ..." > recompress

If threshold is a global value that is applied to all recompress calls
then how does user-space say no-threshold? For instance, when it wants
to recompress only huge pages. It probably still needs to supply something
like threshold=0. So my personal preference for now - keep threshold
as a context dependent value.

Another thing that I like about threshold= being context dependent
is that then we don't need to protect recompression against concurrent
global threshold modifications with lock and so on. It keeps things
simpler.

[..]
> > > Let's squeeze the comp algo index into meta area since we have
> > > some rooms for the bits. Then can we could remove the specific
> > > recomp two flags?
> > 
> > What is meta area?
> 
> zram->table[index].flags
> 
> If we squeeze the algorithm index, we could work like this
> without ZRAM_RECOMP_SKIP.

We still need ZRAM_RECOMP_SKIP. Recompression may fail to compress
object further: sometimes we can get recompressed object that is larger
than the original one, sometimes of the same size, sometimes of a smaller
size but still belonging to the same size class, which doesn't save us
any memory. Without ZRAM_RECOMP_SKIP we will continue re-compressing
objects that are in-compressible (in a way that saves us memory in
zsmalloc) by any of the ZRAM's algorithms.

> read_block_state
>     zram_algo_idx(zram, index) > 0 ? 'r' : '.');
> 
> zram_read_from_zpool
>     if (zram_algo_idx(zram, idx) != 0)
>         idx = 1;

As an idea, maybe we can store everything re-compression related
in a dedicated meta field? SKIP flag, algorithm ID, etc.

We don't have too many bits left in ->flags on 32-bit systems. We
currently probably need at least 3 bits - one for RECOMP_SKIP and at
least 2 for algorithm ID. 2 bits for algorithm ID put us into situation
that we can have only 00, 01, 10, 11 as IDs, that is maximum 3 recompress
algorithms: 00 is the primary one and the rest are alternative ones.
Maximum 3 re-compression algorithms sounds like a reasonable max value to
me. Yeah, maybe we can use flags bits for it.

[..]
> > > zram_bvec_read:
> > >     algo_idx = zram_get_algo_idx(zram, index);
> > >     zstrm = zcomp_stream_get(zram, algo_idx);
> > >     zcomp_decompress(zstrm);
> > >     zcomp_stream_put(zram, algo_idx);
> > 
> > Hmm. This is something that should not be enabled by default.
> 
> Exactly. I don't mean to enable by default, either.

OK.

> > N compressions per every stored page is very very CPU and
> > power intensive. We definitely want a way to have recompression
> > as a user-space event, which gives all sorts of flexibility and
> > extensibility. ZRAM doesn't (and should not) know about too many
> > things, so ZRAM can't make good decisions (and probably should not
> > try). User-space can make good decisions on the other hand.
> > 
> > So recompression for us is not something that happens all the time,
> > unconditionally. It's something that happens sometimes, depending on
> > the situation on the host.
> 
> Totally agree. I am not saying we should enable the feature by default
> but at lesat consider it for the future. I have something in mind to
> be useful later.

OK.

> > [..]
> > > > +static int zram_recompress(struct zram *zram, u32 index, struct page *page,
> > > > +			   int size_watermark)
> > > > +{
> > > > +	unsigned long handle_prev;
> > > > +	unsigned long handle_next;
> > > > +	unsigned int comp_len_next;
> > > > +	unsigned int comp_len_prev;
> > > 
> > > How about orig_handle and new_nandle with orig_comp_len and new_comp_len?
> > 
> > No opinion. Can we have prev and next? :)
> 
> prev and next gives the impression position something like list.
> orig and new gives the impression stale and fresh.
> 
> We are doing latter here.

Yeah, like I said in internal email, this will make rebasing harder on
my side, because this breaks a patch from Alexey and then breaks a higher
order zspages patch series. It's an very old series and we already have
quite a bit of patches depending on it.