Date:   Sun, 26 Feb 2023 13:38:22 +0900
From:   Sergey Senozhatsky <senozhatsky@chromium.org>
To:     Minchan Kim <minchan@kernel.org>
Cc:     Sergey Senozhatsky <senozhatsky@chromium.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Yosry Ahmed <yosryahmed@google.com>,
        linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv2 3/6] zsmalloc: fine-grained inuse ratio based fullness
 grouping
Message-ID: <Y/riPlQ2UK00WirI@google.com>
References: <20230223030451.543162-1-senozhatsky@chromium.org>
 <20230223030451.543162-4-senozhatsky@chromium.org>
 <Y/f2WvhNlwhsf2Cz@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Y/f2WvhNlwhsf2Cz@google.com>
Precedence: bulk

On (23/02/23 15:27), Minchan Kim wrote:
> > + * Pages are distinguished by the ratio of used memory (that is the ratio
> > + * of ->inuse objects to all objects that page can store). For example,
> > + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> > + *
> > + * The number of fullness groups is not random. It allows us to keep
> > + * diffeence between the least busy page in the group (minimum permitted
> > + * number of ->inuse objects) and the most busy page (maximum permitted
> > + * number of ->inuse objects) at a reasonable value.
> > + */
> > +#define ZS_INUSE_RATIO_0	0
> 
> How about keeping ZS_EMPTY and ZS_FULL since they are used
> multiple places in source code? It would have less churning.

I have to admit that I sort of like the unified naming
	"zspage inuse ratio goes from 0 to 100"

but I can keep ZS_EMPTY / ZS_FULL as two "special" inuse values.

> > +#define ZS_INUSE_RATIO_10	1
> > +#define ZS_INUSE_RATIO_20	2
> > +#define ZS_INUSE_RATIO_30	3
> > +#define ZS_INUSE_RATIO_40	4
> > +#define ZS_INUSE_RATIO_50	5
> > +#define ZS_INUSE_RATIO_60	6
> > +#define ZS_INUSE_RATIO_70	7
> > +#define ZS_INUSE_RATIO_80	8
> > +#define ZS_INUSE_RATIO_90	9
> > +#define ZS_INUSE_RATIO_99	10
> 
> Do we really need all the define macro for the range from 10 to 99?
> Can't we do this?
> 
> enum class_stat_type {
>     ZS_EMPTY,
>     /*
>      * There are fullness buckets between 10% - 99%.
>      */
>     ZS_FULL = 11
>     NR_ZS_FULLNESS,
>     ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
>     ZS_OBJ_USED,
>     NR_ZS_STAT,
> }

This creates undocumented secret constats, which are being heavily
used (zspage fullness values, indeces in fullness lists arrays,
stats array offsets, etc.) but have no trace in the code. And this
also forces us to use magic number in the code. So should fullness
grouping change, things like, for instance, zs_stat_get(7), will
compile just fine yet will do something very different and we will
have someone to spot the regression.

So yes, it's 10 lines of defines, it's not even 10 lines of code, but
 1) it is documentation, we keep constats documented
 2) more importantly, it protects us from regressions and bugs

From maintinability point of view, having everything excpliticly
documented / spelled out is a win.

As of why I decided to go with defines, this is because zspage fullness
values and class stats are two conceptually different things, they don't
really fit in one single enum, unless enum's name is "zs_constants".
What do you think?

[..]
> >  	 * Size of objects stored in this class. Must be multiple
> >  	 * of ZS_ALIGN.
> > @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
> >  			continue;
> >  
> >  		spin_lock(&pool->lock);
> > -		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> > -		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> > +
> > +		/*
> > +		 * Replecate old behaviour for almost_full and almost_empty
> > +		 * stats.
> > +		 */
> > +		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
> 
> > +
> > +		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
> 
> I guess you can use just loop here from 1 to 6
> 
> And then from 7 to 10 for class_almost_full.

I can change it to

	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
and
	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)

which would be safer than using hard-coded numbers.

Shall we actually instead report per inuse ratio stats instead? I sort
of don't see too many reasons to keep that below/above 3/4 thing.