Sender: Minchan Kim <minchan.kim@gmail.com>
Date:   Tue, 28 Feb 2023 14:53:53 -0800
From:   Minchan Kim <minchan@kernel.org>
To:     Sergey Senozhatsky <senozhatsky@chromium.org>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Yosry Ahmed <yosryahmed@google.com>,
        linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv2 3/6] zsmalloc: fine-grained inuse ratio based fullness
 grouping
Message-ID: <Y/6GAYJ4c9W0bPzp@google.com>
References: <20230223030451.543162-1-senozhatsky@chromium.org>
 <20230223030451.543162-4-senozhatsky@chromium.org>
 <Y/f2WvhNlwhsf2Cz@google.com>
 <Y/riPlQ2UK00WirI@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Y/riPlQ2UK00WirI@google.com>
Precedence: bulk

On Sun, Feb 26, 2023 at 01:38:22PM +0900, Sergey Senozhatsky wrote:
> On (23/02/23 15:27), Minchan Kim wrote:
> > > + * Pages are distinguished by the ratio of used memory (that is the ratio
> > > + * of ->inuse objects to all objects that page can store). For example,
> > > + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> > > + *
> > > + * The number of fullness groups is not random. It allows us to keep
> > > + * diffeence between the least busy page in the group (minimum permitted
> > > + * number of ->inuse objects) and the most busy page (maximum permitted
> > > + * number of ->inuse objects) at a reasonable value.
> > > + */
> > > +#define ZS_INUSE_RATIO_0	0
> > 
> > How about keeping ZS_EMPTY and ZS_FULL since they are used
> > multiple places in source code? It would have less churning.
> 
> I have to admit that I sort of like the unified naming
> 	"zspage inuse ratio goes from 0 to 100"
> 
> but I can keep ZS_EMPTY / ZS_FULL as two "special" inuse values.
> 
> > > +#define ZS_INUSE_RATIO_10	1
> > > +#define ZS_INUSE_RATIO_20	2
> > > +#define ZS_INUSE_RATIO_30	3
> > > +#define ZS_INUSE_RATIO_40	4
> > > +#define ZS_INUSE_RATIO_50	5
> > > +#define ZS_INUSE_RATIO_60	6
> > > +#define ZS_INUSE_RATIO_70	7
> > > +#define ZS_INUSE_RATIO_80	8
> > > +#define ZS_INUSE_RATIO_90	9
> > > +#define ZS_INUSE_RATIO_99	10
> > 
> > Do we really need all the define macro for the range from 10 to 99?
> > Can't we do this?
> > 
> > enum class_stat_type {
> >     ZS_EMPTY,
> >     /*
> >      * There are fullness buckets between 10% - 99%.
> >      */
> >     ZS_FULL = 11
> >     NR_ZS_FULLNESS,
> >     ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
> >     ZS_OBJ_USED,
> >     NR_ZS_STAT,
> > }
> 
> This creates undocumented secret constats, which are being heavily
> used (zspage fullness values, indeces in fullness lists arrays,
> stats array offsets, etc.) but have no trace in the code. And this
> also forces us to use magic number in the code. So should fullness
> grouping change, things like, for instance, zs_stat_get(7), will
> compile just fine yet will do something very different and we will
> have someone to spot the regression.
> 
> So yes, it's 10 lines of defines, it's not even 10 lines of code, but
>  1) it is documentation, we keep constats documented
>  2) more importantly, it protects us from regressions and bugs
> 
> From maintinability point of view, having everything excpliticly
> documented / spelled out is a win.
> 
> As of why I decided to go with defines, this is because zspage fullness
> values and class stats are two conceptually different things, they don't
> really fit in one single enum, unless enum's name is "zs_constants".
> What do you think?

Agree. We don't need to combine them, then. 
BTW, I still prefer the enum instead of 10 define.

enum fullness_group {
    ZS_EMPTY,
    ZS_INUSE_RATIO_MIN,
    ZS_INUSE_RATIO_ALMOST_FULL = 7,
    ZS_INUSE_RATIO_MAX = 10,
    ZS_FULL,
    NR_ZS_FULLNESS,
}

> 
> [..]
> > >  	 * Size of objects stored in this class. Must be multiple
> > >  	 * of ZS_ALIGN.
> > > @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
> > >  			continue;
> > >  
> > >  		spin_lock(&pool->lock);
> > > -		class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> > > -		class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> > > +
> > > +		/*
> > > +		 * Replecate old behaviour for almost_full and almost_empty
> > > +		 * stats.
> > > +		 */
> > > +		class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> > > +		class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
> > 
> > > +
> > > +		class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> > > +		class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
> > 
> > I guess you can use just loop here from 1 to 6
> > 
> > And then from 7 to 10 for class_almost_full.
> 
> I can change it to
> 
> 	for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
> and
> 	for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)
> 
> which would be safer than using hard-coded numbers.

I didn't mean to have hard code either but just wanted to show
the intention to use the loop.

> 
> Shall we actually instead report per inuse ratio stats instead? I sort
> of don't see too many reasons to keep that below/above 3/4 thing.

Oh, yeah. Since it's debugfs, we would get excuse to break.