LinuxLists.cc - [PATCH 1/4] mm: Kconfig: move swap and slab config options to the MM section

2021-08-30 18:52:56

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Johannes,

On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> Currently it requires poking at debugfs to figure out the size of the
> zswap cache size on a host. There are no counters for reads and writes
> against the cache. This makes it difficult to understand behavior on
> production systems.
>
> Print zswap memory consumption in /proc/meminfo, count zswapouts and
> zswapins in /proc/vmstat.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
> fs/proc/meminfo.c | 4 ++++
> include/linux/swap.h | 4 ++++
> include/linux/vm_event_item.h | 4 ++++
> mm/vmstat.c | 4 ++++
> mm/zswap.c | 11 +++++------
> 5 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 6fa761c9cc78..2dc474940691 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>
> show_val_kb(m, "SwapTotal: ", i.totalswap);
> show_val_kb(m, "SwapFree: ", i.freeswap);
> +#ifdef CONFIG_ZSWAP
> + seq_printf(m, "Zswap: %8lu kB\n",
> + (unsigned long)(zswap_pool_total_size >> 10));

Since we have zram as well as zswap, it would be great if
we can abstract both at once without introducing another
"Zram: " stuff in meminfo. A note: zram can support fs based on
on zram blk device as well as swap. Thus, term would be better
to say "compressed" rather than "swap".

How about this?

"Compressed: xx kB"

unsigined long total_compressed_memory(void) {
return zswap_compressed_mem() + zram_comporessed_mem();
}

> +#endif
> show_val_kb(m, "Dirty: ",
> global_node_page_state(NR_FILE_DIRTY));
> show_val_kb(m, "Writeback: ",
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 144727041e78..3b23c88b6a8d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -696,6 +696,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
> }
> #endif
>
> +#ifdef CONFIG_ZSWAP
> +extern u64 zswap_pool_total_size;
> +#endif
> +
> #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
> extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
> #else
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index ae0dd1948c2b..9dbebea09c69 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -125,6 +125,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> SWAP_RA,
> SWAP_RA_HIT,
> #endif
> +#ifdef CONFIG_ZSWAP
> + ZSWPIN,
> + ZSWPOUT,

INMEM_SWP[IN|OUT] to represent both zram and zswap ?
Feel free to suggest better word.

> +#endif
> #ifdef CONFIG_X86
> DIRECT_MAP_LEVEL2_SPLIT,
> DIRECT_MAP_LEVEL3_SPLIT,
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index cccee36b289c..31aada15c571 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1369,6 +1369,10 @@ const char * const vmstat_text[] = {
> "swap_ra",
> "swap_ra_hit",
> #endif
> +#ifdef CONFIG_ZSWAP
> + "zswpin",
> + "zswpout",
> +#endif
> #ifdef CONFIG_X86
> "direct_map_level2_splits",
> "direct_map_level3_splits",
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 20763267a219..f93a7c715f76 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -40,7 +40,7 @@
> * statistics
> **********************************/
> /* Total bytes used by the compressed storage */
> -static u64 zswap_pool_total_size;
> +u64 zswap_pool_total_size;
> /* The number of compressed pages currently stored in zswap */
> static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> /* The number of same-value filled pages currently stored in zswap */
> @@ -1231,6 +1231,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> /* update stats */
> atomic_inc(&zswap_stored_pages);
> zswap_update_total_size();
> + count_vm_event(ZSWAPOUT);
>
> return 0;
>
> @@ -1273,11 +1274,10 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> zswap_fill_page(dst, entry->value);
> kunmap_atomic(dst);
> ret = 0;
> - goto freeentry;
> + goto stats;
> }
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> tmp = kmalloc(entry->length, GFP_ATOMIC);
> if (!tmp) {
> ret = -ENOMEM;
> @@ -1292,10 +1292,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> src += sizeof(struct zswap_header);
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> memcpy(tmp, src, entry->length);
> src = tmp;
> -
> zpool_unmap_handle(entry->pool->zpool, entry->handle);
> }
>
> @@ -1314,7 +1312,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> kfree(tmp);
>
> BUG_ON(ret);
> -
> +stats:
> + count_vm_event(ZSWAPIN);
> freeentry:
> spin_lock(&tree->lock);
> zswap_entry_put(tree, entry);
> --
> 2.32.0
>
>

2021-11-02 19:20:05

by Johannes Weiner

[permalink] [raw]

Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Minchan,

Sorry about the delay, I'm just now getting back to these patches.

On Mon, Aug 30, 2021 at 11:49:59AM -0700, Minchan Kim wrote:
> Hi Johannes,
>
> On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> > Currently it requires poking at debugfs to figure out the size of the
> > zswap cache size on a host. There are no counters for reads and writes
> > against the cache. This makes it difficult to understand behavior on
> > production systems.
> >
> > Print zswap memory consumption in /proc/meminfo, count zswapouts and
> > zswapins in /proc/vmstat.
> >
> > Signed-off-by: Johannes Weiner <[email protected]>
> > ---
> > fs/proc/meminfo.c | 4 ++++
> > include/linux/swap.h | 4 ++++
> > include/linux/vm_event_item.h | 4 ++++
> > mm/vmstat.c | 4 ++++
> > mm/zswap.c | 11 +++++------
> > 5 files changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > index 6fa761c9cc78..2dc474940691 100644
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> >
> > show_val_kb(m, "SwapTotal: ", i.totalswap);
> > show_val_kb(m, "SwapFree: ", i.freeswap);
> > +#ifdef CONFIG_ZSWAP
> > + seq_printf(m, "Zswap: %8lu kB\n",
> > + (unsigned long)(zswap_pool_total_size >> 10));
>
> Since we have zram as well as zswap, it would be great if
> we can abstract both at once without introducing another
> "Zram: " stuff in meminfo. A note: zram can support fs based on
> on zram blk device as well as swap. Thus, term would be better
> to say "compressed" rather than "swap".
>
> How about this?
>
> "Compressed: xx kB"

Wouldn't it make more sense to keep separate counters? Zswap and zram
are quite different from each other.

From an MM perspective, zram is an opaque storage backend. zswap OTOH
is an explicit MM cache stage which may in the future make different
decisions than zram, be integrated into vmscan's LRU hierarchy
etc. And in theory, you could put zswap with fast compression in front
of a zram device with denser compression, right?

I agree zram should probably also have memory counters, but I think it
makes sense to recognize zswap as a unique MM layer.

2021-11-10 19:11:31

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Johannes,

On Tue, Nov 02, 2021 at 11:06:17AM -0400, Johannes Weiner wrote:
> Hi Minchan,
>
> Sorry about the delay, I'm just now getting back to these patches.
>
> On Mon, Aug 30, 2021 at 11:49:59AM -0700, Minchan Kim wrote:
> > Hi Johannes,
> >
> > On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> > > Currently it requires poking at debugfs to figure out the size of the
> > > zswap cache size on a host. There are no counters for reads and writes
> > > against the cache. This makes it difficult to understand behavior on
> > > production systems.
> > >
> > > Print zswap memory consumption in /proc/meminfo, count zswapouts and
> > > zswapins in /proc/vmstat.
> > >
> > > Signed-off-by: Johannes Weiner <[email protected]>
> > > ---
> > > fs/proc/meminfo.c | 4 ++++
> > > include/linux/swap.h | 4 ++++
> > > include/linux/vm_event_item.h | 4 ++++
> > > mm/vmstat.c | 4 ++++
> > > mm/zswap.c | 11 +++++------
> > > 5 files changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > > index 6fa761c9cc78..2dc474940691 100644
> > > --- a/fs/proc/meminfo.c
> > > +++ b/fs/proc/meminfo.c
> > > @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> > >
> > > show_val_kb(m, "SwapTotal: ", i.totalswap);
> > > show_val_kb(m, "SwapFree: ", i.freeswap);
> > > +#ifdef CONFIG_ZSWAP
> > > + seq_printf(m, "Zswap: %8lu kB\n",
> > > + (unsigned long)(zswap_pool_total_size >> 10));
> >
> > Since we have zram as well as zswap, it would be great if
> > we can abstract both at once without introducing another
> > "Zram: " stuff in meminfo. A note: zram can support fs based on
> > on zram blk device as well as swap. Thus, term would be better
> > to say "compressed" rather than "swap".
> >
> > How about this?
> >
> > "Compressed: xx kB"
>
> Wouldn't it make more sense to keep separate counters? Zswap and zram
> are quite different from each other.
>
> From an MM perspective, zram is an opaque storage backend. zswap OTOH
> is an explicit MM cache stage which may in the future make different
> decisions than zram, be integrated into vmscan's LRU hierarchy
> etc. And in theory, you could put zswap with fast compression in front
> of a zram device with denser compression, right?

My view is the the allocators aims to store compressed memory.
Likewise slab allocator, we could use the allocator any places
and display the total memory usage from the allocator in meminfo
instead of each subsystem and look at slabinfo if we need further
break down. I think it could work for this case, too.

>
> I agree zram should probably also have memory counters, but I think it
> makes sense to recognize zswap as a unique MM layer.

Under your view, I think it would introduce Zram-swap, Zram-block
as well as Zswap. If folks think it's better idea, I am fine and
happy to post a patch to merge it along with this patch.