Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp1914924rwi; Thu, 3 Nov 2022 10:26:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4M13ksagfriGfJHleB37QSbfWwWr1fQZTnXaiKO+JuaIVh6zhAm4Uq533Rs7I04sRk+f+z X-Received: by 2002:a05:6402:1bda:b0:464:4eb2:8c25 with SMTP id ch26-20020a0564021bda00b004644eb28c25mr2093970edb.278.1667496370131; Thu, 03 Nov 2022 10:26:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667496370; cv=none; d=google.com; s=arc-20160816; b=YTouuLTKcS7TuPdj626tlCoXfgG2PjhfQbHkga2lHa12E6T/VysvqHRCskCh4+t8uB nJegNSwb1dRTVdGOmC4b+aYSKv7B+kBPg9FcTXnf7qevb934BjyGKonQaf7npdGC+57p A43eLDyBlFio8ZyXO6X9b8tT5t6wl41rAsOYX0zjRvDxm1c+Wq6RVQN3tPrG3+qaECgP 9BCMAdDL319gw1d9AZ0xApM11MP6OZyg+t6T1Y8jc19WeZKyobZcHwFgujAOKBTgT7Xu pgMBqk94Ajn+Tw5z838o2EtvxwRZZwoRaFGFUkokdnmU8InSAsmwp16GGHAnDBF9TGje gViA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=CYg+m5oLj8gc4kPsOwH3pQu6lmRCj0M+7nnvmSQAEtk=; b=YzYrACpILmW3MXGCVsjA4N3d8mJRKekSQilAN55G7s9xlsk9qVoMtmj/uO+GmVnR/X if3TV08dFsQzLws+HlRk75UUdCAcfNUO3c8zTL5Vrbtw8FhuU9UWM9wrZW/yfQhSt9/R dB4N99rR3enqRoQjDjwszl27+mtS/MHisds+VUCXe8vPE319It9z6E2A/AGgFkvijwUA glchmVEwZUBm9BCpzVGQ63FO1WHJjrScllL9/wt9iyvInJTu5PBLmjG5KfeSNOZuWAqR /8Ro98EA/d5YvP47KDnit0u+8YJlZ51jHJ+FTDVQU0e6tDXw/Ietgtxa/iFHmNHFlHuJ 8h3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ek0Tnf71; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga10-20020a1709070c0a00b00788a4c018b5si2299026ejc.806.2022.11.03.10.25.44; Thu, 03 Nov 2022 10:26:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ek0Tnf71; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231583AbiKCRAk (ORCPT + 97 others); Thu, 3 Nov 2022 13:00:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231146AbiKCRAh (ORCPT ); Thu, 3 Nov 2022 13:00:37 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E6AC1143 for ; Thu, 3 Nov 2022 10:00:36 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id q9so2186004pfg.5 for ; Thu, 03 Nov 2022 10:00:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=CYg+m5oLj8gc4kPsOwH3pQu6lmRCj0M+7nnvmSQAEtk=; b=ek0Tnf71xlL+BT9a8ZACJoSOdDzV2J9rf1Ooq0ml9b5EbDqxl048g53h5TgacjwdaP zsOwcUq2uVEFaWBGra1dV/rvd5Nw3JSSfJsezjJMKbm5TrSNVf/BUgH3/mEu8XQZ7obG pLHcgHeNhxJ7Vi6R1IhLq1FMJ9hA+HHrPEVBwYQkH1HszDmKU/7pZ5bd1UPiRxwRy8Wd +IiXtL1uUij+BveVQkFmhEzPNqu0SaAawTIkm6xcENDyNGYJJPly3TWiapjhgxldwLQs ikcsm1A96ge1R8rWBxdrGT8F0uctLpNc4yw5y5OHweeSwtAR0kQer2GS2yCVB2xzz2sQ DzBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CYg+m5oLj8gc4kPsOwH3pQu6lmRCj0M+7nnvmSQAEtk=; b=dnbbKHraxnh54Rfeo/CM0jizugwyy1T8a9dfVoJxLc3oCIIo9lPwuG1wE8e2yOsBxw wi/TnMatkzkbJxosOfc+Khb05a4uTyEq7LAqC0IaPpN4ZAmklW0DCKFcEkb/+sUdIN97 1QMxIYRk1f311eIpbnbSrXTz1Ota10nTBBUS1xmn7jtVlNKvEhxhezigEGORTlKYXUt0 kW+rhT8nPv3FXOqeT+FG8u6A7sppwp+CViM8tadXMlpD3icgkBz7kGG1lx2OEGvdxv5C LzaCuPDfAS4kn204zZm6sVMbwTBkUSdEkWM+b6+IWi7LM70azP2BvVV7nM4BsiFFou/Q ipHQ== X-Gm-Message-State: ACrzQf04XTpxeHa+kAMRK5bajarG/4goVhVGoVFSgzjcz0rethoG63zB NSnYfZ95c2FoJIXQEhbHKPk= X-Received: by 2002:a63:90c7:0:b0:442:87:3a38 with SMTP id a190-20020a6390c7000000b0044200873a38mr27164332pge.216.1667494835812; Thu, 03 Nov 2022 10:00:35 -0700 (PDT) Received: from google.com ([2620:15c:211:201:3d65:7dc2:c62a:5d98]) by smtp.gmail.com with ESMTPSA id e11-20020a170902784b00b0016d72804664sm880085pln.205.2022.11.03.10.00.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 10:00:35 -0700 (PDT) Sender: Minchan Kim Date: Thu, 3 Nov 2022 10:00:33 -0700 From: Minchan Kim To: Sergey Senozhatsky Cc: Andrew Morton , Nitin Gupta , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv4 4/9] zram: Introduce recompress sysfs knob Message-ID: References: <20221018045533.2396670-1-senozhatsky@chromium.org> <20221018045533.2396670-5-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FSL_HELO_FAKE,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 03, 2022 at 12:25:43PM +0900, Sergey Senozhatsky wrote: > On (22/11/02 14:06), Minchan Kim wrote: > > On Tue, Oct 18, 2022 at 01:55:28PM +0900, Sergey Senozhatsky wrote: > > > Allow zram to recompress (using secondary compression streams) > > > pages. We support three modes: > > > > > > 1) IDLE pages recompression is activated by `idle` mode > > > > > > echo idle > /sys/block/zram0/recompress > > > > > > 2) Since there may be many idle pages user-space may pass a size > > > watermark value (in bytes) and we will recompress IDLE pages only > > > of equal or greater size: > > > > > > echo 888 > /sys/block/zram0/recompress > > > > Hmm, how about having seperate knob for threshold? > > Per-my understanding this threshold can change quite often, > depending on memory pressure and so on. So we may force > user-space to issues more syscalls, without any gain in > simplicity. Sorry, didn't understand your point. Let me clarify my idea. If we have separate knob for recompress thresh hold, we could work like this. # recompress any compressed pages which is greater than 888 bytes. echo 888 > /sys/block/zram0/recompress_threshold # try to compress any pages greather than threshold with following # algorithm. echo "type=lzo priority=1" > /sys/block/zram0/recompress_algo echo "type=zstd priority=2" > /sys/block/zram0/recompress_algo echo "type=deflate priority=3" > /sys/block/zram0/recompress_algo > > > recompress_threshold > > > > With that, we could make rescompress 888 and idle/huge > > as well as only 888. > > > > echo 888 > /sys/block/zram0/recompress_threshold > > echo 1 > /sys/block/zram0/recompress > > > > or > > > > echo 888 > /sys/block/zram0/recompress_threshold > > echo idle > /sys/block/zram0/recompress > > > > or we can introduce the threshold with action item. > > > > echo "idle 888" > /sys/block/zram0/recompress > > echo "huge 888" > /sys/block/zram0/recompress > > echo "normal 888" > /sys/block/zram0/recompress > > I like the latter one, when threshold is an optional argument. > I probably would even go a bit further and add keywords: > > type=STRING threshold=INT Yeah, kerwords would be better. Let's discuss we need a threshold optional argument for each algo or just have one threshold for every secondary algorithm or both. > > > > 3) HUGE pages recompression is activated by `huge` mode > > > > > > echo huge > /sys/block/zram0/recompress > > > > > > 4) HUGE_IDLE pages recompression is activated by `huge_idle` mode > > > > > > echo huge_idle > /sys/block/zram0/recompress > > > > > > Signed-off-by: Sergey Senozhatsky > > > --- > > > drivers/block/zram/Kconfig | 15 +++ > > > drivers/block/zram/zram_drv.c | 196 +++++++++++++++++++++++++++++++++- > > > drivers/block/zram/zram_drv.h | 2 + > > > 3 files changed, 210 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > > > index d4100b0c083e..3e00656a6f8a 100644 > > > --- a/drivers/block/zram/Kconfig > > > +++ b/drivers/block/zram/Kconfig > > > @@ -78,3 +78,18 @@ config ZRAM_MEMORY_TRACKING > > > /sys/kernel/debug/zram/zramX/block_state. > > > > > > See Documentation/admin-guide/blockdev/zram.rst for more information. > > > + > > > +config ZRAM_MULTI_COMP > > > + bool "Enable multiple per-CPU compression streams" > > > > per-CPU is implementation detail. Let's do not mention it. > > OK. > > > > + depends on ZRAM > > > + help > > > + This will enable per-CPU multi-compression streams, so that ZRAM > > > > indentation > > OK. A question: does this matter? I don't see any problems in menuconfig. > > > > + can re-compress IDLE/huge pages, using a potentially slower but > > > + more effective compression algorithm. Note, that IDLE page support > > > + requires ZRAM_MEMORY_TRACKING. > > > + > > > + echo TIMEOUT > /sys/block/zramX/idle > > > + echo SIZE > /sys/block/zramX/recompress > > > + > > > + SIZE (in bytes) parameter sets the object size watermark: idle > > > + objects that are of a smaller size will not get recompressed. > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > > index 94c62d7ea818..da11560ecf70 100644 > > > --- a/drivers/block/zram/zram_drv.c > > > +++ b/drivers/block/zram/zram_drv.c > > > @@ -1282,6 +1282,12 @@ static void zram_free_page(struct zram *zram, size_t index) > > > atomic64_dec(&zram->stats.huge_pages); > > > } > > > > > > + if (zram_test_flag(zram, index, ZRAM_RECOMP)) > > > + zram_clear_flag(zram, index, ZRAM_RECOMP); > > > + > > > + if (zram_test_flag(zram, index, ZRAM_RECOMP_SKIP)) > > > + zram_clear_flag(zram, index, ZRAM_RECOMP_SKIP); > > > > Let's squeeze the comp algo index into meta area since we have > > some rooms for the bits. Then can we could remove the specific > > recomp two flags? > > What is meta area? zram->table[index].flags If we squeeze the algorithm index, we could work like this without ZRAM_RECOMP_SKIP. read_block_state zram_algo_idx(zram, index) > 0 ? 'r' : '.'); zram_read_from_zpool if (zram_algo_idx(zram, idx) != 0) idx = 1; zram_recompress .. we don't need to set ZRAM_RECOMP since every meta will have the algo idx. zram_free_page zram_clear_algo(zram, index); recompress_store int algo_idx = zram_algo_idx(zram, index); if (algo_idx == max_algo_idx) goto next > > > I am thinking the feature to work incoming pages on the fly, > > not only for recompress manual knob so it would be good > > if we could make the interface abstract to work something > > like this(I may miss something why we couldn't. I need more > > time to look into then) > > > > zram_bvec_write: > > > > for (i = 0; i < MAX_COMP_ALGO; i++) { > > zstrm = zcomp_stream_get(i); > > zcomp_compress(src, &comp_len); > > if (comp_len > threshold) { > > zcomp_stream_put(i); > > continue; > > } > > break; > > } > > > > zram_bvec_read: > > algo_idx = zram_get_algo_idx(zram, index); > > zstrm = zcomp_stream_get(zram, algo_idx); > > zcomp_decompress(zstrm); > > zcomp_stream_put(zram, algo_idx); > > Hmm. This is something that should not be enabled by default. Exactly. I don't mean to enable by default, either. > N compressions per every stored page is very very CPU and > power intensive. We definitely want a way to have recompression > as a user-space event, which gives all sorts of flexibility and > extensibility. ZRAM doesn't (and should not) know about too many > things, so ZRAM can't make good decisions (and probably should not > try). User-space can make good decisions on the other hand. > > So recompression for us is not something that happens all the time, > unconditionally. It's something that happens sometimes, depending on > the situation on the host. Totally agree. I am not saying we should enable the feature by default but at lesat consider it for the future. I have something in mind to be useful later. > > [..] > > > +static int zram_recompress(struct zram *zram, u32 index, struct page *page, > > > + int size_watermark) > > > +{ > > > + unsigned long handle_prev; > > > + unsigned long handle_next; > > > + unsigned int comp_len_next; > > > + unsigned int comp_len_prev; > > > > How about orig_handle and new_nandle with orig_comp_len and new_comp_len? > > No opinion. Can we have prev and next? :) prev and next gives the impression position something like list. orig and new gives the impression stale and fresh. We are doing latter here.