Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp2511473ybe; Thu, 12 Sep 2019 10:28:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxsM7Pt+FQ/mfmeFJ657kQNZhRFVGUnRNWTVsBMfvsZm8fHLQC+5GS6v89Pq8ZHEdSp0uFi X-Received: by 2002:aa7:c24f:: with SMTP id y15mr44781337edo.152.1568309312585; Thu, 12 Sep 2019 10:28:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568309312; cv=none; d=google.com; s=arc-20160816; b=PC77s1RS3R1ru11mwXr9qY/uoZ6R+QTGTnVk1WSQID4hC0ewwZ1N/jZXL8uf3wkSH7 VU90G1lSfEzbn+UZSO1NvNpPUa1qfJnqQBiWFvY39jp25l7E2msDPgf6FCgjQbBSf6Ef GSniOcbq4qKJLhts1JFK4+iVyo086zsH6ZCEwfdmGmXLT4N4K8tvguYkM7DrfB2fC0AX 1VTkvqRd9NM9xzeje+IBnU9//1MXvOETxylzjM3nOsEX78ylYc/F1LDXq/rBKPcoE+FI ObC+XaKly2MLqUE7nyyI9KvXk1EtwcW+CIlsbt6jgWHbtoSQ4eDBpEi2NStVn1Fnis2p eMcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:to:from:subject:message-id :dkim-signature; bh=NJ+V9WdH/Mr55CbE+4tCVIXroMyvqDzfmVNb8fYQdC4=; b=bnpBq2HJn4fUlrREqosf7jd6xn47sN9idDBhla9Z90YauSgEp+hgm95f934P/JYbq+ 2az43jIHa4kP6clbZUpeC/4djyTCPfPa+XXKeG5ZTInFgBLQxAC+dauE79Xvw0kdgGvo V+w6yMytMe3FOGuo793iBOpdC7aA7sbRTJ2N7cJr+8x3GZJvHMnA02Y5Zfa0xCet3Qqi bGA5288kCrozLQUeIgtq7SrE4e+GA739E3UBj7X5pjZ+nA7qag3IwupMqW7+phLQjgYs tBtBrDeNpWxMTvL4RWaGT+/yr8fIDqKsvj5rw3ZKAB6X7+kSGQ5NNvH4KLQoMIoBrVxn fjDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=X9BwuCAM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z12si1772356eja.184.2019.09.12.10.28.08; Thu, 12 Sep 2019 10:28:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=X9BwuCAM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731565AbfILMC4 (ORCPT + 99 others); Thu, 12 Sep 2019 08:02:56 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:46841 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730454AbfILMCz (ORCPT ); Thu, 12 Sep 2019 08:02:55 -0400 Received: by mail-qk1-f196.google.com with SMTP id 201so24127375qkd.13 for ; Thu, 12 Sep 2019 05:02:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=message-id:subject:from:to:date:in-reply-to:references:mime-version :content-transfer-encoding; bh=NJ+V9WdH/Mr55CbE+4tCVIXroMyvqDzfmVNb8fYQdC4=; b=X9BwuCAMyL0HkIMHQZTrYkcKYoxesQQBc03dGTAerff++yYqnueCFUG+9R5d+jXfsD ILAnkSwukAjd9PMBWa9h1go0l+VstSm5Hci3nDYtrAlFN5FC8VzMhHh/wM1ypkfVi7I5 xC2NQpOkaZGRGDDblnEmqtsCDwJjKPhPf1gxSpyhFJAfAnHkZToO0QVL3o7qmb6yq3kl /fPKJ36uZKtumx4PMe051elYi1fZr6YsGFpQtmalR3a03is+1tETesXYb8nLmi7YOgWT YbiZACAjO9JVtxcJOF8Fl8Vh+s/Y/AMgZMpVqQHjcJCPht1J5olzVhZ4edarhiUmbZFX 4ajw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=NJ+V9WdH/Mr55CbE+4tCVIXroMyvqDzfmVNb8fYQdC4=; b=qT78JywTaj+1n9b9tUAzR/N31CuDakQvsJHChj8nBqQCVafTb3HDH4ep3GZfiLHWMn 7ayiKuHMblBe2DN7Qg4yCM8YsjCpGD58ixb4BVRbxuImN4bgS3PGOA4SkrQOiyAGLGFN njCNvLH+9wE6s6SoBy3v4zZTuSLh7bYCKMMcO12wyt57dzNy86u/j3MdSIV3Yioisg8j MO+GVMmjkTmUzOLllRBU0aNsqeTWQX2zw8BmkiXxbpuBCwr/AIApWuqlFPREZvW7iVBn icpdLuvJPFwya86fps6MsCDM9jDUNBD+n3h0TPYWfJD12v3064hUz6s4nzA8rCWhxih1 UDiQ== X-Gm-Message-State: APjAAAViD56xryRePJ/ckcNCdv44NnwizdiATYyKoe7/dlD1ZT21fiw8 /sdyU6xKhS/s2E2s9SzEx4HPEg== X-Received: by 2002:a05:620a:7c8:: with SMTP id 8mr1592111qkb.299.1568289774048; Thu, 12 Sep 2019 05:02:54 -0700 (PDT) Received: from dhcp-41-57.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id a72sm12098951qkg.77.2019.09.12.05.02.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Sep 2019 05:02:53 -0700 (PDT) Message-ID: <1568289769.5576.138.camel@lca.pw> Subject: Re: [PATCH] zswap: Add CONFIG_ZSWAP_IO_SWITCH From: Qian Cai To: Hui Zhu , sjenning@redhat.com, ddstreet@ieee.org, akpm@linux-foundation.org, mhocko@suse.com, willy@infradead.org, chris@chris-wilson.co.uk, hannes@cmpxchg.org, ziqian.lzq@antfin.com, osandov@fb.com, ying.huang@intel.com, aryabinin@virtuozzo.com, vovoy@chromium.org, richard.weiyang@gmail.com, jgg@ziepe.ca, dan.j.williams@intel.com, rppt@linux.ibm.com, jglisse@redhat.com, b.zolnierkie@samsung.com, axboe@kernel.dk, dennis@kernel.org, josef@toxicpanda.com, tj@kernel.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Date: Thu, 12 Sep 2019 08:02:49 -0400 In-Reply-To: <1568258490-25359-1-git-send-email-teawaterz@linux.alibaba.com> References: <1568258490-25359-1-git-send-email-teawaterz@linux.alibaba.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2019-09-12 at 11:21 +0800, Hui Zhu wrote: > I use zswap to handle the swap IO issue in a VM that uses a swap file. > This VM has 4G memory and 2 CPUs. And I set up 4G swap in /swapfile. > This is test script: > cat 1.sh > ./usemem --sleep 3600 -M -a -n 1 $((3 * 1024 * 1024 * 1024)) & > sleep 10 > echo 1 > /proc/sys/vm/drop_caches > ./usemem -S -f /test2 $((2 * 1024 * 1024 * 1024)) & > while [ True ]; do ./usemem -a -n 1 $((1 * 1024 * 1024 * 1024)); done > > Without ZSWAP: > echo 100 > /proc/sys/vm/swappiness > swapon /swapfile > sh 1.sh > ... > ... > 1207959552 bytes / 2076479 usecs = 568100 KB/s > 61088 usecs to free memory > 1207959552 bytes / 2035439 usecs = 579554 KB/s > 55073 usecs to free memory > 2415919104 bytes / 24054408 usecs = 98081 KB/s > 3741 usecs to free memory > 1207959552 bytes / 1954371 usecs = 603594 KB/s > 53161 usecs to free memory > ... > ... > > With ZSWAP: > echo 100 > /proc/sys/vm/swappiness > swapon /swapfile > echo lz4 > /sys/module/zswap/parameters/compressor > echo zsmalloc > /sys/module/zswap/parameters/zpool > echo 0 > /sys/module/zswap/parameters/same_filled_pages_enabled > echo 20 > /sys/module/zswap/parameters/max_pool_percent > echo 1 > /sys/module/zswap/parameters/enabled > sh 1.sh > 1207959552 bytes / 3619283 usecs = 325934 KB/s > 194825 usecs to free memory > 1207959552 bytes / 3439563 usecs = 342964 KB/s > 218419 usecs to free memory > 2415919104 bytes / 19508762 usecs = 120935 KB/s > 5632 usecs to free memory > 1207959552 bytes / 3329369 usecs = 354315 KB/s > 179764 usecs to free memory > > The normal io speed is increased from 98081 KB/s to 120935 KB/s. > But I found 2 issues of zswap in this machine: > 1. Because the disk of VM has the file cache in the host layer, > so normal swap speed is higher than with zswap. > 2. Because zswap need allocates memory to store the compressed pages, > it will make memory capacity worse. > For example: > Command "./usemem -a -n 1 $((7 * 1024 * 1024 * 1024))" request 7G memory > from this machine. > It will work OK without zswap but got OOM when zswap is opened. > > This commit adds CONFIG_ZSWAP_IO_SWITCH that try to handle the issues > and let zswap keep save IO. > It add two parameters read_in_flight_limit and write_in_flight_limit to > zswap. > In zswap_frontswap_store, pages will be stored to zswap only when > the IO in flight number of swap device is bigger than > zswap_read_in_flight_limit or zswap_write_in_flight_limit > when zswap is enabled. > Then the zswap just work when the IO in flight number of swap device > is low. There isn't sufficient information for users to decide when they should enable this kconfig. Also, It describes your specific workload, but not clear to me how this benefit other people's workloads in general. > > This is the test result: > echo 100 > /proc/sys/vm/swappiness > swapon /swapfile > echo lz4 > /sys/module/zswap/parameters/compressor > echo zsmalloc > /sys/module/zswap/parameters/zpool > echo 0 > /sys/module/zswap/parameters/same_filled_pages_enabled > echo 20 > /sys/module/zswap/parameters/max_pool_percent > echo 1 > /sys/module/zswap/parameters/enabled > echo 3 > /sys/module/zswap/parameters/read_in_flight_limit > echo 50 > /sys/module/zswap/parameters/write_in_flight_limit > sh 1.sh > ... > 1207959552 bytes / 2320861 usecs = 508280 KB/s > 106164 usecs to free memory > 1207959552 bytes / 2343916 usecs = 503280 KB/s > 79386 usecs to free memory > 2415919104 bytes / 20136015 usecs = 117167 KB/s > 4411 usecs to free memory > 1207959552 bytes / 1833403 usecs = 643419 KB/s > 70452 usecs to free memory > ... > killall usemem > ./usemem -a -n 1 $((7 * 1024 * 1024 * 1024)) > 8455716864 bytes / 14457505 usecs = 571159 KB/s > 365961 usecs to free memory > > Signed-off-by: Hui Zhu > --- > include/linux/swap.h | 3 +++ > mm/Kconfig | 11 +++++++++++ > mm/page_io.c | 16 +++++++++++++++ > mm/zswap.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 85 insertions(+) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index de2c67a..82b621f 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio); > extern int __swap_writepage(struct page *page, struct writeback_control *wbc, > bio_end_io_t end_write_func); > extern int swap_set_page_dirty(struct page *page); > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]); > +#endif > > int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, > unsigned long nr_pages, sector_t start_block); > diff --git a/mm/Kconfig b/mm/Kconfig > index 56cec63..d077e51 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -546,6 +546,17 @@ config ZSWAP > they have not be fully explored on the large set of potential > configurations and workloads that exist. > > +config ZSWAP_IO_SWITCH > + bool "Compressed cache for swap pages according to the IO status" > + depends on ZSWAP > + def_bool n > + help > + Add two parameters read_in_flight_limit and write_in_flight_limit to > + ZSWAP. When ZSWAP is enabled, pages will be stored to zswap only > + when the IO in flight number of swap device is bigger than > + zswap_read_in_flight_limit or zswap_write_in_flight_limit. > + If unsure, say "n". > + > config ZPOOL > tristate "Common API for compressed memory storage" > help > diff --git a/mm/page_io.c b/mm/page_io.c > index 24ee600..e66b050 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page) > return __set_page_dirty_no_writeback(page); > } > } > + > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +void swap_io_in_flight(struct page *page, unsigned int inflight[2]) > +{ > + struct swap_info_struct *sis = page_swap_info(page); > + > + if (!sis->bdev) { > + inflight[0] = 0; > + inflight[1] = 0; > + return; > + } > + > + part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part, > + inflight); > +} > +#endif > diff --git a/mm/zswap.c b/mm/zswap.c > index 0e22744..1255645 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -62,6 +62,13 @@ static u64 zswap_reject_compress_poor; > static u64 zswap_reject_alloc_fail; > /* Store failed because the entry metadata could not be allocated (rare) */ > static u64 zswap_reject_kmemcache_fail; > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +/* Store failed because zswap_read_in_flight_limit or > + * zswap_write_in_flight_limit is bigger than IO in flight number of > + * swap device > + */ > +static u64 zswap_reject_io; > +#endif > /* Duplicate store was encountered (rare) */ > static u64 zswap_duplicate_entry; > > @@ -114,6 +121,22 @@ static bool zswap_same_filled_pages_enabled = true; > module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, > bool, 0644); > > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +/* zswap will not try to store the page if zswap_read_in_flight_limit is > + * bigger than IO read in flight number of swap device > + */ > +static unsigned int zswap_read_in_flight_limit; > +module_param_named(read_in_flight_limit, zswap_read_in_flight_limit, > + uint, 0644); > + > +/* zswap will not try to store the page if zswap_write_in_flight_limit is > + * bigger than IO write in flight number of swap device > + */ > +static unsigned int zswap_write_in_flight_limit; > +module_param_named(write_in_flight_limit, zswap_write_in_flight_limit, > + uint, 0644); > +#endif > + > /********************************* > * data structures > **********************************/ > @@ -1009,6 +1032,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > goto reject; > } > > +#ifdef CONFIG_ZSWAP_IO_SWITCH > + if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) { > + unsigned int inflight[2]; > + bool should_swap = false; > + > + swap_io_in_flight(page, inflight); > + > + if (zswap_write_in_flight_limit && > + inflight[1] < zswap_write_in_flight_limit) > + should_swap = true; > + > + if (zswap_read_in_flight_limit && > + (should_swap || > + (!should_swap && !zswap_write_in_flight_limit))) { > + if (inflight[0] < zswap_read_in_flight_limit) > + should_swap = true; > + else > + should_swap = false; > + } > + > + if (should_swap) { > + zswap_reject_io++; > + ret = -EIO; > + goto reject; > + } > + } > +#endif > + > /* reclaim space if needed */ > if (zswap_is_full()) { > zswap_pool_limit_hit++; > @@ -1264,6 +1315,10 @@ static int __init zswap_debugfs_init(void) > zswap_debugfs_root, &zswap_reject_kmemcache_fail); > debugfs_create_u64("reject_compress_poor", 0444, > zswap_debugfs_root, &zswap_reject_compress_poor); > +#ifdef CONFIG_ZSWAP_IO_SWITCH > + debugfs_create_u64("reject_io", 0444, > + zswap_debugfs_root, &zswap_reject_io); > +#endif > debugfs_create_u64("written_back_pages", 0444, > zswap_debugfs_root, &zswap_written_back_pages); > debugfs_create_u64("duplicate_entry", 0444,