Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp142412ybn; Tue, 24 Sep 2019 20:01:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqy786Z7wTcerIYZs9PNvP2cv2/5efrENJZAB0hMbbHHSo6bg99NOxixkOCrurjayYTrk9E5 X-Received: by 2002:adf:fa10:: with SMTP id m16mr6223448wrr.322.1569380494015; Tue, 24 Sep 2019 20:01:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569380494; cv=none; d=google.com; s=arc-20160816; b=KwBLChnF09VwEL+4EiE6+KrU8JORoAfh32nvE+A404Ihyq0L5JPKWIg3VXPKZrXceQ tZzOvEauKLCjqhTAUVzw9ncHASCplBm9WfpNUy2F3QDTfnR6gK0aAOqea30EKzzRl3y9 8Hdgdh57ZPI+O0kS0FPHE0mJFwQv3shG91V2OLt0pUGTik4POab99zSIZhUOSwnAOcnP pxz7EKGnaIF1HiomKAEFr5VtkhoLpRkO0/MWSNawg4NIkYCLPOLDf8/Zy+hxFswoItLY bh3EOSArFkZkPMFqfWKiHJW/me+uXq3hhRqHadDV+xlbaoiQhwEEpRuhefoHMwRwKPT+ Ly0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=rf/sz1zzBMroPpr3Ju8etg5ya6QVQ9JmNwES3KlQhEw=; b=jZuBjRsl0OqgcAwtS5qcqnFN5AxRvOSD7r1RXoZv+GBvrY7wRUFI/gpPHWOSThQ/ye yWb9jZdEssWmL2OnEGA5U1F3dIoP109rPhp71BkBWqCQQduXNPo2djcbHHc863LHvilZ fTZguhLzdJmcyi7lp2Id2+EF5tdzMhFCKYuMmN6H81CR4OM8op/ydWAaZF5CphN/+ZmX SGpzOuN7yvzX4dYoQi+9wezGDKOQuamrm9ItfeOCXz9o+9YzWs/+iHkqdHpKnrVO6vvl 9k/XmtqVB+CYYv4C1S3KlNAM60KhazEevKn87B/ODSgIACL7qGtb8z5n8REiDCBdYHGk Adxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h48si2474757ede.31.2019.09.24.20.01.08; Tue, 24 Sep 2019 20:01:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405892AbfIWDb5 (ORCPT + 99 others); Sun, 22 Sep 2019 23:31:57 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:50394 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404054AbfIWDb5 (ORCPT ); Sun, 22 Sep 2019 23:31:57 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=teawaterz@linux.alibaba.com;NM=1;PH=DS;RN=27;SR=0;TI=SMTPD_---0Td5y2xK_1569209508; Received: from localhost(mailfrom:teawaterz@linux.alibaba.com fp:SMTPD_---0Td5y2xK_1569209508) by smtp.aliyun-inc.com(127.0.0.1); Mon, 23 Sep 2019 11:31:52 +0800 From: Hui Zhu To: sjenning@redhat.com, ddstreet@ieee.org, akpm@linux-foundation.org, mhocko@suse.com, willy@infradead.org, chris@chris-wilson.co.uk, hannes@cmpxchg.org, ziqian.lzq@antfin.com, osandov@fb.com, ying.huang@intel.com, aryabinin@virtuozzo.com, vovoy@chromium.org, richard.weiyang@gmail.com, jgg@ziepe.ca, dan.j.williams@intel.com, rppt@linux.ibm.com, jglisse@redhat.com, b.zolnierkie@samsung.com, axboe@kernel.dk, dennis@kernel.org, josef@toxicpanda.com, tj@kernel.org, oleg@redhat.com, rdunlap@infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Hui Zhu Subject: [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue Date: Mon, 23 Sep 2019 11:31:45 +0800 Message-Id: <1569209505-15801-1-git-send-email-teawaterz@linux.alibaba.com> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is the third version of this patch. The first and second version is in [1] and [2]. This verion is updated according to the comments from Randy Dunlap in [3]. Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file. I found that swap will affect the IO performance when it is running. So I open zswap to handle it because it just use CPU cycles but not disk IO. It work OK but I found that zswap is slower than normal swap in this VM. zswap is about 300M/s and normal swap is about 500M/s. (The reason is disk inside VM has fscache in host machine.) So open zswap is make memory shrinker slower but good for IO performance in this VM. So I just want zswap work when the disk of the swap file is under high IO load. This commit is designed for this idea. It add two parameters read_in_flight_limit and write_in_flight_limit to zswap. In zswap_frontswap_store, pages will be stored to zswap only when the IO in flight number of swap device is bigger than zswap_read_in_flight_limit or zswap_write_in_flight_limit when zswap is enabled. Then the zswap just work when the IO in flight number of swap device is low. [1] https://lkml.org/lkml/2019/9/11/935 [2] https://lkml.org/lkml/2019/9/20/90 [3] https://lkml.org/lkml/2019/9/20/1076 Signed-off-by: Hui Zhu --- include/linux/swap.h | 3 +++ mm/Kconfig | 18 ++++++++++++++++ mm/page_io.c | 16 +++++++++++++++ mm/zswap.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 95 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a..82b621f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio); extern int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func); extern int swap_set_page_dirty(struct page *page); +#ifdef CONFIG_ZSWAP_IO_SWITCH +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]); +#endif int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); diff --git a/mm/Kconfig b/mm/Kconfig index 56cec63..387c3b5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -546,6 +546,24 @@ config ZSWAP they have not be fully explored on the large set of potential configurations and workloads that exist. +config ZSWAP_IO_SWITCH + bool "Compressed cache for swap pages according to the IO status" + depends on ZSWAP + help + This function helps the system that normal swap speed is higher + than zswap speed to handle the swap IO issue. + For example, a VM where the disk device is not set cache config or + set cache=writeback. + + This function makes zswap just work when the disk of the swap file + is under high IO load. + It add two parameters (read_in_flight_limit and + write_in_flight_limit) to zswap. When zswap is enabled, pages will + be stored to zswap only when the IO in flight number of swap device + is bigger than zswap_read_in_flight_limit or + zswap_write_in_flight_limit. + If unsure, say "n". + config ZPOOL tristate "Common API for compressed memory storage" help diff --git a/mm/page_io.c b/mm/page_io.c index 24ee600..e66b050 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page) return __set_page_dirty_no_writeback(page); } } + +#ifdef CONFIG_ZSWAP_IO_SWITCH +void swap_io_in_flight(struct page *page, unsigned int inflight[2]) +{ + struct swap_info_struct *sis = page_swap_info(page); + + if (!sis->bdev) { + inflight[0] = 0; + inflight[1] = 0; + return; + } + + part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part, + inflight); +} +#endif diff --git a/mm/zswap.c b/mm/zswap.c index 0e22744..0190b2d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor; static u64 zswap_reject_alloc_fail; /* Store failed because the entry metadata could not be allocated (rare) */ static u64 zswap_reject_kmemcache_fail; +#ifdef CONFIG_ZSWAP_IO_SWITCH +/* + * Store failed because zswap_read_in_flight_limit or + * zswap_write_in_flight_limit is bigger than IO in flight number of + * swap device + */ +static u64 zswap_reject_io; +#endif /* Duplicate store was encountered (rare) */ static u64 zswap_duplicate_entry; @@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true; module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, bool, 0644); +#ifdef CONFIG_ZSWAP_IO_SWITCH +/* + * zswap will not try to store the page if zswap_read_in_flight_limit is + * bigger than IO read in flight number of swap device + */ +static unsigned int zswap_read_in_flight_limit; +module_param_named(read_in_flight_limit, zswap_read_in_flight_limit, + uint, 0644); + +/* + * zswap will not try to store the page if zswap_write_in_flight_limit is + * bigger than IO write in flight number of swap device + */ +static unsigned int zswap_write_in_flight_limit; +module_param_named(write_in_flight_limit, zswap_write_in_flight_limit, + uint, 0644); +#endif + /********************************* * data structures **********************************/ @@ -1009,6 +1035,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, goto reject; } +#ifdef CONFIG_ZSWAP_IO_SWITCH + if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) { + unsigned int inflight[2]; + bool should_swap = false; + + swap_io_in_flight(page, inflight); + + if (zswap_write_in_flight_limit && + inflight[1] < zswap_write_in_flight_limit) + should_swap = true; + + if (zswap_read_in_flight_limit && + (should_swap || + (!should_swap && !zswap_write_in_flight_limit))) { + if (inflight[0] < zswap_read_in_flight_limit) + should_swap = true; + else + should_swap = false; + } + + if (should_swap) { + zswap_reject_io++; + ret = -EIO; + goto reject; + } + } +#endif + /* reclaim space if needed */ if (zswap_is_full()) { zswap_pool_limit_hit++; @@ -1264,6 +1318,10 @@ static int __init zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_kmemcache_fail); debugfs_create_u64("reject_compress_poor", 0444, zswap_debugfs_root, &zswap_reject_compress_poor); +#ifdef CONFIG_ZSWAP_IO_SWITCH + debugfs_create_u64("reject_io", 0444, + zswap_debugfs_root, &zswap_reject_io); +#endif debugfs_create_u64("written_back_pages", 0444, zswap_debugfs_root, &zswap_written_back_pages); debugfs_create_u64("duplicate_entry", 0444, -- 2.7.4