Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp2021955rdb; Wed, 31 Jan 2024 17:22:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IGhn1KF8hf5s/Ljx8Ww7+YxhfFjlCClbvj9XzL1K9LVHyP8yo70ixpXDoYSvwBYWPkhrKAj X-Received: by 2002:a05:6808:1312:b0:3be:4f54:c65f with SMTP id y18-20020a056808131200b003be4f54c65fmr977751oiv.25.1706750560989; Wed, 31 Jan 2024 17:22:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706750560; cv=pass; d=google.com; s=arc-20160816; b=xqBlQC2hDTRuFLnUfqaE5xI4RAa1yzxs1kbvSszL6BRhgOD2dBvMLKxDMLHhI7Qhlv eoqXgVex42N1FN/pQ85GtM/QQByDAtEWi+NniQeoh0fEn0Crl9mhmkI9wAIOwqZmQOjb on+IkEXyR+gLN9zaYERd5NGBmjP1VjJHYDEi8+JAS1ZvymvDqvfvg3cY0XXLCbkuxM0K DHj+qgOECFvRUtPD3WNo1xgKnwgEAMNhehKsTW8sAPs5c6g1OSGWuC57FtyqsyhgtfM/ /Me/6Hnkfd7pqegyfZVdra9VXJyxqC3fEepgf/MKShRQhNr4HIvcGK5Mt83Ae4sNDFJH TL2Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=MhI0cjIpTwfZPdyOEuEpxlkv3Wjoztz4ObYONqohGBM=; fh=bgyOQM6JmPNcvJngQNdI1A3Yu7RXUNnCS7Z/CRE/xD4=; b=QvL8SyoYrBI8PZgZUFpxvtputs/96n/enSCrk35YSt4VQOVEuzhZkNLdBD9Op6/WXa FIP+kb2wGe9ZbqM53W3JIwjpcWqirfSYgxHZGbEI73hKF2QZtoOtffgcp0B+RHRVoLOA qeKAK/NFYpePWnl2UMg8/N1ooBC0Uy82xTAD+DHeruzrUmqLGeajBT/IJ0oe6oiuXJ3F eDShQIsyYIfD9+MP63gDDyr+pb1DZjUu8NObVCNYZVw9tXZKPbxnpLOxTRgclmv8VfMB JObQuvgRsa7qcCh0rUQ5NKXQV5v1nwJSliWJ+0XJURVBDZq3otYB1NFwMl1FaitOcGUb h5tA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vG4LSPlr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-47491-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47491-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=1; AJvYcCUg53NLFDbXkeZ7OGIpioUHHHJPmu3L1YSizymv+GOZ0vJSymeYwSYTD6A0uQGMyoSwQRX2LIGQpesAW0Jf+3SHFWOv69yEuftnXy8rXg== Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id r12-20020aa7988c000000b006d9aa48c4e7si10663896pfl.110.2024.01.31.17.22.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 17:22:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-47491-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vG4LSPlr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-47491-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47491-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1CB20289F31 for ; Thu, 1 Feb 2024 01:17:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 491A7442D; Thu, 1 Feb 2024 01:17:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vG4LSPlr" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4914B2119 for ; Thu, 1 Feb 2024 01:17:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706750256; cv=none; b=WgAck+MaGyImWYhNA6yz8voSXdP/J0ZbhF/hhQO36/qNgKcSo/L1Y5gmZ3GHCIm+zrHolWwur9uq3DOdFD4eh+OM2bdTQU2SpuSiddg51BnGth4Tli6ufR37KI7489KUrh/0ilV8IiDdiJK4edRNAYRTxg8sRkSkncsBPjuj3SE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706750256; c=relaxed/simple; bh=UKxm5P0VYwSzMdnn3vI88JphP4KA0MGzRb/TUQaS9vc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=J0LV7S6NQFXNSUPxNbN+52lDDoAvtk/tpjQxNei3KZJ3ws0miGGKK7CpYfghiD30beWHLqCwz4Z1MGe9w6Z225kK4oUPidkmaYK5yHaZeA49JsTiKCXAGv+Fy7fO7knVUXkwydt9DIvDLAgwJxSSmS0XD2Oc99XFSiVvMXWpJJ0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vG4LSPlr; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2EA9C433F1; Thu, 1 Feb 2024 01:17:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706750255; bh=UKxm5P0VYwSzMdnn3vI88JphP4KA0MGzRb/TUQaS9vc=; h=From:Date:Subject:To:Cc:From; b=vG4LSPlrX45rnD1bfgTx0HQ4lPQPb8wZ7wWwb307hOP36/lrjxAj72zaZUFWHIdIW b2my9Q++9VM3K3cjNKxImmgtdlsnny0K/DsqoWpfmQYDKQhHSGJvVnPmsN2kZuYuFa 29CdyRmBuBam6fJ2ZI43sSO5VKS80jxc+qD7Y5Hb98A1eiw6Rc+SneUcSBtc70hsTA qdOIiUGZXGTZjWNxO11459aGwyz5seIpa/EF/2tLbklYVe/v2k9C3PidIoXv1Wl31/ kxN8eDWsj8gXiiNMkAdqfQjKiaP4qN2M+K7RYbNou/lqKZzZNdWfUjFpvg2OrPkdum QCTvUBs079l8w== From: Chris Li Date: Wed, 31 Jan 2024 17:17:31 -0800 Subject: [PATCH v2] mm: swap: async free swap slot cache entries Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20240131-async-free-v2-1-525f03e07184@kernel.org> X-B4-Tracking: v=1; b=H4sIACrxumUC/03MQQ6CMBCF4auQWVvTmaKkrryHYQE4hUbTkqkhE tK7W4kLl/9L3rdBYvGc4FJtILz45GMoQYcKhqkLIyt/Lw2kySDhWXVpDYNywqx6dsaSxlNtCMp hFnb+vWO3tvTk0yvKutsLftcfQ/jPLKhQ2bqnprGWhl5fHyyBn8coI7Q55w+zHmippQAAAA== To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?utf-8?q?Wei_Xu=EF=BF=BC?= , =?utf-8?q?Yu_Zhao=EF=BF=BC?= , Greg Thelen , Chun-Tse Shao , =?utf-8?q?Suren_Baghdasaryan=EF=BF=BC?= , =?utf-8?q?Yosry_Ahmed=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , Chris Li X-Mailer: b4 0.12.3 We discovered that 1% swap page fault is 100us+ while 50% of the swap fault is under 20us. Further investigation show that a large portion of the time spent in the free_swap_slots() function for the long tail case. The percpu cache of swap slots is freed in a batch of 64 entries inside free_swap_slots(). These cache entries are accumulated from previous page faults, which may not be related to the current process. Doing the batch free in the page fault handler causes longer tail latencies and penalizes the current process. Move free_swap_slots() outside of the swapin page fault handler into an async work queue to avoid such long tail latencies. The batch free of swap slots is typically at 100us level. Such a short time will not have a significant impact on the CPU accounting. Notice that the previous swap slot batching behavior already performs a delayed batch free. It waits for the entries accumulated to 64. Adding the async scheduling time does not change the original free timing significantly. Testing: Chun-Tse did some benchmark in chromebook, showing that zram_wait_metrics improve about 15% with 80% and 95% confidence. I recently ran some experiments on about 1000 Google production machines. It shows swapin latency drops in the long tail 100us - 500us bucket dramatically. platform (100-500us) (0-100us) A 1.12% -> 0.36% 98.47% -> 99.22% B 0.65% -> 0.15% 98.96% -> 99.46% C 0.61% -> 0.23% 98.96% -> 99.38% Signed-off-by: Chris Li To: Andrew Morton Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Wei Xu Cc: Yu Zhao Cc: Greg Thelen Cc: Chun-Tse Shao Cc: Suren Baghdasaryan Cc: Yosry Ahmed Cc: Brain Geffon Cc: Minchan Kim Cc: Michal Hocko Cc: Mel Gorman Cc: Huang Ying Cc: Nhat Pham Cc: Johannes Weiner Cc: Kairui Song Cc: Zhongkun He Cc: Kemeng Shi Cc: Barry Song remove create_work queue remove another work queue usage --- Changes in v2: - Add description of the impact of time changing suggest by Ying. - Remove create_workqueue() and use schedule_work() - Link to v1: https://lore.kernel.org/r/20231221-async-free-v1-1-94b277992cb0@kernel.org --- include/linux/swap_slots.h | 1 + mm/swap_slots.c | 29 +++++++++++++++++++++-------- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..67bc8fa30d63 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -19,6 +19,7 @@ struct swap_slots_cache { spinlock_t free_lock; /* protects slots_ret, n_ret */ swp_entry_t *slots_ret; int n_ret; + struct work_struct async_free; }; void disable_swap_slots_cache_lock(void); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 0bec1f705f8e..71d344564e55 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -44,6 +44,7 @@ static DEFINE_MUTEX(swap_slots_cache_mutex); static DEFINE_MUTEX(swap_slots_cache_enable_mutex); static void __drain_swap_slots_cache(unsigned int type); +static void swapcache_async_free_entries(struct work_struct *data); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) #define SLOTS_CACHE 0x1 @@ -149,6 +150,7 @@ static int alloc_swap_slot_cache(unsigned int cpu) spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } + INIT_WORK(&cache->async_free, swapcache_async_free_entries); cache->nr = 0; cache->cur = 0; cache->n_ret = 0; @@ -269,6 +271,20 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) return cache->nr; } +static void swapcache_async_free_entries(struct work_struct *data) +{ + struct swap_slots_cache *cache; + + cache = container_of(data, struct swap_slots_cache, async_free); + spin_lock_irq(&cache->free_lock); + /* Swap slots cache may be deactivated before acquiring lock */ + if (cache->slots_ret) { + swapcache_free_entries(cache->slots_ret, cache->n_ret); + cache->n_ret = 0; + } + spin_unlock_irq(&cache->free_lock); +} + void free_swap_slot(swp_entry_t entry) { struct swap_slots_cache *cache; @@ -282,17 +298,14 @@ void free_swap_slot(swp_entry_t entry) goto direct_free; } if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) { - /* - * Return slots to global pool. - * The current swap_map value is SWAP_HAS_CACHE. - * Set it to 0 to indicate it is available for - * allocation in global pool - */ - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; + spin_unlock_irq(&cache->free_lock); + schedule_work(&cache->async_free); + goto direct_free; } cache->slots_ret[cache->n_ret++] = entry; spin_unlock_irq(&cache->free_lock); + if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) + schedule_work(&cache->async_free); } else { direct_free: swapcache_free_entries(&entry, 1); --- base-commit: eacce8189e28717da6f44ee492b7404c636ae0de change-id: 20231216-async-free-bef392015432 Best regards, -- Chris Li