Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp612669rdb; Thu, 21 Dec 2023 22:25:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IGe7zUT1YgXuc5SpbKDMlZGFTXUQRd11wT3fygjJiUvQSSUPCKw2bF4c84mfYtw8/ob9mou X-Received: by 2002:a05:6a21:a59d:b0:187:23f4:d061 with SMTP id gd29-20020a056a21a59d00b0018723f4d061mr786687pzc.37.1703226357201; Thu, 21 Dec 2023 22:25:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703226357; cv=none; d=google.com; s=arc-20160816; b=bCyLntUUmy/YGPAlWXb9F7lbMbfno3qmEpyIeO8RtCwKB0j6i+wNYhQfFq/03pTVpF Rs+BmYKnQ575j9MTyfS9Wjc4wMaqnUJuvP9Y4xP2btDcGawBTXFtzG+qmkr3ISgT1Od7 RNxdyzkZl7XGODChC7PnTizG/KiPb7u6SP/wFnxkmV3QULsTlkV9RL/0Qyql9iZjY9LB mcmK2gqrxD1LBuRBrlTdAAqsoTtRZdCHwExAJA+V/xe7RU6TXgaKzC2EyD1AB653fefO zEu9FFKikoY8wsQDJQ4mzEpCI4YFnXOKyDJs4/GnO7sjUHFeE4ZH9vMO6jphZbLPxrpQ 1H6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=F5xMbzzD6JL3glt+fsZ03jXfgCcmlRtDbJwWXMSKnpM=; fh=p0nHghVO3VvKrKrnhyrjy6eJxJjbHZaQZS6nDvtd8dk=; b=YLJIBc7vhp32+wiXDHH/k+hxCSOBu2k0tfKTIQ1Eau/EFBx4Da6mY4Hpj4MIgE6ISN NGqZhlEWbIg0AjBnk5kERE1a8zFDT78vGDKuPCC/jNTl/Bqnf7G6MwAeIYE43FCr19iZ Kno5cBmsSH5iQYBqOEwrypuNSt7U23TZYuIjJNBEASZagiFwzKJ6o4s4RyEByoaGsY4W zxGJz4BcC1I2og8bKFVsq+vwKjsRPRI616SRfx7WuvmcFKs/BuRIqsHnW2cfH3PxC5wF qiFzX02XPvGvMKLe8g5mZIqzS4SMX9oZ7VglgKt4FrShrf8VHIttpMkb9sDcg5/TiH12 qmJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="EZ4L/qUY"; spf=pass (google.com: domain of linux-kernel+bounces-9401-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9401-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id ne22-20020a17090b375600b0028beacbda75si2653655pjb.188.2023.12.21.22.25.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 22:25:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-9401-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="EZ4L/qUY"; spf=pass (google.com: domain of linux-kernel+bounces-9401-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-9401-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 895722868AE for ; Fri, 22 Dec 2023 06:25:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 775C779E4; Fri, 22 Dec 2023 06:25:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EZ4L/qUY" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3F4E6FBC for ; Fri, 22 Dec 2023 06:25:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64F22C433C8; Fri, 22 Dec 2023 06:25:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703226349; bh=Jx2LL9VBtn7EevKYNjfg9tbjditIKurBb4A4cGkVspA=; h=From:Date:Subject:To:Cc:From; b=EZ4L/qUYdprfHbJoQZRCR24HRA73UxZyM04nbiV0MwPPACXOlRk+noH0bnkYTGFEu aJjKpjm7vAsibAks0LYcZAKROonVVgLcfGqJk/BgMT0P3tu/mKjz5dZDMSA9i+1HIv IivYWjzxgQdeXK/USykiZeGbm/4dK3CV4iEp4zWEZa3jdvniszsVEQZB9rxSa8vW0F lMDgm/0uRF36MWMZWoxwj9xC9TZ8+jnwRT+zRzKGTqgixXtEkwFdVeucx4+cUKIeWU ygQVGyy2RSrSvtADiI5dCYEx7KVEfboVTVd6Q7v9skpAec3ltil0W0GqUzT9eb8b+c ufTIrxVr+pE6w== From: Chris Li Date: Thu, 21 Dec 2023 22:25:39 -0800 Subject: [PATCH] mm: swap: async free swap slot cache entries Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20231221-async-free-v1-1-94b277992cb0@kernel.org> X-B4-Tracking: v=1; b=H4sIAOIrhWUC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDI2NDI0Mz3cTiyrxk3bSi1FTdpNQ0Y0sjA0NTE2MjJaCGgqLUtMwKsGHRsbW 1AKOxhW1cAAAA To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?utf-8?q?Wei_Xu=EF=BF=BC?= , =?utf-8?q?Yu_Zhao=EF=BF=BC?= , Greg Thelen , Chun-Tse Shao , =?utf-8?q?Suren_Baghdasaryan=EF=BF=BC?= , =?utf-8?q?Yosry_Ahmed=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , Chris Li X-Mailer: b4 0.12.3 We discovered that 1% swap page fault is 100us+ while 50% of the swap fault is under 20us. Further investigation show that a large portion of the time spent in the free_swap_slots() function for the long tail case. The percpu cache of swap slots is freed in a batch of 64 entries inside free_swap_slots(). These cache entries are accumulated from previous page faults, which may not be related to the current process. Doing the batch free in the page fault handler causes longer tail latencies and penalizes the current process. Move free_swap_slots() outside of the swapin page fault handler into an async work queue to avoid such long tail latencies. Testing: Chun-Tse did some benchmark in chromebook, showing that zram_wait_metrics improve about 15% with 80% and 95% confidence. I recently ran some experiments on about 1000 Google production machines. It shows swapin latency drops in the long tail 100us - 500us bucket dramatically. platform (100-500us) (0-100us) A 1.12% -> 0.36% 98.47% -> 99.22% B 0.65% -> 0.15% 98.96% -> 99.46% C 0.61% -> 0.23% 98.96% -> 99.38% Signed-off-by: Chris Li To: Andrew Morton Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Wei Xu Cc: Yu Zhao Cc: Greg Thelen Cc: Chun-Tse Shao Cc: Suren Baghdasaryan Cc: Yosry Ahmed Cc: Brain Geffon Cc: Minchan Kim Cc: Michal Hocko Cc: Mel Gorman Cc: Huang Ying Cc: Nhat Pham Cc: Johannes Weiner Cc: Kairui Song Cc: Zhongkun He Cc: Kemeng Shi Cc: Barry Song --- include/linux/swap_slots.h | 1 + mm/swap_slots.c | 37 +++++++++++++++++++++++++++++-------- 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..67bc8fa30d63 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -19,6 +19,7 @@ struct swap_slots_cache { spinlock_t free_lock; /* protects slots_ret, n_ret */ swp_entry_t *slots_ret; int n_ret; + struct work_struct async_free; }; void disable_swap_slots_cache_lock(void); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 0bec1f705f8e..a3b306550732 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -42,8 +42,10 @@ static bool swap_slot_cache_initialized; static DEFINE_MUTEX(swap_slots_cache_mutex); /* Serialize swap slots cache enable/disable operations */ static DEFINE_MUTEX(swap_slots_cache_enable_mutex); +static struct workqueue_struct *swap_free_queue; static void __drain_swap_slots_cache(unsigned int type); +static void swapcache_async_free_entries(struct work_struct *data); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) #define SLOTS_CACHE 0x1 @@ -149,6 +151,7 @@ static int alloc_swap_slot_cache(unsigned int cpu) spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } + INIT_WORK(&cache->async_free, swapcache_async_free_entries); cache->nr = 0; cache->cur = 0; cache->n_ret = 0; @@ -269,6 +272,20 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) return cache->nr; } +static void swapcache_async_free_entries(struct work_struct *data) +{ + struct swap_slots_cache *cache; + + cache = container_of(data, struct swap_slots_cache, async_free); + spin_lock_irq(&cache->free_lock); + /* Swap slots cache may be deactivated before acquiring lock */ + if (cache->slots_ret) { + swapcache_free_entries(cache->slots_ret, cache->n_ret); + cache->n_ret = 0; + } + spin_unlock_irq(&cache->free_lock); +} + void free_swap_slot(swp_entry_t entry) { struct swap_slots_cache *cache; @@ -282,17 +299,14 @@ void free_swap_slot(swp_entry_t entry) goto direct_free; } if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) { - /* - * Return slots to global pool. - * The current swap_map value is SWAP_HAS_CACHE. - * Set it to 0 to indicate it is available for - * allocation in global pool - */ - swapcache_free_entries(cache->slots_ret, cache->n_ret); - cache->n_ret = 0; + spin_unlock_irq(&cache->free_lock); + queue_work(swap_free_queue, &cache->async_free); + goto direct_free; } cache->slots_ret[cache->n_ret++] = entry; spin_unlock_irq(&cache->free_lock); + if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) + queue_work(swap_free_queue, &cache->async_free); } else { direct_free: swapcache_free_entries(&entry, 1); @@ -348,3 +362,10 @@ swp_entry_t folio_alloc_swap(struct folio *folio) } return entry; } + +static int __init async_queue_init(void) +{ + swap_free_queue = create_workqueue("async swap cache"); + return 0; +} +subsys_initcall(async_queue_init); --- base-commit: eacce8189e28717da6f44ee492b7404c636ae0de change-id: 20231216-async-free-bef392015432 Best regards, -- Chris Li