Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp1159233rdb; Fri, 22 Dec 2023 17:44:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IFuLC2Lsu2Lvup+e0LjvWABhg0gckQBQ8G62FGvxFcnB8Gw765NuIt9G9vBVBdm/BcuEDfj X-Received: by 2002:a05:6a20:728b:b0:194:1d9:aafc with SMTP id o11-20020a056a20728b00b0019401d9aafcmr3270155pzk.46.1703295881700; Fri, 22 Dec 2023 17:44:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703295881; cv=none; d=google.com; s=arc-20160816; b=HwlUtAZhTpJa0vgt5yzlrZSAqUmXRPAjx1NuMCHsfzzz0iCJtAj4KExfAlNAm5Z9Kb +pFf0ddNxy3HXuO8pw9DrlvlI4jX/wTHa92N92CFgj33F8lFqU2nEexbQaRjeRpaXJZ/ ODkvLKXNl728cus5e+kDGl2jq9oNF3soQPHjGhQukYveHBOJDwZqH1VVa3gk21SO/pNR mIms10y/JB0OpTzNCuvu9k9Et0Y5iVip68xtYywTTV5RTcSC3m9m56keW06t/L9WbXki 5SEQDCVicJraSLOiV3E7GTbwzjlWc1V6e5QD3LGXE0hLltSvZY7Ekx0AiaQDcsJZPkFg hYzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=+DUEi9B9o/uy3by0V8nXmef5g50ozncfvFeYsehJ5rI=; fh=YVvPcZGwUVVPEvLe4mYTdGNmfv1vUu+2r/jY6yWdbds=; b=oEYpq/FFSHDJ/C6t3jfnF9I7QF7Fn5CbeTDA5yWGYfiJYzkcrNGcu3HVoVHBaRr7L7 iJiuMyb2XhW8SNXT9JmI66h8LlZ1B8xADvprOR2jBzSZoTMJ9EHgvwrt7bwEOTXFrlP6 AMlJcd9D8zuyX7iTysBJVgtWbJF33od+9LPoYA/g4btHyK9PGZ6Vtw7mfhuG5a0a3l9Q SZwUY2rrNh5d58dSVjjc5cF3MjbearRX2e8TfAHCYM5JMCsH5SQavwApkqVUbMmHdmzS j1C0LekcPf7MeoCRjyNmXF8aNOUDOyJd5eAELEpdiujGBDnYY7fH/zxEm6hhqzSl29cL 9lLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=OOju58vl; spf=pass (google.com: domain of linux-kernel+bounces-10159-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-10159-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id ey5-20020a056a0038c500b006b3aded7e97si4148800pfb.305.2023.12.22.17.44.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 17:44:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-10159-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=OOju58vl; spf=pass (google.com: domain of linux-kernel+bounces-10159-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-10159-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 63BFCB22896 for ; Sat, 23 Dec 2023 01:44:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4947A184E; Sat, 23 Dec 2023 01:44:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OOju58vl" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EED6D17C6 for ; Sat, 23 Dec 2023 01:44:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-io1-f52.google.com with SMTP id ca18e2360f4ac-7ba9c26e14aso27900639f.0 for ; Fri, 22 Dec 2023 17:44:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703295871; x=1703900671; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+DUEi9B9o/uy3by0V8nXmef5g50ozncfvFeYsehJ5rI=; b=OOju58vlZAHTv17C/a5xhybNe4+lv3NGXK572TH6qKtWKQnt5isZjsHZttmJ3N2HvN ZwJ1iBbN8dM+bNflzT7DEy4C5I/Mba38Zbc/rgPKPFKpJp0remD5vJv2td/tYXIXgiVf 7HyauytXZlRydowSGNlH60t6d1ygk6NDYsPlym9xNiHUKqXh8mPTvYwbflpXuuEw7VsF cqqkIXN9z7FqYWeBlNaAgzkX7XNID2Lk35HHRbXEHeaH7cfXD5FtCs4r8E5mqaWcCRKs d+ZgNTNXK5Mxj/uanK7EMgY4Jlv1Yc5pqf5OhY9dYiaWolNa0CADfONPOnKp0wvMwBQS 0Etg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703295871; x=1703900671; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+DUEi9B9o/uy3by0V8nXmef5g50ozncfvFeYsehJ5rI=; b=BOgUoL9LYDzxvdIOQfVYl7ka33xF4l4MVGWVYyzuJ8WeLFU25NnvTqF9NNUkrMi72C R337rZh2E5D2UbkLsFgQZPvy06olHsGi+gr+eCWHzy2zsLnAzS0KsTcv39YMdD2dihDF 22pyZVMvqbCTD3L4hRR5fVVY49fZuNKcRXFxJH/fAkONRto2cq+qSu6yfDp6ZeqkImRr uR8xeSrAXqWDLYGEsz4S6z5o/cydq3egUSpRkMnKKBvGgvVsBTg8lKtkYXZDZ+YKAosP xaFW6dQWMZeI9Ik0o0xpukwNRIIjBafFMp3yp9pvQlEv1vUqvQ9j2FVnLZTu6iU9gPeK tQbQ== X-Gm-Message-State: AOJu0YwpsVf9X9sH/PXnBUB8IlFp9idQk0g4B33H5QqTgBA96cW4DgaR 4rQ1prGi6eNaXIZFk+WwPu5NvzvCqSktP0lxswM= X-Received: by 2002:a6b:5008:0:b0:7b6:fa7f:9cd6 with SMTP id e8-20020a6b5008000000b007b6fa7f9cd6mr2167316iob.5.1703295870901; Fri, 22 Dec 2023 17:44:30 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231221-async-free-v1-1-94b277992cb0@kernel.org> In-Reply-To: <20231221-async-free-v1-1-94b277992cb0@kernel.org> From: Nhat Pham Date: Fri, 22 Dec 2023 17:44:19 -0800 Message-ID: Subject: Re: [PATCH] mm: swap: async free swap slot cache entries To: Chris Li Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , =?UTF-8?B?WXUgWmhhb++/vA==?= , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , =?UTF-8?B?WW9zcnkgQWhtZWTvv7w=?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 21, 2023 at 10:25=E2=80=AFPM Chris Li wrote= : > > We discovered that 1% swap page fault is 100us+ while 50% of > the swap fault is under 20us. > > Further investigation show that a large portion of the time > spent in the free_swap_slots() function for the long tail case. > > The percpu cache of swap slots is freed in a batch of 64 entries > inside free_swap_slots(). These cache entries are accumulated > from previous page faults, which may not be related to the current > process. > > Doing the batch free in the page fault handler causes longer > tail latencies and penalizes the current process. > > Move free_swap_slots() outside of the swapin page fault handler into an > async work queue to avoid such long tail latencies. > > Testing: > > Chun-Tse did some benchmark in chromebook, showing that > zram_wait_metrics improve about 15% with 80% and 95% confidence. > > I recently ran some experiments on about 1000 Google production > machines. It shows swapin latency drops in the long tail > 100us - 500us bucket dramatically. > > platform (100-500us) (0-100us) > A 1.12% -> 0.36% 98.47% -> 99.22% > B 0.65% -> 0.15% 98.96% -> 99.46% > C 0.61% -> 0.23% 98.96% -> 99.38% Nice! Are these values for zram as well, or ordinary (SSD?) swap? I imagine it will matter less for swap, right? > > Signed-off-by: Chris Li > To: Andrew Morton > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: Wei Xu=EF=BF=BC > Cc: Yu Zhao=EF=BF=BC > Cc: Greg Thelen > Cc: Chun-Tse Shao > Cc: Suren Baghdasaryan=EF=BF=BC > Cc: Yosry Ahmed=EF=BF=BC > Cc: Brain Geffon > Cc: Minchan Kim > Cc: Michal Hocko > Cc: Mel Gorman > Cc: Huang Ying > Cc: Nhat Pham > Cc: Johannes Weiner > Cc: Kairui Song > Cc: Zhongkun He > Cc: Kemeng Shi > Cc: Barry Song > --- > include/linux/swap_slots.h | 1 + > mm/swap_slots.c | 37 +++++++++++++++++++++++++++++-------- > 2 files changed, 30 insertions(+), 8 deletions(-) > > diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h > index 15adfb8c813a..67bc8fa30d63 100644 > --- a/include/linux/swap_slots.h > +++ b/include/linux/swap_slots.h > @@ -19,6 +19,7 @@ struct swap_slots_cache { > spinlock_t free_lock; /* protects slots_ret, n_ret */ > swp_entry_t *slots_ret; > int n_ret; > + struct work_struct async_free; > }; > > void disable_swap_slots_cache_lock(void); > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > index 0bec1f705f8e..a3b306550732 100644 > --- a/mm/swap_slots.c > +++ b/mm/swap_slots.c > @@ -42,8 +42,10 @@ static bool swap_slot_cache_initialized; > static DEFINE_MUTEX(swap_slots_cache_mutex); > /* Serialize swap slots cache enable/disable operations */ > static DEFINE_MUTEX(swap_slots_cache_enable_mutex); > +static struct workqueue_struct *swap_free_queue; > > static void __drain_swap_slots_cache(unsigned int type); > +static void swapcache_async_free_entries(struct work_struct *data); > > #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_e= nabled) > #define SLOTS_CACHE 0x1 > @@ -149,6 +151,7 @@ static int alloc_swap_slot_cache(unsigned int cpu) > spin_lock_init(&cache->free_lock); > cache->lock_initialized =3D true; > } > + INIT_WORK(&cache->async_free, swapcache_async_free_entries); > cache->nr =3D 0; > cache->cur =3D 0; > cache->n_ret =3D 0; > @@ -269,6 +272,20 @@ static int refill_swap_slots_cache(struct swap_slots= _cache *cache) > return cache->nr; > } > > +static void swapcache_async_free_entries(struct work_struct *data) > +{ > + struct swap_slots_cache *cache; > + > + cache =3D container_of(data, struct swap_slots_cache, async_free)= ; > + spin_lock_irq(&cache->free_lock); > + /* Swap slots cache may be deactivated before acquiring lock */ > + if (cache->slots_ret) { > + swapcache_free_entries(cache->slots_ret, cache->n_ret); > + cache->n_ret =3D 0; > + } > + spin_unlock_irq(&cache->free_lock); > +} > + > void free_swap_slot(swp_entry_t entry) > { > struct swap_slots_cache *cache; > @@ -282,17 +299,14 @@ void free_swap_slot(swp_entry_t entry) > goto direct_free; > } > if (cache->n_ret >=3D SWAP_SLOTS_CACHE_SIZE) { > - /* > - * Return slots to global pool. > - * The current swap_map value is SWAP_HAS_CACHE. > - * Set it to 0 to indicate it is available for > - * allocation in global pool > - */ > - swapcache_free_entries(cache->slots_ret, cache->n= _ret); > - cache->n_ret =3D 0; > + spin_unlock_irq(&cache->free_lock); > + queue_work(swap_free_queue, &cache->async_free); > + goto direct_free; > } > cache->slots_ret[cache->n_ret++] =3D entry; > spin_unlock_irq(&cache->free_lock); > + if (cache->n_ret >=3D SWAP_SLOTS_CACHE_SIZE) > + queue_work(swap_free_queue, &cache->async_free); > } else { > direct_free: > swapcache_free_entries(&entry, 1); > @@ -348,3 +362,10 @@ swp_entry_t folio_alloc_swap(struct folio *folio) > } > return entry; > } > + > +static int __init async_queue_init(void) > +{ > + swap_free_queue =3D create_workqueue("async swap cache"); nit(?): isn't create_workqueue() deprecated? from: https://www.kernel.org/doc/html/latest/core-api/workqueue.html#application-= programming-interface-api I think there's a zswap patch proposing fixing that on the zswap side. > + return 0; > +} > +subsys_initcall(async_queue_init); > > --- > base-commit: eacce8189e28717da6f44ee492b7404c636ae0de > change-id: 20231216-async-free-bef392015432 > > Best regards, > -- > Chris Li >