Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp6450246rwd; Mon, 5 Jun 2023 19:06:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ45u20UNAzMisg8lkbkbu1/hsY6P5wdG+GTdir2X8N44NOAG9mX3KP6ceCIngJ58ZiEtUSi X-Received: by 2002:a05:6214:19c5:b0:625:aa49:9aba with SMTP id j5-20020a05621419c500b00625aa499abamr868539qvc.62.1686017195243; Mon, 05 Jun 2023 19:06:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686017195; cv=none; d=google.com; s=arc-20160816; b=Wk1r6OP3ESg3DyQEUpr1NPamBmWUeCQa1C4i2tFMMqF4NgDGZbAmfhkINwC29YVvrO NJ9/yuXfIV4kbSORoR6a8KVq6A8vClkhYp2EFy9U1jCG3cdYI5beJXfZSODzsVlkljhn ilRtNygO3iHnQ2nioNANZL47OZxBwCZWycBaJ+YAvKmg9eI5IdvZqJdHDwAQV2WaVsWS Q4q34wUyfYO8WbfmoTvFRHt+9Dj9R76XrY5Jh1XAYdgHDsWkqhubn7V2yOnUDi73F1Gn Tj5Gojomccupy7QcI41/6a8f8OVBD51IytApDQT+HnbOmDaiSYHh6L80Qj+JYYscSc6W Z8gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=n3lFS92Q9Y5Yi8hsvO0NMVnQb89Fjn5Ds6zHjO1KoiM=; b=NsimGl4sqzGvdC1mhIxk3aWTwDIWP3gadvIExYq48nJVrwFZ0bQ+3VWPOy/V1aaVc0 VOMF0hIglVqdlIHVXlixjrte38u0XZFGgU1Cqz1qxglnL/Knhmw0pvJ6XYPbF40PF2/R adqFZ+mFuncmcE1ctgllAgAl8ybV4fQLibd9fbpvxYaLWdZpuebTu1t7AonVhWhPJfCp IJMlrf41jkEnECxWL6Q673+CogOWW2T/Vg9noPW8t45mHAIJtwvsnzyZpnD3/NNcbevu RxFU2ZLot5vCCBGhWPeX3l6IHnDh+S2c5K6dFoS1wh/nc3NVGY+uJsfSfH4mC8Aslk7p SffA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=XwnjUL0D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x19-20020ac85f13000000b003f6a89fee5bsi5661285qta.508.2023.06.05.19.06.21; Mon, 05 Jun 2023 19:06:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=XwnjUL0D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232326AbjFFB5P (ORCPT + 99 others); Mon, 5 Jun 2023 21:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230472AbjFFB5O (ORCPT ); Mon, 5 Jun 2023 21:57:14 -0400 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22ADDED for ; Mon, 5 Jun 2023 18:57:13 -0700 (PDT) Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-977c72b116fso408043866b.3 for ; Mon, 05 Jun 2023 18:57:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686016631; x=1688608631; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n3lFS92Q9Y5Yi8hsvO0NMVnQb89Fjn5Ds6zHjO1KoiM=; b=XwnjUL0D819GhF7PXeXG3s5ECh9TuPAPuEMw5o4AuT35JX4iqRtqM7nttZLcWbHp6w hennia2JKWQLdKFRUuoHHN5OHXG97d1xw2rO/jMbjZ6z7bCRcwkmcKQCzz94DYqIjsxq pPHRpSNHZUR+kMoQsqVB2AWk06+cecwJqSTSrpNjpz8SBZxh2PYVpkPmdDLM6ExQx4b4 Mzqo5Igu5UrMoe9l25HrbQm/e0j2cG4sOYLgOzy4dlmtbiH+To7U74GiK1e6iKe81r2i 4rEvZ3uJga+56DHyEC1FsVe69UNP2sLqgIc2SUJzZhCyOyVXjESYxG2dket9tNQOWRkt iZ0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686016631; x=1688608631; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n3lFS92Q9Y5Yi8hsvO0NMVnQb89Fjn5Ds6zHjO1KoiM=; b=GjRC9mZHf4afnfkd2em9xNCP1MV/paGdqTV1vRYmp8zADgRQAQ5EPQKkDK7lNhCvUd LQxnJBaPFiMgwQFOz+g5ww/+xoVET4LkLG/UrQoPxg7kXKMHipPCpUsDCS3FtvKyfGJi kFwTwxsZM9IqN3CZRBXzQEC4TJRVuZVkbnXhAcao+JLx4QCJlo4fh1PmrCapCOGwR0Qr r5X8ZZByTqQLvfBQWrq6NI+wAmyna6tdUWYf1S8DEixaFV1eVTcEVFwBpUVgGNSOMHBC 7O6NPmt8eCm6jetRp+/Y9RXs+dkrbd/50prM/4dT7cDG1g5+p0Myanq+mfKexuAlSEqB LnTA== X-Gm-Message-State: AC+VfDwecjkKcvouupyASqiTTtZKkIvZ/QnwiJ6urrTTtBrlOmb5hQDC wsRGFU7xYuMJIuqBrIzQXviERLduTfYf9FiOVsIT8w== X-Received: by 2002:a17:907:6e0e:b0:978:6b18:e935 with SMTP id sd14-20020a1709076e0e00b009786b18e935mr605184ejc.23.1686016631437; Mon, 05 Jun 2023 18:57:11 -0700 (PDT) MIME-Version: 1.0 References: <20230531022911.1168524-1-yosryahmed@google.com> <20230601155825.GF102494@cmpxchg.org> <20230602164942.GA215355@cmpxchg.org> <20230602183410.GB215355@cmpxchg.org> <20230602202453.GA218605@cmpxchg.org> In-Reply-To: <20230602202453.GA218605@cmpxchg.org> From: Yosry Ahmed Date: Mon, 5 Jun 2023 18:56:35 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: multiple zpool support To: Johannes Weiner Cc: Sergey Senozhatsky , Minchan Kim , Konrad Rzeszutek Wilk , Andrew Morton , Seth Jennings , Dan Streetman , Vitaly Wool , Nhat Pham , Domenico Cerasuolo , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 2, 2023 at 1:24=E2=80=AFPM Johannes Weiner = wrote: > > On Fri, Jun 02, 2023 at 12:14:28PM -0700, Yosry Ahmed wrote: > > On Fri, Jun 2, 2023 at 11:34=E2=80=AFAM Johannes Weiner wrote: > > > > > > On Fri, Jun 02, 2023 at 09:59:20AM -0700, Yosry Ahmed wrote: > > > > On Fri, Jun 2, 2023 at 9:49=E2=80=AFAM Johannes Weiner wrote: > > > > > Again, what about the zswap_tree.lock and swap_info_struct.lock? > > > > > They're the same scope unless you use multiple swap files. Would = it > > > > > make sense to tie pools to trees, so that using multiple swapfile= s for > > > > > concurrency purposes also implies this optimization? > > > > > > > > Yeah, using multiple swapfiles helps with those locks, but it doesn= 't > > > > help with the zpool lock. > > > > > > > > I am reluctant to take this path because I am trying to get rid of > > > > zswap's dependency on swapfiles to begin with, and have it act as i= ts > > > > own standalone swapping backend. If I am successful, then having on= e > > > > zpool per zswap_tree is just a temporary fix. > > > > > > What about making the pools per-cpu? > > > > > > This would scale nicely with the machine size. And we commonly deal > > > with for_each_cpu() loops and per-cpu data structures, so have good > > > developer intuition about what's reasonable to squeeze into those. > > > > > > It would eliminate the lock contention, for everybody, right away, an= d > > > without asking questions. > > > > > > It would open the door to all kinds of locking optimizations on top. > > > > The page can get swapped out on one cpu and swapped in on another, no? > > > > We will need to store which zpool the page is stored in in its zswap > > entry, and potentially grab percpu locks from other cpus in the swap > > in path. The lock contention would probably be less, but certainly not > > eliminated. > > > > Did I misunderstand? > > Sorry, I should have been more precise. > > I'm saying that using NR_CPUS pools, and replacing the hash with > smp_processor_id(), would accomplish your goal of pool concurrency. > But it would do so with a broadly-used, well-understood scaling > factor. We might not need a config option at all. > > The lock would still be there, but contention would be reduced fairly > optimally (barring preemption) for store concurrency at least. Not > fully eliminated due to frees and compaction, though, yes. Yeah I think we can do that. I looked at the size of the zsmalloc pool as an example, and it seems to be less than 1K, so having one percpu seems okay. There are a few things that we will need to do: - Rework zswap_update_total_size(). We don't want to loop through all cpus on each load/store. We can be smarter about it and inc/dec the total zswap pool size each time we allocate or free a page in the driver. This might need some plumbing from the drivers to zswap (or passing a callback from zswap to the drivers). - Update zsmalloc such that all pool share kmem caches, instead of creating two kmem caches for zsmalloc percpu. This was a follow-up I had in mind for multiple zpools support anyway, but I guess it's more significant if we have NR_CPUS pools. I was nervous about increasing the size of struct zswap_entry to store the cpu/zpool where the entry resides, but I realized we can replace the pointer to zswap_pool in struct zswap_entry with a pointer to zpool, and add a zswap_pool pointer in struct zpool. This will actually trim down the common "entry->pool->zpool" to just "entry->zpool", and then we can replace any "entry->pool" with "entry->zpool->pool". @Yu Zhao, any thoughts on this? The multiple zpools support was initially your idea (and did the initial implementation) -- so your input is very valuable here. > > I'm not proposing more than that at this point. I only wrote the last > line because already having per-cpu data structures might help with > fast path optimizations down the line, if contention is still an > issue. But unlikely. So it's not so important. Let's forget it.