Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp8307256rwd; Tue, 20 Jun 2023 13:13:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5YEmJrabNMcIyNHkdiXSMbqsK/7jKQq6T3pHs5+th4h/V+qnbo01MHH2uZKk4EeVY9oYbF X-Received: by 2002:a05:6358:9f84:b0:127:bde8:eeb0 with SMTP id fy4-20020a0563589f8400b00127bde8eeb0mr4171905rwb.27.1687292032462; Tue, 20 Jun 2023 13:13:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687292032; cv=none; d=google.com; s=arc-20160816; b=mFgDb4SVO21gvCRdHwKHWVpJpM+w99rORjy5V7YtOj4kxsp2nK8rZFHnTswyLJCeRQ 6/P8kaAvgoh/1OX0woXzCMoC0WR5oR2aF8z9TGQanLE4ZISBLdGSjHpnOSaZqFp3ydyh v2HehgDcBRDU09NudtmOZLBKGUYo/IFnpIjQfg6rwHq6Wjpl0ozbtd0jk9BnTNbcdp9T 0cWBZmqtop9FQ8W+d9E+miHjt3HMrm8ywvcXVYJU5K3c6ijigMurEatoU6Pav1Tie4kR TC6YGAJDCoChTwYkpwCKkF8fFhn+UCcVY0kGsKNrht3407xzp6am0TCXqCVp0nw29ogP 9Gcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=fPG1hZr1b86rbuKdMMQ/ZpTGwp0FuFl5uKnpt/o7a05jBixgNqNn7OduYk5o43GDmk ikr2MQlGyY/3LOsdYYDjG6oX/2+humk9nslbopsIeo5371a0dT4oTA3L7BioGX7Tek+R LKujvNMPEyeEBl8mLpDyUnr/ajEylaMbmO+XD0toM4oVyfohhx0QGAFPys1oejDMyi81 VHSvDRPFO7NOC+uJr1m0NXKEWrYnZ5W48vo+tdp0g6bG286x3q9RlED5mCpshJtoYxid 8UFbm7WoDKZuOHpK3O/BLjaqJkyeDWF4nO0VaFQuX0xqG9C5vW7Jbkk/WGYQzVGImlSp GH3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=SoIHZ9gN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b13-20020a63d80d000000b00544b88dda25si2517496pgh.64.2023.06.20.13.13.04; Tue, 20 Jun 2023 13:13:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=SoIHZ9gN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230440AbjFTTtE (ORCPT + 99 others); Tue, 20 Jun 2023 15:49:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229549AbjFTTtD (ORCPT ); Tue, 20 Jun 2023 15:49:03 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEAB010F4 for ; Tue, 20 Jun 2023 12:49:01 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-982a88ca610so615590066b.2 for ; Tue, 20 Jun 2023 12:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687290540; x=1689882540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=SoIHZ9gNZGx9ulbJa5FGNGB4riPB8xOL2DywLVaBAbznxmtY4QCbC+3YwLyjupKFnx 73HVqeqPytzCIhX3q/nh28darjjANhLeb3J8d/Y6aavGlNk81grVrzMkDX30omguC3R+ TrDtiraYFxOQiurn/IrKPsy8ERQKN+a72nJB4WkDGxGQPJx7AwPZXUr7gWRXgQCkwYIn oxNRo9Xcmvi9lySgr6VtN2p4xFu7lRmpGG7BtOn7GI62F/W8qKWzqiKrxYo/6PvcNkxR yLIdWg/55z9q+wXKedzzn+/Yxc+awJmt0kbnFBBmkCOzJZsHaiac39FgV9dJgAI1TGtr guYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687290540; x=1689882540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=kphmEJhKHn9GOgPR905HWpILER6xxk2CCHGsLX4F5AddLlNPwnahZK6MWQe0A55QbL SLfC5Pfap1AbIyMoOzZ5XIdYabrCAE0D9xyNbAdZ486EGXyJJ+yqPIqFw/eGLl7eDghQ UjCF0E4zvD1lQXvi4uOutywTAbLnaVXIhhAnj/oKyWpJGA9pIssmjb6wKhbxcRndIy0Q z+sDPbKmYOhjtlTe0KBUJ7TMyDdwh4Blokhw36cM6KN+wjYEP3tr3c/kN8zSWvkXGtv1 rc6YeIfTPrm91z1mfNcNvyHhr7gHQGPrWcI26Cgd68r0f2aq3bHGhMvKv/cf6uEAEBUd r74A== X-Gm-Message-State: AC+VfDzJVYTAkLCxRmBdBgW3ZZHHNHpPKE2hRiIMQ9QBneJ3BRHG2T5H AJcRYJXH/hgMDTCZv9lRqHX+cnDLkI6h3YlGWx9Fqg== X-Received: by 2002:a17:907:1607:b0:96f:dd14:f749 with SMTP id hb7-20020a170907160700b0096fdd14f749mr12248458ejc.23.1687290540158; Tue, 20 Jun 2023 12:49:00 -0700 (PDT) MIME-Version: 1.0 References: <20230531022911.1168524-1-yosryahmed@google.com> <20230601155825.GF102494@cmpxchg.org> <20230602164942.GA215355@cmpxchg.org> <20230602183410.GB215355@cmpxchg.org> <20230602202453.GA218605@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 20 Jun 2023 12:48:23 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: multiple zpool support To: Johannes Weiner Cc: Sergey Senozhatsky , Minchan Kim , Konrad Rzeszutek Wilk , Andrew Morton , Seth Jennings , Dan Streetman , Vitaly Wool , Nhat Pham , Domenico Cerasuolo , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 14, 2023 at 1:50=E2=80=AFPM Yosry Ahmed = wrote: > > On Wed, Jun 14, 2023 at 7:59=E2=80=AFAM Johannes Weiner wrote: > > > > On Tue, Jun 13, 2023 at 01:13:59PM -0700, Yosry Ahmed wrote: > > > On Mon, Jun 5, 2023 at 6:56=E2=80=AFPM Yosry Ahmed wrote: > > > > > > > > On Fri, Jun 2, 2023 at 1:24=E2=80=AFPM Johannes Weiner wrote: > > > > > Sorry, I should have been more precise. > > > > > > > > > > I'm saying that using NR_CPUS pools, and replacing the hash with > > > > > smp_processor_id(), would accomplish your goal of pool concurrenc= y. > > > > > But it would do so with a broadly-used, well-understood scaling > > > > > factor. We might not need a config option at all. > > > > > > > > > > The lock would still be there, but contention would be reduced fa= irly > > > > > optimally (barring preemption) for store concurrency at least. No= t > > > > > fully eliminated due to frees and compaction, though, yes. > > > > > > I thought about this again and had some internal discussions, and I a= m > > > more unsure about it. Beyond the memory overhead, having too many > > > zpools might result in higher fragmentation within the zpools. For > > > zsmalloc, we do not compact across multiple zpools for example. > > > > > > We have been using a specific number of zpools in our production for > > > years now, we know it can be tuned to achieve performance gains. OTOH= , > > > percpu zpools (or NR_CPUS pools) seems like too big of a hammer, > > > probably too many zpools in a lot of cases, and we wouldn't know how > > > many zpools actually fits our workloads. > > > > Is it the same number across your entire fleet and all workloads? > > Yes. > > > > > How large *is* the number in relation to CPUs? > > It differs based on the number of cpus on the machine. We use 32 > zpools on all machines. > > > > > > I see value in allowing the number of zpools to be directly > > > configurable (it can always be left as 1), and am worried that with > > > percpu we would be throwing away years of production testing for an > > > unknown. > > > > > > I am obviously biased, but I don't think this adds significant > > > complexity to the zswap code as-is (or as v2 is to be precise). > > > > I had typed out this long list of reasons why I hate this change, and > > then deleted it to suggest the per-cpu scaling factor. > > > > But to summarize my POV, I think a user-facing config option for this > > is completely inappropriate. There are no limits, no guidance, no sane > > default. And it's very selective about micro-optimizing this one lock > > when there are several locks and datastructures of the same scope in > > the swap path. This isn't a reasonable question to ask people building > > kernels. It's writing code through the Kconfig file. > > It's not just swap path, it's any contention that happens within the > zpool between its different operations (map, alloc, compaction, etc). > My thought was that if a user observes high contention in any of the > zpool operations, they can increase the number of zpools -- basically > this should be empirically decided. If unsure, the user can just leave > it as a single zpool. > > > > > Data structure scalability should be solved in code, not with config > > options. > > I agree, but until we have a more fundamental architectural solution, > having multiple zpools to address scalability is a win. We can remove > the config option later if needed. > > > > > My vote on the patch as proposed is NAK. > > I hear the argument about the config option not being ideal here, but > NR_CPUs is also not ideal. > > How about if we introduce it as a constant in the kernel? We have a > lot of other constants around the kernel that do not scale with the > machine size (e.g. SWAP_CLUSTER_MAX). We can start with 32, which is a > value that we have tested in our data centers for many years now and > know to work well. We can revisit later if needed. > > WDYT? I sent v3 [1] with the proposed constant instead of a config option, hopefully this is more acceptable. [1]https://lore.kernel.org/lkml/20230620194644.3142384-1-yosryahmed@google.= com/