Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8901053rwr; Thu, 11 May 2023 07:35:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5VPC8WaSzp5Ok3NzuzlVom4ClJPpOp1P7fDA4r1IfDJW9Di4wZwBMjBs4pzaf8VL1oBCnF X-Received: by 2002:a05:6a20:840a:b0:ee:d1b5:146f with SMTP id c10-20020a056a20840a00b000eed1b5146fmr27792833pzd.34.1683815720155; Thu, 11 May 2023 07:35:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683815720; cv=none; d=google.com; s=arc-20160816; b=fUxEVQ0C4YPCDLVBwH/fZwMVslht7WZs5zT/1CzDDpboDAp9b3pDdgjMgtR5efmamV BRwL4lvPCjwwAokcRW0JBZaWv0PuYvM/KKQk3R/lidSBeCVhzfCrehkT9rcT3DI/gPmI wX9tNAJLf9ejjzz65HAnavO7oSsdf7E8AJ3+YbytZzhwTnFuYK6IIdI1W8RBt4bukZbo b6XjGgbtUmEv8UAMQz7zjPN7EP8Ztin456i53n8Eo5G/FQACVY1BuqZJAPFzRsp2OQZ+ aJvDLsYzLPGHtlNvNyewWP9I/ducqFivBYW62OELr6mCTuipZKNmkY+AnZBApXf6dx+F bAXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=+XlnKlB0x9Rnb9o8KcwVCDnlrBU4hIuzG8sbsUaPiBA=; b=RBHqV+HPwnpiyFmIq5+qXV4V13SyrfhqYDt/IeRYkWqPswFyjM3YNDCQ45x+rET830 FIZel6fLCu1QqtQSpEHYcnkcvQWSYKTH1iS9TRkekDN/v/tLxS4K52WtdkspxIzSQCyp 8BAzX2yp0ZSHOc3fQLL9HIkV2clRGDq+BQMP9DdrKFXUhXwlqj1DOBAMVjeaSsgTeU9m I2XEWvatBU+3xdKHxBfyM1qhDthpallxvQktGsLxvuvr8kKzIYKZECcWIidRd2c7iHbl iHL+IyzlyTD4ht3vlwOUs4s/QxljZQzG1UQmgGUMvH02HQj/B9CJGzzI6OrauBa/Ktw2 91zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jMJAJCq0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k190-20020a636fc7000000b0052c3f0b850asi7065485pgc.221.2023.05.11.07.35.07; Thu, 11 May 2023 07:35:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jMJAJCq0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238398AbjEKOYL (ORCPT + 99 others); Thu, 11 May 2023 10:24:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238302AbjEKOXy (ORCPT ); Thu, 11 May 2023 10:23:54 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD1C6106C5 for ; Thu, 11 May 2023 07:23:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683815032; x=1715351032; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=be2GtGpChjQC18AL8KYB2LRliaSADqMrGVcS9L1D0U0=; b=jMJAJCq08UDR8v89x8za9VfXRvm7x7NewOJt1mVYc9f19Q6QBKZPANcJ KT94H8pNP1D3phOoA0z+MZ4bWHZ9y4cLMa2c/Nq501dMAgpEIxVD5YZw6 je6J/+UmcE/Vu26AH2MNa1SYN+BPeIBwAQi2mvd+G+/zE+7Fm0TFruNT4 QPVD3geCRnv/2Kir/8VJ4f/qXXkIWQg6B9oARcWNM5d2Jgo44thyBABKS h6QF8xp2ohyn/GboX/DizUk0evVnfbzLKkjszc0BYAWyRViTmHQ80QvZ9 40vqHS9wBYn+mkSwJEF07iz3WKhNwGoesy1R6QnHSxAFDyjcWLC3XPJUN A==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="349352185" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="349352185" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 07:23:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="769349575" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="769349575" Received: from ambujamp-mobl1.amr.corp.intel.com (HELO [10.212.238.119]) ([10.212.238.119]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 07:23:51 -0700 Message-ID: Date: Thu, 11 May 2023 07:23:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones Content-Language: en-US To: Huang Ying , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox References: <20230511065607.37407-1-ying.huang@intel.com> From: Dave Hansen In-Reply-To: <20230511065607.37407-1-ying.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/10/23 23:56, Huang Ying wrote: > To improve the scalability of the page allocation, in this series, we > will create one zone instance for each about 256 GB memory of a zone > type generally. That is, one large zone type will be split into > multiple zone instances. A few anecdotes for why I think _some_ people will like this: Some Intel hardware has a "RAM" caching mechanism. It either caches DRAM in High-Bandwidth Memory or Persistent Memory in DRAM. This cache is direct-mapped and can have lots of collisions. One way to prevent collisions is to chop up the physical memory into cache-sized zones and let users choose to allocate from one zone. That fixes the conflicts. Some other Intel hardware a ways to chop a NUMA node representing a single socket into slices. Usually one slice gets a memory controller and its closest cores. Intel calls these approaches Cluster on Die or Sub-NUMA Clustering and users can select it from the BIOS. In both of these cases, users have reported scalability improvements. We've gone as far as to suggest the socket-splitting options to folks today who are hitting zone scalability issues on that hardware. That said, those _same_ users sometimes come back and say something along the lines of: "So... we've got this app that allocates a big hunk of memory. It's going slower than before." They're filling up one of the chopped-up zones, hitting _some_ kind of undesirable reclaim behavior and they want their humpty-dumpty zones put back together again ... without hurting scalability. Some people will never be happy. :) Anyway, _if_ you do this, you might also consider being able to dynamically adjust a CPU's zonelists somehow. That would relieve pressure on one zone for those uneven allocations. That wasn't an option in the two cases above because users had ulterior motives for sticking inside a single zone. But, in your case, the zones really do have equivalent performance.