Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1224241pxb; Sun, 21 Feb 2021 16:59:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJzz6pjjuvXNsNoTHpNd19vSypUI6c868Ivk6j+KP5WJXe7UZQlfruwV6bMWB4Zce811EbQf X-Received: by 2002:a17:906:296a:: with SMTP id x10mr18379180ejd.240.1613955541337; Sun, 21 Feb 2021 16:59:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613955541; cv=none; d=google.com; s=arc-20160816; b=nI0hS5vQUf8FpHlCBPxRB+c6kPyUJyhIcmtGYjXjuIQKh8OponxshjvaXxmDLVMI20 jzgJXZN4zJ4m81tu1uU/L6M5tt0N2OkamnyPb9ZIlp4KrtBttVcTh8kIsr+XnYnmWybQ le5dDq7HU/OBf4oNYdrjhM/z0rIn4xgJUTsB2K9qC0T9Q45WHoXXIsT+BtaPBhy+VdEK qyEhXtmhlzrTQt7d/uwe38FAUvp2RXCBfeRqyFM1INXYgUnPqkuMF8u6xbfOAA/abdDf oQPodCp+RBYh5o+Xh0EeAtAZPTajqs125zmpEEKhslrbpaxkW62ZlygxBoNRfZlNXWAB L4ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:autocrypt:from :references:cc:to:subject:ironport-sdr:ironport-sdr; bh=dBagpRS50ryPGJuH8TIcak6DD2SA8P0EGzhdp/xP394=; b=EElVvYezQS33aR9rkclZYvx8iNRpLo1D8gmdYNd/Y0HTtPdM5H6OnEE0WepjyFyZqT Nl8byxR0M3LddWKAXVIjpPWpGQJvUHXIRnsQzcZBshXlWL4u1NXwYphzii3aVTOR8CN6 fyvEFRCSO1BbQp5Bd7Qldc7OmKzaM79OIJiGEA7WC83DTXAHs6szRJp/CP0V1wZKN87O rqYIPiJjpsLplp88mghGcCZZc3ZgPavtq/csD6ZFopAGHRiUn8nIPaWOF/WPzZfFfoaq JwLTaPEHOQm9C9C+Bb79CYmViYlXqRzR4m6vgq1cIkKBOSTSfY6asj4rij4MI//wdACV T0gw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v16si10853983ejy.232.2021.02.21.16.58.39; Sun, 21 Feb 2021 16:59:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbhBVAzR (ORCPT + 99 others); Sun, 21 Feb 2021 19:55:17 -0500 Received: from mga01.intel.com ([192.55.52.88]:38495 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229802AbhBVAzQ (ORCPT ); Sun, 21 Feb 2021 19:55:16 -0500 IronPort-SDR: ksf9cKzOWJuRs1aAs0CcUxgb6UzKyY47C28fNTqOc1v61pQDoRjOO3RbgSBOzY6pmpo3rzBMtw u9/D+u3kqjnw== X-IronPort-AV: E=McAfee;i="6000,8403,9902"; a="203684205" X-IronPort-AV: E=Sophos;i="5.81,195,1610438400"; d="scan'208";a="203684205" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2021 16:54:35 -0800 IronPort-SDR: U6YT//L/lZgPE4363x3l7YRll+aJ2Onk6GtmvQxV7YnXfsTNa8XRkvNSn1n+21/56rQoN0B2FQ Zav4wx9gvajw== X-IronPort-AV: E=Sophos;i="5.81,195,1610438400"; d="scan'208";a="402311448" Received: from arajago-mobl.amr.corp.intel.com (HELO [10.209.130.124]) ([10.209.130.124]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2021 16:54:34 -0800 Subject: Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() To: Jarkko Sakkinen , linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com, dan.j.williams@intel.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Dave Hansen , linux-kernel@vger.kernel.org References: <20210221020631.171404-1-jarkko@kernel.org> From: Dave Hansen Autocrypt: addr=dave.hansen@intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzShEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gPGRhdmVAc3I3MS5uZXQ+wsF7BBMBAgAlAhsDBgsJCAcDAgYVCAIJ CgsEFgIDAQIeAQIXgAUCTo3k0QIZAQAKCRBoNZUwcMmSsMO2D/421Xg8pimb9mPzM5N7khT0 2MCnaGssU1T59YPE25kYdx2HntwdO0JA27Wn9xx5zYijOe6B21ufrvsyv42auCO85+oFJWfE K2R/IpLle09GDx5tcEmMAHX6KSxpHmGuJmUPibHVbfep2aCh9lKaDqQR07gXXWK5/yU1Dx0r VVFRaHTasp9fZ9AmY4K9/BSA3VkQ8v3OrxNty3OdsrmTTzO91YszpdbjjEFZK53zXy6tUD2d e1i0kBBS6NLAAsqEtneplz88T/v7MpLmpY30N9gQU3QyRC50jJ7LU9RazMjUQY1WohVsR56d ORqFxS8ChhyJs7BI34vQusYHDTp6PnZHUppb9WIzjeWlC7Jc8lSBDlEWodmqQQgp5+6AfhTD kDv1a+W5+ncq+Uo63WHRiCPuyt4di4/0zo28RVcjtzlGBZtmz2EIC3vUfmoZbO/Gn6EKbYAn rzz3iU/JWV8DwQ+sZSGu0HmvYMt6t5SmqWQo/hyHtA7uF5Wxtu1lCgolSQw4t49ZuOyOnQi5 f8R3nE7lpVCSF1TT+h8kMvFPv3VG7KunyjHr3sEptYxQs4VRxqeirSuyBv1TyxT+LdTm6j4a mulOWf+YtFRAgIYyyN5YOepDEBv4LUM8Tz98lZiNMlFyRMNrsLV6Pv6SxhrMxbT6TNVS5D+6 UorTLotDZKp5+M7BTQRUY85qARAAsgMW71BIXRgxjYNCYQ3Xs8k3TfAvQRbHccky50h99TUY sqdULbsb3KhmY29raw1bgmyM0a4DGS1YKN7qazCDsdQlxIJp9t2YYdBKXVRzPCCsfWe1dK/q 66UVhRPP8EGZ4CmFYuPTxqGY+dGRInxCeap/xzbKdvmPm01Iw3YFjAE4PQ4hTMr/H76KoDbD cq62U50oKC83ca/PRRh2QqEqACvIH4BR7jueAZSPEDnzwxvVgzyeuhwqHY05QRK/wsKuhq7s UuYtmN92Fasbxbw2tbVLZfoidklikvZAmotg0dwcFTjSRGEg0Gr3p/xBzJWNavFZZ95Rj7Et db0lCt0HDSY5q4GMR+SrFbH+jzUY/ZqfGdZCBqo0cdPPp58krVgtIGR+ja2Mkva6ah94/oQN lnCOw3udS+Eb/aRcM6detZr7XOngvxsWolBrhwTQFT9D2NH6ryAuvKd6yyAFt3/e7r+HHtkU kOy27D7IpjngqP+b4EumELI/NxPgIqT69PQmo9IZaI/oRaKorYnDaZrMXViqDrFdD37XELwQ gmLoSm2VfbOYY7fap/AhPOgOYOSqg3/Nxcapv71yoBzRRxOc4FxmZ65mn+q3rEM27yRztBW9 AnCKIc66T2i92HqXCw6AgoBJRjBkI3QnEkPgohQkZdAb8o9WGVKpfmZKbYBo4pEAEQEAAcLB XwQYAQIACQUCVGPOagIbDAAKCRBoNZUwcMmSsJeCEACCh7P/aaOLKWQxcnw47p4phIVR6pVL e4IEdR7Jf7ZL00s3vKSNT+nRqdl1ugJx9Ymsp8kXKMk9GSfmZpuMQB9c6io1qZc6nW/3TtvK pNGz7KPPtaDzvKA4S5tfrWPnDr7n15AU5vsIZvgMjU42gkbemkjJwP0B1RkifIK60yQqAAlT YZ14P0dIPdIPIlfEPiAWcg5BtLQU4Wg3cNQdpWrCJ1E3m/RIlXy/2Y3YOVVohfSy+4kvvYU3 lXUdPb04UPw4VWwjcVZPg7cgR7Izion61bGHqVqURgSALt2yvHl7cr68NYoFkzbNsGsye9ft M9ozM23JSgMkRylPSXTeh5JIK9pz2+etco3AfLCKtaRVysjvpysukmWMTrx8QnI5Nn5MOlJj 1Ov4/50JY9pXzgIDVSrgy6LYSMc4vKZ3QfCY7ipLRORyalFDF3j5AGCMRENJjHPD6O7bl3Xo 4DzMID+8eucbXxKiNEbs21IqBZbbKdY1GkcEGTE7AnkA3Y6YB7I/j9mQ3hCgm5muJuhM/2Fr OPsw5tV/LmQ5GXH0JQ/TZXWygyRFyyI2FqNTx4WHqUn3yFj8rwTAU1tluRUYyeLy0ayUlKBH ybj0N71vWO936MqP6haFERzuPAIpxj2ezwu0xb1GjTk4ynna6h5GjnKgdfOWoRtoWndMZxbA z5cecg== Message-ID: <7acc3c1c-373e-cfee-e838-2af170e87d98@intel.com> Date: Sun, 21 Feb 2021 16:54:33 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210221020631.171404-1-jarkko@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > +/* Nodes with one or more EPC sections. */ > +static nodemask_t sgx_numa_mask; I'd also add that this is for optimization only. > +/* Array of lists of EPC sections for each NUMA node. */ > +struct list_head *sgx_numa_nodes; I'd much prefer: /* * Array with one list_head for each possible NUMA node. Each * list contains all the sgx_epc_section's which are on that * node. */ Otherwise, it's hard to imagine what this structure looks like. > /* > * These variables are part of the state of the reclaimer, and must be accessed > * with sgx_reclaimer_lock acquired. > @@ -473,6 +479,26 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec > return page; > } > > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) > +{ > + struct sgx_epc_section *section; > + struct sgx_epc_page *page; > + > + if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) > + return NULL; > + > + if (!node_isset(nid, sgx_numa_mask)) > + return NULL; > + > + list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) { > + page = __sgx_alloc_epc_page_from_section(section); > + if (page) > + return page; > + } > + > + return NULL; > +} > + > /** > * __sgx_alloc_epc_page() - Allocate an EPC page > * > @@ -485,13 +511,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec > */ > struct sgx_epc_page *__sgx_alloc_epc_page(void) > { > + int current_nid = numa_node_id(); > struct sgx_epc_section *section; > struct sgx_epc_page *page; > int i; > > + page = __sgx_alloc_epc_page_from_node(current_nid); > + if (page) > + return page; Comments, please. /* Try to allocate EPC from the current node, first: */ then: /* Search all EPC sections, ignoring locality: */ > for (i = 0; i < sgx_nr_epc_sections; i++) { > section = &sgx_epc_sections[i]; > - > page = __sgx_alloc_epc_page_from_section(section); > if (page) > return page; This still has the problem that it exerts too much pressure on the low-numbered sgx_epc_sections[]. If a node's sections are full, it always tries to go after sgx_epc_sections[0]. It can be in another patch, but I think the *minimal* thing we can do here for a NUMA allocator is to try to at least balance the allocations. Instead of having a for-each-section loop, I'd make it for-each-node -> for-each-section. Something like: for (i = 0; i < num_possible_nodes(); i++) { node = (numa_node_id() + i) % num_possible_nodes() if (!node_isset(nid, sgx_numa_mask)) continue; list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) { __sgx_alloc_epc_page_from_section(section) } } Then you have a single loop instead of a "try local then a fall back". Also, that "node++" thing might be able to use next_online_node(). > @@ -665,8 +695,12 @@ static bool __init sgx_page_cache_init(void) > { > u32 eax, ebx, ecx, edx, type; > u64 pa, size; > + int nid; > int i; > > + nodes_clear(sgx_numa_mask); > + sgx_numa_nodes = kmalloc_array(MAX_NUMNODES, sizeof(*sgx_numa_nodes), GFP_KERNEL); MAX_NUMNODES will always be the largest compile-time constant. That's 4k, IIRC. num_possible_nodes() might be as small as 1 if NUMA is off. > for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) { > cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx); > > @@ -690,6 +724,22 @@ static bool __init sgx_page_cache_init(void) > } > > sgx_nr_epc_sections++; > + > + nid = numa_map_to_online_node(phys_to_target_node(pa)); > + > + if (nid == NUMA_NO_NODE) { > + pr_err(FW_BUG "unable to map EPC section %d to online node.\n", nid); > + nid = 0; Could we dump out the physical address there? I think that's even more informative than a section number. > + } else if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) { > + nid = 0; > + } I'm not sure we really need to check for these. If we're worried about the firmware returning these, I'd expect numa_map_to_online_node() to sanity check them for us. > + if (!node_isset(nid, sgx_numa_mask)) { > + INIT_LIST_HEAD(&sgx_numa_nodes[nid]); > + node_set(nid, sgx_numa_mask); > + } > + > + list_add_tail(&sgx_epc_sections[i].section_list, &sgx_numa_nodes[nid]); > } > > if (!sgx_nr_epc_sections) { > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > index 5fa42d143feb..4bc31bc4bacf 100644 > --- a/arch/x86/kernel/cpu/sgx/sgx.h > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > @@ -45,6 +45,7 @@ struct sgx_epc_section { > spinlock_t lock; > struct list_head page_list; > unsigned long free_cnt; > + struct list_head section_list; Maybe name this numa_section_list.