Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1964989pxb; Thu, 4 Nov 2021 11:33:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5bvCDSxMb+E2wi8UsPtnqtNBA4WVls6fwzvFsxLF/r7HBAZqkWRJxXb+RMRl7UEmd3rdK X-Received: by 2002:a17:907:3f83:: with SMTP id hr3mr65756141ejc.555.1636050817088; Thu, 04 Nov 2021 11:33:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636050817; cv=none; d=google.com; s=arc-20160816; b=knZG/4Tlfbd6jdaAf6iyfpGhVZA/T6/vI4oONMRyqu/9mLxNniq3lHLHv5U50ENJwK QwBUXv82IYwALWKayXKS7tEgepIuleIZP5HU00RC7XmhP7Qt1R39MkOcr9fKoiQn/Vqc Q14l+9vYaxF3b+oVZRnFwWZgi2HyZ8/a6qXqJYGjxdNKj81Wo2D0mmELx4uAaYN5qekG MQJ4rtdXK9livkt4MbWPb4fsYHKD/Epd1UFfNwmOcoUHTjSq4jN06W3c50EnVGv0lIxV V7HSz/W6Qk6HsU/zCOAW5yvyps1Zkz8N6DeFsZn0wNzqI+FkMpQRCoKzpftMSxW+OXmW N7tA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=c6sVL/Rj9ZQQMQhydjl/vrD5B1BFak0qWW8l8scn0CY=; b=c7Tvxl1qKSQ8GCspcu/xDX9iEk++yPEyhejU+mMeNl0H8A4YuGkwrt6sBkWwJbyUR5 GKpWfrUtvp7HQItzVo7lRG5uxAlWP9vXnNQV1C74P4IF9PiSVNumWQmH2YwoxEonqAC6 cK1DdbdJ/LlrMHgCwgO7OvKlTWK8gkCbU9phCdLUCDCGlXyx2xplipdA2XnhpdTEJ3F1 Gsc6VjZqZMYR4s8OE+jT/CeJNejshpIpUvWwriD0ylVnXZdjNuPxQPBygPYeng/RMbxC bElJnXptjgbFeAcbmXTwB+PoP5a133Iv7tYn77XCqN/6R0etNWzBB9c+yizjQSrfS3s+ F2+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b5si9216251ejb.4.2021.11.04.11.33.11; Thu, 04 Nov 2021 11:33:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232262AbhKDSbt (ORCPT + 99 others); Thu, 4 Nov 2021 14:31:49 -0400 Received: from mga01.intel.com ([192.55.52.88]:10292 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230504AbhKDSbt (ORCPT ); Thu, 4 Nov 2021 14:31:49 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10158"; a="255413144" X-IronPort-AV: E=Sophos;i="5.87,209,1631602800"; d="scan'208";a="255413144" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2021 11:29:10 -0700 X-IronPort-AV: E=Sophos;i="5.87,209,1631602800"; d="scan'208";a="668001627" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2021 11:29:10 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, linux-sgx@vger.kernel.org, x86@kernel.org Cc: seanjc@google.com, tony.luck@intel.com, hpa@zytor.com, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH] x86/sgx: Fix free page accounting Date: Thu, 4 Nov 2021 11:28:54 -0700 Message-Id: <373992d869cd356ce9e9afe43ef4934b70d604fd.1636049678.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The SGX driver maintains a single global free page counter, sgx_nr_free_pages, that reflects the number of free pages available across all NUMA nodes. Correspondingly, a list of free pages is associated with each NUMA node and sgx_nr_free_pages is updated every time a page is added or removed from any of the free page lists. The main usage of sgx_nr_free_pages is by the reclaimer that will run when the total free pages go below a watermark to ensure that there are always some free pages available to, for example, support efficient page faults. With sgx_nr_free_pages accessed and modified from a few places it is essential to ensure that these accesses are done safely but this is not the case. sgx_nr_free_pages is sometimes accessed without any protection and when it is protected it is done inconsistently with any one of the spin locks associated with the individual NUMA nodes. The consequence of sgx_nr_free_pages not being protected is that its value may not accurately reflect the actual number of free pages on the system, impacting the availability of free pages in support of many flows. The problematic scenario is when the reclaimer never runs because it believes there to be sufficient free pages while any attempt to allocate a page fails because there are no free pages available. The worst scenario observed was a user space hang because of repeated page faults caused by no free pages ever made available. Change the global free page counter to an atomic type that ensures simultaneous updates are done safely. While doing so, move the updating of the variable outside of the spin lock critical section to which it does not belong. Cc: stable@vger.kernel.org Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") Suggested-by: Dave Hansen Signed-off-by: Reinette Chatre --- arch/x86/kernel/cpu/sgx/main.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 63d3de02bbcc..8558d7d5f3e7 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -28,8 +28,7 @@ static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq); static LIST_HEAD(sgx_active_page_list); static DEFINE_SPINLOCK(sgx_reclaimer_lock); -/* The free page list lock protected variables prepend the lock. */ -static unsigned long sgx_nr_free_pages; +atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ static nodemask_t sgx_numa_mask; @@ -403,14 +402,15 @@ static void sgx_reclaim_pages(void) spin_lock(&node->lock); list_add_tail(&epc_page->list, &node->free_page_list); - sgx_nr_free_pages++; spin_unlock(&node->lock); + atomic_long_inc(&sgx_nr_free_pages); } } static bool sgx_should_reclaim(unsigned long watermark) { - return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list); + return atomic_long_read(&sgx_nr_free_pages) < watermark && + !list_empty(&sgx_active_page_list); } static int ksgxd(void *p) @@ -471,9 +471,9 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list); list_del_init(&page->list); - sgx_nr_free_pages--; spin_unlock(&node->lock); + atomic_long_dec(&sgx_nr_free_pages); return page; } @@ -625,9 +625,9 @@ void sgx_free_epc_page(struct sgx_epc_page *page) spin_lock(&node->lock); list_add_tail(&page->list, &node->free_page_list); - sgx_nr_free_pages++; spin_unlock(&node->lock); + atomic_long_inc(&sgx_nr_free_pages); } static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, -- 2.25.1