Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp7225111rwp; Tue, 18 Jul 2023 11:56:40 -0700 (PDT) X-Google-Smtp-Source: APBJJlGSDb9u1RHHreAsRGfXAfBCixInG9vIODQB80g+0zBWQJhxKC5UlkSmw8mhEAV1k13xFW46 X-Received: by 2002:a05:6a21:3399:b0:133:15f1:674c with SMTP id yy25-20020a056a21339900b0013315f1674cmr228256pzb.3.1689706600171; Tue, 18 Jul 2023 11:56:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689706600; cv=none; d=google.com; s=arc-20160816; b=pd1dvS7tqOa5rJgTw9vDp+FzAnPDVEVmGnmjnhMHsynlpPw2BWnDetGi7EDmMwRHVO /Lh624ZaL4z73X0YbZko3fNRsygad3J1mqQ8gMUdgBJRI5Zzf3In3RmZCAR9FHt0Tco8 +00np5won6CrXyCJWRm1ja8XQ0/vDkI6kdhG6TDGKNhyZV23Ch8ANi9Zg19r1OvuCJZb rltpLtGbBWWyn6s5Uy/2oKmwVZxgVVAUDQsUVTFRtuhzpneF385WftNAXqzUOCQUcN0J 4RRYVpdoaQGBBez745l73BI9kQKp/DSGMxIqKpMQm5fCbeta7fOaXnfRGCIszhLWnQLL oIyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:message-id:organization :from:content-transfer-encoding:mime-version:date:references:subject :cc:to:dkim-signature; bh=jTCZwGP0O3S+HsDyHE4ThARiJK5qH5BDVgOdYbi0Sxg=; fh=TkVbJDo0Vvtz8LjoRiW/5MO289g6kUpB9sJckopOxXI=; b=Nm7/IuiZA7s09h8TsmCyqH2qwgQHinURYpAo+DTuw0ETzovnqpC0ayfxAx21gPybms 6yYd1S7sgI6qb1b3I1/1S4z9gvJmq2OfqsVPg6CAlz8/J/pPHoYdQQM883sN0eBzXUcw GTpV37PC/sYRcv5dRuDN3da5E8YRLh0aceRvqOLOfSDdfiWW9YH3bYKUA59BMHkD/lDi KM9ikne+m4EFNTdKrYGUIJMCM9My3lMvppj5QVtz7Yq0P3pyZ/W1YWWAFnj70itEEW5G d79O40gRYowIXbw2jA848lMkNrnPZXw8RwL+g9/ILW/EWqr3fwn/9VpKvNODXBRSqZnV 1d9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WH9YEgeC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z12-20020a056a001d8c00b006546d0d5832si1887148pfw.183.2023.07.18.11.56.26; Tue, 18 Jul 2023 11:56:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WH9YEgeC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231190AbjGRSLn (ORCPT + 99 others); Tue, 18 Jul 2023 14:11:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbjGRSLl (ORCPT ); Tue, 18 Jul 2023 14:11:41 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69BEC99; Tue, 18 Jul 2023 11:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689703900; x=1721239900; h=to:cc:subject:references:date:mime-version: content-transfer-encoding:from:message-id:in-reply-to; bh=p4t3OskS0H/cllBsYG2hIF8dKpFFNElLWh827nNWfh8=; b=WH9YEgeCHFdiMFn1MUkZOmj/eLN53D+JXx3Som75wqjf03K53X4zcUTs 3jInRkbZ2rSPgpCkPdsRJRTNy+xoUiRU7juuQS0GuZRD7bvQ4I8aSDVyG +9M147Yae+XZMQbIXiw0gf4wZEBPUnxzoUdtxmUn4kUCLUWhRfpm11Nmn bb82RyWdjHw8FhZbVzLgAue7FBr49fteTguT2zds8DrXJtH+KLPAXIuPG B37GGjtxYH5xAvmMfe+LHhKuE1RsR233m8Dpej9VAaSvntR6vT8dKjc0v SAmMSKxI79wIJIWJ0WZE1iy0oHrA+KYgVvl8cqKAFNhRq4IxrL6AD6hlg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10775"; a="432458957" X-IronPort-AV: E=Sophos;i="6.01,214,1684825200"; d="scan'208";a="432458957" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 11:11:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10775"; a="813860875" X-IronPort-AV: E=Sophos;i="6.01,214,1684825200"; d="scan'208";a="813860875" Received: from hhuan26-mobl.amr.corp.intel.com ([10.92.48.113]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 18 Jul 2023 11:11:37 -0700 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: "Jarkko Sakkinen" , dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , x86@kernel.org, "H. Peter Anvin" , "Dave Hansen" Cc: kai.huang@intel.com, reinette.chatre@intel.com, kristen@linux.intel.com, seanjc@google.com, stable@vger.kernel.org Subject: Re: [PATCH] x86/sgx: fix a NULL pointer References: <20230717202938.94989-1-haitao.huang@linux.intel.com> Date: Tue, 18 Jul 2023 13:11:36 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Haitao Huang" Organization: Intel Message-ID: In-Reply-To: User-Agent: Opera Mail/1.0 (Win32) X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 18 Jul 2023 09:27:49 -0500, Dave Hansen wrote: > On 7/17/23 13:29, Haitao Huang wrote: >> Under heavy load, the SGX EPC reclaimers (current ksgxd or future EPC >> cgroup worker) may reclaim the SECS EPC page for an enclave and set >> encl->secs.epc_page to NULL. But the SECS EPC page is used for EAUG in >> the SGX #PF handler without checking for NULL and reloading. >> >> Fix this by checking if SECS is loaded before EAUG and load it if it was >> reclaimed. > > It would be nice to see a _bit_ more theory of the bug in here. > > What is an SECS page and why is it special in a reclaim context? Why is > this so hard to hit? What led you to discover this issue now? What is > EAUG? Let me know if this clarify things. The SECS page holds global states of an enclave, and all reclaimable pages tracked by the SGX EPC reclaimer (ksgxd) are considered 'child' pages of the SECS page corresponding to that enclave. The reclaimer only reclaims the SECS page when all its children are reclaimed. That can happen on system under high EPC pressure where multiple large enclaves demanding much more EPC page than physically available. In a rare case, the reclaimer may reclaim all EPC pages of an enclave and it SECS page, setting encl->secs.epc_page to NULL, right before the #PF handler get the chance to handle a #PF for that enclave. In that case, if that #PF happens to require kernel to invoke the EAUG instruction to add a new EPC page for the enclave, then a NULL pointer results as current code does not check if encl->secs.epc_page is NULL before using it. The bug is easier to reproduce with the EPC cgroup implementation when a low EPC limit is set for a group of enclave hosting processes. Without the EPC cgroup it's hard to trigger the reclaimer to reclaim all child pages of an SECS page. And it'd also require a machine configured with large RAM relative to EPC so no OOM killer triggered before this happens. Thanks Haitao