Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2527257rdf; Mon, 6 Nov 2023 18:10:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IHMe8/M0rSsW54yN5DSqCDHRnSuTQ1k1G2u6GBjgt29em+yXRxDYqlktNsq5TdQB9F5YEgh X-Received: by 2002:a17:902:f547:b0:1c9:ddd8:9950 with SMTP id h7-20020a170902f54700b001c9ddd89950mr1687734plf.21.1699323002902; Mon, 06 Nov 2023 18:10:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699323002; cv=none; d=google.com; s=arc-20160816; b=KshbaF+No9MDu2CBwpZNACWSACuWulQ7qW1IFDwrvx7pJJwbRKA/j0JoMz3R6lC8nC 7NabAwzpbBG87ey1GsJbgLp73yq8TcqTeV7j+PehZ4VbEz3fz9xUI9+vTuQKFVhkZ9Mg 74oZR7Cbs+UHB7qO7Af380WROeYrPiG2jHpo6QJEHISk/1hFoXOaKNebCQHfTIqakwph r2H7WwxSb9j5zz63ltbQoji0Tj9nbkEzj1GemAfayDpqIB6Ar92W97Kr383qZg6VORut c+iRYcz4bAU8nmPA6NfzqLGbO0S/EybKQ5sMO9p7VbznVbcudrvCUDfqlUYUoe9ow9KI Gzhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:message-id:organization :from:content-transfer-encoding:mime-version:date:references:subject :cc:to:dkim-signature; bh=EipJ60rmOyjbBi4HMaUyAw42E2DFpeuvbRUSbd44zmo=; fh=RZ9d3SGDwORBxMlKl9Zivu4AQRZYUf+q95m79VW4E6s=; b=e6YpBtmLI39vvTpVWetXFjwG67ivqkI2P2WlKlYqm6vmAnUZOzF2Fj+lSCrzeh05oQ Ldtw2QBrrUYDY+0dvKuMenJMOmA/JvdI9dxiJrk8+OWX/vJEIcTUEhba8Eba6PVxrSpC vQQRev7iAcITVIMlekFvi9ZsWOmHr3c5YxpZwdLQgXe23TeJN15A0ukaTN6BZ/2kLL6A Aisdq5EA2UZQF8EtmqH3LKXvduxKFZZjJI8ijYFgD6eY/Yrsa073Zu0WBP8TsKt0xyZV MdrwhSeyCqn7gsLkNTTtraWNMHwb6Zi218sR6d3viqfC1Jpke65xkyMigPe/HzEqn+An s0xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q7Zux7xE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id s7-20020a170902ea0700b001cc3473657bsi10213123plg.215.2023.11.06.18.10.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 18:10:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q7Zux7xE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B70C5810438B; Mon, 6 Nov 2023 18:10:01 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232539AbjKGCJ7 (ORCPT + 99 others); Mon, 6 Nov 2023 21:09:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231975AbjKGCJ6 (ORCPT ); Mon, 6 Nov 2023 21:09:58 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9877010A; Mon, 6 Nov 2023 18:09:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699322991; x=1730858991; h=to:cc:subject:references:date:mime-version: content-transfer-encoding:from:message-id:in-reply-to; bh=ZHhrLsAbY+01aibkrj3PgqbaJGRt7ZXSmtweYn7WIYc=; b=Q7Zux7xEXzW/9zVxdKq/1PfgbTLRqa+utCLA87Yv5qrWCRgHHEoATaLU 5czn346oHl2vcKDJGQaZvNp1tjTBcAR0oMDsk5qUeYVid/zk4g4TzgBK4 7iO4YJj6aimdVjyXhHr5hdKvZzJG7I6eJd1aEZRlEqwuwG/xpmevs8gY+ eEUPZv4k1JRdJ36jXO1p+O/YOs099xW3s2AhpPE5BLWcLRjbm507FyNWA Qz3N5L6bm194Ksef+gaSuHqeDaqukkGac+npVa/nDGFKSzLdJirjq60ij GMbXc1yFPxihmUKAdCgkExPP961ch3aEpajoMyHsdXjPQuuJ0hFebjE1A Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10886"; a="475652929" X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="475652929" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2023 18:08:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10886"; a="766148743" X-IronPort-AV: E=Sophos;i="6.03,282,1694761200"; d="scan'208";a="766148743" Received: from hhuan26-mobl.amr.corp.intel.com ([10.93.50.175]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 06 Nov 2023 18:08:44 -0800 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: "mingo@redhat.com" , "jarkko@kernel.org" , "x86@kernel.org" , "dave.hansen@linux.intel.com" , "cgroups@vger.kernel.org" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "linux-sgx@vger.kernel.org" , "mkoutny@suse.com" , "tglx@linutronix.de" , "tj@kernel.org" , "Mehta, Sohil" , "bp@alien8.de" , "Huang, Kai" , "Haitao Huang" Cc: "mikko.ylinen@linux.intel.com" , "Christopherson,, Sean" , "Zhang, Bo" , "kristen@linux.intel.com" , "anakrish@microsoft.com" , "sean.j.christopherson@intel.com" , "Li, Zhiquan1" , "yangjie@microsoft.com" Subject: Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup functionality References: <20231030182013.40086-1-haitao.huang@linux.intel.com> <20231030182013.40086-5-haitao.huang@linux.intel.com> <34a337b96a5a917612c4ec4eff2b5a378c21879b.camel@intel.com> Date: Mon, 06 Nov 2023 20:08:43 -0600 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Haitao Huang" Organization: Intel Message-ID: In-Reply-To: User-Agent: Opera Mail/1.0 (Win32) X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 06 Nov 2023 18:10:01 -0800 (PST) On Mon, 06 Nov 2023 19:16:30 -0600, Haitao Huang wrote: > On Mon, 06 Nov 2023 16:18:30 -0600, Huang, Kai > wrote: > >>> >>> > > +/** >>> > > + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a >>> single >>> > > EPC page >>> > > + * >>> > > + * Returns EPC cgroup or NULL on success, -errno on failure. >>> > > + */ >>> > > +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) >>> > > +{ >>> > > + struct sgx_epc_cgroup *epc_cg; >>> > > + int ret; >>> > > + >>> > > + if (sgx_epc_cgroup_disabled()) >>> > > + return NULL; >>> > > + >>> > > + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); >>> > > + ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, >>> PAGE_SIZE); >>> > > + >>> > > + if (!ret) { >>> > > + /* No epc_cg returned, release ref from get_current_misc_cg() */ >>> > > + put_misc_cg(epc_cg->cg); >>> > > + return ERR_PTR(-ENOMEM); >>> > >>> > misc_cg_try_charge() returns 0 when successfully charged, no? >>> >>> Right. I really made some mess in rebasing :-( >>> >>> > >>> > > + } >>> > > + >>> > > + /* Ref released in sgx_epc_cgroup_uncharge() */ >>> > > + return epc_cg; >>> > > +} >>> > >>> > IMHO the above _try_charge() returning a pointer of EPC cgroup is a >>> > little bit >>> > odd, because it doesn't match the existing misc_cg_try_charge() which >>> > returns >>> > whether the charge is successful or not. sev_misc_cg_try_charge() >>> > matches >>> > misc_cg_try_charge() too. >>> > >>> > I think it's better to split "getting EPC cgroup" part out as a >>> separate >>> > helper, >>> > and make this _try_charge() match existing pattern: >>> > >>> > struct sgx_epc_cgroup *sgx_get_current_epc_cg(void) >>> > { >>> > if (sgx_epc_cgroup_disabled()) >>> > return NULL; >>> > >>> > return sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); >>> > } >>> > >>> > int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) >>> > { >>> > if (!epc_cg) >>> > return -EINVAL; >>> > >>> > return misc_cg_try_charge(epc_cg->cg); >>> > } >>> > >>> > Having sgx_get_current_epc_cg() also makes the caller easier to read, >>> > because we >>> > can immediately know we are going to charge the *current* EPC cgroup, >>> > but not >>> > some cgroup hidden within sgx_epc_cgroup_try_charge(). >>> > >>> >>> Actually, unlike other misc controllers, we need charge and get the >>> epc_cg >>> reference at the same time. >> >> Can you elaborate? >> >> And in practice you always call sgx_epc_cgroup_try_charge() right after >> sgx_get_current_epc_cg() anyway. The only difference is the whole >> thing is done >> in one function or in separate functions. >> >> [...] >> > > That's true. I was thinking no need to have them done in separate calls. > The caller has to check the return value for epc_cg instance first, then > check result of try_charge. But there is really only one caller, > sgx_alloc_epc_page() below, so I don't have strong opinions now. > > With them separate, the checks will look like this: > if (epc_cg = sgx_get_current_epc_cg()) // NULL means cgroup disabled, > should continue for allocation > { > if (ret = sgx_epc_cgroup_try_charge()) > return ret > } > // continue... > > I can go either way. > >> >>> > > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) >>> > > { >>> > > struct sgx_epc_page *page; >>> > > + struct sgx_epc_cgroup *epc_cg; >>> > > + >>> > > + epc_cg = sgx_epc_cgroup_try_charge(); >>> > > + if (IS_ERR(epc_cg)) >>> > > + return ERR_CAST(epc_cg); >>> > > >>> > > for ( ; ; ) { >>> > > page = __sgx_alloc_epc_page(); >>> > > @@ -580,10 +587,21 @@ struct sgx_epc_page *sgx_alloc_epc_page(void >>> > > *owner, bool reclaim) >>> > > break; >>> > > } >>> > > >>> > > + /* >>> > > + * Need to do a global reclamation if cgroup was not full but >>> free >>> > > + * physical pages run out, causing __sgx_alloc_epc_page() to >>> fail. >>> > > + */ >>> > > sgx_reclaim_pages(); >>> > >>> > What's the final behaviour? IIUC it should be reclaiming from the >>> > *current* EPC >>> > cgroup? If so shouldn't we just pass the @epc_cg to it here? >>> > >>> > I think we can make this patch as "structure" patch w/o actually >>> having >>> > EPC >>> > cgroup enabled, i.e., sgx_get_current_epc_cg() always return NULL. >>> > >>> > And we can have one patch to change sgx_reclaim_pages() to take the >>> > 'struct >>> > sgx_epc_lru_list *' as argument: >>> > >>> > void sgx_reclaim_pages_lru(struct sgx_epc_lru_list * lru) >>> > { >>> > ... >>> > } >>> > >>> > Then here we can have something like: >>> > >>> > void sgx_reclaim_pages(struct sgx_epc_cg *epc_cg) >>> > { >>> > struct sgx_epc_lru_list *lru = epc_cg ? &epc_cg->lru : >>> > &sgx_global_lru; >>> > >>> > sgx_reclaim_pages_lru(lru); >>> > } >>> > >>> > Makes sense? >>> > >>> >>> This is purely global reclamation. No cgroup involved. >> >> Again why? Here you are allocating one EPC page for enclave in a >> particular EPC >> cgroup. When that fails, shouldn't you try only to reclaim from the >> *current* >> EPC cgroup? Or at least you should try to reclaim from the *current* >> EPC cgroup >> first? >> > > Later sgx_epc_cg_try_charge will take a 'reclaim' flag, if true, cgroup > reclaims synchronously, otherwise in background and returns -EBUSY in > that case. This function also returns if no valid epc_cg pointer > returned. > > All reclamation for *current* cgroup is done in sgx_epc_cg_try_charge(). > > So, by reaching to this point, a valid epc_cg pointer was returned, > that means allocation is allowed for the cgroup (it has reclaimed if > necessary, and its usage is not above limit after charging). > > But the system level free count may be low (e.g., limits of all cgroups > may add up to be more than capacity). so we need to do a global > reclamation here, which may involve reclaiming a few pages (from current > or other groups) so the system can be at a performant state with minimal > free count. (current behavior of ksgxd). > I should have sticked to the orignial comment added in code. Actually __sgx_alloc_epc_page() can fail if system runs out of EPC. That's the really reason for global reclaim. The free count enforcement is near the end of this method after should_reclaim() check. Haitao