Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp839232lqt; Fri, 19 Apr 2024 11:55:47 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUEzOrxQrN6eZHbjMYyXxIvKHxaS5gyzjt3q50qxhHAgPEK/aGSoKnbSHUtJTrNlm8fiwWTw5R8QYpShn2mLNiATnKni59YGuq8+zZdrA== X-Google-Smtp-Source: AGHT+IGnZuAQwA7swyrOo+4vmR7K+O3w+LAQRrc0wKNySJKVZMWPqdRrgdXf5nQ8dXuhG5PovHH9 X-Received: by 2002:a17:906:ca4b:b0:a52:1086:5c7e with SMTP id jx11-20020a170906ca4b00b00a5210865c7emr3133062ejb.24.1713552947407; Fri, 19 Apr 2024 11:55:47 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713552947; cv=pass; d=google.com; s=arc-20160816; b=ryjR/x9KLRakS7+FWl5/4ETR3d498K42QnfXwC1F9rDOnvvRGkHWAnem5JohnStmj5 BqD4oXYsPRLExeTklW+fGBdm5rzF/2n6hgLKnsbMPwjDiF8a4jIwSRUzyS7bCfOLNO8W nqc8lxWsuThmLS3sFhCBm++8Uw2RCVbKLkjMdO/3CGYSObwaYuRigT3jsZ6phSRJBPFj 84ZxHdJnwmviU/2y7eReXjglOEoy6E3HKyDr6MpMOrR0/GlfJFCW0WNz6xu0ygt1N/VD LSGo15Wq4GjD2jSI6YGxHMr1spRIB+vxKDN5L/h5cFpfIg4zFNeMH3QzRnfaoyY8WeFO z/hg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:message-id:organization:from :content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:references:subject:cc:to :dkim-signature; bh=U/xP4SjoJvwP/zsavIVxHwWt/qnTlLsA3BN6fCjYYcQ=; fh=ZbN5g22Rz3Glz6OBxASnAsGoXmFQlpn/kT7CqJkS/gk=; b=w2qNGT4KkLR4TGW2AFXMn7DckVKfvXZV4ICMN+OjkDqwycAsk5H76FOPBDjhCwyYc1 OLeGLkuHCaCUa4xnD+EtGibUCWJykDE53MkMA6/HRtA5/b+1064uHClQLai2CicAu6FZ lqx1hK2PbQPh1/HuQgqDAHJRQ4Jy6HqHHdS48b/Y40JaTj8LpTnCDeOEjmiMvaxF/0bN opYtyMF+Fuqf9kT0I51d94m+YlB2IhAScKZ7Gf4aHo0r/SuKdzdvAsCtiRk7jCexBx+8 DGmlvbMVNtxK+CEhXs4IRTS09xkV8hP5+wQD++AiToKKSoswpRCCdxT6+B3Lb7mAnwzF Vf6A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fUrk96bD; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-151888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151888-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id s11-20020a170906bc4b00b00a55577bf9e9si2355928ejv.283.2024.04.19.11.55.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 11:55:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-151888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fUrk96bD; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-151888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151888-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id D0A301F217F3 for ; Fri, 19 Apr 2024 18:55:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9A34313AD08; Fri, 19 Apr 2024 18:55:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fUrk96bD" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CA921DA26; Fri, 19 Apr 2024 18:55:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713552938; cv=none; b=mY5X+bi7OA8sQmHdrriWTWhcgPISdQpGzJumYAZXU+R80Ga09dM2wYAUWXufyaZp4/J2vqJCF3WvZ4+rFtI9fmjX7cKy9TadO+LOf+77DIwTMaflM4sHgZ7R5Qs97YtHhKt5GVDOVBqW8ZvLp8C4Uqfe4rDaIZERyCrZeXnnzoc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713552938; c=relaxed/simple; bh=UsQKoIdsg1qzhVZohacdD4T7uZAx/ak/IJJH/5oMPrM=; h=Content-Type:To:Cc:Subject:References:Date:MIME-Version:From: Message-ID:In-Reply-To; b=lsIvRfMZ0jPAkAnGP/BF+7Dnh4uawqAn7XR4gY51PsZFrS7sRCHjQgrS47pV4MNpYeN+h+/q4Q/+3M/wlWyCE9QCqDsH1AtXJ0dd9M8i6HHQGSSxZkCZVTMqqoelykL40L2k+HGZ8s+ELYrTMqbt4BKTRxLEAoCYiQR2zhcuaJ8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fUrk96bD; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713552937; x=1745088937; h=to:cc:subject:references:date:mime-version: content-transfer-encoding:from:message-id:in-reply-to; bh=UsQKoIdsg1qzhVZohacdD4T7uZAx/ak/IJJH/5oMPrM=; b=fUrk96bD7672VyhyfoHm8O/SSk0JIozx/A9xEyH/aBPodokyMIbaskE1 ufxGgnr02bi/c2mdOszPs3QXmtdAhtWQcSSh/1PLrpZw3wcHz+QgyghFb akyF1aSKpZ5lpR+Xr5sWj0vx+KLQ6cYg/k0ksosEyumA2XnjOmmvjTSCj nrD3hSav6urZJbcGh92vN9unkA67K/G1N4fXfwKCaQeOCKyrMK2iwy7f9 RI0y4YdeNVqDKkqFnp7fEgxjtslahRvJSOYBqFMVQtEaLvD6RLEk9obMJ IVcRF7qPZ7T4+KhC1eBHz6C7KkNhq2/uchBmJNLw/8XVb9wNF2tshr+v8 A==; X-CSE-ConnectionGUID: PaEPX/0tSPWXK10+Dw+agQ== X-CSE-MsgGUID: bWLb+TUzTjCSRew8m3xoug== X-IronPort-AV: E=McAfee;i="6600,9927,11049"; a="20309639" X-IronPort-AV: E=Sophos;i="6.07,214,1708416000"; d="scan'208";a="20309639" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2024 11:55:36 -0700 X-CSE-ConnectionGUID: E5Tbv0JuSraAgqoY7losDg== X-CSE-MsgGUID: DqVHhWN7QCGEpgauAFickQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,214,1708416000"; d="scan'208";a="28226862" Received: from hhuan26-mobl.amr.corp.intel.com ([10.92.17.168]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/AES256-SHA; 19 Apr 2024 11:55:33 -0700 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com, tim.c.chen@linux.intel.com, "Huang, Kai" Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: Re: [PATCH v12 09/14] x86/sgx: Implement async reclamation for cgroup References: <20240416032011.58578-1-haitao.huang@linux.intel.com> <20240416032011.58578-10-haitao.huang@linux.intel.com> <640866c5-9fe0-4f7b-a459-7a685dbe4092@intel.com> Date: Fri, 19 Apr 2024 13:55:31 -0500 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Haitao Huang" Organization: Intel Message-ID: In-Reply-To: <640866c5-9fe0-4f7b-a459-7a685dbe4092@intel.com> User-Agent: Opera Mail/1.0 (Win32) On Thu, 18 Apr 2024 20:32:14 -0500, Huang, Kai wrote: > > > On 16/04/2024 3:20 pm, Haitao Huang wrote: >> From: Kristen Carlson Accardi >> In cases EPC pages need be allocated during a page fault and the cgroup >> usage is near its limit, an asynchronous reclamation needs be triggered >> to avoid blocking the page fault handling. >> Create a workqueue, corresponding work item and function definitions >> for EPC cgroup to support the asynchronous reclamation. >> In case the workqueue allocation is failed during init, disable cgroup. > > It's fine and reasonable to disable (SGX EPC) cgroup. The problem is > "exactly what does this mean" isn't quite clear. > First, this is really some corner case most people don't care: during init, kernel can't even allocate a workqueue object. So I don't think we should write extra code to implement some sophisticated solution. Any solution we come up with may just not work as the way user want or solve the real issue due to the fact such allocation failure even happens at init time. So IMHO the current solution should be fine and I'll answer some of your detailed questions below. > Given SGX EPC is just one type of MISC cgroup resources, we cannot just > disable MISC cgroup as a whole. > > So, the first interpretation is we treat the entire MISC_CG_RES_SGX > resource type doesn't exist, that is, we just don't show control files > in the file system, and all EPC pages are tracked in the global list. > > But it might be not straightforward to implement in the SGX driver, > i.e., we might need to do more MISC cgroup core code change to make it > being able to support disable particular resource at runtime -- I need > to double check. > > So if that is not something worth to do, we will still need to live with > the fact that, the user is still able to create SGX cgroup in the > hierarchy and see those control files, and being able to read/write them. > Can not reliably predict what will happen. Most likely the ENOMEM will be returned by sgx_cgroup_alloc() if reached or other error in the stack if not reached to sgx_cgroup_alloc() and user fails on creating anything. But if they do end up creating some cgroups (sgx_cgroup_alloc() and everything else on the call stack passed without failure), everything still kind of works for the reason answered below. > The second interpretation I suppose is, although the SGX cgroup is still > seen as supported in userspace, in kernel we just treat it doesn't exist. > > Specifically, that means: 1) we always return the root SGX cgroup for > any EPC page when allocating a new one; 2) as a result, we still track > all EPC pages in a single global list. > Current code has similar behavior without extra code. > But from the code below ... > > >> static int __sgx_cgroup_try_charge(struct sgx_cgroup *epc_cg) >> { >> if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE)) >> @@ -117,19 +226,28 @@ int sgx_cgroup_try_charge(struct sgx_cgroup >> *sgx_cg, enum sgx_reclaim reclaim) >> { >> int ret; >> + /* cgroup disabled due to wq allocation failure during >> sgx_cgroup_init(). */ >> + if (!sgx_cg_wq) >> + return 0; >> + > > ..., IIUC you choose a (third) solution that is even one more step back: > > It just makes try_charge() always succeed, but EPC pages are still > managed in the "per-cgroup" list. > > But this solution, AFAICT, doesn't work. The reason is when you fail to > allocate EPC page you will do the global reclaim, but now the global > list is empty. > > Am I missing anything? But when cgroups enabled in config, global reclamation starts from root and reclaim from the whole hierarchy if user may still be able to create. Just that we don't have async/sync per-cgroup reclaim triggered. > > So my thinking is, we have two options: > > 1) Modify the MISC cgroup core code to allow the kernel to disable one > particular resource. It shouldn't be hard, e.g., we can add a > 'disabled' flag to the 'struct misc_res'. > > Hmm.. wait, after checking, the MISC cgroup won't show any control files > if the "capacity" of the resource is 0: > > " > * Miscellaneous resources capacity for the entire machine. 0 capacity > * means resource is not initialized or not present in the host. > " > > So I really suppose we should go with this route, i.e., by just setting > the EPC capacity to 0? > > Note misc_cg_try_charge() will fail if capacity is 0, but we can make it > return success by explicitly check whether SGX cgroup is disabled by > using a helper, e.g., sgx_cgroup_disabled(). > > And you always return the root SGX cgroup in sgx_get_current_cg() when > sgx_cgroup_disabled() is true. > > And in sgx_reclaim_pages_global(), you do something like: > > static void sgx_reclaim_pages_global(..) > { > #ifdef CONFIG_CGROUP_MISC > if (sgx_cgroup_disabled()) > sgx_reclaim_pages(&sgx_root_cg.lru); > else > sgx_cgroup_reclaim_pages(misc_cg_root()); > #else > sgx_reclaim_pages(&sgx_global_list); > #endif > } > > I am perhaps missing some other spots too but you got the idea. > > At last, after typing those, I believe we should have a separate patch > to handle disable SGX cgroup at initialization time. And you can even > put this patch _somewhere_ after the patch > > "x86/sgx: Implement basic EPC misc cgroup functionality" > > and before this patch. > > It makes sense to have such patch anyway, because with it we can easily > to add a kernel command line 'sgx_cgroup=disabled" if the user wants it > disabled (when someone has such requirement in the future). > I think we can add support for "sgx_cgroup=disabled" in future if indeed needed. But just for init failure, no? Thanks Haitao