Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7;
Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes
To:     "hpa@zytor.com" <hpa@zytor.com>,
        "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
        "x86@kernel.org" <x86@kernel.org>,
        "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
        "cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
        "bp@alien8.de" <bp@alien8.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "jarkko@kernel.org" <jarkko@kernel.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "Mehta, Sohil" <sohil.mehta@intel.com>,
        "tj@kernel.org" <tj@kernel.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "Huang, Kai" <kai.huang@intel.com>
Cc:     "kristen@linux.intel.com" <kristen@linux.intel.com>,
        "yangjie@microsoft.com" <yangjie@microsoft.com>,
        "Li, Zhiquan1" <zhiquan1.li@intel.com>,
        "Christopherson,, Sean" <seanjc@google.com>,
        "mikko.ylinen@linux.intel.com" <mikko.ylinen@linux.intel.com>,
        "Zhang, Bo" <zhanb@microsoft.com>,
        "anakrish@microsoft.com" <anakrish@microsoft.com>
Subject: Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc
 cgroup controller
References: <20230923030657.16148-1-haitao.huang@linux.intel.com>
 <20230923030657.16148-17-haitao.huang@linux.intel.com>
 <0005a998dab64c182c22abc436cbcd36de4240a1.camel@intel.com>
Date:   Sun, 22 Oct 2023 13:26:19 -0500
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From:   "Haitao Huang" <haitao.huang@linux.intel.com>
Organization: Intel
Message-ID: <op.2c8at5ggwjvjmi@hhuan26-mobl.amr.corp.intel.com>
In-Reply-To: <0005a998dab64c182c22abc436cbcd36de4240a1.camel@intel.com>
User-Agent: Opera Mail/1.0 (Win32)
Precedence: bulk

On Mon, 09 Oct 2023 19:26:01 -0500, Huang, Kai <kai.huang@intel.com> wrote:

>
>> @@ -332,6 +336,7 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists  
>> *lru, size_t nr_to_scan,
>>   * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
>>   * @nr_to_scan:		 Number of EPC pages to scan for reclaim
>>   * @ignore_age:		 Reclaim a page even if it is young
>> + * @epc_cg:		 EPC cgroup from which to reclaim
>>   *
>>   * Take a fixed number of pages from the head of the active page pool  
>> and
>>   * reclaim them to the enclave's private shmem files. Skip the pages,  
>> which have
>> @@ -345,7 +350,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists  
>> *lru, size_t nr_to_scan,
>>   * problematic as it would increase the lock contention too much,  
>> which would
>>   * halt forward progress.
>>   */
>> -size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age)
>> +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age,
>> +			     struct sgx_epc_cgroup *epc_cg)
>>  {
>>  	struct sgx_backing backing[SGX_NR_TO_SCAN_MAX];
>>  	struct sgx_epc_page *epc_page, *tmp;
>> @@ -355,7 +361,15 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan,  
>> bool ignore_age)
>>  	LIST_HEAD(iso);
>>  	size_t ret, i;
>>
>> -	sgx_isolate_epc_pages(&sgx_global_lru, nr_to_scan, &iso);
>> +	/*
>> +	 * If a specific cgroup is not being targeted, take from the global
>> +	 * list first, even when cgroups are enabled.  If there are
>> +	 * pages on the global LRU then they should get reclaimed asap.
>> +	 */

This is probably some obsolete comments I should have removed. When cgroup  
is enabled, reclaimables will be always in a cgroup, the root by default.  
(!epc_cg) condition is harmless but not needed because the global list  
will be empty if cgroup is enabled.

>> +	if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg)
>> +		sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso);
>> +
>> +	sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso);
>

So it should have been:

+	if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC))
+		sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso);
+	else
+		sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso);

Or just encapsulate the difference in  sgx_epc_cgroup_isolate_pages

> (I wish such code can be somehow moved to the earlier patches, so that  
> we can
> get early idea that how sgx_reclaim_epc_pages() is supposed to be used.)
>

I'll will try to restructure and split this patch. Now that we are not  
going to deal with unreclaimable, it'd be simpler and also easier to  
restructure.

> So here when we are not targeting a specific EPC cgroup, we always  
> reclaim from
> the global list first, ...
>
> [...]
>
>>
>>  	if (list_empty(&iso))
>>  		return 0;
>> @@ -423,7 +437,7 @@ static bool sgx_should_reclaim(unsigned long  
>> watermark)
>>  void sgx_reclaim_direct(void)
>>  {
>>  	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
>> -		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
>> +		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
>
> ... and we always try to reclaim the global list first when directly  
> reclaim is
> desired, even the enclave is within some EPC cgroup.  ...
>
>>  }
>>
>>  static int ksgxd(void *p)
>> @@ -446,7 +460,7 @@ static int ksgxd(void *p)
>>  				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
>>
>>  		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
>> -			sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
>> +			sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
>
> ... and in ksgxd() as well, which I guess is somehow acceptable.  ...
>
>>
>>  		cond_resched();
>>  	}
>> @@ -600,6 +614,11 @@ int sgx_drop_epc_page(struct sgx_epc_page *page)
>>  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>>  {
>>  	struct sgx_epc_page *page;
>> +	struct sgx_epc_cgroup *epc_cg;
>> +
>> +	epc_cg = sgx_epc_cgroup_try_charge(reclaim);
>> +	if (IS_ERR(epc_cg))
>> +		return ERR_CAST(epc_cg);

I think I need add comments to clarify after this point is the global  
reclaimer only to keep the global free page water mark satisfied. So all  
reclaiming is from the root if cgroup is enabled, otherwise from the  
global LRU (no change from current implementation).

>>
>>  	for ( ; ; ) {
>>  		page = __sgx_alloc_epc_page();
>> @@ -608,8 +627,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void  
>> *owner, bool reclaim)
>>  			break;
>>  		}
>>
>> -		if (!sgx_can_reclaim())
>> -			return ERR_PTR(-ENOMEM);
>> +		if (!sgx_can_reclaim()) {
>> +			page = ERR_PTR(-ENOMEM);
>> +			break;
>> +		}
>>
>>  		if (!reclaim) {
>>  			page = ERR_PTR(-EBUSY);
>> @@ -621,10 +642,17 @@ struct sgx_epc_page *sgx_alloc_epc_page(void  
>> *owner, bool reclaim)
>>  			break;
>>  		}
>>
>> -		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
>> +		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
>
> ... and when an EPC page is allocated, no matter whether the EPC page  
> belongs to
> any cgroup or not.
>
> When we are allocating EPC page for one enclave, if that enclave belongs  
> to some
> cgroup, is it more reasonable to reclaim EPC pages from it's own group  
> (and the
> children under it)?
>
> You already got the current EPC cgroup at the beginning of  
> sgx_alloc_epc_page()
> when you want to charge the EPC allocation.
>
>>  		cond_resched();
>>  	}
>>

I hope the above comments make it clear that all these calls on  
sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL) are to reclaim from the  
global list if cgroup is not enabled, or from the root if cgroup is  
enabled.

Thanks
Haitao