Received-SPF: pass (google.com: domain of linux-kernel+bounces-59906-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1;
Message-ID: <17b1747a-8487-44d2-b79c-0da03b09c990@amd.com>
Date: Fri, 9 Feb 2024 13:52:21 -0600
User-Agent: Mozilla Thunderbird
Subject: [PATCH 1/2] x86/MCE: Extend size of the MCE Records pool
To: Sohil Mehta <sohil.mehta@intel.com>, x86@kernel.org,
 linux-edac@vger.kernel.org
Cc: bp@alien8.de, tony.luck@intel.com, linux-kernel@vger.kernel.org,
 yazen.ghannam@amd.com, Avadhut Naik <avadhut.naik@amd.com>
References: <20240207225632.159276-1-avadhut.naik@amd.com>
 <20240207225632.159276-2-avadhut.naik@amd.com>
 <75f48901-fbfa-4ef4-99b9-312800d20896@intel.com>
Content-Language: en-US
From: "Naik, Avadhut" <avadnaik@amd.com>
In-Reply-To: <75f48901-fbfa-4ef4-99b9-312800d20896@intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0:
	=?utf-8?B?OVlreTJhbmNSV3Evb0tEVzdKWVlmM3dWaWtTYUFuQjZLakRjZjRUaFUyTFVs?=
 =?utf-8?B?MGFPK3VKZUtpSjlwOG5wNmJYcmU5KzRlNEtvWDFya0ZkWlpNTnVLa2RUcHdm?=
 =?utf-8?B?MmtmQW9oWWwrMmFlb3d4MndpNTJKZnBlRjlrM3lweEUwWFo2dkpoWi9SR0h1?=
 =?utf-8?B?Q2VNUi96QklEWTI0a0s2MWhHN3Y2OEh4VjZXZVh1MTl1YzJ2TFNLK0lDWmxS?=
 =?utf-8?B?MDdwdHNxaUpmZG02azB6VjZVUHlrRUFmdm9xUnp4ektSdDR4RWtoWjVrendP?=
 =?utf-8?B?MTgvZlF4ZUVUck10WFRxdy9KZmNrak9TY1IySDZvbHBFazRHdmN4RFN5UGlq?=
 =?utf-8?B?QmN2VDcwTFVFdUVKeTh0TGtPOGM4LzUxL0loeVFuWEE2ek1QajVYN3czNW9z?=
 =?utf-8?B?Qjk1UDZsSXk4VWYrYmY4ajRWZXl6K1BHNWtSRU1uM2JQeWlaY1FCWTd3SmlJ?=
 =?utf-8?B?aUs4dnZKamdmTXY5SWhLU3RXT0ppY28rcy9KSzY5Unk4bTZzSGtmWXc3Uy9y?=
 =?utf-8?B?aUFYWGxGS2R1bmJTWlhaelArem9ZK2thQ1dTSTgzY21FWC83cTFCNnR1QXlV?=
 =?utf-8?B?bm5HUnFHT2tHK0dJelhPTDRuSzI1cEZ4YmhuTlI1aFVwTTFsbkNLbDdZWGJi?=
 =?utf-8?B?WWYrd09PajN1SWpjWjhKSTlGSjViSWJNTmsxNGczSGl6bW0yR0VSdVZnSFR2?=
 =?utf-8?B?SEZoQnRXbWhWUzhnVFlFc0NsS2dFd2ZNaVBlZkxCUHdDYjlUMG1TQ012TUp4?=
 =?utf-8?B?KzhzdFlnNW9KajIvODJBYVZocVhGWG5CYU1iNlUyNmR4VnIxT0pUd2dveGh1?=
 =?utf-8?B?TGtvTE5oTkRkUGt3SEYvV1hjNDlCNHo4ZEZyWVlWVkF6d2R1TTRqNTNpNlB4?=
 =?utf-8?B?ZFgvYlpnU2c4Y20wS3RFU1hzRnkxc05hQVRTcmRhU1Zpcll3Z0tuQWd6cDZa?=
 =?utf-8?B?amdVSzdseXlTQzhVb2p6OUdSVURmOXNsdndqRmYrL1hFWllEWlJpZmFmOUVO?=
 =?utf-8?B?WGxYcHRpZ3NXbEM1WDBkUXhMamVFWjY2bm01YVV3aWRpQVRnQ0ZkK3lYbDVv?=
 =?utf-8?B?bmhlM3JSNlQ1a3owRWJWejYwWFN4YTczbTVieEV2R2RYTHFMQkJqMjRhRXA5?=
 =?utf-8?B?WnFSR2h3UXQ0eGRUcGs0QnVpdzZDbmdpNGpUbXRJYkVwNEh0RkU2Q05GZUUr?=
 =?utf-8?B?VDQzakU1dHUyb0h3YTBjQnMreU8yakJ0Y1pHUEFLU1NmbWNOdVlIc1phSjZK?=
 =?utf-8?B?Z0lVUmRLbFV5Ky9yQllGZHpYMGxLelNhcWg1TkxFRzZzekxuZkdMUUlUREFC?=
 =?utf-8?B?M3k2Qlo3NjdybkVNd0tCWitiK0lNL1djdE5ZMnFzMHBQSm10TExoRzQzUkRI?=
 =?utf-8?B?UjIzeWdPOXdydVFPOE5vd3E0VE12RC92ZGZ1dDhPZGQ2ZXp6eTZuRWNMZHdz?=
 =?utf-8?B?cDRoT2VBaGpraVJoaGNFaC9yaVJ5SjZrcm9wRFNNQzNQQTZ0NWk0T2pZNkRB?=
 =?utf-8?B?bUNWZ0hPdS9zalBMT2VQcVg4ZElkektoL0VyY2FVZ3FNOVJMdGZKYlJPODNF?=
 =?utf-8?B?VHJiSnZHUXdJeUEyT0l4WFZXWXprUmV6UHpLWllHRG44bW05dXRZdEhUdDVv?=
 =?utf-8?B?aHJSSFV2WjR6cXBxUTlRc01TY3o3eDRwSlZxSFYvZ3FzVjBuVjZkMkhKelpr?=
 =?utf-8?B?SzJTa0VTUWdueFkzd2twVHlEbkZDc3o4WVVoeUszWFlxd1k2bDloNm9oalJi?=
 =?utf-8?B?UXdFdkM2c1p6czkrRDhsQjNXdzMwa0x4Nkx1WFBUSm94VFhDcVdVTUx0WmVL?=
 =?utf-8?B?QTYwczlWRjJ6akFkK2RtMFpEbCtQRjJZZ293eHNORkhuVkJ4dTJjc3l2M0JZ?=
 =?utf-8?B?YSt2Vm9qSFFTbGVvQkhhd2llQ1E5eDVodHpQVWJ5QXpiMmYzcVNRNVFGd0Iw?=
 =?utf-8?B?Y1RpMHFlMWVoZ1BPZHlzOTA3b21NcFRKbjZmR25uaFpCeG9PcXphSnYxYkJm?=
 =?utf-8?B?dTU2VDltcWZoZExvUnVDKzRkVzZJQUJ5aCtCSENteXZZdURuNEJyK2gzS0E4?=
 =?utf-8?B?YWV2N2FldUU4aGpMQUpJdzRNbWtqRHNiRDc4cm5CcmVObUxEdVNNM09HQTUz?=
 =?utf-8?Q?MLsUw47Bt0J/oFIsDBJt6os+B?=
X-OriginatorOrg: amd.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 549ac24a-e4d0-4a4d-719f-08dc29a8a267
X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB8403.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Feb 2024 19:52:23.7204
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: XzyannCTFHOvXZH8+rGT0YKrdChMCgCFFTh4+MqWaZ536Y2N0CED4fOpWPmG6lqmqC+NWeL0hQa/AoxWG2zz3Q==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9449

Hi,

On 2/8/2024 15:09, Sohil Mehta wrote:
> On 2/7/2024 2:56 PM, Avadhut Naik wrote:
> 
>> Extend the size of MCE Records pool to better serve modern systems. The
>> increase in size depends on the CPU count of the system. Currently, since
>> size of struct mce is 124 bytes, each logical CPU of the system will have
>> space for at least 2 MCE records available in the pool. To get around the
>> allocation woes during early boot time, the same is undertaken using
>> late_initcall().
>>
> 
> I guess making this proportional to the number of CPUs is probably fine
> assuming CPUs and memory capacity *would* generally increase in sync.
> 
> But, is there some logic to having 2 MCE records per logical cpu or it
> is just a heuristic approach? In practice, the pool is shared amongst
> all MCE sources and can be filled by anyone, right?
> 
Yes, the pool is shared among all MCE sources but the logic for 256 is
that the genpool was set to 2 pages i.e. 8192 bytes in 2015.
Around that time, AFAIK, the max number of logical CPUs on a system was
32.
So, in the maximum case, each CPU will have around 256 bytes (8192/32) in
the pool. It equates to approximately 2 MCE records since sizeof(struct mce)
back then was 88 bytes.
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>>  arch/x86/kernel/cpu/mce/core.c     |  3 +++
>>  arch/x86/kernel/cpu/mce/genpool.c  | 22 ++++++++++++++++++++++
>>  arch/x86/kernel/cpu/mce/internal.h |  1 +
>>  3 files changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index b5cc557cfc37..5d6d7994d549 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -2901,6 +2901,9 @@ static int __init mcheck_late_init(void)
>>  	if (mca_cfg.recovery)
>>  		enable_copy_mc_fragile();
>>  
>> +	if (mce_gen_pool_extend())
>> +		pr_info("Couldn't extend MCE records pool!\n");
>> +
> 
> Why do this unconditionally? For a vast majority of low core-count, low
> memory systems the default 2 pages would be good enough.
> 
> Should there be a threshold beyond which the extension becomes active?
> Let's say, for example, a check for num_present_cpus() > 32 (Roughly
> based on 8Kb memory and 124b*2 estimate per logical CPU).
> 
> Whatever you choose, a comment above the code would be helpful
> describing when the extension is expected to be useful.
> 
Put it in unconditionally because IMO the increase in memory even for
low-core systems didn't seem to be substantial. Just an additional page
for systems with less than 16 CPUs.

But I do get your point. Will add a check in mcheck_late_init() for CPUs
present. Something like below:

@@ -2901,7 +2901,7 @@ static int __init mcheck_late_init(void)
    if (mca_cfg.recovery)
        enable_copy_mc_fragile();

-   if (mce_gen_pool_extend())
+   if ((num_present_cpus() > 32) && mce_gen_pool_extend())
        pr_info("Couldn't extend MCE records pool!\n");

Does this look good? The genpool extension will then be undertaken only for
systems with more than 32 CPUs. Will explain the same in a comment.

>>  	mcheck_debugfs_init();
>>  
>>  	/*
>> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
>> index fbe8b61c3413..aed01612d342 100644
>> --- a/arch/x86/kernel/cpu/mce/genpool.c
>> +++ b/arch/x86/kernel/cpu/mce/genpool.c
>> @@ -20,6 +20,7 @@
>>   * 2 pages to save MCE events for now (~80 MCE records at most).
>>   */
>>  #define MCE_POOLSZ	(2 * PAGE_SIZE)
>> +#define CPU_GEN_MEMSZ	256
>>  
> 
> The comment above MCE_POOLSZ probably needs a complete re-write. Right
> now, it reads as follows:
> 
> * This memory pool is only to be used to save MCE records in MCE context.
> * MCE events are rare, so a fixed size memory pool should be enough. Use
> * 2 pages to save MCE events for now (~80 MCE records at most).
> 
> Apart from the numbers being incorrect since sizeof(struct mce) has
> increased, this patch is based on the assumption that the current MCE
> memory pool is no longer enough in certain cases.
> 
Yes, will change the comment to something like below:

 * This memory pool is only to be used to save MCE records in MCE context.
 * Though MCE events are rare, their frequency typically depends on the
 * system's memory and CPU count.
 * Allocate 2 pages to the MCE Records pool during early boot with the
 * option to extend the pool, as needed, through command line, for systems
 * with CPU count of more than 32.
 * By default, each logical CPU can have around 2 MCE records in the pool
 * at the same time. 

Sounds good?

>>  static struct gen_pool *mce_evt_pool;
>>  static LLIST_HEAD(mce_event_llist);
>> @@ -116,6 +117,27 @@ int mce_gen_pool_add(struct mce *mce)
>>  	return 0;
>>  }
>>  
>> +int mce_gen_pool_extend(void)
>> +{
>> +	unsigned long addr, len;
> 
> s/len/size/
> 
Noted.
>> +	int ret = -ENOMEM;
>> +	u32 num_threads;
>> +
>> +	num_threads = num_present_cpus();
>> +	len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
> 
> Nit: Can the use of the num_threads variable be avoided?
> How about:
> 
> 	size = PAGE_ALIGN(num_present_cpus() * CPU_GEN_MEMSZ);
> 
Will do.
> 
> 
>> +	addr = (unsigned long)kzalloc(len, GFP_KERNEL);
> 
> Also, shouldn't the new allocation be incremental to the 2 pages already
> present?
> 
> Let's say, for example, that you have a 40-cpu system and the calculated
> size in this case comes out to 40 * 2 * 128b = 9920bytes  i.e. 3 pages.
> You only need to allocate 1 additional page to add to mce_evt_pool
> instead of the 3 pages that the current code does.
> 
Will make it incremental when genpool extension is being undertaken through
the default means. Something like below:

@@ -129,6 +134,7 @@ int mce_gen_pool_extend(void)
    } else {
        num_threads = num_present_cpus();
        len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
+       len -= MCE_POOLSZ;

Does this sound good?

-- 
Thanks,
Avadhut Naik

> Sohil
> 
>> +
>> +	if (!addr)
>> +		goto out;
>> +
>> +	ret = gen_pool_add(mce_evt_pool, addr, len, -1);
>> +	if (ret)
>> +		kfree((void *)addr);
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>>  static int mce_gen_pool_create(void)
>>  {
>>  	struct gen_pool *tmpp;
> 
>