On Mon, Jan 30, 2023 at 2:00 PM Dan Williams <[email protected]> wrote:
>
> Hi Shesha, Linux email expectations are to not top post, i.e. respond
> inline, like below:
>
> Shesha Sreenivasamurthy wrote:
>> The re-configuration does not reset the device. It does re-program the PCIe
>> DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg 258), register
>> (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).
>> “So you need to dynamically recreate the region, especially if your step 10
>> above resets the device.”
>> Do you mean the DAX region ?
>
> No, I mean the CXL region.
>
>> If so, I can if the system stays up. After a few seconds the system
>> crashes. Can the crash be because of a mismatch between DVSEC
>> information with what kernel was informed by BIOS during boot (Some
>> ACPI tables ?)
>
> My concern is that the platform memory decode configuration is not
> prepared for the CXL device to claim more than what was originally
> programmed in the CXL DVSEC range registers. One of the platform
> firmware updates for CXL 2.0 was the creation of the CFMWS (CXL Fixed
> Memory Window Structure) in the ACPI CEDT (CXL Early Discovery Table).
> That structure indicates which platform address ranges decode to which
> CXL host bridges. Those windows are defined in platform specific
> registersi (not enumerated to the OS). If the window is only 8GB then
> the endpoint device can not decode more. You would need to reboot to get
> the BIOS to allocate more host address space for CXL.
>
> The expectation for newer platforms is that platform firmware define
> CFMWS such that there is spare capacity in the address map for the OS to
> dynmaically map more CXL.
There seems to be some instability in using DAX. When the system is given all the device memory using efi=nosoftreserve, the stressapptest (https://github.com/stressapptest/stressapptest) runs for an extended period of time. However, when the system is booted without efi=nosoftreserve, and assigned the special purpose memory to system-ram using daxctl, the system crashes after some time (20-30 mins). Is there any known instabilities when using DAX?
Shesha Sreenivasamurthy wrote:
[..]
> There seems to be some instability in using DAX. When the system is
> given all the device memory using efi=nosoftreserve, the stressapptest
> (https://github.com/stressapptest/stressapptest) runs for an extended
> period of time. However, when the system is booted without
> efi=nosoftreserve, and assigned the special purpose memory to
> system-ram using daxctl, the system crashes after some time (20-30
> mins). Is there any known instabilities when using DAX?
One difference with late binding of memory is where kernel data
structures are allocated. So the stress profile can change based on
kernel activity. Otherwise there is no known instability with delaying
the online of memory.