2023-07-21 02:03:54

by Huang, Ying

[permalink] [raw]
Subject: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT

We have the explicit memory tiers framework to manage systems with
multiple types of memory, e.g., DRAM in DIMM slots and CXL memory
devices. Where, same kind of memory devices will be grouped into
memory types, then put into memory tiers. To describe the performance
of a memory type, abstract distance is defined. Which is in direct
proportion to the memory latency and inversely proportional to the
memory bandwidth. To keep the code as simple as possible, fixed
abstract distance is used in dax/kmem to describe slow memory such as
Optane DCPMM.

To support more memory types, in this series, we added the abstract
distance calculation algorithm management mechanism, provided a
algorithm implementation based on ACPI HMAT, and used the general
abstract distance calculation interface in dax/kmem driver. So,
dax/kmem can support HBM (high bandwidth memory) in addition to the
original Optane DCPMM.

Changelog:

V1 (from RFC):

- Added some comments per Aneesh's comments, Thanks!

Best Regards,
Huang, Ying


2023-07-21 02:21:00

by Huang, Ying

[permalink] [raw]
Subject: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
used for slow memory type in kmem driver. This limits the usage of
kmem driver, for example, it cannot be used for HBM (high bandwidth
memory).

So, we use the general abstract distance calculation mechanism in kmem
drivers to get more accurate abstract distance on systems with proper
support. The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
fallback only.

Now, multiple memory types may be managed by kmem. These memory types
are put into the "kmem_memory_types" list and protected by
kmem_memory_type_lock.

Signed-off-by: "Huang, Ying" <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Rafael J Wysocki <[email protected]>
---
drivers/dax/kmem.c | 54 +++++++++++++++++++++++++++---------
include/linux/memory-tiers.h | 2 ++
mm/memory-tiers.c | 2 +-
3 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 898ca9505754..837165037231 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -49,14 +49,40 @@ struct dax_kmem_data {
struct resource *res[];
};

-static struct memory_dev_type *dax_slowmem_type;
+static DEFINE_MUTEX(kmem_memory_type_lock);
+static LIST_HEAD(kmem_memory_types);
+
+static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
+{
+ bool found = false;
+ struct memory_dev_type *mtype;
+
+ mutex_lock(&kmem_memory_type_lock);
+ list_for_each_entry(mtype, &kmem_memory_types, list) {
+ if (mtype->adistance == adist) {
+ found = true;
+ break;
+ }
+ }
+ if (!found) {
+ mtype = alloc_memory_type(adist);
+ if (!IS_ERR(mtype))
+ list_add(&mtype->list, &kmem_memory_types);
+ }
+ mutex_unlock(&kmem_memory_type_lock);
+
+ return mtype;
+}
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
unsigned long total_len = 0;
struct dax_kmem_data *data;
+ struct memory_dev_type *mtype;
int i, rc, mapped = 0;
int numa_node;
+ int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;

/*
* Ensure good NUMA information for the persistent memory.
@@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return -EINVAL;
}

+ mt_calc_adistance(numa_node, &adist);
+ mtype = kmem_find_alloc_memorty_type(adist);
+ if (IS_ERR(mtype))
+ return PTR_ERR(mtype);
+
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;

@@ -88,7 +119,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return -EINVAL;
}

- init_node_memory_type(numa_node, dax_slowmem_type);
+ init_node_memory_type(numa_node, mtype);

rc = -ENOMEM;
data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
@@ -167,7 +198,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
err_res_name:
kfree(data);
err_dax_kmem_data:
- clear_node_memory_type(numa_node, dax_slowmem_type);
+ clear_node_memory_type(numa_node, mtype);
return rc;
}

@@ -219,7 +250,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* for that. This implies this reference will be around
* till next reboot.
*/
- clear_node_memory_type(node, dax_slowmem_type);
+ clear_node_memory_type(node, NULL);
}
}
#else
@@ -251,12 +282,6 @@ static int __init dax_kmem_init(void)
if (!kmem_name)
return -ENOMEM;

- dax_slowmem_type = alloc_memory_type(MEMTIER_DEFAULT_DAX_ADISTANCE);
- if (IS_ERR(dax_slowmem_type)) {
- rc = PTR_ERR(dax_slowmem_type);
- goto err_dax_slowmem_type;
- }
-
rc = dax_driver_register(&device_dax_kmem_driver);
if (rc)
goto error_dax_driver;
@@ -264,18 +289,21 @@ static int __init dax_kmem_init(void)
return rc;

error_dax_driver:
- destroy_memory_type(dax_slowmem_type);
-err_dax_slowmem_type:
kfree_const(kmem_name);
return rc;
}

static void __exit dax_kmem_exit(void)
{
+ struct memory_dev_type *mtype, *mtn;
+
dax_driver_unregister(&device_dax_kmem_driver);
if (!any_hotremove_failed)
kfree_const(kmem_name);
- destroy_memory_type(dax_slowmem_type);
+ list_for_each_entry_safe(mtype, mtn, &kmem_memory_types, list) {
+ list_del(&mtype->list);
+ destroy_memory_type(mtype);
+ }
}

MODULE_AUTHOR("Intel Corporation");
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 9377239c8d34..aca22220cb5c 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -24,6 +24,8 @@ struct memory_tier;
struct memory_dev_type {
/* list of memory types that are part of same tier as this type */
struct list_head tier_sibiling;
+ /* list of memory types that are managed by one driver */
+ struct list_head list;
/* abstract distance for this specific memory type */
int adistance;
/* Nodes of same abstract distance */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 9a734ef2edfb..38005c60fa2d 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -581,7 +581,7 @@ EXPORT_SYMBOL_GPL(init_node_memory_type);
void clear_node_memory_type(int node, struct memory_dev_type *memtype)
{
mutex_lock(&memory_tier_lock);
- if (node_memory_types[node].memtype == memtype)
+ if (node_memory_types[node].memtype == memtype || !memtype)
node_memory_types[node].map_count--;
/*
* If we umapped all the attached devices to this node,
--
2.39.2


2023-07-21 04:51:59

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT


Thanks for this Huang, I had been hoping to take a look at it this week
but have run out of time. I'm keen to do some testing with it as well.

Hopefully next week...

Huang Ying <[email protected]> writes:

> We have the explicit memory tiers framework to manage systems with
> multiple types of memory, e.g., DRAM in DIMM slots and CXL memory
> devices. Where, same kind of memory devices will be grouped into
> memory types, then put into memory tiers. To describe the performance
> of a memory type, abstract distance is defined. Which is in direct
> proportion to the memory latency and inversely proportional to the
> memory bandwidth. To keep the code as simple as possible, fixed
> abstract distance is used in dax/kmem to describe slow memory such as
> Optane DCPMM.
>
> To support more memory types, in this series, we added the abstract
> distance calculation algorithm management mechanism, provided a
> algorithm implementation based on ACPI HMAT, and used the general
> abstract distance calculation interface in dax/kmem driver. So,
> dax/kmem can support HBM (high bandwidth memory) in addition to the
> original Optane DCPMM.
>
> Changelog:
>
> V1 (from RFC):
>
> - Added some comments per Aneesh's comments, Thanks!
>
> Best Regards,
> Huang, Ying


2023-07-24 18:33:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT

On Fri, 21 Jul 2023 14:15:31 +1000 Alistair Popple <[email protected]> wrote:

> Thanks for this Huang, I had been hoping to take a look at it this week
> but have run out of time. I'm keen to do some testing with it as well.

Thanks. I'll queue this in mm-unstable for some testing. Detailed
review and testing would be appreciated.

I made some adjustments to handle the renaming of destroy_memory_type()
to put_memory_type()
(https://lkml.kernel.org/r/[email protected])

2023-07-25 03:35:12

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface


Huang Ying <[email protected]> writes:

> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
> used for slow memory type in kmem driver. This limits the usage of
> kmem driver, for example, it cannot be used for HBM (high bandwidth
> memory).
>
> So, we use the general abstract distance calculation mechanism in kmem
> drivers to get more accurate abstract distance on systems with proper
> support. The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
> fallback only.
>
> Now, multiple memory types may be managed by kmem. These memory types
> are put into the "kmem_memory_types" list and protected by
> kmem_memory_type_lock.

See below but I wonder if kmem_memory_types could be a common helper
rather than kdax specific?

> Signed-off-by: "Huang, Ying" <[email protected]>
> Cc: Aneesh Kumar K.V <[email protected]>
> Cc: Wei Xu <[email protected]>
> Cc: Alistair Popple <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Jonathan Cameron <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Yang Shi <[email protected]>
> Cc: Rafael J Wysocki <[email protected]>
> ---
> drivers/dax/kmem.c | 54 +++++++++++++++++++++++++++---------
> include/linux/memory-tiers.h | 2 ++
> mm/memory-tiers.c | 2 +-
> 3 files changed, 44 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 898ca9505754..837165037231 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -49,14 +49,40 @@ struct dax_kmem_data {
> struct resource *res[];
> };
>
> -static struct memory_dev_type *dax_slowmem_type;
> +static DEFINE_MUTEX(kmem_memory_type_lock);
> +static LIST_HEAD(kmem_memory_types);
> +
> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
> +{
> + bool found = false;
> + struct memory_dev_type *mtype;
> +
> + mutex_lock(&kmem_memory_type_lock);
> + list_for_each_entry(mtype, &kmem_memory_types, list) {
> + if (mtype->adistance == adist) {
> + found = true;
> + break;
> + }
> + }
> + if (!found) {
> + mtype = alloc_memory_type(adist);
> + if (!IS_ERR(mtype))
> + list_add(&mtype->list, &kmem_memory_types);
> + }
> + mutex_unlock(&kmem_memory_type_lock);
> +
> + return mtype;
> +}
> +
> static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> {
> struct device *dev = &dev_dax->dev;
> unsigned long total_len = 0;
> struct dax_kmem_data *data;
> + struct memory_dev_type *mtype;
> int i, rc, mapped = 0;
> int numa_node;
> + int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>
> /*
> * Ensure good NUMA information for the persistent memory.
> @@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> return -EINVAL;
> }
>
> + mt_calc_adistance(numa_node, &adist);
> + mtype = kmem_find_alloc_memorty_type(adist);
> + if (IS_ERR(mtype))
> + return PTR_ERR(mtype);
> +

I wrote my own quick and dirty module to test this and wrote basically
the same code sequence.

I notice your using a list of memory types here though. I think it would
be nice to have a common helper that other users could call to do the
mt_calc_adistance() / kmem_find_alloc_memory_type() /
init_node_memory_type() sequence and cleanup as my naive approach would
result in a new memory_dev_type per device even though adist might be
the same. A common helper would make it easy to de-dup those.

> for (i = 0; i < dev_dax->nr_range; i++) {
> struct range range;
>
> @@ -88,7 +119,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> return -EINVAL;
> }
>
> - init_node_memory_type(numa_node, dax_slowmem_type);
> + init_node_memory_type(numa_node, mtype);
>
> rc = -ENOMEM;
> data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
> @@ -167,7 +198,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> err_res_name:
> kfree(data);
> err_dax_kmem_data:
> - clear_node_memory_type(numa_node, dax_slowmem_type);
> + clear_node_memory_type(numa_node, mtype);
> return rc;
> }
>
> @@ -219,7 +250,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> * for that. This implies this reference will be around
> * till next reboot.
> */
> - clear_node_memory_type(node, dax_slowmem_type);
> + clear_node_memory_type(node, NULL);
> }
> }
> #else
> @@ -251,12 +282,6 @@ static int __init dax_kmem_init(void)
> if (!kmem_name)
> return -ENOMEM;
>
> - dax_slowmem_type = alloc_memory_type(MEMTIER_DEFAULT_DAX_ADISTANCE);
> - if (IS_ERR(dax_slowmem_type)) {
> - rc = PTR_ERR(dax_slowmem_type);
> - goto err_dax_slowmem_type;
> - }
> -
> rc = dax_driver_register(&device_dax_kmem_driver);
> if (rc)
> goto error_dax_driver;
> @@ -264,18 +289,21 @@ static int __init dax_kmem_init(void)
> return rc;
>
> error_dax_driver:
> - destroy_memory_type(dax_slowmem_type);
> -err_dax_slowmem_type:
> kfree_const(kmem_name);
> return rc;
> }
>
> static void __exit dax_kmem_exit(void)
> {
> + struct memory_dev_type *mtype, *mtn;
> +
> dax_driver_unregister(&device_dax_kmem_driver);
> if (!any_hotremove_failed)
> kfree_const(kmem_name);
> - destroy_memory_type(dax_slowmem_type);
> + list_for_each_entry_safe(mtype, mtn, &kmem_memory_types, list) {
> + list_del(&mtype->list);
> + destroy_memory_type(mtype);
> + }
> }
>
> MODULE_AUTHOR("Intel Corporation");
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 9377239c8d34..aca22220cb5c 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -24,6 +24,8 @@ struct memory_tier;
> struct memory_dev_type {
> /* list of memory types that are part of same tier as this type */
> struct list_head tier_sibiling;
> + /* list of memory types that are managed by one driver */
> + struct list_head list;
> /* abstract distance for this specific memory type */
> int adistance;
> /* Nodes of same abstract distance */
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 9a734ef2edfb..38005c60fa2d 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -581,7 +581,7 @@ EXPORT_SYMBOL_GPL(init_node_memory_type);
> void clear_node_memory_type(int node, struct memory_dev_type *memtype)
> {
> mutex_lock(&memory_tier_lock);
> - if (node_memory_types[node].memtype == memtype)
> + if (node_memory_types[node].memtype == memtype || !memtype)
> node_memory_types[node].map_count--;
> /*
> * If we umapped all the attached devices to this node,


2023-07-25 08:02:28

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

Alistair Popple <[email protected]> writes:

> Huang Ying <[email protected]> writes:
>
>> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
>> used for slow memory type in kmem driver. This limits the usage of
>> kmem driver, for example, it cannot be used for HBM (high bandwidth
>> memory).
>>
>> So, we use the general abstract distance calculation mechanism in kmem
>> drivers to get more accurate abstract distance on systems with proper
>> support. The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
>> fallback only.
>>
>> Now, multiple memory types may be managed by kmem. These memory types
>> are put into the "kmem_memory_types" list and protected by
>> kmem_memory_type_lock.
>
> See below but I wonder if kmem_memory_types could be a common helper
> rather than kdax specific?
>
>> Signed-off-by: "Huang, Ying" <[email protected]>
>> Cc: Aneesh Kumar K.V <[email protected]>
>> Cc: Wei Xu <[email protected]>
>> Cc: Alistair Popple <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Davidlohr Bueso <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Jonathan Cameron <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Yang Shi <[email protected]>
>> Cc: Rafael J Wysocki <[email protected]>
>> ---
>> drivers/dax/kmem.c | 54 +++++++++++++++++++++++++++---------
>> include/linux/memory-tiers.h | 2 ++
>> mm/memory-tiers.c | 2 +-
>> 3 files changed, 44 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>> index 898ca9505754..837165037231 100644
>> --- a/drivers/dax/kmem.c
>> +++ b/drivers/dax/kmem.c
>> @@ -49,14 +49,40 @@ struct dax_kmem_data {
>> struct resource *res[];
>> };
>>
>> -static struct memory_dev_type *dax_slowmem_type;
>> +static DEFINE_MUTEX(kmem_memory_type_lock);
>> +static LIST_HEAD(kmem_memory_types);
>> +
>> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
>> +{
>> + bool found = false;
>> + struct memory_dev_type *mtype;
>> +
>> + mutex_lock(&kmem_memory_type_lock);
>> + list_for_each_entry(mtype, &kmem_memory_types, list) {
>> + if (mtype->adistance == adist) {
>> + found = true;
>> + break;
>> + }
>> + }
>> + if (!found) {
>> + mtype = alloc_memory_type(adist);
>> + if (!IS_ERR(mtype))
>> + list_add(&mtype->list, &kmem_memory_types);
>> + }
>> + mutex_unlock(&kmem_memory_type_lock);
>> +
>> + return mtype;
>> +}
>> +
>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> {
>> struct device *dev = &dev_dax->dev;
>> unsigned long total_len = 0;
>> struct dax_kmem_data *data;
>> + struct memory_dev_type *mtype;
>> int i, rc, mapped = 0;
>> int numa_node;
>> + int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>>
>> /*
>> * Ensure good NUMA information for the persistent memory.
>> @@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> return -EINVAL;
>> }
>>
>> + mt_calc_adistance(numa_node, &adist);
>> + mtype = kmem_find_alloc_memorty_type(adist);
>> + if (IS_ERR(mtype))
>> + return PTR_ERR(mtype);
>> +
>
> I wrote my own quick and dirty module to test this and wrote basically
> the same code sequence.
>
> I notice your using a list of memory types here though. I think it would
> be nice to have a common helper that other users could call to do the
> mt_calc_adistance() / kmem_find_alloc_memory_type() /
> init_node_memory_type() sequence and cleanup as my naive approach would
> result in a new memory_dev_type per device even though adist might be
> the same. A common helper would make it easy to de-dup those.

If it's useful, we can move kmem_find_alloc_memory_type() to
memory-tier.c after some revision. But I tend to move it after we have
the second user. What do you think about that?

--
Best Regards,
Huang, Ying

>> for (i = 0; i < dev_dax->nr_range; i++) {
>> struct range range;
>>
>> @@ -88,7 +119,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> return -EINVAL;
>> }
>>
>> - init_node_memory_type(numa_node, dax_slowmem_type);
>> + init_node_memory_type(numa_node, mtype);
>>
>> rc = -ENOMEM;
>> data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
>> @@ -167,7 +198,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> err_res_name:
>> kfree(data);
>> err_dax_kmem_data:
>> - clear_node_memory_type(numa_node, dax_slowmem_type);
>> + clear_node_memory_type(numa_node, mtype);
>> return rc;
>> }
>>
>> @@ -219,7 +250,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
>> * for that. This implies this reference will be around
>> * till next reboot.
>> */
>> - clear_node_memory_type(node, dax_slowmem_type);
>> + clear_node_memory_type(node, NULL);
>> }
>> }
>> #else
>> @@ -251,12 +282,6 @@ static int __init dax_kmem_init(void)
>> if (!kmem_name)
>> return -ENOMEM;
>>
>> - dax_slowmem_type = alloc_memory_type(MEMTIER_DEFAULT_DAX_ADISTANCE);
>> - if (IS_ERR(dax_slowmem_type)) {
>> - rc = PTR_ERR(dax_slowmem_type);
>> - goto err_dax_slowmem_type;
>> - }
>> -
>> rc = dax_driver_register(&device_dax_kmem_driver);
>> if (rc)
>> goto error_dax_driver;
>> @@ -264,18 +289,21 @@ static int __init dax_kmem_init(void)
>> return rc;
>>
>> error_dax_driver:
>> - destroy_memory_type(dax_slowmem_type);
>> -err_dax_slowmem_type:
>> kfree_const(kmem_name);
>> return rc;
>> }
>>
>> static void __exit dax_kmem_exit(void)
>> {
>> + struct memory_dev_type *mtype, *mtn;
>> +
>> dax_driver_unregister(&device_dax_kmem_driver);
>> if (!any_hotremove_failed)
>> kfree_const(kmem_name);
>> - destroy_memory_type(dax_slowmem_type);
>> + list_for_each_entry_safe(mtype, mtn, &kmem_memory_types, list) {
>> + list_del(&mtype->list);
>> + destroy_memory_type(mtype);
>> + }
>> }
>>
>> MODULE_AUTHOR("Intel Corporation");
>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>> index 9377239c8d34..aca22220cb5c 100644
>> --- a/include/linux/memory-tiers.h
>> +++ b/include/linux/memory-tiers.h
>> @@ -24,6 +24,8 @@ struct memory_tier;
>> struct memory_dev_type {
>> /* list of memory types that are part of same tier as this type */
>> struct list_head tier_sibiling;
>> + /* list of memory types that are managed by one driver */
>> + struct list_head list;
>> /* abstract distance for this specific memory type */
>> int adistance;
>> /* Nodes of same abstract distance */
>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>> index 9a734ef2edfb..38005c60fa2d 100644
>> --- a/mm/memory-tiers.c
>> +++ b/mm/memory-tiers.c
>> @@ -581,7 +581,7 @@ EXPORT_SYMBOL_GPL(init_node_memory_type);
>> void clear_node_memory_type(int node, struct memory_dev_type *memtype)
>> {
>> mutex_lock(&memory_tier_lock);
>> - if (node_memory_types[node].memtype == memtype)
>> + if (node_memory_types[node].memtype == memtype || !memtype)
>> node_memory_types[node].map_count--;
>> /*
>> * If we umapped all the attached devices to this node,

2023-08-01 04:34:11

by Bharata B Rao

[permalink] [raw]
Subject: Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT

On 24-Jul-23 11:28 PM, Andrew Morton wrote:
> On Fri, 21 Jul 2023 14:15:31 +1000 Alistair Popple <[email protected]> wrote:
>
>> Thanks for this Huang, I had been hoping to take a look at it this week
>> but have run out of time. I'm keen to do some testing with it as well.
>
> Thanks. I'll queue this in mm-unstable for some testing. Detailed
> review and testing would be appreciated.

I gave this series a try on a 2P system with 2 CXL cards. I don't trust the
bandwidth and latency numbers reported by HMAT here, but FWIW, this patchset
puts the CXL nodes on a lower tier than DRAM nodes.

Regards,
Bharata.


2023-08-11 08:18:35

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT

Hi, Rao,

Bharata B Rao <[email protected]> writes:

> On 24-Jul-23 11:28 PM, Andrew Morton wrote:
>> On Fri, 21 Jul 2023 14:15:31 +1000 Alistair Popple <[email protected]> wrote:
>>
>>> Thanks for this Huang, I had been hoping to take a look at it this week
>>> but have run out of time. I'm keen to do some testing with it as well.
>>
>> Thanks. I'll queue this in mm-unstable for some testing. Detailed
>> review and testing would be appreciated.
>
> I gave this series a try on a 2P system with 2 CXL cards. I don't trust the
> bandwidth and latency numbers reported by HMAT here, but FWIW, this patchset
> puts the CXL nodes on a lower tier than DRAM nodes.

Thank you very much!

Can I add your "Tested-by" for the series?

--
Best Regards,
Huang, Ying

2023-08-11 08:33:27

by Bharata B Rao

[permalink] [raw]
Subject: Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT


On 11-Aug-23 11:56 AM, Huang, Ying wrote:
> Hi, Rao,
>
> Bharata B Rao <[email protected]> writes:
>
>> On 24-Jul-23 11:28 PM, Andrew Morton wrote:
>>> On Fri, 21 Jul 2023 14:15:31 +1000 Alistair Popple <[email protected]> wrote:
>>>
>>>> Thanks for this Huang, I had been hoping to take a look at it this week
>>>> but have run out of time. I'm keen to do some testing with it as well.
>>>
>>> Thanks. I'll queue this in mm-unstable for some testing. Detailed
>>> review and testing would be appreciated.
>>
>> I gave this series a try on a 2P system with 2 CXL cards. I don't trust the
>> bandwidth and latency numbers reported by HMAT here, but FWIW, this patchset
>> puts the CXL nodes on a lower tier than DRAM nodes.
>
> Thank you very much!
>
> Can I add your "Tested-by" for the series?

Yes if the above test qualifies for it, please go ahead.

Regards,
Bharata.