2019-12-20 21:42:45

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

Le 18/12/2019 à 15:50, Jonathan Cameron a écrit :
> On Wed, 18 Dec 2019 12:32:06 +0100
> Brice Goglin <[email protected]> wrote:
>
>> Le 16/12/2019 à 16:38, Jonathan Cameron a écrit :
>>> Introduces a new type of NUMA node for cases where we want to represent
>>> the access characteristics of a non CPU initiator of memory requests,
>>> as these differ from all those for existing nodes containing CPUs and/or
>>> memory.
>>>
>>> These Generic Initiators are presented by the node access0 class in
>>> sysfs in the same way as a CPU. It seems likely that there will be
>>> usecases in which the best 'CPU' is desired and Generic Initiators
>>> should be ignored. The final few patches in this series introduced
>>> access1 which is a new performance class in the sysfs node description
>>> which presents only CPU to memory relationships. Test cases for this
>>> are described below.
>>
>> Hello Jonathan
>>
>> If I want to test this with a fake GI, what are the minimal set of
>> changes I should put in my ACPI tables? Can I just specify a dummy GI in
>> SRAT? What handle should I use there?
> Exactly that for a dummy GI. Also extend HMAT and SLIT for the extra
> proximity domain / initiator.


I couldn't get this to work (your patches on top of 5.5-rc2). I added
the GI in SRAT, and extended HMAT and SLIT accordingly.

I don't know if that's expected but I get an additional node in sysfs,
with 0kB memory.

However the HMAT table gets ignored because find_mem_target() fails in
hmat_parse_proximity_domain(). The target should have been allocated in
alloc_memory_target() which is called in srat_parse_mem_affinity(), but
it seems to me that this function isn't called for GI nodes. Or should
SRAT also contain a normal Memory node with same PM as the GI?

Brice



2020-01-02 15:28:25

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

On Fri, 20 Dec 2019 22:40:18 +0100
Brice Goglin <[email protected]> wrote:

> Le 18/12/2019 ? 15:50, Jonathan Cameron a ?crit?:
> > On Wed, 18 Dec 2019 12:32:06 +0100
> > Brice Goglin <[email protected]> wrote:
> >
> >> Le 16/12/2019 ? 16:38, Jonathan Cameron a ?crit?:
> >>> Introduces a new type of NUMA node for cases where we want to represent
> >>> the access characteristics of a non CPU initiator of memory requests,
> >>> as these differ from all those for existing nodes containing CPUs and/or
> >>> memory.
> >>>
> >>> These Generic Initiators are presented by the node access0 class in
> >>> sysfs in the same way as a CPU. It seems likely that there will be
> >>> usecases in which the best 'CPU' is desired and Generic Initiators
> >>> should be ignored. The final few patches in this series introduced
> >>> access1 which is a new performance class in the sysfs node description
> >>> which presents only CPU to memory relationships. Test cases for this
> >>> are described below.
> >>
> >> Hello Jonathan
> >>
> >> If I want to test this with a fake GI, what are the minimal set of
> >> changes I should put in my ACPI tables? Can I just specify a dummy GI in
> >> SRAT? What handle should I use there?
> > Exactly that for a dummy GI. Also extend HMAT and SLIT for the extra
> > proximity domain / initiator.
>
>
> I couldn't get this to work (your patches on top of 5.5-rc2). I added
> the GI in SRAT, and extended HMAT and SLIT accordingly.
>
> I don't know if that's expected but I get an additional node in sysfs,
> with 0kB memory.
>
> However the HMAT table gets ignored because find_mem_target() fails in
> hmat_parse_proximity_domain(). The target should have been allocated in
> alloc_memory_target() which is called in srat_parse_mem_affinity(), but
> it seems to me that this function isn't called for GI nodes. Or should
> SRAT also contain a normal Memory node with same PM as the GI?
>
Hi Brice,

Yes you should see a node with 0kB memory. Same as you get for a processor
only node I believe.

srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes
as they don't have any memory. The hmat table should only refer to
GI domains as initiators. Just to check, do you have them listed as
a target node? Or perhaps in some hmat proximity entry as memory_PD?

To answer your question, SRAT should not contain a normal memory node
with the same PXM as that would defeat the whole purpose as we would have
been able to have such a domain without Generic Initiators.

Also, just to check, x86 or arm64?

Thanks for testing this.

Jonathan


> Brice
>
>


2020-01-02 21:38:01

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

Le 02/01/2020 à 16:27, Jonathan Cameron a écrit :
>
>> However the HMAT table gets ignored because find_mem_target() fails in
>> hmat_parse_proximity_domain(). The target should have been allocated in
>> alloc_memory_target() which is called in srat_parse_mem_affinity(), but
>> it seems to me that this function isn't called for GI nodes. Or should
>> SRAT also contain a normal Memory node with same PM as the GI?
>>
> Hi Brice,
>
> Yes you should see a node with 0kB memory. Same as you get for a processor
> only node I believe.
>
> srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes
> as they don't have any memory. The hmat table should only refer to
> GI domains as initiators. Just to check, do you have them listed as
> a target node? Or perhaps in some hmat proximity entry as memory_PD?
>

Thanks, I finally got things to work. I am on x86. It's a dual-socket
machine with SubNUMA clusters (2 nodes per socket) and NVDIMMs (one
dax-kmem node per socket). Before adding a GI, initiators look like this:

node0 -> node0 and node4

node1 -> node1 and node5

node2 -> node2 and node4

node3 -> node3 and node5

I added a GI with faster access to node0, node2, node4 (first socket).

The GI node becomes an access0 initiator for node4, and node0 and node2
remain access1 initiators.

The GI node doesn't become access0 initiator for node0 and node2, likely
because of this test :

/*
* If the Address Range Structure provides a local processor pxm, link
* only that one. Otherwise, find the best performance attributes and
* register all initiators that match.
*/
if (target->processor_pxm != PXM_INVAL) {

I guess I should split node0-3 into separate CPU nodes and memory nodes
in SRAT?

Brice




2020-01-03 10:10:24

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

On Thu, 2 Jan 2020 22:37:04 +0100
Brice Goglin <[email protected]> wrote:

> Le 02/01/2020 ? 16:27, Jonathan Cameron a ?crit?:
> >
> >> However the HMAT table gets ignored because find_mem_target() fails in
> >> hmat_parse_proximity_domain(). The target should have been allocated in
> >> alloc_memory_target() which is called in srat_parse_mem_affinity(), but
> >> it seems to me that this function isn't called for GI nodes. Or should
> >> SRAT also contain a normal Memory node with same PM as the GI?
> >>
> > Hi Brice,
> >
> > Yes you should see a node with 0kB memory. Same as you get for a processor
> > only node I believe.
> >
> > srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes
> > as they don't have any memory. The hmat table should only refer to
> > GI domains as initiators. Just to check, do you have them listed as
> > a target node? Or perhaps in some hmat proximity entry as memory_PD?
> >
>
> Thanks, I finally got things to work. I am on x86. It's a dual-socket
> machine with SubNUMA clusters (2 nodes per socket) and NVDIMMs (one
> dax-kmem node per socket). Before adding a GI, initiators look like this:
>
> node0 -> node0 and node4
>
> node1 -> node1 and node5
>
> node2 -> node2 and node4
>
> node3 -> node3 and node5
>
> I added a GI with faster access to node0, node2, node4 (first socket).
>
> The GI node becomes an access0 initiator for node4, and node0 and node2
> remain access1 initiators.
>
> The GI node doesn't become access0 initiator for node0 and node2, likely
> because of this test :
>
> /*
> * If the Address Range Structure provides a local processor pxm, link
> * only that one. Otherwise, find the best performance attributes and
> * register all initiators that match.
> */
> if (target->processor_pxm != PXM_INVAL) {
>
> I guess I should split node0-3 into separate CPU nodes and memory nodes
> in SRAT?

It sounds like it's working as expected. There are a few assumptions made about
'sensible' hmat configurations.

1) If the memory and processor are in the same domain, that should mean the
access characteristics within that domain are the best in the system.
It is possible to have a setup with very low latency access
from a particular processor but also low bandwidth. Another domain may have
high bandwidth but long latency. Such systems may occur, but they are probably
going to not be for 'normal memory the OS can just use'.

2) If we have a relevant "Memory Proximity Domain Attributes Structure"
Note this was renamed in acpi 6.3 from "Address Range Structure" as
it no longer has any address ranges.
(which are entirely optional btw) that indicates that the memory controller
for a given memory lies in the proximity domain of the Initiator specified.
If that happens we ignore cases where hmat says somewhere else is nearer
via bandwidth and latency.

For case 1) I'm not sure we actually enforce it.
I think you've hit case 2).

Removing the address range structures should work, or as you say you can
move that memory into separate memory nodes. It will be a bit of a strange
setup though. Assuming node4 is an NVDIMM then that would be closer to a
potential real system. With a suitable coherent bus (CCIX is most familiar
to me and can do this) You might have

_______ ________ _______
| | | | | |
| Node0 | | Node4 |---| Node6 |
| CPU |-----| Mem + |---| GI |
| Mem | | CCHome |---| |
|_______| |________| |_______|
| |
|__________________________|

CCHome Cache Coherency directory location to avoid the need for more
esoteric cache coherency short cut methods etc.

Idea being the GI node is some big fat DB accelerator or similar doing
offloaded queries. It has a fat pipe to the NVDIMMs.

Lets ignore that, to actually justify the use of a GI only node,
you need some more elements as this situation could be represented
by fusing node4 and node6 and having asymmetric HMAT between Node0
and the fused Node4.

So in conclusion, with your setup, only the NVDIMM nodes look like the
sort of memory that might be in a node nearer to a GI than the host.
>
> Brice

Thanks again for looking at this!

Jonathan
>
>
>
>


2020-01-03 12:21:34

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

Le 03/01/2020 à 11:09, Jonathan Cameron a écrit :
>
> 1) If the memory and processor are in the same domain, that should mean the
> access characteristics within that domain are the best in the system.
> It is possible to have a setup with very low latency access
> from a particular processor but also low bandwidth. Another domain may have
> high bandwidth but long latency. Such systems may occur, but they are probably
> going to not be for 'normal memory the OS can just use'.
>
> 2) If we have a relevant "Memory Proximity Domain Attributes Structure"
> Note this was renamed in acpi 6.3 from "Address Range Structure" as
> it no longer has any address ranges.
> (which are entirely optional btw) that indicates that the memory controller
> for a given memory lies in the proximity domain of the Initiator specified.
> If that happens we ignore cases where hmat says somewhere else is nearer
> via bandwidth and latency.
>
> For case 1) I'm not sure we actually enforce it.
> I think you've hit case 2).
>
> Removing the address range structures should work, or as you say you can
> move that memory into separate memory nodes.


I removed the "processor proximity domain valid" flag from the address
range structure of node2, and the GI is now its access0 initiator
instead of node2 itself. Looks like it confirms I was in case 2)

Thanks

Brice


2020-01-03 13:10:09

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains

On Fri, 3 Jan 2020 13:18:59 +0100
Brice Goglin <[email protected]> wrote:

> Le 03/01/2020 ? 11:09, Jonathan Cameron a ?crit?:
> >
> > 1) If the memory and processor are in the same domain, that should mean the
> > access characteristics within that domain are the best in the system.
> > It is possible to have a setup with very low latency access
> > from a particular processor but also low bandwidth. Another domain may have
> > high bandwidth but long latency. Such systems may occur, but they are probably
> > going to not be for 'normal memory the OS can just use'.
> >
> > 2) If we have a relevant "Memory Proximity Domain Attributes Structure"
> > Note this was renamed in acpi 6.3 from "Address Range Structure" as
> > it no longer has any address ranges.
> > (which are entirely optional btw) that indicates that the memory controller
> > for a given memory lies in the proximity domain of the Initiator specified.
> > If that happens we ignore cases where hmat says somewhere else is nearer
> > via bandwidth and latency.
> >
> > For case 1) I'm not sure we actually enforce it.
> > I think you've hit case 2).
> >
> > Removing the address range structures should work, or as you say you can
> > move that memory into separate memory nodes.
>
>
> I removed the "processor proximity domain valid" flag from the address
> range structure of node2, and the GI is now its access0 initiator
> instead of node2 itself. Looks like it confirms I was in case 2)
>
> Thanks
>
> Brice

Cool. I was wondering if that change would work fine.
It is a somewhat crazy setup so I didn't have an equivalent in my test set.

Sounds like all is working as expected.

Thanks,

Jonathan