2018-09-11 19:35:11

by Jeffrey Hugo

[permalink] [raw]
Subject: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

The type of a cache might not be specified by architectural mechanisms (ie
system registers), but its type might be specified in the PPTT. In this
case, following the PPTT specification, we should identify the cache as
the type specified by PPTT.

This fixes the following lscpu issue where only the cache type sysfs file
is missing which results in no output providing a poor user experience in
the above system configuration-
lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
file or directory

Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology Table parsing)
Reported-by: Vijaya Kumar K <[email protected]>
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/acpi/pptt.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index d1e26cb..3c6db09 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -401,6 +401,21 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
break;
}
}
+ if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
+ (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
+ switch (found_cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) {
+ case ACPI_PPTT_CACHE_TYPE_DATA:
+ this_leaf->type = CACHE_TYPE_DATA;
+ break;
+ case ACPI_PPTT_CACHE_TYPE_INSTR:
+ this_leaf->type = CACHE_TYPE_INST;
+ break;
+ case ACPI_PPTT_CACHE_TYPE_UNIFIED:
+ case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
+ this_leaf->type = CACHE_TYPE_UNIFIED;
+ break;
+ }
+ }
/*
* If the above flags are valid, and the cache type is NOCACHE
* update the cache type as well.
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



2018-09-11 20:17:40

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Hi Jeffrey,

(+Sudeep)

On 09/11/2018 02:32 PM, Jeffrey Hugo wrote:
> The type of a cache might not be specified by architectural mechanisms (ie
> system registers), but its type might be specified in the PPTT. In this
> case, following the PPTT specification, we should identify the cache as
> the type specified by PPTT.
>
> This fixes the following lscpu issue where only the cache type sysfs file
> is missing which results in no output providing a poor user experience in
> the above system configuration-
> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
> file or directory
>
> Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology Table parsing)
> Reported-by: Vijaya Kumar K <[email protected]>
> Signed-off-by: Jeffrey Hugo <[email protected]>
> ---
> drivers/acpi/pptt.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index d1e26cb..3c6db09 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -401,6 +401,21 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
> break;
> }
> }
> + if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
> + (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
> + switch (found_cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) {
> + case ACPI_PPTT_CACHE_TYPE_DATA:
> + this_leaf->type = CACHE_TYPE_DATA;
> + break;
> + case ACPI_PPTT_CACHE_TYPE_INSTR:
> + this_leaf->type = CACHE_TYPE_INST;
> + break;
> + case ACPI_PPTT_CACHE_TYPE_UNIFIED:
> + case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
> + this_leaf->type = CACHE_TYPE_UNIFIED;
> + break;
> + }
> + }
> /*
> * If the above flags are valid, and the cache type is NOCACHE
> * update the cache type as well.
>

If you look at the next line of code following this comment its going to
update the cache type for fully populated PPTT nodes. Although with the
suggested change its only going to activate if someone completely fills
out the node and fails to set the valid flag on the cache type.

What I suspect is happening in the reported case is that the nodes in
the PPTT table are missing fields we consider to be important. Since
that data isn't being filled out anywhere else, so we leave the cache
type alone too. This has the effect of hiding sysfs nodes with
incomplete information.

Also, the lack of the DATA/INST fields is based on the assumption that
the only nodes which need their type field updated are outside of the
CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
you hitting this case?


Thanks,

2018-09-11 20:39:15

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/11/2018 2:16 PM, Jeremy Linton wrote:
> Hi Jeffrey,
>
> (+Sudeep)
>
> On 09/11/2018 02:32 PM, Jeffrey Hugo wrote:
>> The type of a cache might not be specified by architectural mechanisms
>> (ie
>> system registers), but its type might be specified in the PPTT.  In this
>> case, following the PPTT specification, we should identify the cache as
>> the type specified by PPTT.
>>
>> This fixes the following lscpu issue where only the cache type sysfs file
>> is missing which results in no output providing a poor user experience in
>> the above system configuration-
>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No
>> such
>> file or directory
>>
>> Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology
>> Table parsing)
>> Reported-by: Vijaya Kumar K <[email protected]>
>> Signed-off-by: Jeffrey Hugo <[email protected]>
>> ---
>>   drivers/acpi/pptt.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index d1e26cb..3c6db09 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -401,6 +401,21 @@ static void update_cache_properties(struct
>> cacheinfo *this_leaf,
>>               break;
>>           }
>>       }
>> +    if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
>> +        (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
>> +        switch (found_cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) {
>> +        case ACPI_PPTT_CACHE_TYPE_DATA:
>> +            this_leaf->type = CACHE_TYPE_DATA;
>> +            break;
>> +        case ACPI_PPTT_CACHE_TYPE_INSTR:
>> +            this_leaf->type = CACHE_TYPE_INST;
>> +            break;
>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED:
>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
>> +            this_leaf->type = CACHE_TYPE_UNIFIED;
>> +            break;
>> +        }
>> +    }
>>       /*
>>        * If the above flags are valid, and the cache type is NOCACHE
>>        * update the cache type as well.
>>
>
> If you look at the next line of code following this comment its going to
> update the cache type for fully populated PPTT nodes. Although with the
> suggested change its only going to activate if someone completely fills
> out the node and fails to set the valid flag on the cache type.

Yes, however that case doesn't apply to the scenario we are concerned
about, doesn't seem to be fully following the PPTT spec, and seems odd
that Linux just assumes that a "fully specified" cache is unified.

> What I suspect is happening in the reported case is that the nodes in
> the PPTT table are missing fields we consider to be important. Since
> that data isn't being filled out anywhere else, so we leave the cache
> type alone too. This has the effect of hiding sysfs nodes with
> incomplete information.
>
> Also, the lack of the DATA/INST fields is based on the assumption that
> the only nodes which need their type field updated are outside of the
> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
> you hitting this case?
>

Yes. Without this change, we hit the lscpu error in the commit message,
and get zero output about the system. We don't even get information
about the caches which are architecturally specified or how many cpus
are present. With this change, we get what we expect out of lscpu (and
also lstopo) including the cache(s) which are not architecturally specified.

I guess I still don't understand why its important for PPTT to list, for
example, the sets/ways of a cache in all scenarios. In the case of a
"transparent" cache (implementation defined as not reported per section
D3.4.2 of the ARM ARM where the cache cannot be managed by SW), there
may not be valid values for sets/ways. I would argue its better to not
report that information, rather than provide bogus information just to
make Linux happy, which may break other OSes and provide means for which
a user to hang themselves.

However, in the case of a transparent cache, the size/type/level/write
policy/etc (whatever the firmware provider deems relevant) might be
valuable information for the OS to make scheduling decisions, and is
certainly valuable for user space utilities for cross-machine/data
center level job scheduling.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-11 21:26:39

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Hi,

On 09/11/2018 03:38 PM, Jeffrey Hugo wrote:
> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>> Hi Jeffrey,
>>
>> (+Sudeep)
>>
>> On 09/11/2018 02:32 PM, Jeffrey Hugo wrote:
>>> The type of a cache might not be specified by architectural
>>> mechanisms (ie
>>> system registers), but its type might be specified in the PPTT.  In this
>>> case, following the PPTT specification, we should identify the cache as
>>> the type specified by PPTT.
>>>
>>> This fixes the following lscpu issue where only the cache type sysfs
>>> file
>>> is missing which results in no output providing a poor user
>>> experience in
>>> the above system configuration-
>>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No
>>> such
>>> file or directory
>>>
>>> Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology
>>> Table parsing)
>>> Reported-by: Vijaya Kumar K <[email protected]>
>>> Signed-off-by: Jeffrey Hugo <[email protected]>
>>> ---
>>>   drivers/acpi/pptt.c | 15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> index d1e26cb..3c6db09 100644
>>> --- a/drivers/acpi/pptt.c
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -401,6 +401,21 @@ static void update_cache_properties(struct
>>> cacheinfo *this_leaf,
>>>               break;
>>>           }
>>>       }
>>> +    if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
>>> +        (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
>>> +        switch (found_cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) {
>>> +        case ACPI_PPTT_CACHE_TYPE_DATA:
>>> +            this_leaf->type = CACHE_TYPE_DATA;
>>> +            break;
>>> +        case ACPI_PPTT_CACHE_TYPE_INSTR:
>>> +            this_leaf->type = CACHE_TYPE_INST;
>>> +            break;
>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED:
>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
>>> +            this_leaf->type = CACHE_TYPE_UNIFIED;
>>> +            break;
>>> +        }
>>> +    }
>>>       /*
>>>        * If the above flags are valid, and the cache type is NOCACHE
>>>        * update the cache type as well.
>>>
>>
>> If you look at the next line of code following this comment its going
>> to update the cache type for fully populated PPTT nodes. Although with
>> the suggested change its only going to activate if someone completely
>> fills out the node and fails to set the valid flag on the cache type.
>
> Yes, however that case doesn't apply to the scenario we are concerned
> about, doesn't seem to be fully following the PPTT spec, and seems odd
> that Linux just assumes that a "fully specified" cache is unified.

Because, the architecturally specified ones won't be type NOCACHE?

>
>> What I suspect is happening in the reported case is that the nodes in
>> the PPTT table are missing fields we consider to be important. Since
>> that data isn't being filled out anywhere else, so we leave the cache
>> type alone too. This has the effect of hiding sysfs nodes with
>> incomplete information.
>>
>> Also, the lack of the DATA/INST fields is based on the assumption that
>> the only nodes which need their type field updated are outside of the
>> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
>> you hitting this case?
>>
>
> Yes.  Without this change, we hit the lscpu error in the commit message,
> and get zero output about the system.  We don't even get information
> about the caches which are architecturally specified or how many cpus
> are present.  With this change, we get what we expect out of lscpu (and
> also lstopo) including the cache(s) which are not architecturally
> specified.

I'm a bit surprised this changes the behavior of the architecturally
specified ones. As I mentioned above, those shouldn't be NOCACHE. We use
the level/type as a key for matching a PPTT node to an architecturally
described cache. If that is mismatched, something more fundamental is
happening. About the only case I can think of that can cause this is if
the CLIDR type fields are incorrect.

In which case I might suggest we move your switch() for the INST/DATA
into the check below that comment.

>
> I guess I still don't understand why its important for PPTT to list, for
> example, the sets/ways of a cache in all scenarios.  In the case of a
> "transparent" cache (implementation defined as not reported per section
> D3.4.2 of the ARM ARM where the cache cannot be managed by SW), there
> may not be valid values for sets/ways.  I would argue its better to not
> report that information, rather than provide bogus information just to
> make Linux happy, which may break other OSes and provide means for which
> a user to hang themselves.

This doesn't really have anything to do with the Arm/ARM's definition of
set/way as it pertains to cache maint. Its more to assist userspace
software with an understanding of the system cache architecture so it
can make intelligent decisions about loop tiling and the like.

What we want is a consistent dependable view for software to utilize. An
unreliable sysfs field is one that tends not to get used.

>
> However, in the case of a transparent cache, the size/type/level/write
> policy/etc (whatever the firmware provider deems relevant) might be
> valuable information for the OS to make scheduling decisions, and is
> certainly valuable for user space utilities for cross-machine/data ou
> center level job scheduling.
>

2018-09-12 10:38:18

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 11/09/18 21:16, Jeremy Linton wrote:
> Hi Jeffrey,
>
> (+Sudeep)
>

Thanks for looping me in.

>
> If you look at the next line of code following this comment its going to
> update the cache type for fully populated PPTT nodes. Although with the
> suggested change its only going to activate if someone completely fills
> out the node and fails to set the valid flag on the cache type.
>
> What I suspect is happening in the reported case is that the nodes in
> the PPTT table are missing fields we consider to be important. Since
> that data isn't being filled out anywhere else, so we leave the cache
> type alone too. This has the effect of hiding sysfs nodes with
> incomplete information.
>
> Also, the lack of the DATA/INST fields is based on the assumption that
> the only nodes which need their type field updated are outside of the
> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
> you hitting this case?
>

Completely agree with you.

--
Regards,
Sudeep

2018-09-12 10:50:20

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 11/09/18 21:38, Jeffrey Hugo wrote:
> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>> Hi Jeffrey,
>>
>> (+Sudeep)
>>

[..]

>>
>> If you look at the next line of code following this comment its going
>> to update the cache type for fully populated PPTT nodes. Although with
>> the suggested change its only going to activate if someone completely
>> fills out the node and fails to set the valid flag on the cache type.
>
> Yes, however that case doesn't apply to the scenario we are concerned
> about, doesn't seem to be fully following the PPTT spec, and seems odd
> that Linux just assumes that a "fully specified" cache is unified.
>

Agreed, but adding code that will never get used is also not so good.
Do you have systems where this is needed ?

>> What I suspect is happening in the reported case is that the nodes in
>> the PPTT table are missing fields we consider to be important. Since
>> that data isn't being filled out anywhere else, so we leave the cache
>> type alone too. This has the effect of hiding sysfs nodes with
>> incomplete information.
>>
>> Also, the lack of the DATA/INST fields is based on the assumption that
>> the only nodes which need their type field updated are outside of the
>> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
>> you hitting this case?
>>
>
> Yes.  Without this change, we hit the lscpu error in the commit message,
> and get zero output about the system.  We don't even get information
> about the caches which are architecturally specified or how many cpus
> are present.  With this change, we get what we expect out of lscpu (and
> also lstopo) including the cache(s) which are not architecturally
> specified.
>

lscpu and lstopo are so broken. They just assume everything on CPU0.
If you hotplug them out, you start seeing issues. So reading and file
that doesn't exist and then bail out on other essential info though they
are present, hmmm ...

> I guess I still don't understand why its important for PPTT to list, for
> example, the sets/ways of a cache in all scenarios.  In the case of a
> "transparent" cache (implementation defined as not reported per section
> D3.4.2 of the ARM ARM where the cache cannot be managed by SW), there
> may not be valid values for sets/ways.  I would argue its better to not
> report that information, rather than provide bogus information just to
> make Linux happy, which may break other OSes and provide means for which
> a user to hang themselves.
>

While I agree presenting wrong info is not good, but one of the reasons
the cache info is present is to provide those info for some applications
to do optimization based on that(I am told so and not sure if just type
and size will be good enough) and you seem to agree with that below.

> However, in the case of a transparent cache, the size/type/level/write
> policy/etc (whatever the firmware provider deems relevant) might be
> valuable information for the OS to make scheduling decisions, and is
> certainly valuable for user space utilities for cross-machine/data
> center level job scheduling.
>

--
Regards,
Sudeep

2018-09-12 14:43:47

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/11/2018 3:25 PM, Jeremy Linton wrote:
> Hi,
>
> On 09/11/2018 03:38 PM, Jeffrey Hugo wrote:
>> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>>> Hi Jeffrey,
>>>
>>> (+Sudeep)
>>>
>>> On 09/11/2018 02:32 PM, Jeffrey Hugo wrote:
>>>> The type of a cache might not be specified by architectural
>>>> mechanisms (ie
>>>> system registers), but its type might be specified in the PPTT.  In
>>>> this
>>>> case, following the PPTT specification, we should identify the cache as
>>>> the type specified by PPTT.
>>>>
>>>> This fixes the following lscpu issue where only the cache type sysfs
>>>> file
>>>> is missing which results in no output providing a poor user
>>>> experience in
>>>> the above system configuration-
>>>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type:
>>>> No such
>>>> file or directory
>>>>
>>>> Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology
>>>> Table parsing)
>>>> Reported-by: Vijaya Kumar K <[email protected]>
>>>> Signed-off-by: Jeffrey Hugo <[email protected]>
>>>> ---
>>>>   drivers/acpi/pptt.c | 15 +++++++++++++++
>>>>   1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> index d1e26cb..3c6db09 100644
>>>> --- a/drivers/acpi/pptt.c
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -401,6 +401,21 @@ static void update_cache_properties(struct
>>>> cacheinfo *this_leaf,
>>>>               break;
>>>>           }
>>>>       }
>>>> +    if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
>>>> +        (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
>>>> +        switch (found_cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) {
>>>> +        case ACPI_PPTT_CACHE_TYPE_DATA:
>>>> +            this_leaf->type = CACHE_TYPE_DATA;
>>>> +            break;
>>>> +        case ACPI_PPTT_CACHE_TYPE_INSTR:
>>>> +            this_leaf->type = CACHE_TYPE_INST;
>>>> +            break;
>>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED:
>>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
>>>> +            this_leaf->type = CACHE_TYPE_UNIFIED;
>>>> +            break;
>>>> +        }
>>>> +    }
>>>>       /*
>>>>        * If the above flags are valid, and the cache type is NOCACHE
>>>>        * update the cache type as well.
>>>>
>>>
>>> If you look at the next line of code following this comment its going
>>> to update the cache type for fully populated PPTT nodes. Although
>>> with the suggested change its only going to activate if someone
>>> completely fills out the node and fails to set the valid flag on the
>>> cache type.
>>
>> Yes, however that case doesn't apply to the scenario we are concerned
>> about, doesn't seem to be fully following the PPTT spec, and seems odd
>> that Linux just assumes that a "fully specified" cache is unified.
>
> Because, the architecturally specified ones won't be type NOCACHE?

Correct. However, what if you have a NOCACHE (not architecturally
specified), that is fully described in PPTT, as a non-unified cache
(data only)? Unlikely? Maybe. Still seem possible though, therefore I
feel this assumption is suspect.

>
>>
>>> What I suspect is happening in the reported case is that the nodes in
>>> the PPTT table are missing fields we consider to be important. Since
>>> that data isn't being filled out anywhere else, so we leave the cache
>>> type alone too. This has the effect of hiding sysfs nodes with
>>> incomplete information.
>>>
>>> Also, the lack of the DATA/INST fields is based on the assumption
>>> that the only nodes which need their type field updated are outside
>>> of the CPU core itself so they are pretty much guaranteed to be
>>> UNIFIED. Are you hitting this case?
>>>
>>
>> Yes.  Without this change, we hit the lscpu error in the commit
>> message, and get zero output about the system.  We don't even get
>> information about the caches which are architecturally specified or
>> how many cpus are present.  With this change, we get what we expect
>> out of lscpu (and also lstopo) including the cache(s) which are not
>> architecturally specified.
>
> I'm a bit surprised this changes the behavior of the architecturally
> specified ones. As I mentioned above, those shouldn't be NOCACHE. We use
> the level/type as a key for matching a PPTT node to an architecturally
> described cache. If that is mismatched, something more fundamental is
> happening. About the only case I can think of that can cause this is if
> the CLIDR type fields are incorrect.
>
> In which case I might suggest we move your switch() for the INST/DATA
> into the check below that comment.

This change was not intended to impact architecturally specified ones,
however I can see how that is the case. That would be the case if the
architecturally specified type is X and PPTT indicates Y. According to
the PPTT spec, the cache should be treated as type Y then (ie the
contents of the PPTT override the architectural mechanisms).

I can move my change below the current NOCACHE handling, which would be
valid for my scenario, but it seems odd. If I do that, then the current
assumption will take priority. IE, if a cache is "fully specified" in
PPTT, but is NOCACHE, then it will be treated as unified, regardless of
what PPTT says (maybe instruction, or data only).

>
>>
>> I guess I still don't understand why its important for PPTT to list,
>> for example, the sets/ways of a cache in all scenarios.  In the case
>> of a "transparent" cache (implementation defined as not reported per
>> section D3.4.2 of the ARM ARM where the cache cannot be managed by
>> SW), there may not be valid values for sets/ways.  I would argue its
>> better to not report that information, rather than provide bogus
>> information just to make Linux happy, which may break other OSes and
>> provide means for which a user to hang themselves.
>
> This doesn't really have anything to do with the Arm/ARM's definition of
> set/way as it pertains to cache maint. Its more to assist userspace
> software with an understanding of the system cache architecture so it
> can make intelligent decisions about loop tiling and the like.
>
> What we want is a consistent dependable view for software to utilize. An
> unreliable sysfs field is one that tends not to get used.
I guess my argument is this -
The cache is not specified architecturally because it cannot be managed
by software (see the ARM ARM section I referenced).

However, its important that the user be able to "see" it, I'm not a
marketing guy, but I assume its going to be listed on the "data sheet".

Therefore, the firmware is providing some information about the cache
(via SMBIOS tables and PPTT).

The HW designers have indicated that there is no sane way to provide
sets/ways information to software, even on an informational basis (ie
not for cache maintenance, but for performance optimizations).
Therefore the firmware will not provide this information because it will
be wrong.

So, therefore, we should still be able to tell the user that a cache
exists at the relevant level, and what size it is. On the concerned
system, we cannot do that currently.

>
>>
>> However, in the case of a transparent cache, the size/type/level/write
>> policy/etc (whatever the firmware provider deems relevant) might be
>> valuable information for the OS to make scheduling decisions, and is
>> certainly valuable for user space utilities for cross-machine/data ou
>> center level job scheduling.
>>


--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-12 14:50:07

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/12/2018 4:49 AM, Sudeep Holla wrote:
>
>
> On 11/09/18 21:38, Jeffrey Hugo wrote:
>> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>>> Hi Jeffrey,
>>>
>>> (+Sudeep)
>>>
>
> [..]
>
>>>
>>> If you look at the next line of code following this comment its going
>>> to update the cache type for fully populated PPTT nodes. Although with
>>> the suggested change its only going to activate if someone completely
>>> fills out the node and fails to set the valid flag on the cache type.
>>
>> Yes, however that case doesn't apply to the scenario we are concerned
>> about, doesn't seem to be fully following the PPTT spec, and seems odd
>> that Linux just assumes that a "fully specified" cache is unified.
>>
>
> Agreed, but adding code that will never get used is also not so good.
> Do you have systems where this is needed ?

Yes.

>
>>> What I suspect is happening in the reported case is that the nodes in
>>> the PPTT table are missing fields we consider to be important. Since
>>> that data isn't being filled out anywhere else, so we leave the cache
>>> type alone too. This has the effect of hiding sysfs nodes with
>>> incomplete information.
>>>
>>> Also, the lack of the DATA/INST fields is based on the assumption that
>>> the only nodes which need their type field updated are outside of the
>>> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
>>> you hitting this case?
>>>
>>
>> Yes.  Without this change, we hit the lscpu error in the commit message,
>> and get zero output about the system.  We don't even get information
>> about the caches which are architecturally specified or how many cpus
>> are present.  With this change, we get what we expect out of lscpu (and
>> also lstopo) including the cache(s) which are not architecturally
>> specified.
>>
>
> lscpu and lstopo are so broken. They just assume everything on CPU0.
> If you hotplug them out, you start seeing issues. So reading and file
> that doesn't exist and then bail out on other essential info though they
> are present, hmmm ...

lscpu/lstopo being broken seems orthogonal to me.
The kernel is not providing the type file because the kernel thinks the
type is NOCACHE, despite PPTT providing a type. In an ideal world, I
think the kernel should be fixed (this change), and any deficiencies or
bad assumptions in lscpu/lstopo should also be fixed.

>
>> I guess I still don't understand why its important for PPTT to list, for
>> example, the sets/ways of a cache in all scenarios.  In the case of a
>> "transparent" cache (implementation defined as not reported per section
>> D3.4.2 of the ARM ARM where the cache cannot be managed by SW), there
>> may not be valid values for sets/ways.  I would argue its better to not
>> report that information, rather than provide bogus information just to
>> make Linux happy, which may break other OSes and provide means for which
>> a user to hang themselves.
>>
>
> While I agree presenting wrong info is not good, but one of the reasons
> the cache info is present is to provide those info for some applications
> to do optimization based on that(I am told so and not sure if just type
> and size will be good enough) and you seem to agree with that below.

Yes, but if its not entirely clear, on my system, we have the
information but its not being presented. This change fixes that. I'm
willing to discuss an alternative fix, but to ignore the issue is not
viable in my opinion.

What do we need to move forward here?

>
>> However, in the case of a transparent cache, the size/type/level/write
>> policy/etc (whatever the firmware provider deems relevant) might be
>> valuable information for the OS to make scheduling decisions, and is
>> certainly valuable for user space utilities for cross-machine/data
>> center level job scheduling.
>>
>


--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-12 15:30:35

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 12/09/18 15:41, Jeffrey Hugo wrote:
> On 9/11/2018 3:25 PM, Jeremy Linton wrote:
>> Hi,
>>
>> On 09/11/2018 03:38 PM, Jeffrey Hugo wrote:
>>> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>>>> Hi Jeffrey,
>>>>
>>>> (+Sudeep)
>>>>
>>>> On 09/11/2018 02:32 PM, Jeffrey Hugo wrote:
>>>>> The type of a cache might not be specified by architectural
>>>>> mechanisms (ie
>>>>> system registers), but its type might be specified in the PPTT.  In
>>>>> this
>>>>> case, following the PPTT specification, we should identify the
>>>>> cache as
>>>>> the type specified by PPTT.
>>>>>
>>>>> This fixes the following lscpu issue where only the cache type
>>>>> sysfs file
>>>>> is missing which results in no output providing a poor user
>>>>> experience in
>>>>> the above system configuration-
>>>>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type:
>>>>> No such
>>>>> file or directory
>>>>>
>>>>> Fixes: 2bd00bcd73e5 (ACPI/PPTT: Add Processor Properties Topology
>>>>> Table parsing)
>>>>> Reported-by: Vijaya Kumar K <[email protected]>
>>>>> Signed-off-by: Jeffrey Hugo <[email protected]>
>>>>> ---
>>>>>   drivers/acpi/pptt.c | 15 +++++++++++++++
>>>>>   1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>>> index d1e26cb..3c6db09 100644
>>>>> --- a/drivers/acpi/pptt.c
>>>>> +++ b/drivers/acpi/pptt.c
>>>>> @@ -401,6 +401,21 @@ static void update_cache_properties(struct
>>>>> cacheinfo *this_leaf,
>>>>>               break;
>>>>>           }
>>>>>       }
>>>>> +    if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
>>>>> +        (found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
>>>>> +        switch (found_cache->attributes &
>>>>> ACPI_PPTT_MASK_CACHE_TYPE) {
>>>>> +        case ACPI_PPTT_CACHE_TYPE_DATA:
>>>>> +            this_leaf->type = CACHE_TYPE_DATA;
>>>>> +            break;
>>>>> +        case ACPI_PPTT_CACHE_TYPE_INSTR:
>>>>> +            this_leaf->type = CACHE_TYPE_INST;
>>>>> +            break;
>>>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED:
>>>>> +        case ACPI_PPTT_CACHE_TYPE_UNIFIED_ALT:
>>>>> +            this_leaf->type = CACHE_TYPE_UNIFIED;
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>>       /*
>>>>>        * If the above flags are valid, and the cache type is NOCACHE
>>>>>        * update the cache type as well.
>>>>>
>>>>
>>>> If you look at the next line of code following this comment its
>>>> going to update the cache type for fully populated PPTT nodes.
>>>> Although with the suggested change its only going to activate if
>>>> someone completely fills out the node and fails to set the valid
>>>> flag on the cache type.
>>>
>>> Yes, however that case doesn't apply to the scenario we are concerned
>>> about, doesn't seem to be fully following the PPTT spec, and seems
>>> odd that Linux just assumes that a "fully specified" cache is unified.
>>
>> Because, the architecturally specified ones won't be type NOCACHE?
>
> Correct.  However, what if you have a NOCACHE (not architecturally
> specified), that is fully described in PPTT, as a non-unified cache
> (data only)?  Unlikely?  Maybe.  Still seem possible though, therefore I
> feel this assumption is suspect.
>

Yes, we have other issues if the architecturally not specified cache is
not unified irrespective of what PPTT says. So we may need to review and
see if that assumption is removed everywhere.

Until then why can't a simple change fix the issue you have:

-->8

diff --git i/drivers/acpi/pptt.c w/drivers/acpi/pptt.c
index d1e26cb599bf..f74131201f5e 100644
--- i/drivers/acpi/pptt.c
+++ w/drivers/acpi/pptt.c
@@ -406,7 +406,8 @@ static void update_cache_properties(struct cacheinfo
*this_leaf,
* update the cache type as well.
*/
if (this_leaf->type == CACHE_TYPE_NOCACHE &&
- valid_flags == PPTT_CHECKED_ATTRIBUTES)
+ (valid_flags == PPTT_CHECKED_ATTRIBUTES ||
+ found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID))
this_leaf->type = CACHE_TYPE_UNIFIED;
}

[...]

>>>
>>> Yes.  Without this change, we hit the lscpu error in the commit
>>> message, and get zero output about the system.  We don't even get
>>> information about the caches which are architecturally specified or
>>> how many cpus are present.

The above issues is purely lscpu to blame and nothing to do with kernel
change. If kernel doesn't report level 3 cache info and lscpu fails
to report level 1/2 cache info and kernel is to blame here ? That's unfair.

>>> With this change, we get what we expect
>>> out of lscpu (and also lstopo) including the cache(s) which are not
>>> architecturally specified.

Ofcourse, but we can't attribute that as kernel bug. I agree we should
not show incorrect info for cache type and that needs to be fixed but
disagree with lscpu issue as kernel issue. Kernel is not breaking any
ABI as cacheinfo itself is optional and the absence of it needs to be
dealt with.

>> I'm a bit surprised this changes the behavior of the architecturally
>> specified ones. As I mentioned above, those shouldn't be NOCACHE. We
>> use the level/type as a key for matching a PPTT node to an
>> architecturally described cache. If that is mismatched, something more
>> fundamental is happening. About the only case I can think of that can
>> cause this is if the CLIDR type fields are incorrect.
>>
>> In which case I might suggest we move your switch() for the INST/DATA
>> into the check below that comment.
>
> This change was not intended to impact architecturally specified ones,
> however I can see how that is the case.  That would be the case if the
> architecturally specified type is X and PPTT indicates Y.  According to
> the PPTT spec, the cache should be treated as type Y then (ie the
> contents of the PPTT override the architectural mechanisms).
>
> I can move my change below the current NOCACHE handling, which would be
> valid for my scenario, but it seems odd.  If I do that, then the current
> assumption will take priority.  IE, if a cache is "fully specified" in
> PPTT, but is NOCACHE, then it will be treated as unified, regardless of
> what PPTT says (maybe instruction, or data only).
>

Yes it should be that way for above mentioned reasons. You need to fix
other things if you want to support separate data and inst cache for
architecturally unspecified cache.


[...]

>
> However, its important that the user be able to "see" it, I'm not a
> marketing guy, but I assume its going to be listed on the "data sheet".
>
> Therefore, the firmware is providing some information about the cache
> (via SMBIOS tables and PPTT).
>
> The HW designers have indicated that there is no sane way to provide
> sets/ways information to software, even on an informational basis (ie
> not for cache maintenance, but for performance optimizations). Therefore
> the firmware will not provide this information because it will be wrong.
>

Fair enough.

> So, therefore, we should still be able to tell the user that a cache
> exists at the relevant level, and what size it is.  On the concerned
> system, we cannot do that currently.
>

Again agreed.

--
Regards,
Sudeep

2018-09-12 15:33:33

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 12/09/18 15:48, Jeffrey Hugo wrote:
> On 9/12/2018 4:49 AM, Sudeep Holla wrote:
>>
>>
>> On 11/09/18 21:38, Jeffrey Hugo wrote:
>>> On 9/11/2018 2:16 PM, Jeremy Linton wrote:
>>>> Hi Jeffrey,
>>>>
>>>> (+Sudeep)
>>>>
>>
>> [..]
>>
>>>>
>>>> If you look at the next line of code following this comment its going
>>>> to update the cache type for fully populated PPTT nodes. Although with
>>>> the suggested change its only going to activate if someone completely
>>>> fills out the node and fails to set the valid flag on the cache type.
>>>
>>> Yes, however that case doesn't apply to the scenario we are concerned
>>> about, doesn't seem to be fully following the PPTT spec, and seems odd
>>> that Linux just assumes that a "fully specified" cache is unified.
>>>
>>
>> Agreed, but adding code that will never get used is also not so good.
>> Do you have systems where this is needed ?
>
> Yes.
>

Not exactly IMO. I was referring to architecturally not specified
separate data/inst cache.

>>
>>>> What I suspect is happening in the reported case is that the nodes in
>>>> the PPTT table are missing fields we consider to be important. Since
>>>> that data isn't being filled out anywhere else, so we leave the cache
>>>> type alone too. This has the effect of hiding sysfs nodes with
>>>> incomplete information.
>>>>
>>>> Also, the lack of the DATA/INST fields is based on the assumption that
>>>> the only nodes which need their type field updated are outside of the
>>>> CPU core itself so they are pretty much guaranteed to be UNIFIED. Are
>>>> you hitting this case?
>>>>
>>>
>>> Yes.  Without this change, we hit the lscpu error in the commit message,
>>> and get zero output about the system.  We don't even get information
>>> about the caches which are architecturally specified or how many cpus
>>> are present.  With this change, we get what we expect out of lscpu (and
>>> also lstopo) including the cache(s) which are not architecturally
>>> specified.
>>>
>>
>> lscpu and lstopo are so broken. They just assume everything on CPU0.
>> If you hotplug them out, you start seeing issues. So reading and file
>> that doesn't exist and then bail out on other essential info though they
>> are present, hmmm ...
>
> lscpu/lstopo being broken seems orthogonal to me.

Agreed.

> The kernel is not providing the type file because the kernel thinks the
> type is NOCACHE, despite PPTT providing a type.  In an ideal world, I
> think the kernel should be fixed (this change), and any deficiencies or
> bad assumptions in lscpu/lstopo should also be fixed.
>

Again agreed as mentioned in the other mail. I just didn't like the
attribution of that issue to this kernel bug.

>>
>>> I guess I still don't understand why its important for PPTT to list, for
>>> example, the sets/ways of a cache in all scenarios.  In the case of a
>>> "transparent" cache (implementation defined as not reported per section
>>> D3.4.2 of the ARM ARM where the cache cannot be managed by SW), there
>>> may not be valid values for sets/ways.  I would argue its better to not
>>> report that information, rather than provide bogus information just to
>>> make Linux happy, which may break other OSes and provide means for which
>>> a user to hang themselves.
>>>
>>
>> While I agree presenting wrong info is not good, but one of the reasons
>> the cache info is present is to provide those info for some applications
>> to do optimization based on that(I am told so and not sure if just type
>> and size will be good enough) and you seem to agree with that below.
>
> Yes, but if its not entirely clear, on my system, we have the
> information but its not being presented.  This change fixes that.  I'm
> willing to discuss an alternative fix, but to ignore the issue is not
> viable in my opinion.
>

Agreed.

> What do we need to move forward here?
>

Will the simple change I posted address the issue ? I don't want to give
an illusion that separate data/inst cache is support for architecturally
not specified caches. If it needs to be supported, then it needs to be
fixed correctly everywhere not just here.

--
Regards,
Sudeep

2018-09-12 15:40:05

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Hi,


On 09/12/2018 09:41 AM, Jeffrey Hugo wrote:
> The HW designers have indicated that there is no sane way to provide
> sets/ways information to software, even on an informational basis (ie
> not for cache maintenance, but for performance optimizations). Therefore
> the firmware will not provide this information because it will be wrong.
>
> So, therefore, we should still be able to tell the user that a cache
> exists at the relevant level, and what size it is.  On the concerned
> system, we cannot do that currently.

Ok, so set the fields to zero in firmware node, and mark them valid.

That logically says that there isn't any set/way information for the
cache (which implies direct mapped).


2018-09-12 15:41:14

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 12/09/18 16:27, Sudeep Holla wrote:
>
>
> On 12/09/18 15:41, Jeffrey Hugo wrote:

[...]

>>
>> Correct.  However, what if you have a NOCACHE (not architecturally
>> specified), that is fully described in PPTT, as a non-unified cache
>> (data only)?  Unlikely?  Maybe.  Still seem possible though, therefore I
>> feel this assumption is suspect.
>>
>
> Yes, we have other issues if the architecturally not specified cache is
> not unified irrespective of what PPTT says. So we may need to review and
> see if that assumption is removed everywhere.
>
> Until then why can't a simple change fix the issue you have:
>
> -->8
>
> diff --git i/drivers/acpi/pptt.c w/drivers/acpi/pptt.c
> index d1e26cb599bf..f74131201f5e 100644
> --- i/drivers/acpi/pptt.c
> +++ w/drivers/acpi/pptt.c
> @@ -406,7 +406,8 @@ static void update_cache_properties(struct cacheinfo
> *this_leaf,
> * update the cache type as well.
> */
> if (this_leaf->type == CACHE_TYPE_NOCACHE &&
> - valid_flags == PPTT_CHECKED_ATTRIBUTES)
> + (valid_flags == PPTT_CHECKED_ATTRIBUTES ||
> + found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID))

Looking at this again, if we are supporting just presence of cache type
and size(may be), then we can drop the whole valid_flags thing here.

> this_leaf->type = CACHE_TYPE_UNIFIED;
> }
>
--
Regards,
Sudeep

2018-09-12 15:58:40

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/12/2018 9:38 AM, Sudeep Holla wrote:
>
>
> On 12/09/18 16:27, Sudeep Holla wrote:
>>
>>
>> On 12/09/18 15:41, Jeffrey Hugo wrote:
>
> [...]
>
>>>
>>> Correct.  However, what if you have a NOCACHE (not architecturally
>>> specified), that is fully described in PPTT, as a non-unified cache
>>> (data only)?  Unlikely?  Maybe.  Still seem possible though, therefore I
>>> feel this assumption is suspect.
>>>
>>
>> Yes, we have other issues if the architecturally not specified cache is
>> not unified irrespective of what PPTT says. So we may need to review and
>> see if that assumption is removed everywhere.
>>
>> Until then why can't a simple change fix the issue you have:
>>
>> -->8
>>
>> diff --git i/drivers/acpi/pptt.c w/drivers/acpi/pptt.c
>> index d1e26cb599bf..f74131201f5e 100644
>> --- i/drivers/acpi/pptt.c
>> +++ w/drivers/acpi/pptt.c
>> @@ -406,7 +406,8 @@ static void update_cache_properties(struct cacheinfo
>> *this_leaf,
>> * update the cache type as well.
>> */
>> if (this_leaf->type == CACHE_TYPE_NOCACHE &&
>> - valid_flags == PPTT_CHECKED_ATTRIBUTES)
>> + (valid_flags == PPTT_CHECKED_ATTRIBUTES ||
>> + found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID))
>
> Looking at this again, if we are supporting just presence of cache type
> and size(may be), then we can drop the whole valid_flags thing here.
>
>> this_leaf->type = CACHE_TYPE_UNIFIED;
>> }
>>

Yes, this change fixes my usecase. I think it invalidates the comment,
and really, the comment should probably mention that we assume unified
type because there are other issues in supporting architecturally not
specified inst/data only caches.

Do you want a V2 with this? If so, do you want the fixes tag removed
since you seem to view this as not a bug?

I don't think I clearly understand the purpose of the valid flags,
therefore I feel as though I'm not sure if it can be dropped or not. Is
it fair to say that what the valid flags is accomplishing is identifying
if we have a sufficient level of information to support this cache? If
not, then should the cacheinfo driver not expose any sysfs information
about the cache?

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-12 16:12:24

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/12/2018 9:39 AM, Jeremy Linton wrote:
> Hi,
>
>
> On 09/12/2018 09:41 AM, Jeffrey Hugo wrote:
>> The HW designers have indicated that there is no sane way to provide
>> sets/ways information to software, even on an informational basis (ie
>> not for cache maintenance, but for performance optimizations).
>> Therefore the firmware will not provide this information because it
>> will be wrong.
>>
>> So, therefore, we should still be able to tell the user that a cache
>> exists at the relevant level, and what size it is.  On the concerned
>> system, we cannot do that currently.
>
> Ok, so set the fields to zero in firmware node, and mark them valid.

Is that what the PPTT spec says to do?

> That logically says that there isn't any set/way information for the
> cache (which implies direct mapped).

Making inferences such as that have gotten folks into trouble when
interpreting other specs. Unfortunately I am not allowed to detail more
about this specific system, however implying that the affected cache(s)
are direct mapped is incorrect. Officially, what you have is a cache or
multiple caches at certain levels that have a specified size. You can
make no inferences about the exact behavior or implementation of the
cache beyond what FW explicitly provides. I'm not particularly a fan of
it, but its what I have to deal with.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-12 16:15:49

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On Wed, Sep 12, 2018 at 09:57:14AM -0600, Jeffrey Hugo wrote:
> On 9/12/2018 9:38 AM, Sudeep Holla wrote:
> >
> >
> >On 12/09/18 16:27, Sudeep Holla wrote:
> >>
> >>
> >>On 12/09/18 15:41, Jeffrey Hugo wrote:
> >
> >[...]
> >
> >>>
> >>>Correct.? However, what if you have a NOCACHE (not architecturally
> >>>specified), that is fully described in PPTT, as a non-unified cache
> >>>(data only)?? Unlikely?? Maybe.? Still seem possible though, therefore I
> >>>feel this assumption is suspect.
> >>>
> >>
> >>Yes, we have other issues if the architecturally not specified cache is
> >>not unified irrespective of what PPTT says. So we may need to review and
> >>see if that assumption is removed everywhere.
> >>
> >>Until then why can't a simple change fix the issue you have:
> >>
> >>-->8
> >>
> >>diff --git i/drivers/acpi/pptt.c w/drivers/acpi/pptt.c
> >>index d1e26cb599bf..f74131201f5e 100644
> >>--- i/drivers/acpi/pptt.c
> >>+++ w/drivers/acpi/pptt.c
> >>@@ -406,7 +406,8 @@ static void update_cache_properties(struct cacheinfo
> >>*this_leaf,
> >> * update the cache type as well.
> >> */
> >> if (this_leaf->type == CACHE_TYPE_NOCACHE &&
> >>- valid_flags == PPTT_CHECKED_ATTRIBUTES)
> >>+ (valid_flags == PPTT_CHECKED_ATTRIBUTES ||
> >>+ found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID))
> >
> >Looking at this again, if we are supporting just presence of cache type
> >and size(may be), then we can drop the whole valid_flags thing here.
> >
> >> this_leaf->type = CACHE_TYPE_UNIFIED;
> >> }
> >>
>
> Yes, this change fixes my usecase. I think it invalidates the comment, and
> really, the comment should probably mention that we assume unified type
> because there are other issues in supporting architecturally not specified
> inst/data only caches.
>

Agreed.

> Do you want a V2 with this? If so, do you want the fixes tag removed since
> you seem to view this as not a bug?
>

Yes please, I am fine to retain fixes tag but that's my opinion.

> I don't think I clearly understand the purpose of the valid flags, therefore
> I feel as though I'm not sure if it can be dropped or not. Is it fair to
> say that what the valid flags is accomplishing is identifying if we have a
> sufficient level of information to support this cache? If not, then should
> the cacheinfo driver not expose any sysfs information about the cache?
>

I don't see the use of the flag if we *have to* support the case where
all the cache geometry is not present but just cache type (and maybe
size?) is present. If that's the case we can drop valid flags entirely.
I really don't like the idea of supporting that, but I don't have strong
reasons to defend my idea, so I am fine with that.

Further, I think in your case with NOCACHE type set, sysfs dir shouldn't
have been created at the first place ideally. I think we need something
like below to fix that.

-->8

diff --git i/drivers/base/cacheinfo.c w/drivers/base/cacheinfo.c
index 5d5b5988e88b..cf78fa6d470d 100644
--- i/drivers/base/cacheinfo.c
+++ w/drivers/base/cacheinfo.c
@@ -615,6 +615,8 @@ static int cache_add_dev(unsigned int cpu)
this_leaf = this_cpu_ci->info_list + i;
if (this_leaf->disable_sysfs)
continue;
+ if (this_leaf->type == CACHE_TYPE_NOCACHE)
+ break;
cache_groups = cache_get_attribute_groups(this_leaf);
ci_dev = cpu_device_create(parent, this_leaf, cache_groups,
"index%1u", i);

2018-09-12 16:27:41

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/12/2018 10:15 AM, Sudeep Holla wrote:
> On Wed, Sep 12, 2018 at 09:57:14AM -0600, Jeffrey Hugo wrote:
>> On 9/12/2018 9:38 AM, Sudeep Holla wrote:
>>>
>>>
>>> On 12/09/18 16:27, Sudeep Holla wrote:
>>>>
>>>>
>>>> On 12/09/18 15:41, Jeffrey Hugo wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> Correct.  However, what if you have a NOCACHE (not architecturally
>>>>> specified), that is fully described in PPTT, as a non-unified cache
>>>>> (data only)?  Unlikely?  Maybe.  Still seem possible though, therefore I
>>>>> feel this assumption is suspect.
>>>>>
>>>>
>>>> Yes, we have other issues if the architecturally not specified cache is
>>>> not unified irrespective of what PPTT says. So we may need to review and
>>>> see if that assumption is removed everywhere.
>>>>
>>>> Until then why can't a simple change fix the issue you have:
>>>>
>>>> -->8
>>>>
>>>> diff --git i/drivers/acpi/pptt.c w/drivers/acpi/pptt.c
>>>> index d1e26cb599bf..f74131201f5e 100644
>>>> --- i/drivers/acpi/pptt.c
>>>> +++ w/drivers/acpi/pptt.c
>>>> @@ -406,7 +406,8 @@ static void update_cache_properties(struct cacheinfo
>>>> *this_leaf,
>>>> * update the cache type as well.
>>>> */
>>>> if (this_leaf->type == CACHE_TYPE_NOCACHE &&
>>>> - valid_flags == PPTT_CHECKED_ATTRIBUTES)
>>>> + (valid_flags == PPTT_CHECKED_ATTRIBUTES ||
>>>> + found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID))
>>>
>>> Looking at this again, if we are supporting just presence of cache type
>>> and size(may be), then we can drop the whole valid_flags thing here.
>>>
>>>> this_leaf->type = CACHE_TYPE_UNIFIED;
>>>> }
>>>>
>>
>> Yes, this change fixes my usecase. I think it invalidates the comment, and
>> really, the comment should probably mention that we assume unified type
>> because there are other issues in supporting architecturally not specified
>> inst/data only caches.
>>
>
> Agreed.
>
>> Do you want a V2 with this? If so, do you want the fixes tag removed since
>> you seem to view this as not a bug?
>>
>
> Yes please, I am fine to retain fixes tag but that's my opinion.
>
>> I don't think I clearly understand the purpose of the valid flags, therefore
>> I feel as though I'm not sure if it can be dropped or not. Is it fair to
>> say that what the valid flags is accomplishing is identifying if we have a
>> sufficient level of information to support this cache? If not, then should
>> the cacheinfo driver not expose any sysfs information about the cache?
>>
>
> I don't see the use of the flag if we *have to* support the case where
> all the cache geometry is not present but just cache type (and maybe
> size?) is present. If that's the case we can drop valid flags entirely.
> I really don't like the idea of supporting that, but I don't have strong
> reasons to defend my idea, so I am fine with that.
>
> Further, I think in your case with NOCACHE type set, sysfs dir shouldn't
> have been created at the first place ideally. I think we need something
> like below to fix that.
>
> -->8
>
> diff --git i/drivers/base/cacheinfo.c w/drivers/base/cacheinfo.c
> index 5d5b5988e88b..cf78fa6d470d 100644
> --- i/drivers/base/cacheinfo.c
> +++ w/drivers/base/cacheinfo.c
> @@ -615,6 +615,8 @@ static int cache_add_dev(unsigned int cpu)
> this_leaf = this_cpu_ci->info_list + i;
> if (this_leaf->disable_sysfs)
> continue;
> + if (this_leaf->type == CACHE_TYPE_NOCACHE)
> + break;
> cache_groups = cache_get_attribute_groups(this_leaf);
> ci_dev = cpu_device_create(parent, this_leaf, cache_groups,
> "index%1u", i);
>

Ok, let me test this out, and send out a v2.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-13 05:52:31

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Le 12/09/2018 à 11:49, Sudeep Holla a écrit :
>
>> Yes.  Without this change, we hit the lscpu error in the commit message,
>> and get zero output about the system.  We don't even get information
>> about the caches which are architecturally specified or how many cpus
>> are present.  With this change, we get what we expect out of lscpu (and
>> also lstopo) including the cache(s) which are not architecturally
>> specified.
>>
> lscpu and lstopo are so broken. They just assume everything on CPU0.
> If you hotplug them out, you start seeing issues. So reading and file
> that doesn't exist and then bail out on other essential info though they
> are present, hmmm ...

Can you elaborate?

I am not sure cpu0 is supposed to be offlineable on Linux. There's no
"online" file in /sys/devices/system/cpu/cpu0. That's why former lstopo
doesn't like CPU0 being hotplugged out. We are actually making that case
work for another non-standard corner case. But offlining "cpu0" this is
considered "normal", somebody must add that missing "online" sysfs
attribute for "cpu0" (change
https://elixir.bootlin.com/linux/latest/source/drivers/base/cpu.c#L375).

By the way, did anybody actually see an error with lstopo when there's
no "type" attribute for L3? I can't reproduce any issue, we just skip
that specific cache entirely, but everything else appears. If you guys
want to make that "no_cache" cache appear, I'll make it a Unified cache
unless you tell me what to show :)

Brice

2018-09-13 09:39:47

by James Morse

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Hi Brice,

On 13/09/18 06:51, Brice Goglin wrote:
> Le 12/09/2018 à 11:49, Sudeep Holla a écrit :
>>> Yes.  Without this change, we hit the lscpu error in the commit message,
>>> and get zero output about the system.  We don't even get information
>>> about the caches which are architecturally specified or how many cpus
>>> are present.  With this change, we get what we expect out of lscpu (and
>>> also lstopo) including the cache(s) which are not architecturally
>>> specified.
>>>
>> lscpu and lstopo are so broken. They just assume everything on CPU0.
>> If you hotplug them out, you start seeing issues. So reading and file
>> that doesn't exist and then bail out on other essential info though they
>> are present, hmmm ...
>
> Can you elaborate?
>
> I am not sure cpu0 is supposed to be offlineable on Linux. There's no
> "online" file in /sys/devices/system/cpu/cpu0. That's why former lstopo
> doesn't like CPU0 being hotplugged out. We are actually making that case
> work for another non-standard corner case. But offlining "cpu0" this is
> considered "normal", somebody must add that missing "online" sysfs
> attribute for "cpu0" (change
> https://elixir.bootlin.com/linux/latest/source/drivers/base/cpu.c#L375).

On x86 you can't normally offline CPU0, its something to do with certain
interrupts always being routed to CPU0, (oh, and hibernate).
You should be able to enable this behaviour with 'cpu0_hotplug' on the kernel
command line.

(Kconfig's CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0 are also
worth a look)


On arm64 at least, cpu0 is just like the others, and can be offlined.


Thanks,

James


> By the way, did anybody actually see an error with lstopo when there's
> no "type" attribute for L3? I can't reproduce any issue, we just skip
> that specific cache entirely, but everything else appears. If you guys
> want to make that "no_cache" cache appear, I'll make it a Unified cache
> unless you tell me what to show :)

2018-09-13 10:35:56

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On Thu, Sep 13, 2018 at 10:39:10AM +0100, James Morse wrote:
> Hi Brice,
>
> On 13/09/18 06:51, Brice Goglin wrote:
> > Le 12/09/2018 ? 11:49, Sudeep Holla a ?crit?:
> >>> Yes.? Without this change, we hit the lscpu error in the commit message,
> >>> and get zero output about the system.? We don't even get information
> >>> about the caches which are architecturally specified or how many cpus
> >>> are present.? With this change, we get what we expect out of lscpu (and
> >>> also lstopo) including the cache(s) which are not architecturally
> >>> specified.
> >>>
> >> lscpu and lstopo are so broken. They just assume everything on CPU0.
> >> If you hotplug them out, you start seeing issues. So reading and file
> >> that doesn't exist and then bail out on other essential info though they
> >> are present, hmmm ...
> >
> > Can you elaborate?
> >
> > I am not sure cpu0 is supposed to be offlineable on Linux. There's no
> > "online" file in /sys/devices/system/cpu/cpu0. That's why former lstopo
> > doesn't like CPU0 being hotplugged out. We are actually making that case
> > work for another non-standard corner case. But offlining "cpu0" this is
> > considered "normal", somebody must add that missing "online" sysfs
> > attribute for "cpu0" (change
> > https://elixir.bootlin.com/linux/latest/source/drivers/base/cpu.c#L375).
>
> On x86 you can't normally offline CPU0, its something to do with certain
> interrupts always being routed to CPU0, (oh, and hibernate).
> You should be able to enable this behaviour with 'cpu0_hotplug' on the kernel
> command line.
>
> (Kconfig's CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0 are also
> worth a look)
>
> On arm64 at least, cpu0 is just like the others, and can be offlined.
>

Thanks James, for providing all the details.

To add to the issues I spotted with lscpu/lstopo around topology, it ignores
the updates to topology sibling masks when CPUs are hotplugged in and out.

We have following in lscpu:
add_summary_n(tb, _("Core(s) per socket:"),
cores_per_socket ?: desc->ncores / desc->nsockets);

Now when cores_per_socket = 1, (i.e when we don't have procfs entry),
if ncores = (ncores_max - few_cpus_hotplugged_out), core(s) per socket
will get computed as less than the actual number.

IMO lscpu should be used only when all CPUs are online and it should have
a warning when all cores are not online.

> > By the way, did anybody actually see an error with lstopo when there's
> > no "type" attribute for L3? I can't reproduce any issue, we just skip
> > that specific cache entirely, but everything else appears. If you guys
> > want to make that "no_cache" cache appear, I'll make it a Unified cache
> > unless you tell me what to show :)

IIUC, Jeffrey Hugo did see error as per his initial message:
"
This fixes the following lscpu issue where only the cache type sysfs file
is missing which results in no output providing a poor user experience in
the above system configuration.
lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
file or directory
"

--
Regards,
Sudeep

[1] https://www.spinics.net/lists/arm-kernel/msg661101.html

2018-09-13 11:53:50

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

Le 13/09/2018 à 11:35, Sudeep Holla a écrit :
> On Thu, Sep 13, 2018 at 10:39:10AM +0100, James Morse wrote:
>> Hi Brice,
>>
>> On 13/09/18 06:51, Brice Goglin wrote:
>>> Le 12/09/2018 à 11:49, Sudeep Holla a écrit :
>>>>> Yes.  Without this change, we hit the lscpu error in the commit message,
>>>>> and get zero output about the system.  We don't even get information
>>>>> about the caches which are architecturally specified or how many cpus
>>>>> are present.  With this change, we get what we expect out of lscpu (and
>>>>> also lstopo) including the cache(s) which are not architecturally
>>>>> specified.
>>>>>
>>>> lscpu and lstopo are so broken. They just assume everything on CPU0.
>>>> If you hotplug them out, you start seeing issues. So reading and file
>>>> that doesn't exist and then bail out on other essential info though they
>>>> are present, hmmm ...
>>> Can you elaborate?
>>>
>>> I am not sure cpu0 is supposed to be offlineable on Linux. There's no
>>> "online" file in /sys/devices/system/cpu/cpu0. That's why former lstopo
>>> doesn't like CPU0 being hotplugged out. We are actually making that case
>>> work for another non-standard corner case. But offlining "cpu0" this is
>>> considered "normal", somebody must add that missing "online" sysfs
>>> attribute for "cpu0" (change
>>> https://elixir.bootlin.com/linux/latest/source/drivers/base/cpu.c#L375).
>> On x86 you can't normally offline CPU0, its something to do with certain
>> interrupts always being routed to CPU0, (oh, and hibernate).
>> You should be able to enable this behaviour with 'cpu0_hotplug' on the kernel
>> command line.
>>
>> (Kconfig's CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0 are also
>> worth a look)
>>
>> On arm64 at least, cpu0 is just like the others, and can be offlined.
>>
> Thanks James, for providing all the details.
>
> To add to the issues I spotted with lscpu/lstopo around topology, it ignores
> the updates to topology sibling masks when CPUs are hotplugged in and out.
>
> We have following in lscpu:
> add_summary_n(tb, _("Core(s) per socket:"),
> cores_per_socket ?: desc->ncores / desc->nsockets);
>
> Now when cores_per_socket = 1, (i.e when we don't have procfs entry),
> if ncores = (ncores_max - few_cpus_hotplugged_out), core(s) per socket
> will get computed as less than the actual number.
>
> IMO lscpu should be used only when all CPUs are online and it should have
> a warning when all cores are not online.
>
>>> By the way, did anybody actually see an error with lstopo when there's
>>> no "type" attribute for L3? I can't reproduce any issue, we just skip
>>> that specific cache entirely, but everything else appears. If you guys
>>> want to make that "no_cache" cache appear, I'll make it a Unified cache
>>> unless you tell me what to show :)
> IIUC, Jeffrey Hugo did see error as per his initial message:
> "
> This fixes the following lscpu issue where only the cache type sysfs file
> is missing which results in no output providing a poor user experience in
> the above system configuration.
> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
> file or directory
> "
>

I don't know about lscpu (it's a different project), but lstopo
shouldn't have any such problem.

If you see an issue with lstopo, I'd be interesting in getting the
tarball generated by hwloc-gather-topology (it dumps useful files from
procfs and sysfs so that we may debug offline).

Brice


2018-09-13 15:11:53

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types

On 9/13/2018 5:53 AM, Brice Goglin wrote:
> Le 13/09/2018 à 11:35, Sudeep Holla a écrit :
>> On Thu, Sep 13, 2018 at 10:39:10AM +0100, James Morse wrote:
>>> Hi Brice,
>>>
>>> On 13/09/18 06:51, Brice Goglin wrote:
>>>> Le 12/09/2018 à 11:49, Sudeep Holla a écrit :
>>>>>> Yes.  Without this change, we hit the lscpu error in the commit message,
>>>>>> and get zero output about the system.  We don't even get information
>>>>>> about the caches which are architecturally specified or how many cpus
>>>>>> are present.  With this change, we get what we expect out of lscpu (and
>>>>>> also lstopo) including the cache(s) which are not architecturally
>>>>>> specified.
>>>>>>
>>>>> lscpu and lstopo are so broken. They just assume everything on CPU0.
>>>>> If you hotplug them out, you start seeing issues. So reading and file
>>>>> that doesn't exist and then bail out on other essential info though they
>>>>> are present, hmmm ...
>>>> Can you elaborate?
>>>>
>>>> I am not sure cpu0 is supposed to be offlineable on Linux. There's no
>>>> "online" file in /sys/devices/system/cpu/cpu0. That's why former lstopo
>>>> doesn't like CPU0 being hotplugged out. We are actually making that case
>>>> work for another non-standard corner case. But offlining "cpu0" this is
>>>> considered "normal", somebody must add that missing "online" sysfs
>>>> attribute for "cpu0" (change
>>>> https://elixir.bootlin.com/linux/latest/source/drivers/base/cpu.c#L375).
>>> On x86 you can't normally offline CPU0, its something to do with certain
>>> interrupts always being routed to CPU0, (oh, and hibernate).
>>> You should be able to enable this behaviour with 'cpu0_hotplug' on the kernel
>>> command line.
>>>
>>> (Kconfig's CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0 are also
>>> worth a look)
>>>
>>> On arm64 at least, cpu0 is just like the others, and can be offlined.
>>>
>> Thanks James, for providing all the details.
>>
>> To add to the issues I spotted with lscpu/lstopo around topology, it ignores
>> the updates to topology sibling masks when CPUs are hotplugged in and out.
>>
>> We have following in lscpu:
>> add_summary_n(tb, _("Core(s) per socket:"),
>> cores_per_socket ?: desc->ncores / desc->nsockets);
>>
>> Now when cores_per_socket = 1, (i.e when we don't have procfs entry),
>> if ncores = (ncores_max - few_cpus_hotplugged_out), core(s) per socket
>> will get computed as less than the actual number.
>>
>> IMO lscpu should be used only when all CPUs are online and it should have
>> a warning when all cores are not online.
>>
>>>> By the way, did anybody actually see an error with lstopo when there's
>>>> no "type" attribute for L3? I can't reproduce any issue, we just skip
>>>> that specific cache entirely, but everything else appears. If you guys
>>>> want to make that "no_cache" cache appear, I'll make it a Unified cache
>>>> unless you tell me what to show :)
>> IIUC, Jeffrey Hugo did see error as per his initial message:
>> "
>> This fixes the following lscpu issue where only the cache type sysfs file
>> is missing which results in no output providing a poor user experience in
>> the above system configuration.
>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
>> file or directory
>> "
>>
>
> I don't know about lscpu (it's a different project), but lstopo
> shouldn't have any such problem.
>
> If you see an issue with lstopo, I'd be interesting in getting the
> tarball generated by hwloc-gather-topology (it dumps useful files from
> procfs and sysfs so that we may debug offline).

No error was reported with lstopo, but we don't see the cache as
expected. Fixing the type results in the expected lstopo output. This
seems consistent with your expectations.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-09-13 15:17:58

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] ACPI/PPTT: Handle architecturally unknown cache types



On 13/09/18 12:53, Brice Goglin wrote:
> Le 13/09/2018 à 11:35, Sudeep Holla a écrit :


>>> On 13/09/18 06:51, Brice Goglin wrote:

[...]

>>>> By the way, did anybody actually see an error with lstopo when there's
>>>> no "type" attribute for L3? I can't reproduce any issue, we just skip
>>>> that specific cache entirely, but everything else appears. If you guys
>>>> want to make that "no_cache" cache appear, I'll make it a Unified cache
>>>> unless you tell me what to show :)
>> IIUC, Jeffrey Hugo did see error as per his initial message:
>> "
>> This fixes the following lscpu issue where only the cache type sysfs file
>> is missing which results in no output providing a poor user experience in
>> the above system configuration.
>> lscpu: cannot open /sys/devices/system/cpu/cpu0/cache/index3/type: No such
>> file or directory
>> "
>>
>
> I don't know about lscpu (it's a different project), but lstopo
> shouldn't have any such problem.
>

Ah OK, I have only looked at lscpu source. Sorry for that, I assumed
they are based on same/similar code base. Thanks for letting me know
they are not.

> If you see an issue with lstopo, I'd be interesting in getting the
> tarball generated by hwloc-gather-topology (it dumps useful files from
> procfs and sysfs so that we may debug offline).

Thanks for the information.

--
Regards,
Sudeep