2024-04-28 11:41:34

by Dragan Simic

[permalink] [raw]
Subject: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for A64

Add missing cache information to the Allwinner A64 SoC dtsi, to allow
the userspace, which includes lscpu(1) that uses the virtual files provided
by the kernel under the /sys/devices/system/cpu directory, to display the
proper A64 cache information.

While there, use a more self-descriptive label for the L2 cache node, which
also makes it more consistent with other SoC dtsi files.

The cache parameters for the A64 dtsi were obtained and partially derived
by hand from the cache size and layout specifications found in the following
datasheets and technical reference manuals:

- Allwinner A64 datasheet, version 1.1
- ARM Cortex-A53 revision r0p3 TRM, version E

For future reference, here's a brief summary of the documentation:

- All caches employ the 64-byte cache line length
- Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
cache and 32 KB of L1 4-way, set-associative data cache
- The entire SoC has 512 KB of unified L2 16-way, set-associative cache

Signed-off-by: Dragan Simic <[email protected]>
---
arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 ++++++++++++++++---
1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
index 57ac18738c99..86074d03afa9 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
@@ -51,49 +51,76 @@ cpu0: cpu@0 {
device_type = "cpu";
reg = <0>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu1: cpu@1 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <1>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu2: cpu@2 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <2>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu3: cpu@3 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <3>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

- L2: l2-cache {
+ l2_cache: l2-cache {
compatible = "cache";
cache-level = <2>;
cache-unified;
+ cache-size = <0x80000>;
+ cache-line-size = <64>;
+ cache-sets = <512>;
};
};



2024-04-28 16:19:28

by Jernej Škrabec

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for A64

Dne nedelja, 28. april 2024 ob 13:40:35 GMT +2 je Dragan Simic napisal(a):
> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper A64 cache information.
>
> While there, use a more self-descriptive label for the L2 cache node, which
> also makes it more consistent with other SoC dtsi files.
>
> The cache parameters for the A64 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
>
> - Allwinner A64 datasheet, version 1.1
> - ARM Cortex-A53 revision r0p3 TRM, version E
>
> For future reference, here's a brief summary of the documentation:
>
> - All caches employ the 64-byte cache line length
> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
> cache and 32 KB of L1 4-way, set-associative data cache
> - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
>
> Signed-off-by: Dragan Simic <[email protected]>

Reviewed-by: Jernej Skrabec <[email protected]>

Best regards,
Jernej



2024-04-29 10:33:38

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for A64

On Sun, 28 Apr 2024 13:40:35 +0200
Dragan Simic <[email protected]> wrote:

Hi,

thanks for taking care of this!

> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper A64 cache information.
>
> While there, use a more self-descriptive label for the L2 cache node, which
> also makes it more consistent with other SoC dtsi files.
>
> The cache parameters for the A64 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
>
> - Allwinner A64 datasheet, version 1.1
> - ARM Cortex-A53 revision r0p3 TRM, version E
>
> For future reference, here's a brief summary of the documentation:
>
> - All caches employ the 64-byte cache line length
> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
> cache and 32 KB of L1 4-way, set-associative data cache
> - The entire SoC has 512 KB of unified L2 16-way, set-associative cache

So that looks correct when checking the manuals, and the per-CPU
entries below match both between themselves and with that description
above.
However I have some level of distrust towards the Allwinner manuals,
regarding the cache sizes (which are chosen by Allwinner).
So while I haven't measured this myself, nor checked the cache type
registers, tinymembench's memory latency test supports those sizes are
correct:
https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)

> Signed-off-by: Dragan Simic <[email protected]>

Reviewed-by: Andre Przywara <[email protected]>

Cheers,
Andre

> ---
> arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 ++++++++++++++++---
> 1 file changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> index 57ac18738c99..86074d03afa9 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> @@ -51,49 +51,76 @@ cpu0: cpu@0 {
> device_type = "cpu";
> reg = <0>;
> enable-method = "psci";
> - next-level-cache = <&L2>;
> clocks = <&ccu CLK_CPUX>;
> clock-names = "cpu";
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu1: cpu@1 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <1>;
> enable-method = "psci";
> - next-level-cache = <&L2>;
> clocks = <&ccu CLK_CPUX>;
> clock-names = "cpu";
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu2: cpu@2 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <2>;
> enable-method = "psci";
> - next-level-cache = <&L2>;
> clocks = <&ccu CLK_CPUX>;
> clock-names = "cpu";
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu3: cpu@3 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <3>;
> enable-method = "psci";
> - next-level-cache = <&L2>;
> clocks = <&ccu CLK_CPUX>;
> clock-names = "cpu";
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> - L2: l2-cache {
> + l2_cache: l2-cache {
> compatible = "cache";
> cache-level = <2>;
> cache-unified;
> + cache-size = <0x80000>;
> + cache-line-size = <64>;
> + cache-sets = <512>;
> };
> };
>
>


2024-04-29 13:53:48

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for A64

Hello Andre,

On 2024-04-29 12:33, Andre Przywara wrote:
> On Sun, 28 Apr 2024 13:40:35 +0200
> Dragan Simic <[email protected]> wrote:
> thanks for taking care of this!

Thank you for reviewing my patch!

>> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
>> the userspace, which includes lscpu(1) that uses the virtual files
>> provided
>> by the kernel under the /sys/devices/system/cpu directory, to display
>> the
>> proper A64 cache information.
>>
>> While there, use a more self-descriptive label for the L2 cache node,
>> which
>> also makes it more consistent with other SoC dtsi files.
>>
>> The cache parameters for the A64 dtsi were obtained and partially
>> derived
>> by hand from the cache size and layout specifications found in the
>> following
>> datasheets and technical reference manuals:
>>
>> - Allwinner A64 datasheet, version 1.1
>> - ARM Cortex-A53 revision r0p3 TRM, version E
>>
>> For future reference, here's a brief summary of the documentation:
>>
>> - All caches employ the 64-byte cache line length
>> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
>> instruction
>> cache and 32 KB of L1 4-way, set-associative data cache
>> - The entire SoC has 512 KB of unified L2 16-way, set-associative
>> cache
>
> So that looks correct when checking the manuals, and the per-CPU
> entries below match both between themselves and with that description
> above.
> However I have some level of distrust towards the Allwinner manuals,
> regarding the cache sizes (which are chosen by Allwinner).

Quite frankly, I was surprised a bit to see that the A64 contains
512 KB of L2 cache. IMHO, that's quite a lot for an SoC that was
advertised primarily as a cost-effective solution.

> So while I haven't measured this myself, nor checked the cache type
> registers, tinymembench's memory latency test supports those sizes are
> correct:
> https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)

Ah, that's a nice benchmark report. Let me copy & paste the most
relevant part of that report below, just for future reference in
case that web page becomes inaccessible at some point:

==========================================================================
== Memory latency test
==
==
==
== Average time is measured for random memory accesses in the buffers
==
== of different sizes. The larger is the buffer, the more significant
==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM
==
== accesses. For extremely large buffer sizes we are expecting to see
==
== page table walk with several requests to SDRAM for almost every
==
== memory access (though 64MiB is not nearly large enough to experience
==
== this effect to its fullest).
==
==
==
== Note 1: All the numbers are representing extra time, which needs to
==
== be added to L1 cache latency. The cycle timings for L1 cache
==
== latency can be usually found in the processor documentation.
==
== Note 2: Dual random read means that we are simultaneously performing
==
== two independent memory accesses at a time. In the case if
==
== the memory subsystem can't handle multiple outstanding
==
== requests, dual random read has the same timings as two
==
== single reads performed one after another.
==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 5.9 ns / 10.0 ns
131072 : 9.1 ns / 14.0 ns
262144 : 10.7 ns / 15.5 ns
524288 : 12.7 ns / 17.7 ns
1048576 : 92.8 ns / 143.2 ns
2097152 : 134.9 ns / 184.4 ns
4194304 : 163.5 ns / 207.1 ns
8388608 : 178.6 ns / 217.6 ns
16777216 : 187.5 ns / 223.7 ns
33554432 : 192.8 ns / 228.0 ns
67108864 : 195.8 ns / 230.7 ns

block size : single random read / dual random read, [MADV_HUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 5.9 ns / 10.0 ns
131072 : 9.1 ns / 14.0 ns
262144 : 10.7 ns / 15.6 ns
524288 : 12.6 ns / 17.8 ns
1048576 : 92.7 ns / 142.6 ns
2097152 : 134.7 ns / 184.3 ns
4194304 : 155.8 ns / 198.4 ns
8388608 : 166.4 ns / 203.8 ns
16777216 : 171.6 ns / 206.0 ns
33554432 : 174.2 ns / 206.9 ns
67108864 : 175.4 ns / 207.4 ns

>> Signed-off-by: Dragan Simic <[email protected]>
>
> Reviewed-by: Andre Przywara <[email protected]>

Thanks!

>> ---
>> arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37
>> ++++++++++++++++---
>> 1 file changed, 32 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> index 57ac18738c99..86074d03afa9 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> @@ -51,49 +51,76 @@ cpu0: cpu@0 {
>> device_type = "cpu";
>> reg = <0>;
>> enable-method = "psci";
>> - next-level-cache = <&L2>;
>> clocks = <&ccu CLK_CPUX>;
>> clock-names = "cpu";
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu1: cpu@1 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <1>;
>> enable-method = "psci";
>> - next-level-cache = <&L2>;
>> clocks = <&ccu CLK_CPUX>;
>> clock-names = "cpu";
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu2: cpu@2 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <2>;
>> enable-method = "psci";
>> - next-level-cache = <&L2>;
>> clocks = <&ccu CLK_CPUX>;
>> clock-names = "cpu";
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu3: cpu@3 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <3>;
>> enable-method = "psci";
>> - next-level-cache = <&L2>;
>> clocks = <&ccu CLK_CPUX>;
>> clock-names = "cpu";
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> - L2: l2-cache {
>> + l2_cache: l2-cache {
>> compatible = "cache";
>> cache-level = <2>;
>> cache-unified;
>> + cache-size = <0x80000>;
>> + cache-line-size = <64>;
>> + cache-sets = <512>;
>> };
>> };