2024-04-28 11:40:59

by Dragan Simic

[permalink] [raw]
Subject: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Add missing cache information to the Allwinner H6 SoC dtsi, to allow
the userspace, which includes lscpu(1) that uses the virtual files provided
by the kernel under the /sys/devices/system/cpu directory, to display the
proper H6 cache information.

Adding the cache information to the H6 SoC dtsi also makes the following
warning message in the kernel log go away:

cacheinfo: Unable to detect cache hierarchy for CPU 0

The cache parameters for the H6 dtsi were obtained and partially derived
by hand from the cache size and layout specifications found in the following
datasheets and technical reference manuals:

- Allwinner H6 V200 datasheet, version 1.1
- ARM Cortex-A53 revision r0p3 TRM, version E

For future reference, here's a brief summary of the documentation:

- All caches employ the 64-byte cache line length
- Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
cache and 32 KB of L1 4-way, set-associative data cache
- The entire SoC has 512 KB of unified L2 16-way, set-associative cache

Signed-off-by: Dragan Simic <[email protected]>
---
arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++
1 file changed, 37 insertions(+)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
index d11e5041bae9..1a63066396e8 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
@@ -29,36 +29,73 @@ cpu0: cpu@0 {
clocks = <&ccu CLK_CPUX>;
clock-latency-ns = <244144>; /* 8 32k periods */
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu1: cpu@1 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <1>;
enable-method = "psci";
clocks = <&ccu CLK_CPUX>;
clock-latency-ns = <244144>; /* 8 32k periods */
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu2: cpu@2 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <2>;
enable-method = "psci";
clocks = <&ccu CLK_CPUX>;
clock-latency-ns = <244144>; /* 8 32k periods */
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};

cpu3: cpu@3 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <3>;
enable-method = "psci";
clocks = <&ccu CLK_CPUX>;
clock-latency-ns = <244144>; /* 8 32k periods */
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
+ };
+
+ l2_cache: l2-cache {
+ compatible = "cache";
+ cache-level = <2>;
+ cache-unified;
+ cache-size = <0x80000>;
+ cache-line-size = <64>;
+ cache-sets = <512>;
};
};



2024-04-28 16:21:40

by Jernej Škrabec

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Dne nedelja, 28. april 2024 ob 13:40:36 GMT +2 je Dragan Simic napisal(a):
> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
>
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
>
> cacheinfo: Unable to detect cache hierarchy for CPU 0
>
> The cache parameters for the H6 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
>
> - Allwinner H6 V200 datasheet, version 1.1
> - ARM Cortex-A53 revision r0p3 TRM, version E
>
> For future reference, here's a brief summary of the documentation:
>
> - All caches employ the 64-byte cache line length
> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
> cache and 32 KB of L1 4-way, set-associative data cache
> - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
>
> Signed-off-by: Dragan Simic <[email protected]>

Reviewed-by: Jernej Skrabec <[email protected]>

Best regards,
Jernej



2024-04-29 23:10:48

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

On Sun, 28 Apr 2024 13:40:36 +0200
Dragan Simic <[email protected]> wrote:

> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
>
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
>
> cacheinfo: Unable to detect cache hierarchy for CPU 0
>
> The cache parameters for the H6 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
>
> - Allwinner H6 V200 datasheet, version 1.1
> - ARM Cortex-A53 revision r0p3 TRM, version E
>
> For future reference, here's a brief summary of the documentation:
>
> - All caches employ the 64-byte cache line length
> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
> cache and 32 KB of L1 4-way, set-associative data cache
> - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
>
> Signed-off-by: Dragan Simic <[email protected]>

I can confirm that the data below matches the manuals, but also the
decoding of the architectural cache type registers (CCSIDR_EL1):
L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line

tinymembench results for the H6 are available here:
https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt
and confirm the theory. Also ran it locally with similar results.

Reviewed-by: Andre Przywara <[email protected]>

Thanks,
Andre

> ---
> arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> index d11e5041bae9..1a63066396e8 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> @@ -29,36 +29,73 @@ cpu0: cpu@0 {
> clocks = <&ccu CLK_CPUX>;
> clock-latency-ns = <244144>; /* 8 32k periods */
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu1: cpu@1 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <1>;
> enable-method = "psci";
> clocks = <&ccu CLK_CPUX>;
> clock-latency-ns = <244144>; /* 8 32k periods */
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu2: cpu@2 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <2>;
> enable-method = "psci";
> clocks = <&ccu CLK_CPUX>;
> clock-latency-ns = <244144>; /* 8 32k periods */
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> };
>
> cpu3: cpu@3 {
> compatible = "arm,cortex-a53";
> device_type = "cpu";
> reg = <3>;
> enable-method = "psci";
> clocks = <&ccu CLK_CPUX>;
> clock-latency-ns = <244144>; /* 8 32k periods */
> #cooling-cells = <2>;
> + i-cache-size = <0x8000>;
> + i-cache-line-size = <64>;
> + i-cache-sets = <256>;
> + d-cache-size = <0x8000>;
> + d-cache-line-size = <64>;
> + d-cache-sets = <128>;
> + next-level-cache = <&l2_cache>;
> + };
> +
> + l2_cache: l2-cache {
> + compatible = "cache";
> + cache-level = <2>;
> + cache-unified;
> + cache-size = <0x80000>;
> + cache-line-size = <64>;
> + cache-sets = <512>;
> };
> };
>
>


2024-04-30 00:02:01

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Hello Andre,

On 2024-04-30 01:10, Andre Przywara wrote:
> On Sun, 28 Apr 2024 13:40:36 +0200
> Dragan Simic <[email protected]> wrote:
>
>> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
>> the userspace, which includes lscpu(1) that uses the virtual files
>> provided
>> by the kernel under the /sys/devices/system/cpu directory, to display
>> the
>> proper H6 cache information.
>>
>> Adding the cache information to the H6 SoC dtsi also makes the
>> following
>> warning message in the kernel log go away:
>>
>> cacheinfo: Unable to detect cache hierarchy for CPU 0
>>
>> The cache parameters for the H6 dtsi were obtained and partially
>> derived
>> by hand from the cache size and layout specifications found in the
>> following
>> datasheets and technical reference manuals:
>>
>> - Allwinner H6 V200 datasheet, version 1.1
>> - ARM Cortex-A53 revision r0p3 TRM, version E
>>
>> For future reference, here's a brief summary of the documentation:
>>
>> - All caches employ the 64-byte cache line length
>> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
>> instruction
>> cache and 32 KB of L1 4-way, set-associative data cache
>> - The entire SoC has 512 KB of unified L2 16-way, set-associative
>> cache
>>
>> Signed-off-by: Dragan Simic <[email protected]>
>
> I can confirm that the data below matches the manuals, but also the
> decoding of the architectural cache type registers (CCSIDR_EL1):
> L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line

Thank you very much for reviewing my patch in such a detailed way!
It's good to know that the values in the Allwinner datasheets match
with the observed reality, so to speak. :)

> tinymembench results for the H6 are available here:
> https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt
> and confirm the theory. Also ran it locally with similar results.

Here's a quick copy & paste of the most important benchmark results
from the link above, as a quick reference for anyone reading this
thread in the future, or as a data source in case the link above
becomes inaccessible at some point in the future:

==========================================================================
== Memory latency test
==
==
==
== Average time is measured for random memory accesses in the buffers
==
== of different sizes. The larger is the buffer, the more significant
==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM
==
== accesses. For extremely large buffer sizes we are expecting to see
==
== page table walk with several requests to SDRAM for almost every
==
== memory access (though 64MiB is not nearly large enough to experience
==
== this effect to its fullest).
==
==
==
== Note 1: All the numbers are representing extra time, which needs to
==
== be added to L1 cache latency. The cycle timings for L1 cache
==
== latency can be usually found in the processor documentation.
==
== Note 2: Dual random read means that we are simultaneously performing
==
== two independent memory accesses at a time. In the case if
==
== the memory subsystem can't handle multiple outstanding
==
== requests, dual random read has the same timings as two
==
== single reads performed one after another.
==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 3.8 ns / 6.5 ns
131072 : 5.8 ns / 9.1 ns
262144 : 6.9 ns / 10.2 ns
524288 : 7.8 ns / 11.2 ns
1048576 : 74.3 ns / 114.5 ns
2097152 : 110.5 ns / 148.1 ns
4194304 : 132.6 ns / 164.5 ns
8388608 : 144.0 ns / 172.3 ns
16777216 : 151.5 ns / 177.3 ns
33554432 : 156.3 ns / 180.7 ns
67108864 : 158.7 ns / 182.9 ns

block size : single random read / dual random read, [MADV_HUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 3.8 ns / 6.5 ns
131072 : 5.8 ns / 9.1 ns
262144 : 6.9 ns / 10.2 ns
524288 : 7.8 ns / 11.2 ns
1048576 : 74.3 ns / 114.5 ns
2097152 : 110.0 ns / 147.5 ns
4194304 : 127.6 ns / 158.3 ns
8388608 : 136.4 ns / 162.2 ns
16777216 : 141.2 ns / 165.6 ns
33554432 : 143.7 ns / 168.4 ns
67108864 : 144.9 ns / 168.9 ns

> Reviewed-by: Andre Przywara <[email protected]>

Thanks!

>> ---
>> arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37
>> ++++++++++++++++++++
>> 1 file changed, 37 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> index d11e5041bae9..1a63066396e8 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> @@ -29,36 +29,73 @@ cpu0: cpu@0 {
>> clocks = <&ccu CLK_CPUX>;
>> clock-latency-ns = <244144>; /* 8 32k periods */
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu1: cpu@1 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <1>;
>> enable-method = "psci";
>> clocks = <&ccu CLK_CPUX>;
>> clock-latency-ns = <244144>; /* 8 32k periods */
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu2: cpu@2 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <2>;
>> enable-method = "psci";
>> clocks = <&ccu CLK_CPUX>;
>> clock-latency-ns = <244144>; /* 8 32k periods */
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> };
>>
>> cpu3: cpu@3 {
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> reg = <3>;
>> enable-method = "psci";
>> clocks = <&ccu CLK_CPUX>;
>> clock-latency-ns = <244144>; /* 8 32k periods */
>> #cooling-cells = <2>;
>> + i-cache-size = <0x8000>;
>> + i-cache-line-size = <64>;
>> + i-cache-sets = <256>;
>> + d-cache-size = <0x8000>;
>> + d-cache-line-size = <64>;
>> + d-cache-sets = <128>;
>> + next-level-cache = <&l2_cache>;
>> + };
>> +
>> + l2_cache: l2-cache {
>> + compatible = "cache";
>> + cache-level = <2>;
>> + cache-unified;
>> + cache-size = <0x80000>;
>> + cache-line-size = <64>;
>> + cache-sets = <512>;
>> };
>> };

2024-04-30 10:46:45

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

On Tue, 30 Apr 2024 02:01:42 +0200
Dragan Simic <[email protected]> wrote:

Hi Dragan,

> Hello Andre,
>
> On 2024-04-30 01:10, Andre Przywara wrote:
> > On Sun, 28 Apr 2024 13:40:36 +0200
> > Dragan Simic <[email protected]> wrote:
> >
> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> the userspace, which includes lscpu(1) that uses the virtual files
> >> provided
> >> by the kernel under the /sys/devices/system/cpu directory, to display
> >> the
> >> proper H6 cache information.
> >>
> >> Adding the cache information to the H6 SoC dtsi also makes the
> >> following
> >> warning message in the kernel log go away:
> >>
> >> cacheinfo: Unable to detect cache hierarchy for CPU 0
> >>
> >> The cache parameters for the H6 dtsi were obtained and partially
> >> derived
> >> by hand from the cache size and layout specifications found in the
> >> following
> >> datasheets and technical reference manuals:
> >>
> >> - Allwinner H6 V200 datasheet, version 1.1
> >> - ARM Cortex-A53 revision r0p3 TRM, version E
> >>
> >> For future reference, here's a brief summary of the documentation:
> >>
> >> - All caches employ the 64-byte cache line length
> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
> >> instruction
> >> cache and 32 KB of L1 4-way, set-associative data cache
> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative
> >> cache
> >>
> >> Signed-off-by: Dragan Simic <[email protected]>
> >
> > I can confirm that the data below matches the manuals, but also the
> > decoding of the architectural cache type registers (CCSIDR_EL1):
> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line
>
> Thank you very much for reviewing my patch in such a detailed way!
> It's good to know that the values in the Allwinner datasheets match
> with the observed reality, so to speak. :)

YW, and yes, I like to double check things when it comes to Allwinner
documentation ;-) And it was comparably easy for this problem.

Out of curiosity: what triggered that patch? Trying to get rid of false
warning/error messages?
And do you plan to address the H616 as well? It's a bit more tricky there,
since there are two die revisions out: one with 256(?)KB of L2, one with
1MB(!). We know how to tell them apart, so I could provide some TF-A code
to patch that up in the DT. The kernel DT copy could go with 256KB then.

Cheers,
Andre.

2024-04-30 11:10:57

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Hello Andre,

On 2024-04-30 12:46, Andre Przywara wrote:
> On Tue, 30 Apr 2024 02:01:42 +0200
> Dragan Simic <[email protected]> wrote:
>> On 2024-04-30 01:10, Andre Przywara wrote:
>> > On Sun, 28 Apr 2024 13:40:36 +0200
>> > Dragan Simic <[email protected]> wrote:
>> >
>> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
>> >> the userspace, which includes lscpu(1) that uses the virtual files
>> >> provided
>> >> by the kernel under the /sys/devices/system/cpu directory, to display
>> >> the
>> >> proper H6 cache information.
>> >>
>> >> Adding the cache information to the H6 SoC dtsi also makes the
>> >> following
>> >> warning message in the kernel log go away:
>> >>
>> >> cacheinfo: Unable to detect cache hierarchy for CPU 0
>> >>
>> >> The cache parameters for the H6 dtsi were obtained and partially
>> >> derived
>> >> by hand from the cache size and layout specifications found in the
>> >> following
>> >> datasheets and technical reference manuals:
>> >>
>> >> - Allwinner H6 V200 datasheet, version 1.1
>> >> - ARM Cortex-A53 revision r0p3 TRM, version E
>> >>
>> >> For future reference, here's a brief summary of the documentation:
>> >>
>> >> - All caches employ the 64-byte cache line length
>> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
>> >> instruction
>> >> cache and 32 KB of L1 4-way, set-associative data cache
>> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative
>> >> cache
>> >>
>> >> Signed-off-by: Dragan Simic <[email protected]>
>> >
>> > I can confirm that the data below matches the manuals, but also the
>> > decoding of the architectural cache type registers (CCSIDR_EL1):
>> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
>> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
>> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line
>>
>> Thank you very much for reviewing my patch in such a detailed way!
>> It's good to know that the values in the Allwinner datasheets match
>> with the observed reality, so to speak. :)
>
> YW, and yes, I like to double check things when it comes to Allwinner
> documentation ;-) And it was comparably easy for this problem.

Double checking is always good, IMHO. :)

> Out of curiosity: what triggered that patch? Trying to get rid of false
> warning/error messages?

Yes, one of the motivators was to get rid of the false kernel warning,
and the other was to have the cache information nicely available through
lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] so
a couple of Allwinner SoCs were the next on my mental TODO list. :)

> And do you plan to address the H616 as well? It's a bit more tricky
> there,
> since there are two die revisions out: one with 256(?)KB of L2, one
> with
> 1MB(!). We know how to tell them apart, so I could provide some TF-A
> code
> to patch that up in the DT. The kernel DT copy could go with 256KB
> then.

I have no boards based on the Allwinner H616, so it wasn't on my radar.
Though, I'd be happy to prepare and submit a similar kernel patch for
the H616, if you'd then take it further and submit a TF-A patch that
fixes the DT according to the detected die revision? Did I understand
the plan right?

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
[3]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4

2024-05-01 09:35:46

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

On Tue, 30 Apr 2024 13:10:41 +0200
Dragan Simic <[email protected]> wrote:

> Hello Andre,
>
> On 2024-04-30 12:46, Andre Przywara wrote:
> > On Tue, 30 Apr 2024 02:01:42 +0200
> > Dragan Simic <[email protected]> wrote:
> >> On 2024-04-30 01:10, Andre Przywara wrote:
> >> > On Sun, 28 Apr 2024 13:40:36 +0200
> >> > Dragan Simic <[email protected]> wrote:
> >> >
> >> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> >> the userspace, which includes lscpu(1) that uses the virtual files
> >> >> provided
> >> >> by the kernel under the /sys/devices/system/cpu directory, to display
> >> >> the
> >> >> proper H6 cache information.
> >> >>
> >> >> Adding the cache information to the H6 SoC dtsi also makes the
> >> >> following
> >> >> warning message in the kernel log go away:
> >> >>
> >> >> cacheinfo: Unable to detect cache hierarchy for CPU 0
> >> >>
> >> >> The cache parameters for the H6 dtsi were obtained and partially
> >> >> derived
> >> >> by hand from the cache size and layout specifications found in the
> >> >> following
> >> >> datasheets and technical reference manuals:
> >> >>
> >> >> - Allwinner H6 V200 datasheet, version 1.1
> >> >> - ARM Cortex-A53 revision r0p3 TRM, version E
> >> >>
> >> >> For future reference, here's a brief summary of the documentation:
> >> >>
> >> >> - All caches employ the 64-byte cache line length
> >> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
> >> >> instruction
> >> >> cache and 32 KB of L1 4-way, set-associative data cache
> >> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative
> >> >> cache
> >> >>
> >> >> Signed-off-by: Dragan Simic <[email protected]>
> >> >
> >> > I can confirm that the data below matches the manuals, but also the
> >> > decoding of the architectural cache type registers (CCSIDR_EL1):
> >> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> >> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> >> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line
> >>
> >> Thank you very much for reviewing my patch in such a detailed way!
> >> It's good to know that the values in the Allwinner datasheets match
> >> with the observed reality, so to speak. :)
> >
> > YW, and yes, I like to double check things when it comes to Allwinner
> > documentation ;-) And it was comparably easy for this problem.
>
> Double checking is always good, IMHO. :)
>
> > Out of curiosity: what triggered that patch? Trying to get rid of false
> > warning/error messages?
>
> Yes, one of the motivators was to get rid of the false kernel warning,
> and the other was to have the cache information nicely available through
> lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] so
> a couple of Allwinner SoCs were the next on my mental TODO list. :)

Thanks for doing this!

> > And do you plan to address the H616 as well? It's a bit more tricky
> > there,
> > since there are two die revisions out: one with 256(?)KB of L2, one
> > with
> > 1MB(!). We know how to tell them apart, so I could provide some TF-A
> > code
> > to patch that up in the DT. The kernel DT copy could go with 256KB
> > then.
>
> I have no boards based on the Allwinner H616, so it wasn't on my radar.
> Though, I'd be happy to prepare and submit a similar kernel patch for
> the H616, if you'd then take it further and submit a TF-A patch that
> fixes the DT according to the detected die revision? Did I understand
> the plan right?

Yes, that was the idea. I have a working version of that TF-A patch now,
just need to figure out some details about the best way to only build this
for the H616 port.

Neither the data sheet nor the user manual mention the cache sizes for the
H616, but I checked the CSSIDR_EL1 register readouts on both an old H616
and a new H618, and they confirm that the former has 256 KB L2, and the
latter 1MB. Also I ran tinymembench on two boards to confirm this,
community benchmarks results are available here:
https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively.
Associativity and cache line size are dictated by the Arm Cortex cores,
and the L1I & L1D sizes are the same as in the other SoCs.

Cheers,
Andre

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4


2024-05-03 09:15:41

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Hello Andre,

On 2024-05-01 11:30, Andre Przywara wrote:
> On Tue, 30 Apr 2024 13:10:41 +0200
> Dragan Simic <[email protected]> wrote:
>> On 2024-04-30 12:46, Andre Przywara wrote:
>> > On Tue, 30 Apr 2024 02:01:42 +0200
>> > Dragan Simic <[email protected]> wrote:
>> >> Thank you very much for reviewing my patch in such a detailed way!
>> >> It's good to know that the values in the Allwinner datasheets match
>> >> with the observed reality, so to speak. :)
>> >
>> > YW, and yes, I like to double check things when it comes to Allwinner
>> > documentation ;-) And it was comparably easy for this problem.
>>
>> Double checking is always good, IMHO. :)
>>
>> > Out of curiosity: what triggered that patch? Trying to get rid of false
>> > warning/error messages?
>>
>> Yes, one of the motivators was to get rid of the false kernel warning,
>> and the other was to have the cache information nicely available
>> through
>> lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3]
>> so
>> a couple of Allwinner SoCs were the next on my mental TODO list. :)
>
> Thanks for doing this!

I'm glad that you like all these patches. :)

>>> And do you plan to address the H616 as well? It's a bit more tricky
>>> there,
>>> since there are two die revisions out: one with 256(?)KB of L2, one
>>> with
>>> 1MB(!). We know how to tell them apart, so I could provide some TF-A
>>> code
>>> to patch that up in the DT. The kernel DT copy could go with 256KB
>>> then.
>>
>> I have no boards based on the Allwinner H616, so it wasn't on my
>> radar.
>> Though, I'd be happy to prepare and submit a similar kernel patch for
>> the H616, if you'd then take it further and submit a TF-A patch that
>> fixes the DT according to the detected die revision? Did I understand
>> the plan right?
>
> Yes, that was the idea. I have a working version of that TF-A patch
> now,
> just need to figure out some details about the best way to only build
> this
> for the H616 port.

Nice, the kernel patch for the H616 SoC dtsi is now on the list, [4]
please have a look. Please let me know when your follow-up TF-A patch
gets submitted upstream, so I can watch it.

> Neither the data sheet nor the user manual mention the cache sizes for
> the
> H616, but I checked the CSSIDR_EL1 register readouts on both an old
> H616
> and a new H618, and they confirm that the former has 256 KB L2, and the
> latter 1MB.

Oh wow, 1 MB of L2 cache is quite a lot for such an SoC, which is
actually very nice to see. Thumbs up for Allwinner not skimping on
the L2 cache in that H616 die revision. :)

> Also I ran tinymembench on two boards to confirm this,
> community benchmarks results are available here:
> https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
> The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively.
> Associativity and cache line size are dictated by the Arm Cortex cores,
> and the L1I & L1D sizes are the same as in the other SoCs.

I've included the most important benchmark results in the H616 SoC
dtsi patch, [4] which actually now serves as an additional reference
for the cache sizes.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
[3]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
[4]
https://lore.kernel.org/linux-sunxi/9d52e6d338a059618d894abb0764015043330c2b.1714727227.git.dsimic@manjaro.org/