2018-11-05 11:58:31

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 0/7] arm64: capabilities: Optimize checking and enabling

We maintain two separate tables (i.e, arm64_features and arm64_errata) of
struct arm64_cpu_capabilities which decide the capabilities of the system.
We iterate over the two tables for detecting/verifying/enabling the capabilities.
e.g, this_cpu_has_cap() needs to iterate over the two tables one by one to
find the "capability" structure corresponding to the cap number and then
check it on the CPU.

Also, we enable all the non-boot scoped capabilities after all the SMP cpus
are brought up by the kernel, using stop_machine() for each available
capability. We could batch all the "enabling" activity to a single
stop_machine() callback. But that implies you need a way to map
a given capability number to the corresponding capability entry
to finish the operation quickly.

So we need a quicker way to access the entry for a given capability.
We have two choices :

1) Unify both the tables to a static/dynamic sorted entry based on
the capability number. This has the following drawbacks :
a) The entries must be unique. i.e, no duplicate entries for a
capability.
b) Looses the separation of "features" vs. "errata" classification
c) Statically sorting the list is error prone. Runtime sorting the
array means more time for booting.

2) Keep an array of pointers to the capability sorted at boot time
based on the capability.
a) As for (1), the entries must be unique for a capability.

This series implements (2) and uses the new list for optimizing the
operations on the entries. As a prepatory step, we remove the
duplicate entries for the same capabilities (Patch 1-3).


Suzuki K Poulose (7):
arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
arm64: capabilities: Merge duplicate Cavium erratum entries
arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
arm64: capabilities: Speed up capability lookup
arm64: capabilities: Optimize this_cpu_has_cap
arm64: capabilities: Use linear array for detection and verification
arm64: capabilities: Batch cpu_enable callbacks

arch/arm64/include/asm/cpufeature.h | 3 +
arch/arm64/include/asm/cputype.h | 2 +
arch/arm64/kernel/cpu_errata.c | 94 ++++++++++----------
arch/arm64/kernel/cpufeature.c | 165 ++++++++++++++++++++----------------
4 files changed, 145 insertions(+), 119 deletions(-)

--
2.7.4



2018-11-05 11:56:35

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 7/7] arm64: capabilities: Batch cpu_enable callbacks

We use a stop_machine call for each available capability to
enable it on all the CPUs available at boot time. Instead
we could batch the cpu_enable callbacks to a single stop_machine()
call to save us some time.

Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 3 ++
arch/arm64/kernel/cpufeature.c | 70 +++++++++++++++++++++++--------------
2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 7e2ec64..0a15e2c 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -357,6 +357,9 @@ extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
extern struct static_key_false arm64_const_caps_ready;

+#define for_each_available_cap(cap) \
+ for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
+
bool this_cpu_has_cap(unsigned int cap);

static inline bool cpu_have_feature(unsigned int num)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index fff430d..9a1cccc 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1527,11 +1527,27 @@ static void update_cpu_capabilities(u16 scope_mask)
}
}

-static int __enable_cpu_capability(void *arg)
+/*
+ * Enable all the available capabilities on this CPU. The capabilities
+ * with BOOT_CPU scope are handled separately and hence skipped here.
+ */
+static int cpu_enable_non_boot_scope_capabilities(void *__unused)
{
- const struct arm64_cpu_capabilities *cap = arg;
+ int i;
+ u16 non_boot_scope = SCOPE_ALL & ~SCOPE_BOOT_CPU;

- cap->cpu_enable(cap);
+ for_each_available_cap(i) {
+ const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[i];
+
+ if (WARN_ON(!cap))
+ continue;
+
+ if (!(cap->type & non_boot_scope))
+ continue;
+
+ if (cap->cpu_enable)
+ cap->cpu_enable(cap);
+ }
return 0;
}

@@ -1539,21 +1555,29 @@ static int __enable_cpu_capability(void *arg)
* Run through the enabled capabilities and enable() it on all active
* CPUs
*/
-static void __init
-__enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
- u16 scope_mask)
+static void __init enable_cpu_capabilities(u16 scope_mask)
{
+ int i;
+ const struct arm64_cpu_capabilities *caps;
+ bool boot_scope;
+
scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
- for (; caps->matches; caps++) {
- unsigned int num = caps->capability;
+ boot_scope = !!(scope_mask & SCOPE_BOOT_CPU);

- if (!(caps->type & scope_mask) || !cpus_have_cap(num))
+ for (i = 0; i < ARM64_NCAPS; i++) {
+ unsigned int num;
+
+ caps = cpu_hwcaps_ptrs[i];
+ if (!caps || !(caps->type & scope_mask))
+ continue;
+ num = caps->capability;
+ if (!cpus_have_cap(num))
continue;

/* Ensure cpus_have_const_cap(num) works */
static_branch_enable(&cpu_hwcap_keys[num]);

- if (caps->cpu_enable) {
+ if (boot_scope && caps->cpu_enable)
/*
* Capabilities with SCOPE_BOOT_CPU scope are finalised
* before any secondary CPU boots. Thus, each secondary
@@ -1562,25 +1586,19 @@ __enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
* the boot CPU, for which the capability must be
* enabled here. This approach avoids costly
* stop_machine() calls for this case.
- *
- * Otherwise, use stop_machine() as it schedules the
- * work allowing us to modify PSTATE, instead of
- * on_each_cpu() which uses an IPI, giving us a PSTATE
- * that disappears when we return.
*/
- if (scope_mask & SCOPE_BOOT_CPU)
- caps->cpu_enable(caps);
- else
- stop_machine(__enable_cpu_capability,
- (void *)caps, cpu_online_mask);
- }
+ caps->cpu_enable(caps);
}
-}

-static void __init enable_cpu_capabilities(u16 scope_mask)
-{
- __enable_cpu_capabilities(arm64_errata, scope_mask);
- __enable_cpu_capabilities(arm64_features, scope_mask);
+ /*
+ * For all non-boot scope capabilities, use stop_machine()
+ * as it schedules the work allowing us to modify PSTATE,
+ * instead of on_each_cpu() which uses an IPI, giving us a
+ * PSTATE that disappears when we return.
+ */
+ if (!boot_scope)
+ stop_machine(cpu_enable_non_boot_scope_capabilities,
+ NULL, cpu_online_mask);
}

/*
--
2.7.4


2018-11-05 11:56:41

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 6/7] arm64: capabilities: Use linear array for detection and verification

Use the sorted list of capability entries for the detection and
verification.

Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 42 +++++++++++++++++-------------------------
1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 96e687e..fff430d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1508,28 +1508,25 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
cap_set_elf_hwcap(hwcaps);
}

-static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
- u16 scope_mask, const char *info)
+static void update_cpu_capabilities(u16 scope_mask)
{
+ int i;
+ const struct arm64_cpu_capabilities *caps;
+
scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
- for (; caps->matches; caps++) {
- if (!(caps->type & scope_mask) ||
+ for (i = 0; i < ARM64_NCAPS; i++) {
+ caps = cpu_hwcaps_ptrs[i];
+ if (!caps || !(caps->type & scope_mask) ||
+ cpus_have_cap(caps->capability) ||
!caps->matches(caps, cpucap_default_scope(caps)))
continue;

- if (!cpus_have_cap(caps->capability) && caps->desc)
- pr_info("%s %s\n", info, caps->desc);
+ if (caps->desc)
+ pr_info("detected: %s\n", caps->desc);
cpus_set_cap(caps->capability);
}
}

-static void update_cpu_capabilities(u16 scope_mask)
-{
- __update_cpu_capabilities(arm64_errata, scope_mask,
- "enabling workaround for");
- __update_cpu_capabilities(arm64_features, scope_mask, "detected:");
-}
-
static int __enable_cpu_capability(void *arg)
{
const struct arm64_cpu_capabilities *cap = arg;
@@ -1593,16 +1590,17 @@ static void __init enable_cpu_capabilities(u16 scope_mask)
*
* Returns "false" on conflicts.
*/
-static bool
-__verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps,
- u16 scope_mask)
+static bool verify_local_cpu_caps(u16 scope_mask)
{
+ int i;
bool cpu_has_cap, system_has_cap;
+ const struct arm64_cpu_capabilities *caps;

scope_mask &= ARM64_CPUCAP_SCOPE_MASK;

- for (; caps->matches; caps++) {
- if (!(caps->type & scope_mask))
+ for (i = 0; i < ARM64_NCAPS; i++) {
+ caps = cpu_hwcaps_ptrs[i];
+ if (!caps || !(caps->type & scope_mask))
continue;

cpu_has_cap = caps->matches(caps, SCOPE_LOCAL_CPU);
@@ -1633,7 +1631,7 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps,
}
}

- if (caps->matches) {
+ if (i < ARM64_NCAPS) {
pr_crit("CPU%d: Detected conflict for capability %d (%s), System: %d, CPU: %d\n",
smp_processor_id(), caps->capability,
caps->desc, system_has_cap, cpu_has_cap);
@@ -1643,12 +1641,6 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps,
return true;
}

-static bool verify_local_cpu_caps(u16 scope_mask)
-{
- return __verify_local_cpu_caps(arm64_errata, scope_mask) &&
- __verify_local_cpu_caps(arm64_features, scope_mask);
-}
-
/*
* Check for CPU features that are used in early boot
* based on the Boot CPU value.
--
2.7.4


2018-11-05 11:56:47

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 5/7] arm64: capabilities: Optimize this_cpu_has_cap

Make use of the sorted capability list to access the capability
entry in this_cpu_has_cap() to avoid iterating over the two
tables.

Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 31 +++++++++----------------------
1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7e9ebbe..96e687e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1508,25 +1508,6 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
cap_set_elf_hwcap(hwcaps);
}

-/*
- * Check if the current CPU has a given feature capability.
- * Should be called from non-preemptible context.
- */
-static bool __this_cpu_has_cap(const struct arm64_cpu_capabilities *cap_array,
- unsigned int cap)
-{
- const struct arm64_cpu_capabilities *caps;
-
- if (WARN_ON(preemptible()))
- return false;
-
- for (caps = cap_array; caps->matches; caps++)
- if (caps->capability == cap)
- return caps->matches(caps, SCOPE_LOCAL_CPU);
-
- return false;
-}
-
static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
u16 scope_mask, const char *info)
{
@@ -1776,10 +1757,16 @@ static void __init mark_const_caps_ready(void)
static_branch_enable(&arm64_const_caps_ready);
}

-bool this_cpu_has_cap(unsigned int cap)
+bool this_cpu_has_cap(unsigned int n)
{
- return (__this_cpu_has_cap(arm64_features, cap) ||
- __this_cpu_has_cap(arm64_errata, cap));
+ if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
+ const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n];
+
+ if (cap)
+ return cap->matches(cap, SCOPE_LOCAL_CPU);
+ }
+
+ return false;
}

static void __init setup_system_capabilities(void)
--
2.7.4


2018-11-05 11:56:52

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 4/7] arm64: capabilities: Speed up capability lookup

We maintain two separate tables of capabilities, errata and features,
which decide the system capabilities. We iterate over each of these
tables for various operations (e.g, detection, verification etc.).
We do not have a way to map a system "capability" to its entry,
(i.e, cap -> struct arm64_cpu_capabilities) which is needed for
this_cpu_has_cap(). So we iterate over the table one by one to
find the entry and then do the operation. Also, this prevents
us from optimizing the way we "enable" the capabilities on the
CPUs, where we now issue a stop_machine() for each available
capability.

One solution is to merge the two tables into a single table,
sorted by the capability. But this is has the following
disadvantages:
- We loose the "classification" of an errata vs. feature
- It is quite easy to make a mistake when adding an entry,
unless we sort the table at runtime.

So we maintain a list of pointers to the capability entry, sorted
by the "cap number" in a separate array, initialized at boot time.
The only restriction is that we can have one "entry" per capability.
While at it, remove the duplicate declaration of arm64_errata table.

Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index af50064..7e9ebbe 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -52,6 +52,7 @@ unsigned int compat_elf_hwcap2 __read_mostly;

DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
EXPORT_SYMBOL(cpu_hwcaps);
+static struct arm64_cpu_capabilities const __ro_after_init *cpu_hwcaps_ptrs[ARM64_NCAPS];

/*
* Flag to indicate if we have computed the system wide
@@ -518,6 +519,25 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new)
}

extern const struct arm64_cpu_capabilities arm64_errata[];
+static const struct arm64_cpu_capabilities arm64_features[];
+
+static void __init
+init_cpu_hwcaps_indirect_list_from_array(const struct arm64_cpu_capabilities *caps)
+{
+ for (; caps->matches; caps++) {
+ if (WARN_ON(caps->capability > ARM64_NCAPS ||
+ cpu_hwcaps_ptrs[caps->capability]))
+ continue;
+ cpu_hwcaps_ptrs[caps->capability] = caps;
+ }
+}
+
+static void __init init_cpu_hwcaps_indirect_list(void)
+{
+ init_cpu_hwcaps_indirect_list_from_array(arm64_features);
+ init_cpu_hwcaps_indirect_list_from_array(arm64_errata);
+}
+
static void __init setup_boot_cpu_capabilities(void);

void __init init_cpu_features(struct cpuinfo_arm64 *info)
@@ -564,6 +584,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
}

/*
+ * Initialize the indirect array of CPU hwcaps capabilities pointers
+ * before we handle the boot CPU below.
+ */
+ init_cpu_hwcaps_indirect_list();
+
+ /*
* Detect and enable early CPU capabilities based on the boot CPU,
* after we have initialised the CPU feature infrastructure.
*/
@@ -1750,8 +1776,6 @@ static void __init mark_const_caps_ready(void)
static_branch_enable(&arm64_const_caps_ready);
}

-extern const struct arm64_cpu_capabilities arm64_errata[];
-
bool this_cpu_has_cap(unsigned int cap)
{
return (__this_cpu_has_cap(arm64_features, cap) ||
--
2.7.4


2018-11-05 11:57:43

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 2/7] arm64: capabilities: Merge duplicate Cavium erratum entries

Merge duplicate entries for a single capability using the midr
range list for Cavium errata 30115 and 27456.

Cc: Andrew Pinski <[email protected]>
Cc: David Daney <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/cputype.h | 2 ++
arch/arm64/kernel/cpu_errata.c | 50 +++++++++++++++++++---------------------
2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 12f93e4d..5de4f6a 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -151,6 +151,8 @@ struct midr_range {
.rv_max = MIDR_CPU_VAR_REV(v_max, r_max), \
}

+#define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max)
+#define MIDR_REV(m, v, r) MIDR_RANGE(m, v, r, v, r)
#define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf)

static inline bool is_midr_in_range(u32 midr, struct midr_range const *range)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index c825bc0..7242e4c 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -570,6 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = {

#endif

+#ifdef CONFIG_CAVIUM_ERRATUM_27456
+static const struct midr_range cavium_erratum_27456_cpus[] = {
+ /* Cavium ThunderX, T88 pass 1.x - 2.1 */
+ MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
+ /* Cavium ThunderX, T81 pass 1.0 */
+ MIDR_REV(MIDR_THUNDERX_81XX, 0, 0),
+ {},
+};
+#endif
+
+#ifdef CONFIG_CAVIUM_ERRATUM_30115
+static const struct midr_range cavium_erratum_30115_cpus[] = {
+ /* Cavium ThunderX, T88 pass 1.x - 2.2 */
+ MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 2),
+ /* Cavium ThunderX, T81 pass 1.0 - 1.2 */
+ MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2),
+ /* Cavium ThunderX, T83 pass 1.0 */
+ MIDR_REV(MIDR_THUNDERX_83XX, 0, 0),
+ {},
+};
+#endif
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#if defined(CONFIG_ARM64_ERRATUM_826319) || \
defined(CONFIG_ARM64_ERRATUM_827319) || \
@@ -633,40 +655,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_27456
{
- /* Cavium ThunderX, T88 pass 1.x - 2.1 */
.desc = "Cavium erratum 27456",
.capability = ARM64_WORKAROUND_CAVIUM_27456,
- ERRATA_MIDR_RANGE(MIDR_THUNDERX,
- 0, 0,
- 1, 1),
- },
- {
- /* Cavium ThunderX, T81 pass 1.0 */
- .desc = "Cavium erratum 27456",
- .capability = ARM64_WORKAROUND_CAVIUM_27456,
- ERRATA_MIDR_REV(MIDR_THUNDERX_81XX, 0, 0),
+ ERRATA_MIDR_RANGE_LIST(cavium_erratum_27456_cpus),
},
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_30115
{
- /* Cavium ThunderX, T88 pass 1.x - 2.2 */
.desc = "Cavium erratum 30115",
.capability = ARM64_WORKAROUND_CAVIUM_30115,
- ERRATA_MIDR_RANGE(MIDR_THUNDERX,
- 0, 0,
- 1, 2),
- },
- {
- /* Cavium ThunderX, T81 pass 1.0 - 1.2 */
- .desc = "Cavium erratum 30115",
- .capability = ARM64_WORKAROUND_CAVIUM_30115,
- ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2),
- },
- {
- /* Cavium ThunderX, T83 pass 1.0 */
- .desc = "Cavium erratum 30115",
- .capability = ARM64_WORKAROUND_CAVIUM_30115,
- ERRATA_MIDR_REV(MIDR_THUNDERX_83XX, 0, 0),
+ ERRATA_MIDR_RANGE_LIST(cavium_erratum_30115_cpus),
},
#endif
{
--
2.7.4


2018-11-05 11:57:51

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE

We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :

1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
2) ARM Errata 819472 on A53 r0p[01]

Both have the same work around. Merge these entries to avoid
duplicate entries for a single capability.

Cc: Will Deacon <[email protected]>
Cc: Andre Przywara <[email protected]>
Cc: Mark Rutland <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpu_errata.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index a509e351..c825bc0 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -573,24 +573,19 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
const struct arm64_cpu_capabilities arm64_errata[] = {
#if defined(CONFIG_ARM64_ERRATUM_826319) || \
defined(CONFIG_ARM64_ERRATUM_827319) || \
- defined(CONFIG_ARM64_ERRATUM_824069)
+ defined(CONFIG_ARM64_ERRATUM_824069) || \
+ defined(CONFIG_ARM64_ERRATUM_819472)
{
- /* Cortex-A53 r0p[012] */
- .desc = "ARM errata 826319, 827319, 824069",
+ /*
+ * Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069
+ * Cortex-A53 r0p[01] : ARM errata 819472
+ */
+ .desc = "ARM errata 826319, 827319, 824069, 819472",
.capability = ARM64_WORKAROUND_CLEAN_CACHE,
ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
.cpu_enable = cpu_enable_cache_maint_trap,
},
#endif
-#ifdef CONFIG_ARM64_ERRATUM_819472
- {
- /* Cortex-A53 r0p[01] */
- .desc = "ARM errata 819472",
- .capability = ARM64_WORKAROUND_CLEAN_CACHE,
- ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1),
- .cpu_enable = cpu_enable_cache_maint_trap,
- },
-#endif
#ifdef CONFIG_ARM64_ERRATUM_832075
{
/* Cortex-A57 r0p0 - r1p2 */
--
2.7.4


2018-11-05 11:58:37

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH 3/7] arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003

Remove duplicate entries for Qualcomm erratum 1003. Since the entries
are not purely based on generic MIDR checks, use the multi_cap_entry
type to merge the entries.

Cc: Christopher Covington <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpu_errata.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 7242e4c..64525b2 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -592,6 +592,19 @@ static const struct midr_range cavium_erratum_30115_cpus[] = {
};
#endif

+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
+static const struct arm64_cpu_capabilities qcom_erratum_1003_list[] = {
+ {
+ ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0),
+ },
+ {
+ .midr_range.model = MIDR_QCOM_KRYO,
+ .matches = is_kryo_midr,
+ },
+ {},
+};
+#endif
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#if defined(CONFIG_ARM64_ERRATUM_826319) || \
defined(CONFIG_ARM64_ERRATUM_827319) || \
@@ -676,16 +689,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
},
#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
{
- .desc = "Qualcomm Technologies Falkor erratum 1003",
+ .desc = "Qualcomm Technologies Falkor/Kryo erratum 1003",
.capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003,
- ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0),
- },
- {
- .desc = "Qualcomm Technologies Kryo erratum 1003",
- .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003,
- .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
- .midr_range.model = MIDR_QCOM_KRYO,
- .matches = is_kryo_midr,
+ .matches = multi_entry_cap_matches,
+ .match_list = qcom_erratum_1003_list,
},
#endif
#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009
--
2.7.4


2018-11-05 12:15:29

by Vladimir Murzin

[permalink] [raw]
Subject: Re: [PATCH 0/7] arm64: capabilities: Optimize checking and enabling

Hi Suzuki,

On 05/11/18 11:55, Suzuki K Poulose wrote:
> We maintain two separate tables (i.e, arm64_features and arm64_errata) of
> struct arm64_cpu_capabilities which decide the capabilities of the system.
> We iterate over the two tables for detecting/verifying/enabling the capabilities.
> e.g, this_cpu_has_cap() needs to iterate over the two tables one by one to
> find the "capability" structure corresponding to the cap number and then
> check it on the CPU.
>
> Also, we enable all the non-boot scoped capabilities after all the SMP cpus
> are brought up by the kernel, using stop_machine() for each available
> capability. We could batch all the "enabling" activity to a single
> stop_machine() callback. But that implies you need a way to map
> a given capability number to the corresponding capability entry
> to finish the operation quickly.
>
> So we need a quicker way to access the entry for a given capability.
> We have two choices :
>
> 1) Unify both the tables to a static/dynamic sorted entry based on
> the capability number. This has the following drawbacks :
> a) The entries must be unique. i.e, no duplicate entries for a
> capability.
> b) Looses the separation of "features" vs. "errata" classification
> c) Statically sorting the list is error prone. Runtime sorting the
> array means more time for booting.
>
> 2) Keep an array of pointers to the capability sorted at boot time
> based on the capability.
> a) As for (1), the entries must be unique for a capability.
>
> This series implements (2) and uses the new list for optimizing the
> operations on the entries. As a prepatory step, we remove the
> duplicate entries for the same capabilities (Patch 1-3).
>

Thanks a lot for getting it sorted out! In case it'd help:

Reviewed-by: Vladimir Murzin <[email protected]>
Tested-by: Vladimir Murzin <[email protected]>

Cheers
Vladimir

>
> Suzuki K Poulose (7):
> arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
> arm64: capabilities: Merge duplicate Cavium erratum entries
> arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
> arm64: capabilities: Speed up capability lookup
> arm64: capabilities: Optimize this_cpu_has_cap
> arm64: capabilities: Use linear array for detection and verification
> arm64: capabilities: Batch cpu_enable callbacks
>
> arch/arm64/include/asm/cpufeature.h | 3 +
> arch/arm64/include/asm/cputype.h | 2 +
> arch/arm64/kernel/cpu_errata.c | 94 ++++++++++----------
> arch/arm64/kernel/cpufeature.c | 165 ++++++++++++++++++++----------------
> 4 files changed, 145 insertions(+), 119 deletions(-)
>


2018-11-26 14:10:10

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE

On Mon, Nov 05, 2018 at 11:55:11AM +0000, Suzuki K Poulose wrote:
> We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :
>
> 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
> 2) ARM Errata 819472 on A53 r0p[01]
>
> Both have the same work around. Merge these entries to avoid
> duplicate entries for a single capability.
>
> Cc: Will Deacon <[email protected]>
> Cc: Andre Przywara <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm64/kernel/cpu_errata.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index a509e351..c825bc0 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -573,24 +573,19 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
> const struct arm64_cpu_capabilities arm64_errata[] = {
> #if defined(CONFIG_ARM64_ERRATUM_826319) || \
> defined(CONFIG_ARM64_ERRATUM_827319) || \
> - defined(CONFIG_ARM64_ERRATUM_824069)
> + defined(CONFIG_ARM64_ERRATUM_824069) || \
> + defined(CONFIG_ARM64_ERRATUM_819472)
> {
> - /* Cortex-A53 r0p[012] */
> - .desc = "ARM errata 826319, 827319, 824069",
> + /*
> + * Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069
> + * Cortex-A53 r0p[01] : ARM errata 819472
> + */
> + .desc = "ARM errata 826319, 827319, 824069, 819472",
> .capability = ARM64_WORKAROUND_CLEAN_CACHE,
> ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
> .cpu_enable = cpu_enable_cache_maint_trap,

Isn't this a semantic change wrt the Kconfig options? After this change,
if I /only/ set CONFIG_ARM64_ERRATUM_819472=y, then I still get the
workaround applied for CPUs > r0[p01] which isn't what I asked for.

Will

2018-11-26 15:13:15

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE

On Mon, Nov 26, 2018 at 02:06:04PM +0000, Will Deacon wrote:
> On Mon, Nov 05, 2018 at 11:55:11AM +0000, Suzuki K Poulose wrote:
> > We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :
> >
> > 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
> > 2) ARM Errata 819472 on A53 r0p[01]
> >
> > Both have the same work around. Merge these entries to avoid
> > duplicate entries for a single capability.
> >
> > Cc: Will Deacon <[email protected]>
> > Cc: Andre Przywara <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Signed-off-by: Suzuki K Poulose <[email protected]>
> > ---
> > arch/arm64/kernel/cpu_errata.c | 19 +++++++------------
> > 1 file changed, 7 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index a509e351..c825bc0 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -573,24 +573,19 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
> > const struct arm64_cpu_capabilities arm64_errata[] = {
> > #if defined(CONFIG_ARM64_ERRATUM_826319) || \
> > defined(CONFIG_ARM64_ERRATUM_827319) || \
> > - defined(CONFIG_ARM64_ERRATUM_824069)
> > + defined(CONFIG_ARM64_ERRATUM_824069) || \
> > + defined(CONFIG_ARM64_ERRATUM_819472)
> > {
> > - /* Cortex-A53 r0p[012] */
> > - .desc = "ARM errata 826319, 827319, 824069",
> > + /*
> > + * Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069
> > + * Cortex-A53 r0p[01] : ARM errata 819472
> > + */
> > + .desc = "ARM errata 826319, 827319, 824069, 819472",
> > .capability = ARM64_WORKAROUND_CLEAN_CACHE,
> > ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
> > .cpu_enable = cpu_enable_cache_maint_trap,
>
> Isn't this a semantic change wrt the Kconfig options? After this change,
> if I /only/ set CONFIG_ARM64_ERRATUM_819472=y, then I still get the
> workaround applied for CPUs > r0[p01] which isn't what I asked for.

You're right. I could change this to :


diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 787d785..56a921f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -313,6 +313,11 @@ menu "Kernel Features"

menu "ARM errata workarounds via the alternatives framework"

+config ARM64_WORKAROUND_CLEAN_CACHE
+ default y
+ depends on ARM64_ERRATUM_826319 || ARM64_ERRATUM_827319 ||
+ ARM64_ERRATUM_824069 || ARM64_ERRATUM_819472
+
config ARM64_ERRATUM_826319
bool "Cortex-A53: 826319: System might deadlock if a write cannot complete until read data is accepted"
default y
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index a509e351..80f2e66 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -570,24 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = {

#endif

-const struct arm64_cpu_capabilities arm64_errata[] = {
+#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
+static const struct midr_range workaround_clean_cache[] = {
#if defined(CONFIG_ARM64_ERRATUM_826319) || \
defined(CONFIG_ARM64_ERRATUM_827319) || \
defined(CONFIG_ARM64_ERRATUM_824069)
- {
- /* Cortex-A53 r0p[012] */
- .desc = "ARM errata 826319, 827319, 824069",
- .capability = ARM64_WORKAROUND_CLEAN_CACHE,
- ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
- .cpu_enable = cpu_enable_cache_maint_trap,
- },
+ /* Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069 */
+ MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_819472
+ /* Cortex-A53 r0p[01] : ARM errata 819472 */
+ MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1),
+#endif
+ {},
+};
#endif
-#ifdef CONFIG_ARM64_ERRATUM_819472
+
+const struct arm64_cpu_capabilities arm64_errata[] = {
+#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
{
- /* Cortex-A53 r0p[01] */
- .desc = "ARM errata 819472",
+ .desc = "ARM errata 826319, 827319, 824069, 819472",
.capability = ARM64_WORKAROUND_CLEAN_CACHE,
- ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1),
+ ERRATA_MIDR_REV_LIST(workaround_clean_cache),
.cpu_enable = cpu_enable_cache_maint_trap,
},
#endif

---

Cheers
Suzuki

2018-11-27 19:33:10

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE

On Mon, Nov 26, 2018 at 03:09:56PM +0000, Suzuki K Poulose wrote:
> On Mon, Nov 26, 2018 at 02:06:04PM +0000, Will Deacon wrote:
> > On Mon, Nov 05, 2018 at 11:55:11AM +0000, Suzuki K Poulose wrote:
> > > We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :
> > >
> > > 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
> > > 2) ARM Errata 819472 on A53 r0p[01]
> > >
> > > Both have the same work around. Merge these entries to avoid
> > > duplicate entries for a single capability.
> > >
> > > Cc: Will Deacon <[email protected]>
> > > Cc: Andre Przywara <[email protected]>
> > > Cc: Mark Rutland <[email protected]>
> > > Signed-off-by: Suzuki K Poulose <[email protected]>
> > > ---
> > > arch/arm64/kernel/cpu_errata.c | 19 +++++++------------
> > > 1 file changed, 7 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > > index a509e351..c825bc0 100644
> > > --- a/arch/arm64/kernel/cpu_errata.c
> > > +++ b/arch/arm64/kernel/cpu_errata.c
> > > @@ -573,24 +573,19 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
> > > const struct arm64_cpu_capabilities arm64_errata[] = {
> > > #if defined(CONFIG_ARM64_ERRATUM_826319) || \
> > > defined(CONFIG_ARM64_ERRATUM_827319) || \
> > > - defined(CONFIG_ARM64_ERRATUM_824069)
> > > + defined(CONFIG_ARM64_ERRATUM_824069) || \
> > > + defined(CONFIG_ARM64_ERRATUM_819472)
> > > {
> > > - /* Cortex-A53 r0p[012] */
> > > - .desc = "ARM errata 826319, 827319, 824069",
> > > + /*
> > > + * Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069
> > > + * Cortex-A53 r0p[01] : ARM errata 819472
> > > + */
> > > + .desc = "ARM errata 826319, 827319, 824069, 819472",
> > > .capability = ARM64_WORKAROUND_CLEAN_CACHE,
> > > ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2),
> > > .cpu_enable = cpu_enable_cache_maint_trap,
> >
> > Isn't this a semantic change wrt the Kconfig options? After this change,
> > if I /only/ set CONFIG_ARM64_ERRATUM_819472=y, then I still get the
> > workaround applied for CPUs > r0[p01] which isn't what I asked for.
>
> You're right. I could change this to :

Indeed, and you'll probably need something similar for the other entries
that you're merging.

Will