On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.
For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.
Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures.
we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.
64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8)))) in the way of #ifdef.
The last patch "sched: depend on 64BIT_ATOMIC_ACCESS to determine if to
use min_vruntime_copy" is an example of this approach.
I'd like to know what the architecture maintainers and kernel maintainers
think about it. I think I can make more examples (mostly removing seqlock
to access the 64bit variables on the machines) if this approach is
accepted.
Hoeun Ryu (3):
arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit
machines
arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines
sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use
min_vruntime_copy
arch/Kconfig | 20 ++++++++++++++++++++
arch/arm/mm/Kconfig | 2 ++
kernel/sched/fair.c | 6 +++---
kernel/sched/sched.h | 6 +++++-
4 files changed, 30 insertions(+), 4 deletions(-)
--
2.7.4
On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.
For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.
Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures,
we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.
64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8)))) in the way of #ifdef.
Signed-off-by: Hoeun Ryu <[email protected]>
---
arch/Kconfig | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..1def331 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -115,6 +115,26 @@ config UPROBES
managed by the kernel and kept transparent to the probed
application. )
+config 64BIT_ATOMIC_ACCESS
+ def_bool 64BIT
+ help
+ On some 32bit architectures as well as 64bit architectures,
+ 64bit accesses are atomic when certain conditions are satisfied.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses can be atomic.
+
+config 64BIT_ATOMIC_ALIGNED_ACCESS
+ def_bool n
+ depends on 64BIT_ATOMIC_ACCESS
+ help
+ On 64BIT_ATOMIC_ACCESS enabled system, the address should be
+ aligned by 8 to guarantee the accesses are atomic.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses are required to be 64 bit aligned to guarantee that
+ the 64bit accesses are atomic.
+
config HAVE_64BIT_ALIGNED_ACCESS
def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS
help
--
2.7.4
'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as
long as the address is 64-bit aligned on LPAE (Large Physical Address
Extension) enabled architectures. This feature is to guarantee atomic
accesses on newly introduced 64bit wide descriptors in the translation
tables.
Making 64BIT_ATOMIC_ACCESS true, some kernel codes to access 64bit
variables can be optimized by omitting seqlock or the mimic of it.
Also make 64BIT_ATOMIC_ALIGNED_ACCESS true, the 64bit atomic access is
guarnteed only when the address is 64bit algined.
Signed-off-by: Hoeun Ryu <[email protected]>
---
arch/arm/mm/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 60cdfdc..3142572 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -660,6 +660,8 @@ config ARM_LPAE
bool "Support for the Large Physical Address Extension"
depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
!CPU_32v4 && !CPU_32v3
+ select 64BIT_ATOMIC_ACCESS
+ select 64BIT_ATOMIC_ALIGNED_ACCESS
help
Say Y if you have an ARMv7 processor supporting the LPAE page
table format and you would like to access memory beyond the
--
2.7.4
'min_vruntime_copy' is copied when 'min_vruntime' is updated for cfq_rq
and used to check if updating 'min_vruntime' is completed on reader side.
Because 'min_vruntime' variable is 64bit, we need a mimic of seqlock to
check if the variable is not being updated on 32bit machines.
On 64BIT_ATOMIC_ACCESS enabled machines, 64bit accesses are atomic even
though the machines are 32bit, so we can directly access 'min_vruntime'
on the architectures.
Depend on CONFIG_64BIT_ATOMIC_ACCESS instead of CONFIG_64BIT to determine
whether 'min_vruntime_copy' variable is used for synchronization or not.
And align 'min_vruntime' by 8 if 64BIT_ATOMIC_ALIGNED_ACCESS is true
because 64BIT_ATOMIC_ALIGNED_ACCESS enabled system can access the variable
atomically only when it' aligned.
Signed-off-by: Hoeun Ryu <[email protected]>
---
kernel/sched/fair.c | 6 +++---
kernel/sched/sched.h | 6 +++++-
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e..840658f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -536,7 +536,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
/* ensure we never gain time by being placed backwards. */
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
smp_wmb();
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
#endif
@@ -5975,7 +5975,7 @@ static void migrate_task_rq_fair(struct task_struct *p)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 min_vruntime;
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
do {
@@ -9173,7 +9173,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
{
cfs_rq->tasks_timeline = RB_ROOT;
cfs_rq->min_vruntime = (u64)(-(1LL << 20));
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
#endif
#ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3..870010b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -421,8 +421,12 @@ struct cfs_rq {
unsigned int nr_running, h_nr_running;
u64 exec_clock;
+#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS
u64 min_vruntime;
-#ifndef CONFIG_64BIT
+#else
+ u64 min_vruntime __attribute__((aligned(sizeof(u64))));
+#endif
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
#endif
--
2.7.4
On Thu, Aug 24, 2017 at 02:42:57PM +0900, Hoeun Ryu wrote:
> +#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS
> u64 min_vruntime;
> -#ifndef CONFIG_64BIT
> +#else
> + u64 min_vruntime __attribute__((aligned(sizeof(u64))));
> +#endif
That's stupid, just make sure your platform defines u64 as naturally
aligned when you have this 64BIT_ATOMIC foo.
Also, please try and dig out more 32bit archs that can use this and make
sure to include performance numbers to justify this extra cruft.