2009-06-30 21:25:41

by David Howells

[permalink] [raw]
Subject: [PATCH] FRV: Wire up new syscalls

Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.

Signed-off-by: David Howells <[email protected]>
---

arch/frv/include/asm/unistd.h | 4 +++-
arch/frv/kernel/entry.S | 2 ++
2 files changed, 5 insertions(+), 1 deletions(-)


diff --git a/arch/frv/include/asm/unistd.h b/arch/frv/include/asm/unistd.h
index 96d78d5..4a8fb42 100644
--- a/arch/frv/include/asm/unistd.h
+++ b/arch/frv/include/asm/unistd.h
@@ -341,10 +341,12 @@
#define __NR_inotify_init1 332
#define __NR_preadv 333
#define __NR_pwritev 334
+#define __NR_rt_tgsigqueueinfo 335
+#define __NR_perf_counter_open 336

#ifdef __KERNEL__

-#define NR_syscalls 335
+#define NR_syscalls 337

#define __ARCH_WANT_IPC_PARSE_VERSION
/* #define __ARCH_WANT_OLD_READDIR */
diff --git a/arch/frv/kernel/entry.S b/arch/frv/kernel/entry.S
index 356e0e3..fde1e44 100644
--- a/arch/frv/kernel/entry.S
+++ b/arch/frv/kernel/entry.S
@@ -1524,5 +1524,7 @@ sys_call_table:
.long sys_inotify_init1
.long sys_preadv
.long sys_pwritev
+ .long sys_rt_tgsigqueueinfo /* 335 */
+ .long sys_perf_counter_open

syscall_table_size = (. - sys_call_table)


2009-06-30 21:34:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls


* David Howells <[email protected]> wrote:

> Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.

> + .long sys_perf_counter_open

Hm, i suspect this should be sys_ni because FRV does not set
HAVE_PERF_COUNTERS so perfcounters cannot be enabled. (Or do you
have patches in the pipeline that enable perfcounters for FRV?)

Ingo

2009-06-30 21:42:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

On Tuesday 30 June 2009, Ingo Molnar wrote:
> * David Howells <[email protected]> wrote:
>
> > Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.
>
> > + .long sys_perf_counter_open
>
> Hm, i suspect this should be sys_ni because FRV does not set
> HAVE_PERF_COUNTERS so perfcounters cannot be enabled. (Or do you
> have patches in the pipeline that enable perfcounters for FRV?)

It doesn't matter because of the cond_syscall() in kernel/sys_ni.c.
Right now, scripts/checksyscalls.sh causes a warning on each architecture
that does not assign a system call number for sys_perf_counter_open,
so they might as well make it point to sys_ni though cond_syscall().

Arnd <><

2009-06-30 21:54:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls


* Arnd Bergmann <[email protected]> wrote:

> On Tuesday 30 June 2009, Ingo Molnar wrote:
> > * David Howells <[email protected]> wrote:
> >
> > > Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.
> >
> > > + .long sys_perf_counter_open
> >
> > Hm, i suspect this should be sys_ni because FRV does not set
> > HAVE_PERF_COUNTERS so perfcounters cannot be enabled. (Or do you
> > have patches in the pipeline that enable perfcounters for FRV?)
>
> It doesn't matter because of the cond_syscall() in
> kernel/sys_ni.c. Right now, scripts/checksyscalls.sh causes a
> warning on each architecture that does not assign a system call
> number for sys_perf_counter_open, so they might as well make it
> point to sys_ni though cond_syscall().

It would make sense to wire it up for real as it's really easy: just
set HAVE_PERF_COUNTERS in arch/frv/Kconfig, add an empty
arch/frv/include/asm/perf_counter.h file. (Optional: double check
that tools/perf/ builds and works fine on FRV ;-)

Ingo

2009-07-01 11:28:34

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

Ingo Molnar <[email protected]> wrote:

> It would make sense to wire it up for real as it's really easy: just
> set HAVE_PERF_COUNTERS in arch/frv/Kconfig, add an empty
> arch/frv/include/asm/perf_counter.h file. (Optional: double check
> that tools/perf/ builds and works fine on FRV ;-)

Are these for hardware performance counters?

David

2009-07-01 11:54:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls


* David Howells <[email protected]> wrote:

> Ingo Molnar <[email protected]> wrote:
>
> > It would make sense to wire it up for real as it's really easy:
> > just set HAVE_PERF_COUNTERS in arch/frv/Kconfig, add an empty
> > arch/frv/include/asm/perf_counter.h file. (Optional: double
> > check that tools/perf/ builds and works fine on FRV ;-)
>
> Are these for hardware performance counters?

No. PMU support is not necessary in the first step, you'll still get
plenty of features from enabling perfcounters on FRV:

- hrtimer driven recording/profiling

- various software counters

- working perf top / perf stat / perf record / perf report

The tools will all work out of box and there's a transparent
fall-back path in the tools if the hardware counters are not
available.

It's much easier to add than basic Oprofile support, and you'll get
a lot more features in exchange as well ;)

[ There might be unanticipated problems of course - please report to
us any problems (tools/perf/ does not build cleanly, features dont
work as expected, etc.). ]

Ingo

2009-07-01 12:20:48

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

Ingo Molnar <[email protected]> wrote:

> [ There might be unanticipated problems of course - please report to
> us any problems (tools/perf/ does not build cleanly, features dont
> work as expected, etc.). ]

It requires atomic64_t. Is that mandatory on all arches now?

David

2009-07-01 12:37:06

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

David Howells writes:

> It requires atomic64_t. Is that mandatory on all arches now?

I did a generic implementation using irq disable on UP and hashed
spin-locks on SMP. See 09d4e0ed for the implementation and c2e95c6d
for an example of how to wire it up.

Paul.

2009-07-01 12:42:18

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

Paul Mackerras <[email protected]> wrote:

> I did a generic implementation using irq disable on UP and hashed
> spin-locks on SMP. See 09d4e0ed for the implementation

I can do better than that on FRV. FRV has load/store-double instructions that
I can use to atomically load/store 64-bit memory locations by using paired
registers for the data.

David

2009-07-01 13:13:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls


* David Howells <[email protected]> wrote:

> Paul Mackerras <[email protected]> wrote:
>
> > I did a generic implementation using irq disable on UP and
> > hashed spin-locks on SMP. See 09d4e0ed for the implementation
>
> I can do better than that on FRV. FRV has load/store-double
> instructions that I can use to atomically load/store 64-bit memory
> locations by using paired registers for the data.

Yeah - still i'd suggest to just go with the generic code initially,
as then 'perf' output itself can be used to verify whether the FRV
atomic64_t implementation is correct.

Ingo

2009-07-01 14:10:36

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

Ingo Molnar <[email protected]> wrote:

> Yeah - still i'd suggest to just go with the generic code initially,

Nah, that's no fun! Besides, the code is very similar to the 32-bit code, and
I'm fairly certain it'll work. I can also check it very easily by dumping
some temp test code in setup.c.

Anyway, I see the following errors:

kernel/perf_counter.c: In function 'perf_counter_index':
kernel/perf_counter.c:1889: error: 'PERF_COUNTER_INDEX_OFFSET' undeclared (first use in this function)
kernel/perf_counter.c:1889: error: (Each undeclared identifier is reported only once
kernel/perf_counter.c:1889: error: for each function it appears in.)
CC block/as-iosched.o
kernel/perf_counter.c:1890: warning: control reaches end of non-void function

But this symbol appears to be undocumented:

warthog>grep -r PERF_COUNTER_INDEX_OFFSET Documentation/
warthog1>grep -r INDEX_OFFSET Documentation/
warthog1>

David

2009-07-01 14:49:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls


* David Howells <[email protected]> wrote:

> Ingo Molnar <[email protected]> wrote:
>
> > Yeah - still i'd suggest to just go with the generic code initially,
>
> Nah, that's no fun! Besides, the code is very similar to the 32-bit code, and
> I'm fairly certain it'll work. I can also check it very easily by dumping
> some temp test code in setup.c.
>
> Anyway, I see the following errors:
>
> kernel/perf_counter.c: In function 'perf_counter_index':
> kernel/perf_counter.c:1889: error: 'PERF_COUNTER_INDEX_OFFSET' undeclared (first use in this function)
> kernel/perf_counter.c:1889: error: (Each undeclared identifier is reported only once
> kernel/perf_counter.c:1889: error: for each function it appears in.)
> CC block/as-iosched.o
> kernel/perf_counter.c:1890: warning: control reaches end of non-void function

Ok, indeed - you can define it to 0 for now - FRV wont set hw
counters so it doesnt matter. It is used for the hw index in the
mmap head for 'direct' access to the PMU, using special instructions
like RDPMC (on x86). Probably not relevant to FRV.

> But this symbol appears to be undocumented:
>
> warthog>grep -r PERF_COUNTER_INDEX_OFFSET Documentation/
> warthog1>grep -r INDEX_OFFSET Documentation/
> warthog1>

You can use the power of the Git space:

git log -SPERF_COUNTER_INDEX_OFFSET

:-)

Ingo

2009-07-01 15:30:06

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] FRV: Wire up new syscalls

Ingo Molnar <[email protected]> wrote:

> Ok, indeed - you can define it to 0 for now - FRV wont set hw counters so it
> doesnt matter.

Okay. FRV does have some h/w perf counters for measuring various things about
userspace and kernel execution. I'll have to look at how to use them with
this.

David

2009-07-01 16:47:27

by David Howells

[permalink] [raw]
Subject: [PATCH 2/2] FRV: Add basic performance counter support

Add basic performance counter support to the FRV arch.

Signed-off-by: David Howells <[email protected]>
---

arch/frv/Kconfig | 1 +
arch/frv/include/asm/perf_counter.h | 17 +++++++++++++++++
arch/frv/lib/Makefile | 2 +-
arch/frv/lib/perf_counter.c | 19 +++++++++++++++++++
4 files changed, 38 insertions(+), 1 deletions(-)
create mode 100644 arch/frv/include/asm/perf_counter.h
create mode 100644 arch/frv/lib/perf_counter.c


diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig
index 8a5bd7a..b86e19c 100644
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -7,6 +7,7 @@ config FRV
default y
select HAVE_IDE
select HAVE_ARCH_TRACEHOOK
+ select HAVE_PERF_COUNTERS

config ZONE_DMA
bool
diff --git a/arch/frv/include/asm/perf_counter.h b/arch/frv/include/asm/perf_counter.h
new file mode 100644
index 0000000..ccf726e
--- /dev/null
+++ b/arch/frv/include/asm/perf_counter.h
@@ -0,0 +1,17 @@
+/* FRV performance counter support
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PERF_COUNTER_H
+#define _ASM_PERF_COUNTER_H
+
+#define PERF_COUNTER_INDEX_OFFSET 0
+
+#endif /* _ASM_PERF_COUNTER_H */
diff --git a/arch/frv/lib/Makefile b/arch/frv/lib/Makefile
index 4ff2fb1..0a37721 100644
--- a/arch/frv/lib/Makefile
+++ b/arch/frv/lib/Makefile
@@ -5,4 +5,4 @@
lib-y := \
__ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
- outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
+ outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o perf_counter.o
diff --git a/arch/frv/lib/perf_counter.c b/arch/frv/lib/perf_counter.c
new file mode 100644
index 0000000..2000fee
--- /dev/null
+++ b/arch/frv/lib/perf_counter.c
@@ -0,0 +1,19 @@
+/* Performance counter handling
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/perf_counter.h>
+
+/*
+ * mark the performance counter as pending
+ */
+void set_perf_counter_pending(void)
+{
+}

2009-07-01 16:47:39

by David Howells

[permalink] [raw]
Subject: [PATCH 1/2] FRV: Implement atomic64_t

Implement atomic64_t and its ops for FRV. Tested with the following patch:
diff --git a/arch/frv/kernel/setup.c b/arch/frv/kernel/setup.c
index 55e4fab..086d50d 100644
--- a/arch/frv/kernel/setup.c
+++ b/arch/frv/kernel/setup.c
@@ -746,6 +746,52 @@ static void __init parse_cmdline_early(char *cmdline)

} /* end parse_cmdline_early() */

+static atomic64_t xxx;
+
+static void test_atomic64(void)
+{
+ atomic64_set(&xxx, 0x12300000023LL);
+
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000023LL);
+ mb();
+ if (atomic64_inc_return(&xxx) != 0x12300000024LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000024LL);
+ mb();
+ if (atomic64_sub_return(0x36900000050LL, &xxx) != -0x2460000002cLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002cLL);
+ mb();
+ if (atomic64_dec_return(&xxx) != -0x2460000002dLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002dLL);
+ mb();
+ if (atomic64_add_return(0x36800000001LL, &xxx) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x123456789abcdefLL, 0x121ffffffd4LL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x121ffffffd4LL, 0x123456789abcdefLL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x123456789abcdefLL);
+ mb();
+ if (atomic64_xchg(&xxx, 0xabcdef123456789LL) != 0x123456789abcdefLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0xabcdef123456789LL);
+ mb();
+}
+
/*****************************************************************************/
/*
*
@@ -845,6 +891,8 @@ void __init setup_arch(char **cmdline_p)
// asm volatile("movgs %0,timerd" :: "r"(10000000));
// __set_HSR(0, __get_HSR(0) | HSR0_ETMD);

+ test_atomic64();
+
} /* end setup_arch() */

#if 0

Note that this doesn't cover all the trivial wrappers, but does cover all the
substantial implementations.

Signed-off-by: David Howells <[email protected]>
---

arch/frv/include/asm/atomic.h | 53 +++++++++++++
arch/frv/include/asm/system.h | 2 +
arch/frv/kernel/frv_ksyms.c | 4 +
arch/frv/lib/Makefile | 2 -
arch/frv/lib/atomic-ops.S | 3 -
arch/frv/lib/atomic64-ops.S | 162 +++++++++++++++++++++++++++++++++++++++++
6 files changed, 221 insertions(+), 5 deletions(-)
create mode 100644 arch/frv/lib/atomic64-ops.S


diff --git a/arch/frv/include/asm/atomic.h b/arch/frv/include/asm/atomic.h
index 0409d98..cf483d1 100644
--- a/arch/frv/include/asm/atomic.h
+++ b/arch/frv/include/asm/atomic.h
@@ -121,10 +121,57 @@ static inline void atomic_dec(atomic_t *v)
#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)

+/*
+ * 64-bit atomic ops
+ */
+typedef struct {
+ volatile long long counter;
+} atomic64_t;
+
+#define ATOMIC64_INIT(i) { (i) }
+#define atomic64_read(v) ((v)->counter)
+#define atomic64_set(v, i) (((v)->counter) = (i))
+
+extern long long atomic64_inc_return(atomic64_t *v);
+extern long long atomic64_dec_return(atomic64_t *v);
+extern long long atomic64_add_return(long long i, atomic64_t *v);
+extern long long atomic64_sub_return(long long i, atomic64_t *v);
+
+static inline long long atomic64_add_negative(long long i, atomic64_t *v)
+{
+ return atomic64_add_return(i, v) < 0;
+}
+
+static inline void atomic64_add(long long i, atomic64_t *v)
+{
+ atomic64_add_return(i, v);
+}
+
+static inline void atomic64_sub(long long i, atomic64_t *v)
+{
+ atomic64_sub_return(i, v);
+}
+
+static inline void atomic64_inc(atomic64_t *v)
+{
+ atomic64_inc_return(v);
+}
+
+static inline void atomic64_dec(atomic64_t *v)
+{
+ atomic64_dec_return(v);
+}
+
+#define atomic64_sub_and_test(i,v) (atomic64_sub_return((i), (v)) == 0)
+#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
+#define atomic64_inc_and_test(v) (atomic64_inc_return((v)) == 0)
+
/*****************************************************************************/
/*
* exchange value with memory
*/
+extern uint64_t __xchg_64(uint64_t i, volatile void *v);
+
#ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS

#define xchg(ptr, x) \
@@ -174,8 +221,10 @@ extern uint32_t __xchg_32(uint32_t i, volatile void *v);

#define tas(ptr) (xchg((ptr), 1))

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), old, new))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic_cmpxchg(v, old, new) (cmpxchg(&(v)->counter, old, new))
+#define atomic_xchg(v, new) (xchg(&(v)->counter, new))
+#define atomic64_cmpxchg(v, old, new) (__cmpxchg_64(old, new, &(v)->counter))
+#define atomic64_xchg(v, new) (__xchg_64(new, &(v)->counter))

static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)
{
diff --git a/arch/frv/include/asm/system.h b/arch/frv/include/asm/system.h
index 7742ec0..efd22d9 100644
--- a/arch/frv/include/asm/system.h
+++ b/arch/frv/include/asm/system.h
@@ -208,6 +208,8 @@ extern void free_initmem(void);
* - if (*ptr == test) then orig = *ptr; *ptr = test;
* - if (*ptr != test) then orig = *ptr;
*/
+extern uint64_t __cmpxchg_64(uint64_t test, uint64_t new, volatile uint64_t *v);
+
#ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS

#define cmpxchg(ptr, test, new) \
diff --git a/arch/frv/kernel/frv_ksyms.c b/arch/frv/kernel/frv_ksyms.c
index 0316b3c..a89803b 100644
--- a/arch/frv/kernel/frv_ksyms.c
+++ b/arch/frv/kernel/frv_ksyms.c
@@ -67,6 +67,10 @@ EXPORT_SYMBOL(atomic_sub_return);
EXPORT_SYMBOL(__xchg_32);
EXPORT_SYMBOL(__cmpxchg_32);
#endif
+EXPORT_SYMBOL(atomic64_add_return);
+EXPORT_SYMBOL(atomic64_sub_return);
+EXPORT_SYMBOL(__xchg_64);
+EXPORT_SYMBOL(__cmpxchg_64);

EXPORT_SYMBOL(__debug_bug_printk);
EXPORT_SYMBOL(__delay_loops_MHz);
diff --git a/arch/frv/lib/Makefile b/arch/frv/lib/Makefile
index 08be305..4ff2fb1 100644
--- a/arch/frv/lib/Makefile
+++ b/arch/frv/lib/Makefile
@@ -4,5 +4,5 @@

lib-y := \
__ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
- checksum.o memcpy.o memset.o atomic-ops.o \
+ checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
diff --git a/arch/frv/lib/atomic-ops.S b/arch/frv/lib/atomic-ops.S
index ee0ac90..5e9e6ab 100644
--- a/arch/frv/lib/atomic-ops.S
+++ b/arch/frv/lib/atomic-ops.S
@@ -163,11 +163,10 @@ __cmpxchg_32:
ld.p @(gr11,gr0),gr8
orcr cc7,cc7,cc3
subcc gr8,gr9,gr7,icc0
- bne icc0,#0,1f
+ bnelr icc0,#0
cst.p gr10,@(gr11,gr0) ,cc3,#1
corcc gr29,gr29,gr0 ,cc3,#1
beq icc3,#0,0b
-1:
bralr

.size __cmpxchg_32, .-__cmpxchg_32
diff --git a/arch/frv/lib/atomic64-ops.S b/arch/frv/lib/atomic64-ops.S
new file mode 100644
index 0000000..b6194ee
--- /dev/null
+++ b/arch/frv/lib/atomic64-ops.S
@@ -0,0 +1,162 @@
+/* kernel atomic64 operations
+ *
+ * For an explanation of how atomic ops work in this arch, see:
+ * Documentation/frv/atomic-ops.txt
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/spr-regs.h>
+
+ .text
+ .balign 4
+
+
+###############################################################################
+#
+# long long atomic64_inc_return(atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_inc_return
+ .type atomic64_inc_return,@function
+atomic64_inc_return:
+ or.p gr8,gr8,gr10
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ addicc gr9,#1,gr9,icc0
+ addxi gr8,#0,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_inc_return, .-atomic64_inc_return
+
+###############################################################################
+#
+# long long atomic64_dec_return(atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_dec_return
+ .type atomic64_dec_return,@function
+atomic64_dec_return:
+ or.p gr8,gr8,gr10
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ subicc gr9,#1,gr9,icc0
+ subxi gr8,#0,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_dec_return, .-atomic64_dec_return
+
+###############################################################################
+#
+# long long atomic64_add_return(long long i, atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_add_return
+ .type atomic64_add_return,@function
+atomic64_add_return:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ addcc gr9,gr5,gr9,icc0
+ addx gr8,gr4,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_add_return, .-atomic64_add_return
+
+###############################################################################
+#
+# long long atomic64_sub_return(long long i, atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_sub_return
+ .type atomic64_sub_return,@function
+atomic64_sub_return:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ subcc gr9,gr5,gr9,icc0
+ subx gr8,gr4,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_sub_return, .-atomic64_sub_return
+
+###############################################################################
+#
+# uint64_t __xchg_64(uint64_t i, uint64_t *v)
+#
+###############################################################################
+ .globl __xchg_64
+ .type __xchg_64,@function
+__xchg_64:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ cstd.p gr4,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size __xchg_64, .-__xchg_64
+
+###############################################################################
+#
+# uint64_t __cmpxchg_64(uint64_t test, uint64_t new, uint64_t *v)
+#
+###############################################################################
+ .globl __cmpxchg_64
+ .type __cmpxchg_64,@function
+__cmpxchg_64:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr12,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3
+ subcc gr8,gr4,gr0,icc0
+ subcc.p gr9,gr5,gr0,icc1
+ bnelr icc0,#0
+ bnelr icc1,#0
+ cstd.p gr10,@(gr12,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size __cmpxchg_64, .-__cmpxchg_64
+

2009-07-01 17:21:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t



On Wed, 1 Jul 2009, David Howells wrote:
> +
> +#define ATOMIC64_INIT(i) { (i) }
> +#define atomic64_read(v) ((v)->counter)
> +#define atomic64_set(v, i) (((v)->counter) = (i))

These seem to be buggy.

At least "atomic64_read()" needs to make sure to actually read it
atomically - otherwise you'll do two 32-bit reads, and that just gets
crap. Imagine if somebody is adding 1 to 0x00000000ffffffff, and then
"atomic64_read()" reads it as two accesses in the wrong place, and gets
either 0, or 0x00000001ffffffff, both of which are totally incorrect.

The case of 'atomic64_set()' is _slightly_ less clear, because I think we
use it mostly for initializers, so atomicity is often not strictly
required. But at least on x86, we do guarantee that it sets it atomically
too.

Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
couple of buglets:

- atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
read, but that looks really really bogus.

It _should_ use __atomic64_read(), and the 64-bit versions should use a
different counter name ("counter64"?) or we should use an inline
function for atomic_read(), so that the type safety issue gets fixed.

- atomic64_read() is being stupid with the whole loop thing. It _should_
just do

static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long old = __atomic64_read(ptr);
return cmpxchg8b(ptr, old, old);
}

and that's it. No loop. cmpxchg8b() will return the right thing.

- Similarly, atomic64_add_return() is bogus for the same reasons: using
the wrong 'atomic_read()', and unnecessarily ignoring the returned old
value. It probably should do

static inline unsigned long long
atomic64_add_return(unsigned long long delta, atomic64_t *ptr)
{
unsigned long long old;

old = __atomic_read64(ptr);
for (;;) {
unsigned long long tmp, new;
new = old + delta;
tmp = atomic64_cmpxchg(ptr, old, new);
if (tmp == old)
return new;
old = tmp;
}
}

or something. NOTE NOTE NOTE! Not tested!

Those functions also almost certainly should _not_ be inlined. They need
so many registers that inlining them is crazy.

Linus

2009-07-01 17:33:42

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t

Linus Torvalds <[email protected]> wrote:

> > +#define atomic64_read(v) ((v)->counter)
> > +#define atomic64_set(v, i) (((v)->counter) = (i))
>
> These seem to be buggy.

Good point. I should be doing inline asm with LDD and STD instructions.

David

2009-07-01 21:10:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] FRV: Add basic performance counter support


* David Howells <[email protected]> wrote:

> Add basic performance counter support to the FRV arch.
>

that was fast :)

Looks good. There's two areas where we can improve the core
perfcounters code:

> Signed-off-by: David Howells <[email protected]>
> ---
>
> arch/frv/Kconfig | 1 +
> arch/frv/include/asm/perf_counter.h | 17 +++++++++++++++++
> arch/frv/lib/Makefile | 2 +-
> arch/frv/lib/perf_counter.c | 19 +++++++++++++++++++
> 4 files changed, 38 insertions(+), 1 deletions(-)
> create mode 100644 arch/frv/include/asm/perf_counter.h
> create mode 100644 arch/frv/lib/perf_counter.c
>
>
> diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig
> index 8a5bd7a..b86e19c 100644
> --- a/arch/frv/Kconfig
> +++ b/arch/frv/Kconfig
> @@ -7,6 +7,7 @@ config FRV
> default y
> select HAVE_IDE
> select HAVE_ARCH_TRACEHOOK
> + select HAVE_PERF_COUNTERS
>
> config ZONE_DMA
> bool
> diff --git a/arch/frv/include/asm/perf_counter.h b/arch/frv/include/asm/perf_counter.h
> new file mode 100644
> index 0000000..ccf726e
> --- /dev/null
> +++ b/arch/frv/include/asm/perf_counter.h
> @@ -0,0 +1,17 @@
> +/* FRV performance counter support
> + *
> + * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public Licence
> + * as published by the Free Software Foundation; either version
> + * 2 of the Licence, or (at your option) any later version.
> + */
> +
> +#ifndef _ASM_PERF_COUNTER_H
> +#define _ASM_PERF_COUNTER_H
> +
> +#define PERF_COUNTER_INDEX_OFFSET 0

We shouldnt require this of architectures that dont offer hw
perfcounters yet.


> +
> +#endif /* _ASM_PERF_COUNTER_H */
> diff --git a/arch/frv/lib/Makefile b/arch/frv/lib/Makefile
> index 4ff2fb1..0a37721 100644
> --- a/arch/frv/lib/Makefile
> +++ b/arch/frv/lib/Makefile
> @@ -5,4 +5,4 @@
> lib-y := \
> __ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
> checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
> - outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
> + outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o perf_counter.o
> diff --git a/arch/frv/lib/perf_counter.c b/arch/frv/lib/perf_counter.c
> new file mode 100644
> index 0000000..2000fee
> --- /dev/null
> +++ b/arch/frv/lib/perf_counter.c
> @@ -0,0 +1,19 @@
> +/* Performance counter handling
> + *
> + * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public Licence
> + * as published by the Free Software Foundation; either version
> + * 2 of the Licence, or (at your option) any later version.
> + */
> +
> +#include <linux/perf_counter.h>
> +
> +/*
> + * mark the performance counter as pending
> + */
> +void set_perf_counter_pending(void)
> +{
> +}

This could be avoided as well via a generic weak alias.

Ingo

2009-07-01 21:11:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t


* Linus Torvalds <[email protected]> wrote:

> On Wed, 1 Jul 2009, David Howells wrote:
> > +
> > +#define ATOMIC64_INIT(i) { (i) }
> > +#define atomic64_read(v) ((v)->counter)
> > +#define atomic64_set(v, i) (((v)->counter) = (i))
>
> These seem to be buggy.
>
> At least "atomic64_read()" needs to make sure to actually read it
> atomically - otherwise you'll do two 32-bit reads, and that just
> gets crap. Imagine if somebody is adding 1 to 0x00000000ffffffff,
> and then "atomic64_read()" reads it as two accesses in the wrong
> place, and gets either 0, or 0x00000001ffffffff, both of which are
> totally incorrect.
>
> The case of 'atomic64_set()' is _slightly_ less clear, because I
> think we use it mostly for initializers, so atomicity is often not
> strictly required. But at least on x86, we do guarantee that it
> sets it atomically too.
>
> Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed
> a couple of buglets:

thanks - we'll fix these!

Ingo

2009-07-01 22:57:30

by Paul Mackerras

[permalink] [raw]
Subject: [PATCH] x86: Code atomic(64)_read and atomic(64)_set in C not CPP [was Re: FRV: Implement atomic64_t]

Occasionally we get bugs where atomic_read or atomic_set are used on
atomic64_t variables or vice versa. These bugs don't generate
warnings on x86 because atomic_read and atomic_set are coded as macros
rather than C functions, so we don't get any type-checking on their
arguments; similarly for atomic64_read and atomic64_set in 64-bit
kernels.

This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily.
It also converts atomic_cmpxchg and atomic_xchg, and atomic64_cmpxchg
and atomic64_xchg on 64-bit, so we get type-checking on their
arguments too.

Compiling a typical 64-bit x86 config, this generates no new warnings,
and the vmlinux text is 86 bytes smaller.

Signed-off-by: Paul Mackerras <[email protected]>
---
Linus Torvalds writes:

> Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
> couple of buglets:
>
> - atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
> the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
> read, but that looks really really bogus.
>
> It _should_ use __atomic64_read(), and the 64-bit versions should use a
> different counter name ("counter64"?) or we should use an inline
> function for atomic_read(), so that the type safety issue gets fixed.

I did this patch a few weeks ago (before the merge window) and sent it
to Ingo, Thomas & Peter, but it seems to have got lost.

arch/x86/include/asm/atomic_32.h | 25 +++++++++++++++++-----
arch/x86/include/asm/atomic_64.h | 42 ++++++++++++++++++++++++++++++-------
2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index aff9f1f..0c466cb 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -19,7 +19,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -28,7 +31,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -200,8 +206,15 @@ static inline int atomic_sub_return(int i, atomic_t *v)
return atomic_add_return(-i, v);
}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline int atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is already a given value
@@ -306,7 +319,7 @@ atomic64_xchg(atomic64_t *ptr, unsigned long long new_val)
unsigned long long old_val;

do {
- old_val = atomic_read(ptr);
+ old_val = __atomic64_read(ptr);
} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);

return old_val;
@@ -354,7 +367,7 @@ atomic64_add_return(unsigned long long delta, atomic64_t *ptr)
unsigned long long old_val, new_val;

do {
- old_val = atomic_read(ptr);
+ old_val = __atomic64_read(ptr);
new_val = old_val + delta;

} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
diff --git a/arch/x86/include/asm/atomic_64.h b/arch/x86/include/asm/atomic_64.h
index 8c21731..7d5282e 100644
--- a/arch/x86/include/asm/atomic_64.h
+++ b/arch/x86/include/asm/atomic_64.h
@@ -18,7 +18,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -27,7 +30,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -192,7 +198,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
* Atomically reads the value of @v.
* Doesn't imply a read memory barrier.
*/
-#define atomic64_read(v) ((v)->counter)
+static inline long atomic64_read(const atomic64_t *v)
+{
+ return v->counter;
+}

/**
* atomic64_set - set atomic64 variable
@@ -201,7 +210,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic64_set(v, i) (((v)->counter) = (i))
+static inline void atomic64_set(atomic64_t *v, long i)
+{
+ v->counter = i;
+}

/**
* atomic64_add - add integer to atomic64 variable
@@ -355,11 +367,25 @@ static inline long atomic64_sub_return(long i, atomic64_t *v)
#define atomic64_inc_return(v) (atomic64_add_return(1, (v)))
#define atomic64_dec_return(v) (atomic64_sub_return(1, (v)))

-#define atomic64_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic64_xchg(atomic64_t *v, long new)
+{
+ return xchg(&v->counter, new);
+}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline long atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is a given value
--
1.6.0.4

2009-07-01 23:46:37

by David Howells

[permalink] [raw]
Subject: [PATCH 1/2] FRV: Implement atomic64_t [ver #2]

Implement atomic64_t and its ops for FRV. Tested with the following patch:

diff --git a/arch/frv/kernel/setup.c b/arch/frv/kernel/setup.c
index 55e4fab..086d50d 100644
--- a/arch/frv/kernel/setup.c
+++ b/arch/frv/kernel/setup.c
@@ -746,6 +746,52 @@ static void __init parse_cmdline_early(char *cmdline)

} /* end parse_cmdline_early() */

+static atomic64_t xxx;
+
+static void test_atomic64(void)
+{
+ atomic64_set(&xxx, 0x12300000023LL);
+
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000023LL);
+ mb();
+ if (atomic64_inc_return(&xxx) != 0x12300000024LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000024LL);
+ mb();
+ if (atomic64_sub_return(0x36900000050LL, &xxx) != -0x2460000002cLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002cLL);
+ mb();
+ if (atomic64_dec_return(&xxx) != -0x2460000002dLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002dLL);
+ mb();
+ if (atomic64_add_return(0x36800000001LL, &xxx) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x123456789abcdefLL, 0x121ffffffd4LL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x121ffffffd4LL, 0x123456789abcdefLL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x123456789abcdefLL);
+ mb();
+ if (atomic64_xchg(&xxx, 0xabcdef123456789LL) != 0x123456789abcdefLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0xabcdef123456789LL);
+ mb();
+}
+
/*****************************************************************************/
/*
*
@@ -845,6 +891,8 @@ void __init setup_arch(char **cmdline_p)
// asm volatile("movgs %0,timerd" :: "r"(10000000));
// __set_HSR(0, __get_HSR(0) | HSR0_ETMD);

+ test_atomic64();
+
} /* end setup_arch() */

#if 0

Note that this doesn't cover all the trivial wrappers, but does cover all the
substantial implementations.

Signed-off-by: David Howells <[email protected]>
---

arch/frv/include/asm/atomic.h | 68 +++++++++++++++++
arch/frv/include/asm/system.h | 2 +
arch/frv/kernel/frv_ksyms.c | 4 +
arch/frv/lib/Makefile | 2 -
arch/frv/lib/atomic-ops.S | 3 -
arch/frv/lib/atomic64-ops.S | 162 +++++++++++++++++++++++++++++++++++++++++
6 files changed, 236 insertions(+), 5 deletions(-)
create mode 100644 arch/frv/lib/atomic64-ops.S


diff --git a/arch/frv/include/asm/atomic.h b/arch/frv/include/asm/atomic.h
index 0409d98..00a57af 100644
--- a/arch/frv/include/asm/atomic.h
+++ b/arch/frv/include/asm/atomic.h
@@ -121,10 +121,72 @@ static inline void atomic_dec(atomic_t *v)
#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)

+/*
+ * 64-bit atomic ops
+ */
+typedef struct {
+ volatile long long counter;
+} atomic64_t;
+
+#define ATOMIC64_INIT(i) { (i) }
+
+static inline long long atomic64_read(atomic64_t *v)
+{
+ long long counter;
+
+ asm("ldd%I1 %M1,%0"
+ : "=e"(counter)
+ : "m"(v->counter));
+ return counter;
+}
+
+static inline void atomic64_set(atomic64_t *v, long long i)
+{
+ asm volatile("std%I0 %1,%M0"
+ : "=m"(v->counter)
+ : "e"(i));
+}
+
+extern long long atomic64_inc_return(atomic64_t *v);
+extern long long atomic64_dec_return(atomic64_t *v);
+extern long long atomic64_add_return(long long i, atomic64_t *v);
+extern long long atomic64_sub_return(long long i, atomic64_t *v);
+
+static inline long long atomic64_add_negative(long long i, atomic64_t *v)
+{
+ return atomic64_add_return(i, v) < 0;
+}
+
+static inline void atomic64_add(long long i, atomic64_t *v)
+{
+ atomic64_add_return(i, v);
+}
+
+static inline void atomic64_sub(long long i, atomic64_t *v)
+{
+ atomic64_sub_return(i, v);
+}
+
+static inline void atomic64_inc(atomic64_t *v)
+{
+ atomic64_inc_return(v);
+}
+
+static inline void atomic64_dec(atomic64_t *v)
+{
+ atomic64_dec_return(v);
+}
+
+#define atomic64_sub_and_test(i,v) (atomic64_sub_return((i), (v)) == 0)
+#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
+#define atomic64_inc_and_test(v) (atomic64_inc_return((v)) == 0)
+
/*****************************************************************************/
/*
* exchange value with memory
*/
+extern uint64_t __xchg_64(uint64_t i, volatile void *v);
+
#ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS

#define xchg(ptr, x) \
@@ -174,8 +236,10 @@ extern uint32_t __xchg_32(uint32_t i, volatile void *v);

#define tas(ptr) (xchg((ptr), 1))

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), old, new))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic_cmpxchg(v, old, new) (cmpxchg(&(v)->counter, old, new))
+#define atomic_xchg(v, new) (xchg(&(v)->counter, new))
+#define atomic64_cmpxchg(v, old, new) (__cmpxchg_64(old, new, &(v)->counter))
+#define atomic64_xchg(v, new) (__xchg_64(new, &(v)->counter))

static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)
{
diff --git a/arch/frv/include/asm/system.h b/arch/frv/include/asm/system.h
index 7742ec0..efd22d9 100644
--- a/arch/frv/include/asm/system.h
+++ b/arch/frv/include/asm/system.h
@@ -208,6 +208,8 @@ extern void free_initmem(void);
* - if (*ptr == test) then orig = *ptr; *ptr = test;
* - if (*ptr != test) then orig = *ptr;
*/
+extern uint64_t __cmpxchg_64(uint64_t test, uint64_t new, volatile uint64_t *v);
+
#ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS

#define cmpxchg(ptr, test, new) \
diff --git a/arch/frv/kernel/frv_ksyms.c b/arch/frv/kernel/frv_ksyms.c
index 0316b3c..a89803b 100644
--- a/arch/frv/kernel/frv_ksyms.c
+++ b/arch/frv/kernel/frv_ksyms.c
@@ -67,6 +67,10 @@ EXPORT_SYMBOL(atomic_sub_return);
EXPORT_SYMBOL(__xchg_32);
EXPORT_SYMBOL(__cmpxchg_32);
#endif
+EXPORT_SYMBOL(atomic64_add_return);
+EXPORT_SYMBOL(atomic64_sub_return);
+EXPORT_SYMBOL(__xchg_64);
+EXPORT_SYMBOL(__cmpxchg_64);

EXPORT_SYMBOL(__debug_bug_printk);
EXPORT_SYMBOL(__delay_loops_MHz);
diff --git a/arch/frv/lib/Makefile b/arch/frv/lib/Makefile
index 08be305..4ff2fb1 100644
--- a/arch/frv/lib/Makefile
+++ b/arch/frv/lib/Makefile
@@ -4,5 +4,5 @@

lib-y := \
__ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
- checksum.o memcpy.o memset.o atomic-ops.o \
+ checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
diff --git a/arch/frv/lib/atomic-ops.S b/arch/frv/lib/atomic-ops.S
index ee0ac90..5e9e6ab 100644
--- a/arch/frv/lib/atomic-ops.S
+++ b/arch/frv/lib/atomic-ops.S
@@ -163,11 +163,10 @@ __cmpxchg_32:
ld.p @(gr11,gr0),gr8
orcr cc7,cc7,cc3
subcc gr8,gr9,gr7,icc0
- bne icc0,#0,1f
+ bnelr icc0,#0
cst.p gr10,@(gr11,gr0) ,cc3,#1
corcc gr29,gr29,gr0 ,cc3,#1
beq icc3,#0,0b
-1:
bralr

.size __cmpxchg_32, .-__cmpxchg_32
diff --git a/arch/frv/lib/atomic64-ops.S b/arch/frv/lib/atomic64-ops.S
new file mode 100644
index 0000000..b6194ee
--- /dev/null
+++ b/arch/frv/lib/atomic64-ops.S
@@ -0,0 +1,162 @@
+/* kernel atomic64 operations
+ *
+ * For an explanation of how atomic ops work in this arch, see:
+ * Documentation/frv/atomic-ops.txt
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/spr-regs.h>
+
+ .text
+ .balign 4
+
+
+###############################################################################
+#
+# long long atomic64_inc_return(atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_inc_return
+ .type atomic64_inc_return,@function
+atomic64_inc_return:
+ or.p gr8,gr8,gr10
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ addicc gr9,#1,gr9,icc0
+ addxi gr8,#0,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_inc_return, .-atomic64_inc_return
+
+###############################################################################
+#
+# long long atomic64_dec_return(atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_dec_return
+ .type atomic64_dec_return,@function
+atomic64_dec_return:
+ or.p gr8,gr8,gr10
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ subicc gr9,#1,gr9,icc0
+ subxi gr8,#0,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_dec_return, .-atomic64_dec_return
+
+###############################################################################
+#
+# long long atomic64_add_return(long long i, atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_add_return
+ .type atomic64_add_return,@function
+atomic64_add_return:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ addcc gr9,gr5,gr9,icc0
+ addx gr8,gr4,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_add_return, .-atomic64_add_return
+
+###############################################################################
+#
+# long long atomic64_sub_return(long long i, atomic64_t *v)
+#
+###############################################################################
+ .globl atomic64_sub_return
+ .type atomic64_sub_return,@function
+atomic64_sub_return:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ subcc gr9,gr5,gr9,icc0
+ subx gr8,gr4,gr8,icc0
+ cstd.p gr8,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size atomic64_sub_return, .-atomic64_sub_return
+
+###############################################################################
+#
+# uint64_t __xchg_64(uint64_t i, uint64_t *v)
+#
+###############################################################################
+ .globl __xchg_64
+ .type __xchg_64,@function
+__xchg_64:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr10,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3 /* set CC3 to true */
+ cstd.p gr4,@(gr10,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size __xchg_64, .-__xchg_64
+
+###############################################################################
+#
+# uint64_t __cmpxchg_64(uint64_t test, uint64_t new, uint64_t *v)
+#
+###############################################################################
+ .globl __cmpxchg_64
+ .type __cmpxchg_64,@function
+__cmpxchg_64:
+ or.p gr8,gr8,gr4
+ or gr9,gr9,gr5
+0:
+ orcc gr0,gr0,gr0,icc3 /* set ICC3.Z */
+ ckeq icc3,cc7
+ ldd.p @(gr12,gr0),gr8 /* LDD.P/ORCR must be atomic */
+ orcr cc7,cc7,cc3
+ subcc gr8,gr4,gr0,icc0
+ subcc.p gr9,gr5,gr0,icc1
+ bnelr icc0,#0
+ bnelr icc1,#0
+ cstd.p gr10,@(gr12,gr0) ,cc3,#1
+ corcc gr29,gr29,gr0 ,cc3,#1 /* clear ICC3.Z if store happens */
+ beq icc3,#0,0b
+ bralr
+
+ .size __cmpxchg_64, .-__cmpxchg_64
+

2009-07-01 23:46:49

by David Howells

[permalink] [raw]
Subject: [PATCH 2/2] FRV: Add basic performance counter support [ver #2]

Add basic performance counter support to the FRV arch.

Signed-off-by: David Howells <[email protected]>
---

arch/frv/Kconfig | 1 +
arch/frv/include/asm/perf_counter.h | 17 +++++++++++++++++
arch/frv/lib/Makefile | 2 +-
arch/frv/lib/perf_counter.c | 19 +++++++++++++++++++
4 files changed, 38 insertions(+), 1 deletions(-)
create mode 100644 arch/frv/include/asm/perf_counter.h
create mode 100644 arch/frv/lib/perf_counter.c


diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig
index 8a5bd7a..b86e19c 100644
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -7,6 +7,7 @@ config FRV
default y
select HAVE_IDE
select HAVE_ARCH_TRACEHOOK
+ select HAVE_PERF_COUNTERS

config ZONE_DMA
bool
diff --git a/arch/frv/include/asm/perf_counter.h b/arch/frv/include/asm/perf_counter.h
new file mode 100644
index 0000000..ccf726e
--- /dev/null
+++ b/arch/frv/include/asm/perf_counter.h
@@ -0,0 +1,17 @@
+/* FRV performance counter support
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PERF_COUNTER_H
+#define _ASM_PERF_COUNTER_H
+
+#define PERF_COUNTER_INDEX_OFFSET 0
+
+#endif /* _ASM_PERF_COUNTER_H */
diff --git a/arch/frv/lib/Makefile b/arch/frv/lib/Makefile
index 4ff2fb1..0a37721 100644
--- a/arch/frv/lib/Makefile
+++ b/arch/frv/lib/Makefile
@@ -5,4 +5,4 @@
lib-y := \
__ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
- outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
+ outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o perf_counter.o
diff --git a/arch/frv/lib/perf_counter.c b/arch/frv/lib/perf_counter.c
new file mode 100644
index 0000000..2000fee
--- /dev/null
+++ b/arch/frv/lib/perf_counter.c
@@ -0,0 +1,19 @@
+/* Performance counter handling
+ *
+ * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/perf_counter.h>
+
+/*
+ * mark the performance counter as pending
+ */
+void set_perf_counter_pending(void)
+{
+}

2009-07-01 23:48:58

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t

Linus Torvalds <[email protected]> wrote:

> > +#define atomic64_read(v) ((v)->counter)
> > +#define atomic64_set(v, i) (((v)->counter) = (i))

This is now:

static inline long long atomic64_read(atomic64_t *v)
{
long long counter;

asm("ldd%I1 %M1,%0"
: "=e"(counter)
: "m"(v->counter));
return counter;
}

static inline void atomic64_set(atomic64_t *v, long long i)
{
asm volatile("std%I0 %1,%M0"
: "=m"(v->counter)
: "e"(i));
}

which causes the CPU to do 64-bit loads and stores to register pairs.

David

2009-07-02 07:22:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: Code atomic(64)_read and atomic(64)_set in C not CPP [was Re: FRV: Implement atomic64_t]


* Paul Mackerras <[email protected]> wrote:

> Occasionally we get bugs where atomic_read or atomic_set are used
> on atomic64_t variables or vice versa. These bugs don't generate
> warnings on x86 because atomic_read and atomic_set are coded as
> macros rather than C functions, so we don't get any type-checking
> on their arguments; similarly for atomic64_read and atomic64_set
> in 64-bit kernels.
>
> This converts them to C functions so that the arguments are
> type-checked and bugs like this will get caught more easily. It
> also converts atomic_cmpxchg and atomic_xchg, and atomic64_cmpxchg
> and atomic64_xchg on 64-bit, so we get type-checking on their
> arguments too.
>
> Compiling a typical 64-bit x86 config, this generates no new
> warnings, and the vmlinux text is 86 bytes smaller.
>
> Signed-off-by: Paul Mackerras <[email protected]>

Thanks Paul!

> ---
> Linus Torvalds writes:
>
> > Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
> > couple of buglets:
> >
> > - atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
> > the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
> > read, but that looks really really bogus.
> >
> > It _should_ use __atomic64_read(), and the 64-bit versions should use a
> > different counter name ("counter64"?) or we should use an inline
> > function for atomic_read(), so that the type safety issue gets fixed.
>
> I did this patch a few weeks ago (before the merge window) and
> sent it to Ingo, Thomas & Peter, but it seems to have got lost.

Yeah, as i noted back then off-list i didnt take it due to it
causing a criss-cross merge:

| > Acked-by: Peter Zijlstra <[email protected]>
|
| Nice - could someone please remind us later in the merge window to
| have a look at this again? Right now this needs to go into
| perfcounters/core - but i'd like to avoid having to do too many
| cross-changes there.

Linus reminded us ;-)

Ingo

2009-07-02 07:22:45

by Paul Mackerras

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Code atomic(64)_read and atomic(64)_set in C not CPP

Commit-ID: 584de8a67dcc5f70bd4fdb25ceaf9f840d57937b
Gitweb: http://git.kernel.org/tip/584de8a67dcc5f70bd4fdb25ceaf9f840d57937b
Author: Paul Mackerras <[email protected]>
AuthorDate: Thu, 2 Jul 2009 08:57:12 +1000
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 2 Jul 2009 09:19:11 +0200

x86: Code atomic(64)_read and atomic(64)_set in C not CPP

Occasionally we get bugs where atomic_read or atomic_set are
used on atomic64_t variables or vice versa. These bugs don't
generate warnings on x86 because atomic_read and atomic_set are
coded as macros rather than C functions, so we don't get any
type-checking on their arguments; similarly for atomic64_read
and atomic64_set in 64-bit kernels.

This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily. It
also converts atomic_cmpxchg and atomic_xchg, and
atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
type-checking on their arguments too.

Compiling a typical 64-bit x86 config, this generates no new
warnings, and the vmlinux text is 86 bytes smaller.

Signed-off-by: Paul Mackerras <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 25 +++++++++++++++++-----
arch/x86/include/asm/atomic_64.h | 42 ++++++++++++++++++++++++++++++-------
2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..6e26c23 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -19,7 +19,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -28,7 +31,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -200,8 +206,15 @@ static inline int atomic_sub_return(int i, atomic_t *v)
return atomic_add_return(-i, v);
}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline int atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is already a given value
@@ -305,7 +318,7 @@ atomic64_xchg(atomic64_t *ptr, unsigned long long new_val)
unsigned long long old_val;

do {
- old_val = atomic_read(ptr);
+ old_val = __atomic64_read(ptr);
} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);

return old_val;
@@ -353,7 +366,7 @@ atomic64_add_return(unsigned long long delta, atomic64_t *ptr)
unsigned long long old_val, new_val;

do {
- old_val = atomic_read(ptr);
+ old_val = __atomic64_read(ptr);
new_val = old_val + delta;

} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
diff --git a/arch/x86/include/asm/atomic_64.h b/arch/x86/include/asm/atomic_64.h
index 0d63602..d605dc2 100644
--- a/arch/x86/include/asm/atomic_64.h
+++ b/arch/x86/include/asm/atomic_64.h
@@ -18,7 +18,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -27,7 +30,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -192,7 +198,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
* Atomically reads the value of @v.
* Doesn't imply a read memory barrier.
*/
-#define atomic64_read(v) ((v)->counter)
+static inline long atomic64_read(const atomic64_t *v)
+{
+ return v->counter;
+}

/**
* atomic64_set - set atomic64 variable
@@ -201,7 +210,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic64_set(v, i) (((v)->counter) = (i))
+static inline void atomic64_set(atomic64_t *v, long i)
+{
+ v->counter = i;
+}

/**
* atomic64_add - add integer to atomic64 variable
@@ -355,11 +367,25 @@ static inline long atomic64_sub_return(long i, atomic64_t *v)
#define atomic64_inc_return(v) (atomic64_add_return(1, (v)))
#define atomic64_dec_return(v) (atomic64_sub_return(1, (v)))

-#define atomic64_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic64_xchg(atomic64_t *v, long new)
+{
+ return xchg(&v->counter, new);
+}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline long atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is a given value

2009-07-02 21:13:30

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t

Linus Torvalds a ?crit :
>
> On Wed, 1 Jul 2009, David Howells wrote:
>> +
>> +#define ATOMIC64_INIT(i) { (i) }
>> +#define atomic64_read(v) ((v)->counter)
>> +#define atomic64_set(v, i) (((v)->counter) = (i))
>
> These seem to be buggy.
>
> At least "atomic64_read()" needs to make sure to actually read it
> atomically - otherwise you'll do two 32-bit reads, and that just gets
> crap. Imagine if somebody is adding 1 to 0x00000000ffffffff, and then
> "atomic64_read()" reads it as two accesses in the wrong place, and gets
> either 0, or 0x00000001ffffffff, both of which are totally incorrect.
>
> The case of 'atomic64_set()' is _slightly_ less clear, because I think we
> use it mostly for initializers, so atomicity is often not strictly
> required. But at least on x86, we do guarantee that it sets it atomically
> too.
>
> Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
> couple of buglets:
>
> - atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
> the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
> read, but that looks really really bogus.
>
> It _should_ use __atomic64_read(), and the 64-bit versions should use a
> different counter name ("counter64"?) or we should use an inline
> function for atomic_read(), so that the type safety issue gets fixed.
>
> - atomic64_read() is being stupid with the whole loop thing. It _should_
> just do
>
> static inline unsigned long long atomic64_read(atomic64_t *ptr)
> {
> unsigned long long old = __atomic64_read(ptr);
> return cmpxchg8b(ptr, old, old);
> }
>
> and that's it. No loop. cmpxchg8b() will return the right thing.

Using a fixed initial value (instead of __atomic64_read()) is even faster,
it apparently permits cpu to use an appropriate bus transaction.

static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long old = 0LL ;

return cmpxchg8b(&ptr->counter, old, old);
}

I also rewrote cmpxchg8b() to not use %edi register but a generic "+m" constraint.

static inline unsigned long long
cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new)
{
unsigned long low = new;
unsigned long high = new >> 32;

asm volatile(
LOCK_PREFIX "cmpxchg8b %1\n"
: "+A" (old), "+m" (*ptr)
: "b" (low), "c" (high)
);
return old;
}



I got a 4 x speedup on a dual quad core (Intel E5450) machine if all cpus try
to *read* the same atomic64 location.

I tried various init value and got additional 5 % speedup chosing a
value *most probably* different than actual atomic64 one,
like (1LL << 32), with nice asm output...

static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long old = (1LL << 32) ;

return cmpxchg8b(&ptr->counter, old, old);
}

2009-07-02 21:29:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t



On Thu, 2 Jul 2009, Eric Dumazet wrote:
>
> Using a fixed initial value (instead of __atomic64_read()) is even faster,
> it apparently permits cpu to use an appropriate bus transaction.

Yeah, I guess it does a "read-for-write-ownership" and allows the thing to
be done as a single cache transaction.

If we read it first, it will first get the cacheline for shared-read, and
then the cmpxchg8b will need to turn it from shared to exclusive.

Of course, the _optimal_ situation would be if the cmpxchg8b didn't
actually do the write at all when the value matches (and all cores could
just keep it shared), but I guess that's not going to happen.

Too bad there is no pure 8-byte read op. Using MMX has too many downsides.

Btw, your numbers imply that for the atomic64_add_return(), we really
would be much better off not reading the original value at all. Again, in
that case, we really do want the "read-for-write-ownership" cache
transaction, not a read.

Linus

2009-07-02 22:09:09

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] x86: atomic64_t should be 8 bytes aligned

Linus Torvalds a ?crit :
>
> On Thu, 2 Jul 2009, Eric Dumazet wrote:
>> Using a fixed initial value (instead of __atomic64_read()) is even faster,
>> it apparently permits cpu to use an appropriate bus transaction.
>
> Yeah, I guess it does a "read-for-write-ownership" and allows the thing to
> be done as a single cache transaction.
>
> If we read it first, it will first get the cacheline for shared-read, and
> then the cmpxchg8b will need to turn it from shared to exclusive.
>
> Of course, the _optimal_ situation would be if the cmpxchg8b didn't
> actually do the write at all when the value matches (and all cores could
> just keep it shared), but I guess that's not going to happen.
>
> Too bad there is no pure 8-byte read op. Using MMX has too many downsides.
>
> Btw, your numbers imply that for the atomic64_add_return(), we really
> would be much better off not reading the original value at all. Again, in
> that case, we really do want the "read-for-write-ownership" cache
> transaction, not a read.
>

I forgot to mention that if atomic64_t uses to cache lines, my test
program is 10x slower.

So we probably need an __attribute__((aligned(8)) on atomic64_t definition
as well, since alignof(atomic64_t) is 4, not 8 (!!!)

#include <stdio.h>

typedef struct {
unsigned long long counter;
} atomic64_t;

typedef struct {
unsigned long long __attribute__((aligned(8))) counter;
} atomic64_ta;

struct {
int a;
atomic64_t counter;
} s __attribute__((aligned(64)));

struct {
int a;
atomic64_ta counter;
} sa __attribute__((aligned(64)));

int main()
{
printf("alignof(atomic64_t)=%d\n", __alignof__(atomic64_t));
printf("alignof(atomic64_ta)=%d\n", __alignof__(atomic64_t));
printf("alignof(unsigned long long)=%d\n", __alignof__(unsigned long long));
printf("&s.counter=%p\n", &s.counter);
printf("&sa.counter=%p\n", &sa.counter);
return 0;
}

$ gcc -O2 -o test test.c ; ./test
alignof(atomic64_t)=4
alignof(atomic64_ta)=4
alignof(unsigned long long)=8
&s.counter=0x80496c4
&sa.counter=0x8049708
$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.3.3/configure --enable-languages=c,c++ --prefix=/usr
Thread model: posix
gcc version 4.3.3 (GCC)


[PATCH] x86: atomic64_t should be 8 bytes aligned

LOCKED instructions on two cache lines are painful.
Make sure an atomic64_t is 8bytes aligned.

Signed-off-by: Eric Dumazet <[email protected]>

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..1927a56 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -250,7 +250,7 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
/* An 64bit atomic type */

typedef struct {
- unsigned long long counter;
+ unsigned long long __attribute__((__aligned__(8))) counter;
} atomic64_t;

#define ATOMIC64_INIT(val) { (val) }

2009-07-02 23:54:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86: atomic64_t should be 8 bytes aligned



On Fri, 3 Jul 2009, Eric Dumazet wrote:
>
> So we probably need an __attribute__((aligned(8)) on atomic64_t definition
> as well, since alignof(atomic64_t) is 4, not 8 (!!!)

Ack. Ingo, Peter, I'm hoping you're collecting these things?

Linus

2009-07-03 06:06:32

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t

Eric Dumazet a ?crit :
> I got a 4 x speedup on a dual quad core (Intel E5450) machine if all cpus try
> to *read* the same atomic64 location.
>
> I tried various init value and got additional 5 % speedup chosing a
> value *most probably* different than actual atomic64 one,
> like (1LL << 32), with nice asm output...
>
> static inline unsigned long long atomic64_read(atomic64_t *ptr)
> {
> unsigned long long old = (1LL << 32) ;
>
> return cmpxchg8b(&ptr->counter, old, old);
> }
>

My last suggestion would be :

static inline unsigned long long atomic64_read(const atomic64_t *ptr)
{
unsigned long long res;

asm volatile(
"mov %%ebx, %%eax\n\t"
"mov %%ecx, %%edx\n\t"
LOCK_PREFIX "cmpxchg8b %1\n"
: "=A" (res)
: "m" (*ptr)
);
return res;
}

ebx/ecx being read only, and their value can be random, they are not even
mentioned in asm constraints, so gcc is allowed to keep useful values
in these registers.

So the following (stupid) example

for (i = 0; i < 10000000; i++) {
res += atomic64_read(&myvar);
}

gives :
xorl %esi, %esi
.L2:
mov %ebx, %eax
mov %ecx, %edx
lock;cmpxchg8b myvar
addl %eax, %ecx
adcl %edx, %ebx
addl $1, %esi
cmpl $10000000, %esi
jne .L2

2009-07-03 06:14:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: atomic64_t should be 8 bytes aligned


* Linus Torvalds <[email protected]> wrote:

> On Fri, 3 Jul 2009, Eric Dumazet wrote:
> >
> > So we probably need an __attribute__((aligned(8)) on atomic64_t
> > definition as well, since alignof(atomic64_t) is 4, not 8 (!!!)
>
> Ack. Ingo, Peter, I'm hoping you're collecting these things?

Yeah, of course.

Ingo

2009-07-03 11:18:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t


* Eric Dumazet <[email protected]> wrote:

> Linus Torvalds a ?crit :
> >
> > On Wed, 1 Jul 2009, David Howells wrote:
> >> +
> >> +#define ATOMIC64_INIT(i) { (i) }
> >> +#define atomic64_read(v) ((v)->counter)
> >> +#define atomic64_set(v, i) (((v)->counter) = (i))
> >
> > These seem to be buggy.
> >
> > At least "atomic64_read()" needs to make sure to actually read it
> > atomically - otherwise you'll do two 32-bit reads, and that just gets
> > crap. Imagine if somebody is adding 1 to 0x00000000ffffffff, and then
> > "atomic64_read()" reads it as two accesses in the wrong place, and gets
> > either 0, or 0x00000001ffffffff, both of which are totally incorrect.
> >
> > The case of 'atomic64_set()' is _slightly_ less clear, because I think we
> > use it mostly for initializers, so atomicity is often not strictly
> > required. But at least on x86, we do guarantee that it sets it atomically
> > too.
> >
> > Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
> > couple of buglets:
> >
> > - atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
> > the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
> > read, but that looks really really bogus.
> >
> > It _should_ use __atomic64_read(), and the 64-bit versions should use a
> > different counter name ("counter64"?) or we should use an inline
> > function for atomic_read(), so that the type safety issue gets fixed.
> >
> > - atomic64_read() is being stupid with the whole loop thing. It _should_
> > just do
> >
> > static inline unsigned long long atomic64_read(atomic64_t *ptr)
> > {
> > unsigned long long old = __atomic64_read(ptr);
> > return cmpxchg8b(ptr, old, old);
> > }
> >
> > and that's it. No loop. cmpxchg8b() will return the right thing.
>
> Using a fixed initial value (instead of __atomic64_read()) is even faster,
> it apparently permits cpu to use an appropriate bus transaction.
>
> static inline unsigned long long atomic64_read(atomic64_t *ptr)
> {
> unsigned long long old = 0LL ;
>
> return cmpxchg8b(&ptr->counter, old, old);
> }

Good point. I've done a simple:

u64 atomic64_read(atomic64_t *ptr)
{
return cmpxchg8b(ptr, 0, 0);
}

Ingo

2009-07-03 11:26:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t


* Ingo Molnar <[email protected]> wrote:

> Good point. I've done a simple:
>
> u64 atomic64_read(atomic64_t *ptr)
> {
> return cmpxchg8b(ptr, 0, 0);
> }

-1 is better i guess, as that way we can avoid dirtying the
cacheline. Zero is way too common for atomics.

Ingo

2009-07-03 12:02:20

by Ingo Molnar

[permalink] [raw]
Subject: [patch] x86: atomic64_t: Improve atomic64_add_return()


* Linus Torvalds <[email protected]> wrote:

> On Thu, 2 Jul 2009, Eric Dumazet wrote:
> >
> > Using a fixed initial value (instead of __atomic64_read()) is even faster,
> > it apparently permits cpu to use an appropriate bus transaction.
>
> Yeah, I guess it does a "read-for-write-ownership" and allows the
> thing to be done as a single cache transaction.
>
> If we read it first, it will first get the cacheline for
> shared-read, and then the cmpxchg8b will need to turn it from
> shared to exclusive.
>
> Of course, the _optimal_ situation would be if the cmpxchg8b
> didn't actually do the write at all when the value matches (and
> all cores could just keep it shared), but I guess that's not going
> to happen.
>
> Too bad there is no pure 8-byte read op. Using MMX has too many
> downsides.
>
> Btw, your numbers imply that for the atomic64_add_return(), we
> really would be much better off not reading the original value at
> all. Again, in that case, we really do want the
> "read-for-write-ownership" cache transaction, not a read.

Something like the patch below?

Please review it carefully, as the perfcounter exposure to the
conditional-arithmetics atomic64_t APIs is very low:

earth4:~/tip> for N in $(git grep atomic64_ | grep perf_ |
sed 's/(/ /g'); do echo $N; done | grep ^atomic64_ | sort | uniq -c | sort -n

1 atomic64_add_negative
1 atomic64_inc_return
2 atomic64_xchg
3 atomic64_cmpxchg
3 atomic64_sub
7 atomic64_t
11 atomic64_add
21 atomic64_set
22 atomic64_read

So while i have tested it on a 32-bit box, it's only lightly tested
(and possibly broken) due to the low exposure of the API.

Thanks,

Ingo

----------------------->
Subject: x86: atomic64_t: Improve atomic64_add_return()
From: Ingo Molnar <[email protected]>
Date: Fri Jul 03 12:39:07 CEST 2009

Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/atomic64_32.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

Index: linux/arch/x86/lib/atomic64_32.c
===================================================================
--- linux.orig/arch/x86/lib/atomic64_32.c
+++ linux/arch/x86/lib/atomic64_32.c
@@ -76,13 +76,22 @@ u64 atomic64_read(atomic64_t *ptr)
*/
u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
{
- u64 old_val, new_val;
+ /*
+ * Try first with a (probably incorrect) assumption about
+ * what we have there. We'll do two loops most likely,
+ * but we'll get an ownership MESI transaction straight away
+ * instead of a read transaction followed by a
+ * flush-for-ownership transaction:
+ */
+ u64 old_val, new_val, real_val = 1ULL << 32;

do {
- old_val = atomic_read(ptr);
+ old_val = real_val;
new_val = old_val + delta;

- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+ real_val = atomic64_cmpxchg(ptr, old_val, new_val);
+
+ } while (real_val != old_val);

return new_val;
}

2009-07-03 12:27:21

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations

Ingo Molnar a ?crit :
> * Linus Torvalds <[email protected]> wrote:
>
>> On Thu, 2 Jul 2009, Eric Dumazet wrote:
>>> Using a fixed initial value (instead of __atomic64_read()) is even faster,
>>> it apparently permits cpu to use an appropriate bus transaction.
>> Yeah, I guess it does a "read-for-write-ownership" and allows the
>> thing to be done as a single cache transaction.
>>
>> If we read it first, it will first get the cacheline for
>> shared-read, and then the cmpxchg8b will need to turn it from
>> shared to exclusive.
>>
>> Of course, the _optimal_ situation would be if the cmpxchg8b
>> didn't actually do the write at all when the value matches (and
>> all cores could just keep it shared), but I guess that's not going
>> to happen.
>>
>> Too bad there is no pure 8-byte read op. Using MMX has too many
>> downsides.
>>
>> Btw, your numbers imply that for the atomic64_add_return(), we
>> really would be much better off not reading the original value at
>> all. Again, in that case, we really do want the
>> "read-for-write-ownership" cache transaction, not a read.
>
> Something like the patch below?
>
> Please review it carefully, as the perfcounter exposure to the
> conditional-arithmetics atomic64_t APIs is very low:
>
> earth4:~/tip> for N in $(git grep atomic64_ | grep perf_ |
> sed 's/(/ /g'); do echo $N; done | grep ^atomic64_ | sort | uniq -c | sort -n
>
> 1 atomic64_add_negative
> 1 atomic64_inc_return
> 2 atomic64_xchg
> 3 atomic64_cmpxchg
> 3 atomic64_sub
> 7 atomic64_t
> 11 atomic64_add
> 21 atomic64_set
> 22 atomic64_read
>
> So while i have tested it on a 32-bit box, it's only lightly tested
> (and possibly broken) due to the low exposure of the API.
>
> Thanks,
>
> Ingo
>
> ----------------------->
> Subject: x86: atomic64_t: Improve atomic64_add_return()
> From: Ingo Molnar <[email protected]>
> Date: Fri Jul 03 12:39:07 CEST 2009
>
> Linus noted (based on Eric Dumazet's numbers) that we would
> probably be better off not trying an atomic_read() in
> atomic64_add_return() but intead intentionally let the first
> cmpxchg8b fail - to get a cache-friendly 'give me ownership
> of this cacheline' transaction. That can then be followed
> by the real cmpxchg8b which sets the value local to the CPU.
>
> Reported-by: Linus Torvalds <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/lib/atomic64_32.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> Index: linux/arch/x86/lib/atomic64_32.c
> ===================================================================
> --- linux.orig/arch/x86/lib/atomic64_32.c
> +++ linux/arch/x86/lib/atomic64_32.c
> @@ -76,13 +76,22 @@ u64 atomic64_read(atomic64_t *ptr)
> */
> u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
> {
> - u64 old_val, new_val;
> + /*
> + * Try first with a (probably incorrect) assumption about
> + * what we have there. We'll do two loops most likely,
> + * but we'll get an ownership MESI transaction straight away
> + * instead of a read transaction followed by a
> + * flush-for-ownership transaction:
> + */
> + u64 old_val, new_val, real_val = 1ULL << 32;
>
> do {
> - old_val = atomic_read(ptr);
> + old_val = real_val;
> new_val = old_val + delta;
>
> - } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
> + real_val = atomic64_cmpxchg(ptr, old_val, new_val);
> +
> + } while (real_val != old_val);
>
> return new_val;
> }

Seems cool, I am going to apply it and test it too.

I cooked following patch, to address _cmpxchg() and _read().

[PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations

atomic64_cmpxchg() can use existing optimized __cmpxchg64() function,
no need to redefine another cmpxchg8b() variant (BTW this variant had
aliasing problems)

As suggested by Linus, atomic64_read() doesnt have to loop.

We can implement it with a single cmpxchg8b instruction, and avoid
initial reading of the value so that cpu can use a "read-for-write-ownership"
bus transaction.

We do not call __cmpxchg64() here because ebx/ecx values are not changed,
lowering register pressure for the compiler.

This saves 1008 bytes of code on vmlinux (CONFIG_PERF_COUNTERS=y)

Performance results (with a user mode program doing 10.000.000 atomic64_read()
on 2x4 cpus, all fighting on same atomic64_t location)

before :

Performance counter stats for './atomic64_slow':

62897.582084 task-clock-msecs # 7.670 CPUs
186 context-switches # 0.000 M/sec
4 CPU-migrations # 0.000 M/sec
132 page-faults # 0.000 M/sec
188218718344 cycles # 2992.463 M/sec
1017372948 instructions # 0.005 IPC
132425024 cache-references # 2.105 M/sec
109006338 cache-misses # 1.733 M/sec

8.200718697 seconds time elapsed

(2352 cycles per atomic64_read())

after :
Performance counter stats for './atomic64_fast':

10912.935654 task-clock-msecs # 6.663 CPUs
41 context-switches # 0.000 M/sec
4 CPU-migrations # 0.000 M/sec
132 page-faults # 0.000 M/sec
32657177400 cycles # 2992.520 M/sec
838338807 instructions # 0.026 IPC
41855076 cache-references # 3.835 M/sec
36402586 cache-misses # 3.336 M/sec

1.637767671 seconds time elapsed

(408 cycles per atomic64_read())

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
---
arch/x86/include/asm/atomic_32.h | 35 +++++++++--------------------
1 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..14c9e9c 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -265,29 +265,10 @@ typedef struct {
#define __atomic64_read(ptr) ((ptr)->counter)

static inline unsigned long long
-cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new)
-{
- asm volatile(
-
- LOCK_PREFIX "cmpxchg8b (%[ptr])\n"
-
- : "=A" (old)
-
- : [ptr] "D" (ptr),
- "A" (old),
- "b" (ll_low(new)),
- "c" (ll_high(new))
-
- : "memory");
-
- return old;
-}
-
-static inline unsigned long long
atomic64_cmpxchg(atomic64_t *ptr, unsigned long long old_val,
unsigned long long new_val)
{
- return cmpxchg8b(&ptr->counter, old_val, new_val);
+ return __cmpxchg64(&ptr->counter, old_val, new_val);
}

/**
@@ -333,9 +314,17 @@ static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long curr_val;

- do {
- curr_val = __atomic64_read(ptr);
- } while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val);
+ /*
+ * "+A" constraint to make sure gcc wont use eax or edx
+ * as a base or offset register for address computation
+ */
+ asm volatile(
+ "mov\t%%ebx, %%eax\n\t"
+ "mov\t%%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "+A" (curr_val)
+ : "m" (*__xg(ptr))
+ );

return curr_val;
}

2009-07-03 12:27:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t


* Eric Dumazet <[email protected]> wrote:

> My last suggestion would be :
>
> static inline unsigned long long atomic64_read(const atomic64_t *ptr)
> {
> unsigned long long res;
>
> asm volatile(
> "mov %%ebx, %%eax\n\t"
> "mov %%ecx, %%edx\n\t"
> LOCK_PREFIX "cmpxchg8b %1\n"
> : "=A" (res)
> : "m" (*ptr)
> );
> return res;
> }
>
> ebx/ecx being read only, and their value can be random, they are
> not even mentioned in asm constraints, so gcc is allowed to keep
> useful values in these registers.
>
> So the following (stupid) example
>
> for (i = 0; i < 10000000; i++) {
> res += atomic64_read(&myvar);
> }
>
> gives :
> xorl %esi, %esi
> .L2:
> mov %ebx, %eax
> mov %ecx, %edx
> lock;cmpxchg8b myvar
> addl %eax, %ecx
> adcl %edx, %ebx
> addl $1, %esi
> cmpl $10000000, %esi
> jne .L2

Ok, agreed. We dont want to inline it - cmpxchg8b is way too fat -
but your code above is a valid optimization for the out-of-line
variant as well. So i have applied it as such, will post the whole
atomic64_t series soon, after some testing.

Ingo

2009-07-03 12:41:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations


* Eric Dumazet <[email protected]> wrote:

> Seems cool, I am going to apply it and test it too.
>
> I cooked following patch, to address _cmpxchg() and _read().

ok, lemme push out the series, i already picked up many of your
suggestions. You should get them as tip-bot commit notifications in
this discussion, threaded around a single mail, with you Cc:-ed.

Ingo

2009-07-03 12:41:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t

Ingo Molnar a ?crit :
> * Eric Dumazet <[email protected]> wrote:
>
>> My last suggestion would be :
>>
>> static inline unsigned long long atomic64_read(const atomic64_t *ptr)
>> {
>> unsigned long long res;
>>
>> asm volatile(
>> "mov %%ebx, %%eax\n\t"
>> "mov %%ecx, %%edx\n\t"
>> LOCK_PREFIX "cmpxchg8b %1\n"
>> : "=A" (res)
>> : "m" (*ptr)
>> );
>> return res;
>> }
>>
>> ebx/ecx being read only, and their value can be random, they are
>> not even mentioned in asm constraints, so gcc is allowed to keep
>> useful values in these registers.
>>
>> So the following (stupid) example
>>
>> for (i = 0; i < 10000000; i++) {
>> res += atomic64_read(&myvar);
>> }
>>
>> gives :
>> xorl %esi, %esi
>> .L2:
>> mov %ebx, %eax
>> mov %ecx, %edx
>> lock;cmpxchg8b myvar
>> addl %eax, %ecx
>> adcl %edx, %ebx
>> addl $1, %esi
>> cmpl $10000000, %esi
>> jne .L2
>
> Ok, agreed. We dont want to inline it - cmpxchg8b is way too fat -
> but your code above is a valid optimization for the out-of-line
> variant as well. So i have applied it as such, will post the whole
> atomic64_t series soon, after some testing.

I just changed "=A" constraint to "+A" or gcc could use %edx/%eax in "ptr"
address computation. I discovered this after crashing my box :)

2009-07-03 12:44:19

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too

Commit-ID: bbf2a330d92c5afccfd17592ba9ccd50f41cf748
Gitweb: http://git.kernel.org/tip/bbf2a330d92c5afccfd17592ba9ccd50f41cf748
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 00:08:26 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:38 +0200

x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too

Locked instructions on two cache lines at once are painful. If
atomic64_t uses two cache lines, my test program is 10x slower.

The chance for that is significant: 4/32 or 12.5%.

Make sure an atomic64_t is 8 bytes aligned.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
[ changed it to __aligned(8) as per Andrew's suggestion ]
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..ae0fbb5 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -250,7 +250,7 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
/* An 64bit atomic type */

typedef struct {
- unsigned long long counter;
+ unsigned long long __aligned(8) counter;
} atomic64_t;

#define ATOMIC64_INIT(val) { (val) }

2009-07-03 12:44:33

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_add_return()

Commit-ID: 824975ef190e7dcb77718d1cc2cb53769b16d918
Gitweb: http://git.kernel.org/tip/824975ef190e7dcb77718d1cc2cb53769b16d918
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 12:39:07 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:42 +0200

x86: atomic64: Improve atomic64_add_return()

Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 5fc1e2c..6195962 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -76,13 +76,22 @@ u64 atomic64_read(atomic64_t *ptr)
*/
u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
{
- u64 old_val, new_val;
+ /*
+ * Try first with a (probably incorrect) assumption about
+ * what we have there. We'll do two loops most likely,
+ * but we'll get an ownership MESI transaction straight away
+ * instead of a read transaction followed by a
+ * flush-for-ownership transaction:
+ */
+ u64 old_val, new_val, real_val = 1ULL << 32;

do {
- old_val = atomic_read(ptr);
+ old_val = real_val;
new_val = old_val + delta;

- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+ real_val = atomic64_cmpxchg(ptr, old_val, new_val);
+
+ } while (real_val != old_val);

return new_val;
}

2009-07-03 12:44:46

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file

Commit-ID: b7882b7c65abb00194bdb3d4a22d27d70fcc59ba
Gitweb: http://git.kernel.org/tip/b7882b7c65abb00194bdb3d4a22d27d70fcc59ba
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 13:26:39 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:39 +0200

x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file

Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.

Move them to a separate file: arch/x86/lib/atomic64_32.c

Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.

This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.

[ More intrusive changes are not yet done in this patch. ]

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 137 ++++--------------------
arch/x86/lib/Makefile | 1 +
arch/x86/lib/atomic64_32.c | 216 ++++++++++++++++++++++++++++++++++++++
3 files changed, 237 insertions(+), 117 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index ae0fbb5..311a43e 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -250,7 +250,7 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
/* An 64bit atomic type */

typedef struct {
- unsigned long long __aligned(8) counter;
+ u64 __aligned(8) counter;
} atomic64_t;

#define ATOMIC64_INIT(val) { (val) }
@@ -264,31 +264,7 @@ typedef struct {
*/
#define __atomic64_read(ptr) ((ptr)->counter)

-static inline unsigned long long
-cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new)
-{
- asm volatile(
-
- LOCK_PREFIX "cmpxchg8b (%[ptr])\n"
-
- : "=A" (old)
-
- : [ptr] "D" (ptr),
- "A" (old),
- "b" (ll_low(new)),
- "c" (ll_high(new))
-
- : "memory");
-
- return old;
-}
-
-static inline unsigned long long
-atomic64_cmpxchg(atomic64_t *ptr, unsigned long long old_val,
- unsigned long long new_val)
-{
- return cmpxchg8b(&ptr->counter, old_val, new_val);
-}
+extern u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val);

/**
* atomic64_xchg - xchg atomic64 variable
@@ -298,18 +274,7 @@ atomic64_cmpxchg(atomic64_t *ptr, unsigned long long old_val,
* Atomically xchgs the value of @ptr to @new_val and returns
* the old value.
*/
-
-static inline unsigned long long
-atomic64_xchg(atomic64_t *ptr, unsigned long long new_val)
-{
- unsigned long long old_val;
-
- do {
- old_val = atomic_read(ptr);
- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
-
- return old_val;
-}
+extern u64 atomic64_xchg(atomic64_t *ptr, u64 new_val);

/**
* atomic64_set - set atomic64 variable
@@ -318,10 +283,7 @@ atomic64_xchg(atomic64_t *ptr, unsigned long long new_val)
*
* Atomically sets the value of @ptr to @new_val.
*/
-static inline void atomic64_set(atomic64_t *ptr, unsigned long long new_val)
-{
- atomic64_xchg(ptr, new_val);
-}
+extern void atomic64_set(atomic64_t *ptr, u64 new_val);

/**
* atomic64_read - read atomic64 variable
@@ -329,16 +291,7 @@ static inline void atomic64_set(atomic64_t *ptr, unsigned long long new_val)
*
* Atomically reads the value of @ptr and returns it.
*/
-static inline unsigned long long atomic64_read(atomic64_t *ptr)
-{
- unsigned long long curr_val;
-
- do {
- curr_val = __atomic64_read(ptr);
- } while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val);
-
- return curr_val;
-}
+extern u64 atomic64_read(atomic64_t *ptr);

/**
* atomic64_add_return - add and return
@@ -347,34 +300,14 @@ static inline unsigned long long atomic64_read(atomic64_t *ptr)
*
* Atomically adds @delta to @ptr and returns @delta + *@ptr
*/
-static inline unsigned long long
-atomic64_add_return(unsigned long long delta, atomic64_t *ptr)
-{
- unsigned long long old_val, new_val;
-
- do {
- old_val = atomic_read(ptr);
- new_val = old_val + delta;
-
- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
-
- return new_val;
-}
-
-static inline long atomic64_sub_return(unsigned long long delta, atomic64_t *ptr)
-{
- return atomic64_add_return(-delta, ptr);
-}
+extern u64 atomic64_add_return(u64 delta, atomic64_t *ptr);

-static inline long atomic64_inc_return(atomic64_t *ptr)
-{
- return atomic64_add_return(1, ptr);
-}
-
-static inline long atomic64_dec_return(atomic64_t *ptr)
-{
- return atomic64_sub_return(1, ptr);
-}
+/*
+ * Other variants with different arithmetic operators:
+ */
+extern u64 atomic64_sub_return(u64 delta, atomic64_t *ptr);
+extern u64 atomic64_inc_return(atomic64_t *ptr);
+extern u64 atomic64_dec_return(atomic64_t *ptr);

/**
* atomic64_add - add integer to atomic64 variable
@@ -383,10 +316,7 @@ static inline long atomic64_dec_return(atomic64_t *ptr)
*
* Atomically adds @delta to @ptr.
*/
-static inline void atomic64_add(unsigned long long delta, atomic64_t *ptr)
-{
- atomic64_add_return(delta, ptr);
-}
+extern void atomic64_add(u64 delta, atomic64_t *ptr);

/**
* atomic64_sub - subtract the atomic64 variable
@@ -395,10 +325,7 @@ static inline void atomic64_add(unsigned long long delta, atomic64_t *ptr)
*
* Atomically subtracts @delta from @ptr.
*/
-static inline void atomic64_sub(unsigned long long delta, atomic64_t *ptr)
-{
- atomic64_add(-delta, ptr);
-}
+extern void atomic64_sub(u64 delta, atomic64_t *ptr);

/**
* atomic64_sub_and_test - subtract value from variable and test result
@@ -409,13 +336,7 @@ static inline void atomic64_sub(unsigned long long delta, atomic64_t *ptr)
* true if the result is zero, or false for all
* other cases.
*/
-static inline int
-atomic64_sub_and_test(unsigned long long delta, atomic64_t *ptr)
-{
- unsigned long long old_val = atomic64_sub_return(delta, ptr);
-
- return old_val == 0;
-}
+extern int atomic64_sub_and_test(u64 delta, atomic64_t *ptr);

/**
* atomic64_inc - increment atomic64 variable
@@ -423,10 +344,7 @@ atomic64_sub_and_test(unsigned long long delta, atomic64_t *ptr)
*
* Atomically increments @ptr by 1.
*/
-static inline void atomic64_inc(atomic64_t *ptr)
-{
- atomic64_add(1, ptr);
-}
+extern void atomic64_inc(atomic64_t *ptr);

/**
* atomic64_dec - decrement atomic64 variable
@@ -434,10 +352,7 @@ static inline void atomic64_inc(atomic64_t *ptr)
*
* Atomically decrements @ptr by 1.
*/
-static inline void atomic64_dec(atomic64_t *ptr)
-{
- atomic64_sub(1, ptr);
-}
+extern void atomic64_dec(atomic64_t *ptr);

/**
* atomic64_dec_and_test - decrement and test
@@ -447,10 +362,7 @@ static inline void atomic64_dec(atomic64_t *ptr)
* returns true if the result is 0, or false for all other
* cases.
*/
-static inline int atomic64_dec_and_test(atomic64_t *ptr)
-{
- return atomic64_sub_and_test(1, ptr);
-}
+extern int atomic64_dec_and_test(atomic64_t *ptr);

/**
* atomic64_inc_and_test - increment and test
@@ -460,10 +372,7 @@ static inline int atomic64_dec_and_test(atomic64_t *ptr)
* and returns true if the result is zero, or false for all
* other cases.
*/
-static inline int atomic64_inc_and_test(atomic64_t *ptr)
-{
- return atomic64_sub_and_test(-1, ptr);
-}
+extern int atomic64_inc_and_test(atomic64_t *ptr);

/**
* atomic64_add_negative - add and test if negative
@@ -474,13 +383,7 @@ static inline int atomic64_inc_and_test(atomic64_t *ptr)
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
-static inline int
-atomic64_add_negative(unsigned long long delta, atomic64_t *ptr)
-{
- long long old_val = atomic64_add_return(delta, ptr);
-
- return old_val < 0;
-}
+extern int atomic64_add_negative(u64 delta, atomic64_t *ptr);

#include <asm-generic/atomic-long.h>
#endif /* _ASM_X86_ATOMIC_32_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f9d3563..c3c657c 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -10,6 +10,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o

ifeq ($(CONFIG_X86_32),y)
+ lib-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
lib-y += semaphore_32.o string_32.o
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
new file mode 100644
index 0000000..d21e725
--- /dev/null
+++ b/arch/x86/lib/atomic64_32.c
@@ -0,0 +1,216 @@
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/processor.h>
+#include <asm/cmpxchg.h>
+#include <asm/atomic.h>
+
+static inline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
+{
+ asm volatile(
+
+ LOCK_PREFIX "cmpxchg8b (%[ptr])\n"
+
+ : "=A" (old)
+
+ : [ptr] "D" (ptr),
+ "A" (old),
+ "b" (ll_low(new)),
+ "c" (ll_high(new))
+
+ : "memory");
+
+ return old;
+}
+
+u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val)
+{
+ return cmpxchg8b(&ptr->counter, old_val, new_val);
+}
+
+/**
+ * atomic64_xchg - xchg atomic64 variable
+ * @ptr: pointer to type atomic64_t
+ * @new_val: value to assign
+ *
+ * Atomically xchgs the value of @ptr to @new_val and returns
+ * the old value.
+ */
+
+u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
+{
+ u64 old_val;
+
+ do {
+ old_val = atomic_read(ptr);
+ } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+
+ return old_val;
+}
+
+/**
+ * atomic64_set - set atomic64 variable
+ * @ptr: pointer to type atomic64_t
+ * @new_val: value to assign
+ *
+ * Atomically sets the value of @ptr to @new_val.
+ */
+void atomic64_set(atomic64_t *ptr, u64 new_val)
+{
+ atomic64_xchg(ptr, new_val);
+}
+
+/**
+ * atomic64_read - read atomic64 variable
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically reads the value of @ptr and returns it.
+ */
+u64 atomic64_read(atomic64_t *ptr)
+{
+ u64 curr_val;
+
+ do {
+ curr_val = __atomic64_read(ptr);
+ } while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val);
+
+ return curr_val;
+}
+
+/**
+ * atomic64_add_return - add and return
+ * @delta: integer value to add
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically adds @delta to @ptr and returns @delta + *@ptr
+ */
+u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
+{
+ u64 old_val, new_val;
+
+ do {
+ old_val = atomic_read(ptr);
+ new_val = old_val + delta;
+
+ } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+
+ return new_val;
+}
+
+u64 atomic64_sub_return(u64 delta, atomic64_t *ptr)
+{
+ return atomic64_add_return(-delta, ptr);
+}
+
+u64 atomic64_inc_return(atomic64_t *ptr)
+{
+ return atomic64_add_return(1, ptr);
+}
+
+u64 atomic64_dec_return(atomic64_t *ptr)
+{
+ return atomic64_sub_return(1, ptr);
+}
+
+/**
+ * atomic64_add - add integer to atomic64 variable
+ * @delta: integer value to add
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically adds @delta to @ptr.
+ */
+void atomic64_add(u64 delta, atomic64_t *ptr)
+{
+ atomic64_add_return(delta, ptr);
+}
+
+/**
+ * atomic64_sub - subtract the atomic64 variable
+ * @delta: integer value to subtract
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically subtracts @delta from @ptr.
+ */
+void atomic64_sub(u64 delta, atomic64_t *ptr)
+{
+ atomic64_add(-delta, ptr);
+}
+
+/**
+ * atomic64_sub_and_test - subtract value from variable and test result
+ * @delta: integer value to subtract
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically subtracts @delta from @ptr and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)
+{
+ u64 old_val = atomic64_sub_return(delta, ptr);
+
+ return old_val == 0;
+}
+
+/**
+ * atomic64_inc - increment atomic64 variable
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically increments @ptr by 1.
+ */
+void atomic64_inc(atomic64_t *ptr)
+{
+ atomic64_add(1, ptr);
+}
+
+/**
+ * atomic64_dec - decrement atomic64 variable
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically decrements @ptr by 1.
+ */
+void atomic64_dec(atomic64_t *ptr)
+{
+ atomic64_sub(1, ptr);
+}
+
+/**
+ * atomic64_dec_and_test - decrement and test
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically decrements @ptr by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+int atomic64_dec_and_test(atomic64_t *ptr)
+{
+ return atomic64_sub_and_test(1, ptr);
+}
+
+/**
+ * atomic64_inc_and_test - increment and test
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically increments @ptr by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+int atomic64_inc_and_test(atomic64_t *ptr)
+{
+ return atomic64_sub_and_test(-1, ptr);
+}
+
+/**
+ * atomic64_add_negative - add and test if negative
+ * @delta: integer value to add
+ * @ptr: pointer to type atomic64_t
+ *
+ * Atomically adds @delta to @ptr and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+int atomic64_add_negative(u64 delta, atomic64_t *ptr)
+{
+ long long old_val = atomic64_add_return(delta, ptr);
+
+ return old_val < 0;
+}

2009-07-03 12:44:58

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read()

Commit-ID: aacf682fd8c66b57383c407eecd9d4a28264ee91
Gitweb: http://git.kernel.org/tip/aacf682fd8c66b57383c407eecd9d4a28264ee91
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 12:14:27 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:40 +0200

x86: atomic64: Improve atomic64_read()

Linus noticed that the 32-bit version of atomic64_read() was
being overly complex with re-reading the value and doing a
retry loop over that.

Instead we can just rely on cmpxchg8b returning either the new
value or returning the current value.

We can use any 'old' value, which will be faster as it can be
loaded via immediates. Using some value that is not equal to
the real value in memory the instruction gets faster.

This also has the advantage that the CPU could avoid dirtying
the cacheline.

Reported-by: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index d21e725..afa5d44 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -67,13 +67,9 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
*/
u64 atomic64_read(atomic64_t *ptr)
{
- u64 curr_val;
+ u64 old = 1LL << 32;

- do {
- curr_val = __atomic64_read(ptr);
- } while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val);
-
- return curr_val;
+ return cmpxchg8b(&ptr->counter, old, old);
}

/**

2009-07-03 12:45:24

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve cmpxchg8b()

Commit-ID: 69237f94e65d3d7f539f1adb98ef68685c595004
Gitweb: http://git.kernel.org/tip/69237f94e65d3d7f539f1adb98ef68685c595004
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 13:26:41 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:41 +0200

x86: atomic64: Improve cmpxchg8b()

Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
constraint, to increase compiler freedom in code generation and
possibly better code.

Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index afa5d44..5fc1e2c 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -6,19 +6,14 @@

static inline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
{
- asm volatile(
-
- LOCK_PREFIX "cmpxchg8b (%[ptr])\n"
-
- : "=A" (old)
-
- : [ptr] "D" (ptr),
- "A" (old),
- "b" (ll_low(new)),
- "c" (ll_high(new))
-
- : "memory");
+ u32 low = new;
+ u32 high = new >> 32;

+ asm volatile(
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "+A" (old), "+m" (*ptr)
+ : "b" (low), "c" (high)
+ );
return old;
}

2009-07-03 12:45:46

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Reduce size of functions

Commit-ID: 3ac805d2afd3fa4a07cb5bcf352fd7fa83f28935
Gitweb: http://git.kernel.org/tip/3ac805d2afd3fa4a07cb5bcf352fd7fa83f28935
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 12:51:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:43 +0200

x86: atomic64: Reduce size of functions

cmpxchg8b is a huge instruction in terms of register footprint,
we almost never want to inline it, not even within the same
code module.

GCC 4.3 still messes up for two functions, under-judging the
true cost of this instruction - so annotate two key functions
to reduce the bloat:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
1763 0 0 1763 6e3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

Cc: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 6195962..a910238 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -4,7 +4,7 @@
#include <asm/cmpxchg.h>
#include <asm/atomic.h>

-static inline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
+static noinline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
{
u32 low = new;
u32 high = new >> 32;
@@ -74,7 +74,7 @@ u64 atomic64_read(atomic64_t *ptr)
*
* Atomically adds @delta to @ptr and returns @delta + *@ptr
*/
-u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
+noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
{
/*
* Try first with a (probably incorrect) assumption about

2009-07-03 12:46:21

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Fix unclean type use in atomic64_xchg()

Commit-ID: 199e23780a7e75c63a9e3d1108804e3af450ea3e
Gitweb: http://git.kernel.org/tip/199e23780a7e75c63a9e3d1108804e3af450ea3e
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 13:02:39 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:46 +0200

x86: atomic64: Fix unclean type use in atomic64_xchg()

Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.

Fix atomic64_xchg() to use __atomic64_read() instead.

No code changed:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.after.asm

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index a910238..fd28fd3 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -36,7 +36,7 @@ u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
u64 old_val;

do {
- old_val = atomic_read(ptr);
+ old_val = __atomic64_read(ptr);
} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);

return old_val;

2009-07-03 12:46:47

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read()

Commit-ID: 9103cab145a93cef13378632b332a824758a2f6f
Gitweb: http://git.kernel.org/tip/9103cab145a93cef13378632b332a824758a2f6f
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 13:23:02 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 13:26:48 +0200

x86: atomic64: Improve atomic64_read()

Optimize atomic64_read() as a special open-coded
cmpxchg8b variant. This generates nicer code:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
431 0 0 431 1af atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
2bdfd4bd1f6b7b61b7fc127aef90ce3b atomic64_32.o.after.asm

Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index fd28fd3..079ac8e 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -62,9 +62,17 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
*/
u64 atomic64_read(atomic64_t *ptr)
{
- u64 old = 1LL << 32;
+ u64 res;

- return cmpxchg8b(&ptr->counter, old, old);
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "=A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
}

/**

2009-07-03 12:46:58

by Paul Mackerras

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP

Commit-ID: c6f91967a6015bbc270236aeb76a36771b195ebc
Gitweb: http://git.kernel.org/tip/c6f91967a6015bbc270236aeb76a36771b195ebc
Author: Paul Mackerras <[email protected]>
AuthorDate: Thu, 2 Jul 2009 08:57:12 +1000
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 14:32:21 +0200

x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP

Occasionally we get bugs where atomic_read or atomic_set are
used on atomic64_t variables or vice versa. These bugs don't
generate warnings on x86 because atomic_read and atomic_set are
coded as macros rather than C functions, so we don't get any
type-checking on their arguments; similarly for atomic64_read
and atomic64_set in 64-bit kernels.

This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily. It
also converts atomic_cmpxchg and atomic_xchg, and
atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
type-checking on their arguments too.

Compiling a typical 64-bit x86 config, this generates no new
warnings, and the vmlinux text is 86 bytes smaller.

Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 16 +++++++++++--
arch/x86/include/asm/atomic_64.h | 42 ++++++++++++++++++++++++++++++-------
2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index b551bb1..aa045de 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -31,7 +31,10 @@ static inline int atomic_read(const atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -203,8 +206,15 @@ static inline int atomic_sub_return(int i, atomic_t *v)
return atomic_add_return(-i, v);
}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline int atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is already a given value
diff --git a/arch/x86/include/asm/atomic_64.h b/arch/x86/include/asm/atomic_64.h
index 0d63602..d605dc2 100644
--- a/arch/x86/include/asm/atomic_64.h
+++ b/arch/x86/include/asm/atomic_64.h
@@ -18,7 +18,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -27,7 +30,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -192,7 +198,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
* Atomically reads the value of @v.
* Doesn't imply a read memory barrier.
*/
-#define atomic64_read(v) ((v)->counter)
+static inline long atomic64_read(const atomic64_t *v)
+{
+ return v->counter;
+}

/**
* atomic64_set - set atomic64 variable
@@ -201,7 +210,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic64_set(v, i) (((v)->counter) = (i))
+static inline void atomic64_set(atomic64_t *v, long i)
+{
+ v->counter = i;
+}

/**
* atomic64_add - add integer to atomic64 variable
@@ -355,11 +367,25 @@ static inline long atomic64_sub_return(long i, atomic64_t *v)
#define atomic64_inc_return(v) (atomic64_add_return(1, (v)))
#define atomic64_dec_return(v) (atomic64_sub_return(1, (v)))

-#define atomic64_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic64_xchg(atomic64_t *v, long new)
+{
+ return xchg(&v->counter, new);
+}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline long atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is a given value

2009-07-03 12:49:55

by Paul Mackerras

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP

Commit-ID: 8e049ef054f1cc765f05f13e1396bb9a17c19e66
Gitweb: http://git.kernel.org/tip/8e049ef054f1cc765f05f13e1396bb9a17c19e66
Author: Paul Mackerras <[email protected]>
AuthorDate: Thu, 2 Jul 2009 08:57:12 +1000
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 14:42:39 +0200

x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP

Occasionally we get bugs where atomic_read or atomic_set are
used on atomic64_t variables or vice versa. These bugs don't
generate warnings on x86 because atomic_read and atomic_set are
coded as macros rather than C functions, so we don't get any
type-checking on their arguments; similarly for atomic64_read
and atomic64_set in 64-bit kernels.

This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily. It
also converts atomic_cmpxchg and atomic_xchg, and
atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
type-checking on their arguments too.

Compiling a typical 64-bit x86 config, this generates no new
warnings, and the vmlinux text is 86 bytes smaller.

Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 16 +++++++++++--
arch/x86/include/asm/atomic_64.h | 42 ++++++++++++++++++++++++++++++-------
2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index b551bb1..aa045de 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -31,7 +31,10 @@ static inline int atomic_read(const atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -203,8 +206,15 @@ static inline int atomic_sub_return(int i, atomic_t *v)
return atomic_add_return(-i, v);
}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline int atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is already a given value
diff --git a/arch/x86/include/asm/atomic_64.h b/arch/x86/include/asm/atomic_64.h
index 0d63602..d605dc2 100644
--- a/arch/x86/include/asm/atomic_64.h
+++ b/arch/x86/include/asm/atomic_64.h
@@ -18,7 +18,10 @@
*
* Atomically reads the value of @v.
*/
-#define atomic_read(v) ((v)->counter)
+static inline int atomic_read(const atomic_t *v)
+{
+ return v->counter;
+}

/**
* atomic_set - set atomic variable
@@ -27,7 +30,10 @@
*
* Atomically sets the value of @v to @i.
*/
-#define atomic_set(v, i) (((v)->counter) = (i))
+static inline void atomic_set(atomic_t *v, int i)
+{
+ v->counter = i;
+}

/**
* atomic_add - add integer to atomic variable
@@ -192,7 +198,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
* Atomically reads the value of @v.
* Doesn't imply a read memory barrier.
*/
-#define atomic64_read(v) ((v)->counter)
+static inline long atomic64_read(const atomic64_t *v)
+{
+ return v->counter;
+}

/**
* atomic64_set - set atomic64 variable
@@ -201,7 +210,10 @@ static inline int atomic_sub_return(int i, atomic_t *v)
*
* Atomically sets the value of @v to @i.
*/
-#define atomic64_set(v, i) (((v)->counter) = (i))
+static inline void atomic64_set(atomic64_t *v, long i)
+{
+ v->counter = i;
+}

/**
* atomic64_add - add integer to atomic64 variable
@@ -355,11 +367,25 @@ static inline long atomic64_sub_return(long i, atomic64_t *v)
#define atomic64_inc_return(v) (atomic64_add_return(1, (v)))
#define atomic64_dec_return(v) (atomic64_sub_return(1, (v)))

-#define atomic64_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic64_xchg(atomic64_t *v, long new)
+{
+ return xchg(&v->counter, new);
+}

-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
+static inline long atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return cmpxchg(&v->counter, old, new);
+}
+
+static inline long atomic_xchg(atomic_t *v, int new)
+{
+ return xchg(&v->counter, new);
+}

/**
* atomic_add_unless - add unless the number is a given value

2009-07-03 12:50:14

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read()

Commit-ID: 67d7178f8fc64b7f68d7dd8a1b21dfa0d42c220c
Gitweb: http://git.kernel.org/tip/67d7178f8fc64b7f68d7dd8a1b21dfa0d42c220c
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 13:23:02 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 14:42:59 +0200

x86: atomic64: Improve atomic64_read()

Optimize atomic64_read() as a special open-coded
cmpxchg8b variant. This generates nicer code:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
431 0 0 431 1af atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
2bdfd4bd1f6b7b61b7fc127aef90ce3b atomic64_32.o.after.asm

Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index fd28fd3..cd11803 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -62,9 +62,17 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
*/
u64 atomic64_read(atomic64_t *ptr)
{
- u64 old = 1LL << 32;
+ u64 res;

- return cmpxchg8b(&ptr->counter, old, old);
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "+A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
}

/**

2009-07-03 14:50:29

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH -tip] x86: atomic64: inline atomic64_read()

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)


$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet <[email protected]>
---
arch/x86/include/asm/atomic_32.h | 15 +++++++++++++++
arch/x86/lib/atomic64_32.c | 21 ---------------------
2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index aa045de..0332041 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -304,6 +304,21 @@ extern void atomic64_set(atomic64_t *ptr, u64 new_val);
*
* Atomically reads the value of @ptr and returns it.
*/
+static inline u64 atomic64_read(atomic64_t *ptr)
+{
+ u64 res;
+
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "=&A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
+}
+
extern u64 atomic64_read(atomic64_t *ptr);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index cd11803..cb51226 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -55,27 +55,6 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
}

/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically reads the value of @ptr and returns it.
- */
-u64 atomic64_read(atomic64_t *ptr)
-{
- u64 res;
-
- asm volatile(
- "mov %%ebx, %%eax\n\t"
- "mov %%ecx, %%edx\n\t"
- LOCK_PREFIX "cmpxchg8b %1\n"
- : "+A" (res)
- : "m" (*ptr)
- );
-
- return res;
-}
-
-/**
* atomic64_add_return - add and return
* @delta: integer value to add
* @ptr: pointer to type atomic64_t

2009-07-03 15:34:36

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Export APIs to modules

Commit-ID: 05118ab8859492ac9ddda0154cf90e37b0a4a0b0
Gitweb: http://git.kernel.org/tip/05118ab8859492ac9ddda0154cf90e37b0a4a0b0
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 17:28:57 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 17:28:57 +0200

x86: atomic64: Export APIs to modules

atomic64_t primitives are used by a handful of drivers,
so export the APIs consistently. These were inlined
before.

Cc: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/atomic64_32.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index fd28fd3..cd11803 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -62,9 +62,17 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
*/
u64 atomic64_read(atomic64_t *ptr)
{
- u64 old = 1LL << 32;
+ u64 res;

- return cmpxchg8b(&ptr->counter, old, old);
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "+A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
}

/**


---
arch/x86/lib/atomic64_32.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index cd11803..aab898c 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -21,6 +21,7 @@ u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val)
{
return cmpxchg8b(&ptr->counter, old_val, new_val);
}
+EXPORT_SYMBOL(atomic64_cmpxchg);

/**
* atomic64_xchg - xchg atomic64 variable
@@ -41,6 +42,7 @@ u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)

return old_val;
}
+EXPORT_SYMBOL(atomic64_xchg);

/**
* atomic64_set - set atomic64 variable
@@ -53,6 +55,7 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
{
atomic64_xchg(ptr, new_val);
}
+EXPORT_SYMBOL(atomic64_read);

/**
* atomic64_read - read atomic64 variable
@@ -74,6 +77,7 @@ u64 atomic64_read(atomic64_t *ptr)

return res;
}
+EXPORT_SYMBOL(atomic64_read);

/**
* atomic64_add_return - add and return
@@ -103,21 +107,25 @@ noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)

return new_val;
}
+EXPORT_SYMBOL(atomic64_add_return);

u64 atomic64_sub_return(u64 delta, atomic64_t *ptr)
{
return atomic64_add_return(-delta, ptr);
}
+EXPORT_SYMBOL(atomic64_sub_return);

u64 atomic64_inc_return(atomic64_t *ptr)
{
return atomic64_add_return(1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc_return);

u64 atomic64_dec_return(atomic64_t *ptr)
{
return atomic64_sub_return(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec_return);

/**
* atomic64_add - add integer to atomic64 variable
@@ -130,6 +138,7 @@ void atomic64_add(u64 delta, atomic64_t *ptr)
{
atomic64_add_return(delta, ptr);
}
+EXPORT_SYMBOL(atomic64_add);

/**
* atomic64_sub - subtract the atomic64 variable
@@ -142,6 +151,7 @@ void atomic64_sub(u64 delta, atomic64_t *ptr)
{
atomic64_add(-delta, ptr);
}
+EXPORT_SYMBOL(atomic64_sub);

/**
* atomic64_sub_and_test - subtract value from variable and test result
@@ -158,6 +168,7 @@ int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)

return old_val == 0;
}
+EXPORT_SYMBOL(atomic64_sub_and_test);

/**
* atomic64_inc - increment atomic64 variable
@@ -169,6 +180,7 @@ void atomic64_inc(atomic64_t *ptr)
{
atomic64_add(1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc);

/**
* atomic64_dec - decrement atomic64 variable
@@ -180,6 +192,7 @@ void atomic64_dec(atomic64_t *ptr)
{
atomic64_sub(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec);

/**
* atomic64_dec_and_test - decrement and test
@@ -193,6 +206,7 @@ int atomic64_dec_and_test(atomic64_t *ptr)
{
return atomic64_sub_and_test(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec_and_test);

/**
* atomic64_inc_and_test - increment and test
@@ -206,6 +220,7 @@ int atomic64_inc_and_test(atomic64_t *ptr)
{
return atomic64_sub_and_test(-1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc_and_test);

/**
* atomic64_add_negative - add and test if negative
@@ -222,3 +237,4 @@ int atomic64_add_negative(u64 delta, atomic64_t *ptr)

return old_val < 0;
}
+EXPORT_SYMBOL(atomic64_add_negative);

2009-07-03 16:48:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file



On Fri, 3 Jul 2009, tip-bot for Ingo Molnar wrote:
> +int atomic64_add_negative(u64 delta, atomic64_t *ptr)
> +{
> + long long old_val = atomic64_add_return(delta, ptr);
> +
> + return old_val < 0;
> +}

Can we please fix this horribly mis-named 'old_val' variable?

It's not 'old_val'. It should be 'new_val' or 'result'.

As it is, the above looks very wrong, and made me think that you had done
the wrong semantics (ie "xadd" like semantics that literally return the
pre-add 'old' value).

But on closer inspection, it looks like the code is correct, but the
naming is just totally wrong.

Linus

2009-07-03 16:59:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too



On Fri, 3 Jul 2009, tip-bot for Eric Dumazet wrote:
>
> x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
>
> Locked instructions on two cache lines at once are painful. If
> atomic64_t uses two cache lines, my test program is 10x slower.
>
> The chance for that is significant: 4/32 or 12.5%.

Btw, the comments here are not strictly correct.

It's not necessarily even about "two cachelines". It's true that crossing
cachelines is extra painful, but from a CPU core angle, there's another
access width that matters almost as much, namely the width of the bus
between the core and the L1 cache. If it's not aligned to that, the core
needs to do each 8-byte read/write as two accesses, even if it's to the
same cacheline, and that complicates things.

The cacheline itself is generally larger than the cache access width. I
could easily see a 64B cacheline, but a 256b (32B) bus between the cache
and the core.

Making the atomics be naturally aligned means that you never cross either
one, of course.

Linus

2009-07-03 17:03:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Fix unclean type use in atomic64_xchg()



On Fri, 3 Jul 2009, tip-bot for Ingo Molnar wrote:
>
> Fix atomic64_xchg() to use __atomic64_read() instead.

Hmm. The whole __atomic64_read() thing should be dropped. Are there any
users? None of them should be valid, as Eric's numbers showed. It's
apparently better to start out with a random value rather than actually
trying to read it.

Linus

2009-07-03 17:38:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch] x86: atomic64_t: Improve atomic64_add_return()



On Fri, 3 Jul 2009, Ingo Molnar wrote:
> u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
> {
> + /*
> + * Try first with a (probably incorrect) assumption about
> + * what we have there. We'll do two loops most likely,
> + * but we'll get an ownership MESI transaction straight away
> + * instead of a read transaction followed by a
> + * flush-for-ownership transaction:
> + */
> + u64 old_val, new_val, real_val = 1ULL << 32;
>
> do {
> + old_val = real_val;
> new_val = old_val + delta;
>
> + real_val = atomic64_cmpxchg(ptr, old_val, new_val);
> +
> + } while (real_val != old_val);

For the case where we actually have to keep looping until we succeed,
we're probably better off starting with a good guess rather than a bad
one.

So I'd suggest using 'real_val = 0', which gives smaller code, rather than
the 1ull << 32 that is actively trying to be a bad guess.

It won't matter all that much (since 0 really isn't a much better guess
than 1ull<<32), but I'd rather have a simple constant that doesn't matter,
over an odd constant that makes no sense.

Linus

2009-07-03 18:01:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Fix unclean type use in atomic64_xchg()


* Linus Torvalds <[email protected]> wrote:

> On Fri, 3 Jul 2009, tip-bot for Ingo Molnar wrote:
> >
> > Fix atomic64_xchg() to use __atomic64_read() instead.
>
> Hmm. The whole __atomic64_read() thing should be dropped. Are
> there any users? None of them should be valid, as Eric's numbers
> showed. It's apparently better to start out with a random value
> rather than actually trying to read it.

ah, yes.

We still have this use:

u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
{
u64 old_val;

do {
old_val = __atomic64_read(ptr);
} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);

return old_val;
}
EXPORT_SYMBOL(atomic64_xchg);

I'm testing the commit below which improves this loop and also
removes __atomic64_read().

Ingo

---------------->
>From 2f4f497dfb708daa52392db98b4bb3d6378be3d3 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Fri, 3 Jul 2009 19:56:36 +0200
Subject: [PATCH] x86: atomic64: Improve atomic64_xchg()

Remove the read-first logic from atomic64_xchg() and simplify
the loop.

This function was the last user of __atomic64_read() - remove it.

Also, change the 'real_val' assumption from the somewhat quirky
1ULL << 32 value to the (just as arbitrary, but simpler) value
of 0.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/atomic_32.h | 9 ---------
arch/x86/lib/atomic64_32.c | 21 +++++++++++++++------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index aa045de..d7c8849 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -268,15 +268,6 @@ typedef struct {

#define ATOMIC64_INIT(val) { (val) }

-/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer of type atomic64_t
- *
- * Atomically reads the value of @v.
- * Doesn't imply a read memory barrier.
- */
-#define __atomic64_read(ptr) ((ptr)->counter)
-
extern u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index aab898c..0fac67b 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -31,14 +31,23 @@ EXPORT_SYMBOL(atomic64_cmpxchg);
* Atomically xchgs the value of @ptr to @new_val and returns
* the old value.
*/
-
u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
{
- u64 old_val;
+ /*
+ * Try first with a (possibly incorrect) assumption about
+ * what we have there. We'll do two loops most likely,
+ * but we'll get an ownership MESI transaction straight away
+ * instead of a read transaction followed by a
+ * flush-for-ownership transaction:
+ */
+ u64 old_val, real_val = 0;

do {
- old_val = __atomic64_read(ptr);
- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+ old_val = real_val;
+
+ real_val = atomic64_cmpxchg(ptr, old_val, new_val);
+
+ } while (real_val != old_val);

return old_val;
}
@@ -89,13 +98,13 @@ EXPORT_SYMBOL(atomic64_read);
noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
{
/*
- * Try first with a (probably incorrect) assumption about
+ * Try first with a (possibly incorrect) assumption about
* what we have there. We'll do two loops most likely,
* but we'll get an ownership MESI transaction straight away
* instead of a read transaction followed by a
* flush-for-ownership transaction:
*/
- u64 old_val, new_val, real_val = 1ULL << 32;
+ u64 old_val, new_val, real_val = 0;

do {
old_val = real_val;

2009-07-03 18:05:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()


* Eric Dumazet <[email protected]> wrote:

> Now atomic64_read() is light weight (no register pressure and
> small icache), we can inline it again.
>
> Also use "=&A" constraint instead of "+A" to avoid warning about
> unitialized 'res' variable. (gcc had to force 0 in eax/edx)
>
>
> $ size vmlinux.prev vmlinux.after
> text data bss dec hex filename
> 4908667 451676 1684868 7045211 6b805b vmlinux.prev
> 4908651 451676 1684868 7045195 6b804b vmlinux.after

ok, but:

> +static inline u64 atomic64_read(atomic64_t *ptr)
> +{
> + u64 res;
> +
> + asm volatile(
> + "mov %%ebx, %%eax\n\t"
> + "mov %%ecx, %%edx\n\t"
> + LOCK_PREFIX "cmpxchg8b %1\n"
> + : "=&A" (res)
> + : "m" (*ptr)
> + );

That clobbers or affects eax, ebx, ecx, edx. The only registers it
does not clobber is esi, edi and rbp.

So it's not quite lightweight. But your patch reduced the kernel
image size by 17 bytes so i guess it counts as an improvement - and
we also avoid function calls.

Linus, what's your preference in this case?

Ingo

2009-07-03 18:09:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()

On Fri, 3 Jul 2009 20:04:50 +0200
Ingo Molnar <[email protected]> wrote:

> > +static inline u64 atomic64_read(atomic64_t *ptr)
> > +{
> > + u64 res;
> > +
> > + asm volatile(
> > + "mov %%ebx, %%eax\n\t"
> > + "mov %%ecx, %%edx\n\t"
> > + LOCK_PREFIX "cmpxchg8b %1\n"

also afaik cmpxchg implies the lock prefix, so no need to make an
explicit one... it just adds code size and alternatives-work

2009-07-03 18:18:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()


* Arjan van de Ven <[email protected]> wrote:

> On Fri, 3 Jul 2009 20:04:50 +0200
> Ingo Molnar <[email protected]> wrote:
>
> > > +static inline u64 atomic64_read(atomic64_t *ptr)
> > > +{
> > > + u64 res;
> > > +
> > > + asm volatile(
> > > + "mov %%ebx, %%eax\n\t"
> > > + "mov %%ecx, %%edx\n\t"
> > > + LOCK_PREFIX "cmpxchg8b %1\n"
>
> also afaik cmpxchg implies the lock prefix, so no need to make an
> explicit one... it just adds code size and alternatives-work

good point!

Is the same true of XADD as well? If yes then we can remove the lock
prefix from those as well - there's a few uses of that on 32-bit and
64-bit as well.

Ingo

2009-07-03 18:24:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()

Arjan van de Ven <[email protected]> writes:

> On Fri, 3 Jul 2009 20:04:50 +0200
> Ingo Molnar <[email protected]> wrote:
>
>> > +static inline u64 atomic64_read(atomic64_t *ptr)
>> > +{
>> > + u64 res;
>> > +
>> > + asm volatile(
>> > + "mov %%ebx, %%eax\n\t"
>> > + "mov %%ecx, %%edx\n\t"
>> > + LOCK_PREFIX "cmpxchg8b %1\n"
>
> also afaik cmpxchg implies the lock prefix, so no need to make an
> explicit one... it just adds code size and alternatives-work

No it doesn't; LOCK is implied for "xchg", but not for cmpxchg

-Andi

--
[email protected] -- Speaking for myself only.

2009-07-03 18:25:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()

Ingo Molnar <[email protected]> writes:
>
> Is the same true of XADD as well? If yes then we can remove the lock
> prefix from those as well - there's a few uses of that on 32-bit and
> 64-bit as well.

XADD also has no implicit LOCK. AFAIK the implicit semantics are
only true for XCHG.

-Andi

--
[email protected] -- Speaking for myself only.

2009-07-03 18:30:29

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()

On Fri, 3 Jul 2009 20:18:15 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Arjan van de Ven <[email protected]> wrote:
>
> > On Fri, 3 Jul 2009 20:04:50 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > > > +static inline u64 atomic64_read(atomic64_t *ptr)
> > > > +{
> > > > + u64 res;
> > > > +
> > > > + asm volatile(
> > > > + "mov %%ebx, %%eax\n\t"
> > > > + "mov %%ecx, %%edx\n\t"
> > > > + LOCK_PREFIX "cmpxchg8b %1\n"
> >
> > also afaik cmpxchg implies the lock prefix, so no need to make an
> > explicit one... it just adds code size and alternatives-work
>
> good point!
>
> Is the same true of XADD as well? If yes then we can remove the lock
> prefix from those as well - there's a few uses of that on 32-bit and
> 64-bit as well.
>

ok never mind I was wrong,
it seems cmpxchg does NOT imply a lock prefix, nor does xadd.

> Ingo


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-03 18:31:30

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Export APIs to modules

Commit-ID: 1fde902d52ee13ab9fab155bbae757fdf7daf0c1
Gitweb: http://git.kernel.org/tip/1fde902d52ee13ab9fab155bbae757fdf7daf0c1
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 17:28:57 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 20:23:52 +0200

x86: atomic64: Export APIs to modules

atomic64_t primitives are used by a handful of drivers,
so export the APIs consistently. These were inlined
before.

Also mark atomic64_32.o a core object, so that the symbols
are available even if not linked to core kernel pieces.

Cc: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/atomic64_32.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index c3c657c..07c3189 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -10,7 +10,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o

ifeq ($(CONFIG_X86_32),y)
- lib-y += atomic64_32.o
+ obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
lib-y += semaphore_32.o string_32.o
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index cd11803..6722a09 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -1,5 +1,7 @@
#include <linux/compiler.h>
+#include <linux/module.h>
#include <linux/types.h>
+
#include <asm/processor.h>
#include <asm/cmpxchg.h>
#include <asm/atomic.h>
@@ -21,6 +23,7 @@ u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val)
{
return cmpxchg8b(&ptr->counter, old_val, new_val);
}
+EXPORT_SYMBOL(atomic64_cmpxchg);

/**
* atomic64_xchg - xchg atomic64 variable
@@ -41,6 +44,7 @@ u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)

return old_val;
}
+EXPORT_SYMBOL(atomic64_xchg);

/**
* atomic64_set - set atomic64 variable
@@ -53,6 +57,7 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
{
atomic64_xchg(ptr, new_val);
}
+EXPORT_SYMBOL(atomic64_read);

/**
* atomic64_read - read atomic64 variable
@@ -74,6 +79,7 @@ u64 atomic64_read(atomic64_t *ptr)

return res;
}
+EXPORT_SYMBOL(atomic64_read);

/**
* atomic64_add_return - add and return
@@ -103,21 +109,25 @@ noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)

return new_val;
}
+EXPORT_SYMBOL(atomic64_add_return);

u64 atomic64_sub_return(u64 delta, atomic64_t *ptr)
{
return atomic64_add_return(-delta, ptr);
}
+EXPORT_SYMBOL(atomic64_sub_return);

u64 atomic64_inc_return(atomic64_t *ptr)
{
return atomic64_add_return(1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc_return);

u64 atomic64_dec_return(atomic64_t *ptr)
{
return atomic64_sub_return(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec_return);

/**
* atomic64_add - add integer to atomic64 variable
@@ -130,6 +140,7 @@ void atomic64_add(u64 delta, atomic64_t *ptr)
{
atomic64_add_return(delta, ptr);
}
+EXPORT_SYMBOL(atomic64_add);

/**
* atomic64_sub - subtract the atomic64 variable
@@ -142,6 +153,7 @@ void atomic64_sub(u64 delta, atomic64_t *ptr)
{
atomic64_add(-delta, ptr);
}
+EXPORT_SYMBOL(atomic64_sub);

/**
* atomic64_sub_and_test - subtract value from variable and test result
@@ -158,6 +170,7 @@ int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)

return old_val == 0;
}
+EXPORT_SYMBOL(atomic64_sub_and_test);

/**
* atomic64_inc - increment atomic64 variable
@@ -169,6 +182,7 @@ void atomic64_inc(atomic64_t *ptr)
{
atomic64_add(1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc);

/**
* atomic64_dec - decrement atomic64 variable
@@ -180,6 +194,7 @@ void atomic64_dec(atomic64_t *ptr)
{
atomic64_sub(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec);

/**
* atomic64_dec_and_test - decrement and test
@@ -193,6 +208,7 @@ int atomic64_dec_and_test(atomic64_t *ptr)
{
return atomic64_sub_and_test(1, ptr);
}
+EXPORT_SYMBOL(atomic64_dec_and_test);

/**
* atomic64_inc_and_test - increment and test
@@ -206,6 +222,7 @@ int atomic64_inc_and_test(atomic64_t *ptr)
{
return atomic64_sub_and_test(-1, ptr);
}
+EXPORT_SYMBOL(atomic64_inc_and_test);

/**
* atomic64_add_negative - add and test if negative
@@ -222,3 +239,4 @@ int atomic64_add_negative(u64 delta, atomic64_t *ptr)

return old_val < 0;
}
+EXPORT_SYMBOL(atomic64_add_negative);

2009-07-03 18:31:42

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_xchg()

Commit-ID: 3a8d1788b37435baf6c296f4ea8beb4fa4955f44
Gitweb: http://git.kernel.org/tip/3a8d1788b37435baf6c296f4ea8beb4fa4955f44
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 19:56:36 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 20:23:55 +0200

x86: atomic64: Improve atomic64_xchg()

Remove the read-first logic from atomic64_xchg() and simplify
the loop.

This function was the last user of __atomic64_read() - remove it.

Also, change the 'real_val' assumption from the somewhat quirky
1ULL << 32 value to the (just as arbitrary, but simpler) value
of 0.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 9 ---------
arch/x86/lib/atomic64_32.c | 21 +++++++++++++++------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index aa045de..d7c8849 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -268,15 +268,6 @@ typedef struct {

#define ATOMIC64_INIT(val) { (val) }

-/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer of type atomic64_t
- *
- * Atomically reads the value of @v.
- * Doesn't imply a read memory barrier.
- */
-#define __atomic64_read(ptr) ((ptr)->counter)
-
extern u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 6722a09..a804f96 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -33,14 +33,23 @@ EXPORT_SYMBOL(atomic64_cmpxchg);
* Atomically xchgs the value of @ptr to @new_val and returns
* the old value.
*/
-
u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
{
- u64 old_val;
+ /*
+ * Try first with a (possibly incorrect) assumption about
+ * what we have there. We'll do two loops most likely,
+ * but we'll get an ownership MESI transaction straight away
+ * instead of a read transaction followed by a
+ * flush-for-ownership transaction:
+ */
+ u64 old_val, real_val = 0;

do {
- old_val = __atomic64_read(ptr);
- } while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
+ old_val = real_val;
+
+ real_val = atomic64_cmpxchg(ptr, old_val, new_val);
+
+ } while (real_val != old_val);

return old_val;
}
@@ -91,13 +100,13 @@ EXPORT_SYMBOL(atomic64_read);
noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
{
/*
- * Try first with a (probably incorrect) assumption about
+ * Try first with a (possibly incorrect) assumption about
* what we have there. We'll do two loops most likely,
* but we'll get an ownership MESI transaction straight away
* instead of a read transaction followed by a
* flush-for-ownership transaction:
*/
- u64 old_val, new_val, real_val = 1ULL << 32;
+ u64 old_val, new_val, real_val = 0;

do {
old_val = real_val;

2009-07-03 18:32:21

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Inline atomic64_read() again

Commit-ID: 441b0586e234383898615f1367a960128a54079a
Gitweb: http://git.kernel.org/tip/441b0586e234383898615f1367a960128a54079a
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 16:50:10 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 20:23:57 +0200

x86: atomic64: Inline atomic64_read() again

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 15 +++++++++++++++
arch/x86/lib/atomic64_32.c | 21 ---------------------
2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index d7c8849..de89959 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -295,6 +295,21 @@ extern void atomic64_set(atomic64_t *ptr, u64 new_val);
*
* Atomically reads the value of @ptr and returns it.
*/
+static inline u64 atomic64_read(atomic64_t *ptr)
+{
+ u64 res;
+
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "=&A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
+}
+
extern u64 atomic64_read(atomic64_t *ptr);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index a804f96..a2dd9a9 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -69,28 +69,7 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
EXPORT_SYMBOL(atomic64_read);

/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically reads the value of @ptr and returns it.
- */
-u64 atomic64_read(atomic64_t *ptr)
-{
- u64 res;
-
- asm volatile(
- "mov %%ebx, %%eax\n\t"
- "mov %%ecx, %%edx\n\t"
- LOCK_PREFIX "cmpxchg8b %1\n"
- : "+A" (res)
- : "m" (*ptr)
- );
-
- return res;
-}
EXPORT_SYMBOL(atomic64_read);
-
-/**
* atomic64_add_return - add and return
* @delta: integer value to add
* @ptr: pointer to type atomic64_t

2009-07-03 18:32:37

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()

Commit-ID: 0054e925a298829ade45734087f7475e05c2ea1e
Gitweb: http://git.kernel.org/tip/0054e925a298829ade45734087f7475e05c2ea1e
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 20:11:30 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 20:23:58 +0200

x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()

Linus noticed that the variable name 'old_val' is
confusingly named in these functions - the correct
naming is 'new_val'.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index a2dd9a9..d79f4ac 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -154,9 +154,9 @@ EXPORT_SYMBOL(atomic64_sub);
*/
int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)
{
- u64 old_val = atomic64_sub_return(delta, ptr);
+ u64 new_val = atomic64_sub_return(delta, ptr);

- return old_val == 0;
+ return new_val == 0;
}
EXPORT_SYMBOL(atomic64_sub_and_test);

@@ -223,8 +223,8 @@ EXPORT_SYMBOL(atomic64_inc_and_test);
*/
int atomic64_add_negative(u64 delta, atomic64_t *ptr)
{
- long long old_val = atomic64_add_return(delta, ptr);
+ s64 new_val = atomic64_add_return(delta, ptr);

- return old_val < 0;
+ return new_val < 0;
}
EXPORT_SYMBOL(atomic64_add_negative);

2009-07-03 18:32:48

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Optimize CMPXCHG8B sequences to not use the LOCK prefix

Commit-ID: 07aa81b2169a2cb5e62ad239fd8055ef484e4f79
Gitweb: http://git.kernel.org/tip/07aa81b2169a2cb5e62ad239fd8055ef484e4f79
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 20:15:58 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 20:24:00 +0200

x86: atomic64: Optimize CMPXCHG8B sequences to not use the LOCK prefix

Arjan noted that the CMPXCHG8B instruction has an implicit lock
prefix, so there's no need to add the prefix explicitly.

This gives us smaller code and smaller alternative-instructions
tables.

Cc: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 2 +-
arch/x86/lib/atomic64_32.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index de89959..130f4ab 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -302,7 +302,7 @@ static inline u64 atomic64_read(atomic64_t *ptr)
asm volatile(
"mov %%ebx, %%eax\n\t"
"mov %%ecx, %%edx\n\t"
- LOCK_PREFIX "cmpxchg8b %1\n"
+ "cmpxchg8b %1\n"
: "=&A" (res)
: "m" (*ptr)
);
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index d79f4ac..e4a6671 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -12,7 +12,7 @@ static noinline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
u32 high = new >> 32;

asm volatile(
- LOCK_PREFIX "cmpxchg8b %1\n"
+ "cmpxchg8b %1\n"
: "+A" (old), "+m" (*ptr)
: "b" (low), "c" (high)
);

2009-07-03 18:44:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()


* Arjan van de Ven <[email protected]> wrote:

> On Fri, 3 Jul 2009 20:18:15 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Arjan van de Ven <[email protected]> wrote:
> >
> > > On Fri, 3 Jul 2009 20:04:50 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > > > +static inline u64 atomic64_read(atomic64_t *ptr)
> > > > > +{
> > > > > + u64 res;
> > > > > +
> > > > > + asm volatile(
> > > > > + "mov %%ebx, %%eax\n\t"
> > > > > + "mov %%ecx, %%edx\n\t"
> > > > > + LOCK_PREFIX "cmpxchg8b %1\n"
> > >
> > > also afaik cmpxchg implies the lock prefix, so no need to make an
> > > explicit one... it just adds code size and alternatives-work
> >
> > good point!
> >
> > Is the same true of XADD as well? If yes then we can remove the lock
> > prefix from those as well - there's a few uses of that on 32-bit and
> > 64-bit as well.
> >
>
> ok never mind I was wrong,
> it seems cmpxchg does NOT imply a lock prefix, nor does xadd.

Indeed, i just checked the SDM and the lock prefix is not implicit.

Ignore the patch i sent out ;-)

Ingo

2009-07-03 18:45:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Optimize CMPXCHG8B sequences to not use the LOCK prefix


* tip-bot for Ingo Molnar <[email protected]> wrote:

> x86: atomic64: Optimize CMPXCHG8B sequences to not use the LOCK prefix

ok, that's wrong so i dropped it.

Ingo

2009-07-03 19:11:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()



On Fri, 3 Jul 2009, Ingo Molnar wrote:
> > +
> > + asm volatile(
> > + "mov %%ebx, %%eax\n\t"
> > + "mov %%ecx, %%edx\n\t"
> > + LOCK_PREFIX "cmpxchg8b %1\n"
> > + : "=&A" (res)
> > + : "m" (*ptr)
> > + );
>
> That clobbers or affects eax, ebx, ecx, edx. The only registers it
> does not clobber is esi, edi and rbp.

No, it only clobbers eax/edx. Yes, it touches ebx/ecx, but that's kind of
incidental, and read-only, and doesn't matter.

> Linus, what's your preference in this case?

I think it's kind of subtle to do that thing, but it _is_ very much a
special case. It's odd to use a cmpxchg8b to just do a read in the first
place, so the fact that we then play other games with it is kind of
immaterial.

Adding a comment about the magic "we don't care about the initial values
in eax/edx and ebx/ecx, but they have to match in case we end up doing a
writeback" might be a good idea.

Linus

2009-07-03 19:17:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()


* Linus Torvalds <[email protected]> wrote:

> On Fri, 3 Jul 2009, Ingo Molnar wrote:
> > > +
> > > + asm volatile(
> > > + "mov %%ebx, %%eax\n\t"
> > > + "mov %%ecx, %%edx\n\t"
> > > + LOCK_PREFIX "cmpxchg8b %1\n"
> > > + : "=&A" (res)
> > > + : "m" (*ptr)
> > > + );
> >
> > That clobbers or affects eax, ebx, ecx, edx. The only registers
> > it does not clobber is esi, edi and rbp.
>
> No, it only clobbers eax/edx. Yes, it touches ebx/ecx, but that's
> kind of incidental, and read-only, and doesn't matter.
>
> > Linus, what's your preference in this case?
>
> I think it's kind of subtle to do that thing, but it _is_ very
> much a special case. It's odd to use a cmpxchg8b to just do a read
> in the first place, so the fact that we then play other games with
> it is kind of immaterial.
>
> Adding a comment about the magic "we don't care about the initial
> values in eax/edx and ebx/ecx, but they have to match in case we
> end up doing a writeback" might be a good idea.

Ok - i've applied it and added a comment as well about the reasons
an subtleties.

Ingo

2009-07-03 19:19:29

by Ingo Molnar

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()

Commit-ID: ddf9a003d32f720805ac30bcc15755e9289073de
Gitweb: http://git.kernel.org/tip/ddf9a003d32f720805ac30bcc15755e9289073de
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 3 Jul 2009 20:11:30 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 21:15:08 +0200

x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()

Linus noticed that the variable name 'old_val' is
confusingly named in these functions - the correct
naming is 'new_val'.

Reported-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/lib/atomic64_32.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index a804f96..1d98c9e 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -175,9 +175,9 @@ EXPORT_SYMBOL(atomic64_sub);
*/
int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)
{
- u64 old_val = atomic64_sub_return(delta, ptr);
+ u64 new_val = atomic64_sub_return(delta, ptr);

- return old_val == 0;
+ return new_val == 0;
}
EXPORT_SYMBOL(atomic64_sub_and_test);

@@ -244,8 +244,8 @@ EXPORT_SYMBOL(atomic64_inc_and_test);
*/
int atomic64_add_negative(u64 delta, atomic64_t *ptr)
{
- long long old_val = atomic64_add_return(delta, ptr);
+ s64 new_val = atomic64_add_return(delta, ptr);

- return old_val < 0;
+ return new_val < 0;
}
EXPORT_SYMBOL(atomic64_add_negative);

2009-07-03 19:19:42

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Inline atomic64_read() again

Commit-ID: 5fc6406e15d35b99cd19ec5f1da5b3cc326db2e3
Gitweb: http://git.kernel.org/tip/5fc6406e15d35b99cd19ec5f1da5b3cc326db2e3
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 16:50:10 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 3 Jul 2009 21:16:48 +0200

x86: atomic64: Inline atomic64_read() again

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 22 ++++++++++++++++++++++
arch/x86/lib/atomic64_32.c | 21 ---------------------
2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index d7c8849..dc5a667 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -295,6 +295,28 @@ extern void atomic64_set(atomic64_t *ptr, u64 new_val);
*
* Atomically reads the value of @ptr and returns it.
*/
+static inline u64 atomic64_read(atomic64_t *ptr)
+{
+ u64 res;
+
+ /*
+ * Note, we inline this atomic64_t primitive because
+ * it only clobbers EAX/EDX and leaves the others
+ * untouched. We also (somewhat subtly) rely on the
+ * fact that cmpxchg8b returns the current 64-bit value
+ * of the memory location we are touching:
+ */
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "=&A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
+}
+
extern u64 atomic64_read(atomic64_t *ptr);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 1d98c9e..d79f4ac 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -69,28 +69,7 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
EXPORT_SYMBOL(atomic64_read);

/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically reads the value of @ptr and returns it.
- */
-u64 atomic64_read(atomic64_t *ptr)
-{
- u64 res;
-
- asm volatile(
- "mov %%ebx, %%eax\n\t"
- "mov %%ecx, %%edx\n\t"
- LOCK_PREFIX "cmpxchg8b %1\n"
- : "+A" (res)
- : "m" (*ptr)
- );
-
- return res;
-}
EXPORT_SYMBOL(atomic64_read);
-
-/**
* atomic64_add_return - add and return
* @delta: integer value to add
* @ptr: pointer to type atomic64_t

2009-07-03 19:28:03

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too

Linus Torvalds wrote:
>
> It's not necessarily even about "two cachelines". It's true that crossing
> cachelines is extra painful, but from a CPU core angle, there's another
> access width that matters almost as much, namely the width of the bus
> between the core and the L1 cache. If it's not aligned to that, the core
> needs to do each 8-byte read/write as two accesses, even if it's to the
> same cacheline, and that complicates things.
>
> The cacheline itself is generally larger than the cache access width. I
> could easily see a 64B cacheline, but a 256b (32B) bus between the cache
> and the core.
>
> Making the atomics be naturally aligned means that you never cross either
> one, of course.
>

Sometimes, of course, that down-mux is where alignment happens, too.
Either which way, I don't think there is any doubt that unaligned
atomics are just asking for trouble.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-07-03 19:38:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()



Btw, it's entirely possible that we could have a faster "atomic64_read()"
if we have some guarantees about the behavior of the counter.

For example, let's assume that the counter is known to be monotonic: in
that case, we could do a 64-bit read with something like

u64 atomic64_read_monotonic(atomic64_t *p)
{
unsigned int last = read_high_word(p);
do {
lfence;
low = read_low_word(p);
high = last;
lfence;
last = read_high_word;
} while (last != high)
return ((u64)high << 32) | low;
}

which is not necessarily all that much faster than the cmpxchg8b (the two
lfence's aren't going to be cheap), but keeping the cacheline in a shared
state might be a win.

Here, the "monotonic" part is only important because the above would not
work in case the counter switches back and forth, ie if the value ever
does an atomic increment and then an atomic decrement like this:

0x0ffffffff -> 0x100000000 -> 0x0ffffffff

then the above read logic might see a "stable" high word of 0 (before and
after), and a low word of 0 (in the middle), and think that the counter
really was 0 at one point.

But if it's a strictly monotonic counter, or has some other stability
guarantees (the way we have certain stability guarantees on PTE's, for
example: we know that the high bits can only change if the low bits
changed the present bit), you can sometimes do tricks like the above.

Do we actually _have_ any performance-critical 64-bit counters that have
monotonicity guarantees? I have no idea. I'm just throwing out the notion.

Linus

2009-07-03 21:41:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] x86: atomic64: inline atomic64_read()


* Linus Torvalds <[email protected]> wrote:

> Do we actually _have_ any performance-critical 64-bit counters
> that have monotonicity guarantees? I have no idea. I'm just
> throwing out the notion.

Due to the 'counter value flipping' we do on perfcounters for
certain workloads (attr.inherit_stat=1 counters - inherited counters
that neverheless provide precise per thread readouts - not just
precise summaries), the monotonicity assumption is not generally
true anymore.

But it would be nice if Eric could measure your suggested primitive,
just that we see how many cycles we are talking about. I suspect the
biggest difference would be a many-readers testcase - but that wont
normally happen on perfcounters as the counters typically get read
only from a single thread of executin.

Ingo

2009-07-04 00:05:37

by Paul Mackerras

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file

tip-bot for Ingo Molnar writes:

> Also, while at it, rename all uses of 'unsigned long long' to
> the much shorter u64.
>
> This makes the appearance of the prototypes a lot nicer - and
> it also uncovered a few bugs where (yet unused) API variants
> had 'long' as their return type instead of u64.

Did you consider making atomic64_t a signed type on 32-bit x86 like it
is on all 64-bit platforms, and like atomic_t is everywhere?

Paul.

2009-07-04 09:50:33

by Eric Dumazet

[permalink] [raw]
Subject: [tip:perfcounters/urgent] x86: atomic64: Inline atomic64_read() again

Commit-ID: a79f0da80a508448434476b77f9d3d1a469eab67
Gitweb: http://git.kernel.org/tip/a79f0da80a508448434476b77f9d3d1a469eab67
Author: Eric Dumazet <[email protected]>
AuthorDate: Fri, 3 Jul 2009 16:50:10 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 4 Jul 2009 11:45:00 +0200

x86: atomic64: Inline atomic64_read() again

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
LKML-Reference: <[email protected]>
[ Also fix typo in atomic64_set() export ]
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/atomic_32.h | 22 ++++++++++++++++++++++
arch/x86/lib/atomic64_32.c | 23 +----------------------
2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index d7c8849..dc5a667 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -295,6 +295,28 @@ extern void atomic64_set(atomic64_t *ptr, u64 new_val);
*
* Atomically reads the value of @ptr and returns it.
*/
+static inline u64 atomic64_read(atomic64_t *ptr)
+{
+ u64 res;
+
+ /*
+ * Note, we inline this atomic64_t primitive because
+ * it only clobbers EAX/EDX and leaves the others
+ * untouched. We also (somewhat subtly) rely on the
+ * fact that cmpxchg8b returns the current 64-bit value
+ * of the memory location we are touching:
+ */
+ asm volatile(
+ "mov %%ebx, %%eax\n\t"
+ "mov %%ecx, %%edx\n\t"
+ LOCK_PREFIX "cmpxchg8b %1\n"
+ : "=&A" (res)
+ : "m" (*ptr)
+ );
+
+ return res;
+}
+
extern u64 atomic64_read(atomic64_t *ptr);

/**
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 1d98c9e..824fa0b 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -66,31 +66,10 @@ void atomic64_set(atomic64_t *ptr, u64 new_val)
{
atomic64_xchg(ptr, new_val);
}
-EXPORT_SYMBOL(atomic64_read);
+EXPORT_SYMBOL(atomic64_set);

/**
- * atomic64_read - read atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically reads the value of @ptr and returns it.
- */
-u64 atomic64_read(atomic64_t *ptr)
-{
- u64 res;
-
- asm volatile(
- "mov %%ebx, %%eax\n\t"
- "mov %%ecx, %%edx\n\t"
- LOCK_PREFIX "cmpxchg8b %1\n"
- : "+A" (res)
- : "m" (*ptr)
- );
-
- return res;
-}
EXPORT_SYMBOL(atomic64_read);
-
-/**
* atomic64_add_return - add and return
* @delta: integer value to add
* @ptr: pointer to type atomic64_t

2009-07-05 11:26:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file


* Paul Mackerras <[email protected]> wrote:

> tip-bot for Ingo Molnar writes:
>
> > Also, while at it, rename all uses of 'unsigned long long' to
> > the much shorter u64.
> >
> > This makes the appearance of the prototypes a lot nicer - and
> > it also uncovered a few bugs where (yet unused) API variants
> > had 'long' as their return type instead of u64.
>
> Did you consider making atomic64_t a signed type on 32-bit x86
> like it is on all 64-bit platforms, and like atomic_t is
> everywhere?

That did not occur to me and it makes sense - this would clean up a
few explicit s64 casts we do there currently.

Ingo