2021-08-09 11:16:14

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 0/9] perf: Refine barriers for AUX ring buffer

This patch series is to refine the memory barriers for AUX ring buffer.

Patches 01 ~ 04 to address the barriers usage in the kernel. The first
patch is to make clear comment for how to use the barriers between the
data store and aux_head store, this asks the driver to make sure the
data is visible. Patches 02 ~ 04 is to refine the drivers for barriers
after the data store.

Patch 05 is to use WRITE_ONCE() for updating aux_tail.

Patches 06 ~ 09 is to drop the legacy __sync functions, and polish for
duplicate code and cleanup the build and feature test after
SYNC_COMPARE_AND_SWAP is not used.

For easier review and more clear patch organization, comparing against
to the previous patch series, the patches for support compat mode for
AUX trace have been left out and will be sent out as a separate patch
set.

This patch set have been tested on Arm64 Juno platform.

Changes from v4:
- Refined comment for CoreSight ETR/ETF drivers (Suzuki/Peter);
- Changed to use compiler barrier for BTS (mentioned by Peter, but have
not received response from Intel developer);
- Refined the coding style for patch 07 (Adrian).

Changes from v3:
- Removed the inapprocate paragraph in the commit log for patch "perf
auxtrace: Drop legacy __sync functions" (Adrian);
- Added new patch to remove feature-sync-compare-and-swap test (Adrian);
- Th patch for "perf auxtrace: Use WRITE_ONCE() for updating aux_tail",
is a standlone and simple change, so moved it ahead in the patch set
for better ordering;
- Minor improvement for commit logs in the last two patches.

Changes from v2:
- Removed auxtrace_mmap__read_snapshot_head(), which has the duplicated
code with auxtrace_mmap__read_head();
- Cleanuped the build for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT (Adrian);
- Added global variable "kernel_is_64_bit" (Adrian);
- Added compat variants compat_auxtrace_mmap__{read_head|write_tail}
(Adrian).


Leo Yan (9):
perf/ring_buffer: Add comment for barriers on AUX ring buffer
coresight: tmc-etr: Add barrier after updating AUX ring buffer
coresight: tmc-etf: Add comment for store ordering
perf/x86: Add compiler barrier after updating BTS
perf auxtrace: Use WRITE_ONCE() for updating aux_tail
perf auxtrace: Drop legacy __sync functions
perf auxtrace: Remove auxtrace_mmap__read_snapshot_head()
perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
tools: Remove feature-sync-compare-and-swap feature detection

arch/x86/events/intel/bts.c | 6 ++++
.../hwtracing/coresight/coresight-tmc-etf.c | 5 +++
.../hwtracing/coresight/coresight-tmc-etr.c | 8 +++++
kernel/events/ring_buffer.c | 9 ++++++
tools/build/Makefile.feature | 1 -
tools/build/feature/Makefile | 4 ---
tools/build/feature/test-all.c | 4 ---
.../feature/test-sync-compare-and-swap.c | 15 ---------
tools/perf/Makefile.config | 4 ---
tools/perf/util/auxtrace.c | 18 +++--------
tools/perf/util/auxtrace.h | 31 +------------------
11 files changed, 34 insertions(+), 71 deletions(-)
delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c

--
2.25.1


2021-08-09 11:17:13

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 2/9] coresight: tmc-etr: Add barrier after updating AUX ring buffer

Since a memory barrier is required between AUX trace data store and
aux_head store, and the AUX trace data is filled with memcpy(), it's
sufficient to use smp_wmb() so can ensure the trace data is visible
prior to updating aux_head.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
---
drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index acdb59e0e661..13fd1fc730ed 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
*/
if (etr_perf->snapshot)
handle->head += size;
+
+ /*
+ * Ensure that the AUX trace data is visible before the aux_head
+ * is updated via perf_aux_output_end(), as expected by the
+ * perf ring buffer.
+ */
+ smp_wmb();
+
out:
/*
* Don't set the TRUNCATED flag in snapshot mode because 1) the
--
2.25.1

2021-08-09 11:17:19

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 3/9] coresight: tmc-etf: Add comment for store ordering

Since the function CS_LOCK() has contained memory barrier mb(), it
ensures the visibility of the AUX trace data before updating the
aux_head, thus it's needless to add any explicit barrier anymore.

Add comment to make clear for the barrier usage for ETF.

Signed-off-by: Leo Yan <[email protected]>
---
drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index cd0fb7bfba68..8debd4f40f06 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -553,6 +553,11 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
if (buf->snapshot)
handle->head += to_read;

+ /*
+ * CS_LOCK() contains mb() so it can ensure visibility of the AUX trace
+ * data before the aux_head is updated via perf_aux_output_end(), which
+ * is expected by the perf ring buffer.
+ */
CS_LOCK(drvdata->base);
out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
--
2.25.1

2021-08-09 11:17:29

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 4/9] perf/x86: Add compiler barrier after updating BTS

Since BTS is coherent, simply add a compiler barrier to separate the BTS
update and aux_head store.

Signed-off-by: Leo Yan <[email protected]>
---
arch/x86/events/intel/bts.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 6320d2cfd9d3..974e917e65b2 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -209,6 +209,12 @@ static void bts_update(struct bts_ctx *bts)
} else {
local_set(&buf->data_size, head);
}
+
+ /*
+ * Since BTS is coherent, just add compiler barrier to ensure
+ * BTS updating is ordered against bts::handle::event.
+ */
+ barrier();
}

static int
--
2.25.1

2021-08-09 11:17:32

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 5/9] perf auxtrace: Use WRITE_ONCE() for updating aux_tail

Use WRITE_ONCE() for updating aux_tail, so can avoid unexpected memory
behaviour.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
---
tools/perf/util/auxtrace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index cc1c1b9cec9c..79227b8864cd 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -480,7 +480,7 @@ static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
/* Ensure all reads are done before we write the tail out */
smp_mb();
#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
- pc->aux_tail = tail;
+ WRITE_ONCE(pc->aux_tail, tail);
#else
do {
old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
--
2.25.1

2021-08-09 11:17:59

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 6/9] perf auxtrace: Drop legacy __sync functions

The main purpose for using __sync built-in functions is to support
compat mode for 32-bit perf with 64-bit kernel. But using these
built-in functions might cause potential issues.

__sync functions originally support Intel Itanium processoer [1]
but it cannot promise to support all 32-bit archs. Now these
functions have become the legacy functions.

Considering __sync functions cannot really fix the 64-bit value
atomicity on 32-bit archs, thus this patch drops __sync functions.

Credits to Peter for detailed analysis.

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/auxtrace.h | 19 -------------------
1 file changed, 19 deletions(-)

diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 79227b8864cd..4f9176368134 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -440,12 +440,6 @@ struct auxtrace_cache;

#ifdef HAVE_AUXTRACE_SUPPORT

-/*
- * In snapshot mode the mmapped page is read-only which makes using
- * __sync_val_compare_and_swap() problematic. However, snapshot mode expects
- * the buffer is not updated while the snapshot is made (e.g. Intel PT disables
- * the event) so there is not a race anyway.
- */
static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->userpg;
@@ -459,11 +453,7 @@ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->userpg;
-#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
u64 head = READ_ONCE(pc->aux_head);
-#else
- u64 head = __sync_val_compare_and_swap(&pc->aux_head, 0, 0);
-#endif

/* Ensure all reads are done after we read the head */
smp_rmb();
@@ -473,19 +463,10 @@ static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
{
struct perf_event_mmap_page *pc = mm->userpg;
-#if BITS_PER_LONG != 64 && defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
- u64 old_tail;
-#endif

/* Ensure all reads are done before we write the tail out */
smp_mb();
-#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
WRITE_ONCE(pc->aux_tail, tail);
-#else
- do {
- old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
- } while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail));
-#endif
}

int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
--
2.25.1

2021-08-09 11:20:09

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 8/9] perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT

Since the __sync functions have been dropped, This patch removes unused
build and checking for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT in perf tool.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/Makefile.config | 4 ----
tools/perf/util/auxtrace.c | 5 -----
2 files changed, 9 deletions(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index eb8e487ef90b..4a0d9a6defc7 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -349,10 +349,6 @@ CXXFLAGS += $(INC_FLAGS)

LIBPERF_CFLAGS := $(CORE_CFLAGS) $(EXTRA_CFLAGS)

-ifeq ($(feature-sync-compare-and-swap), 1)
- CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
-endif
-
ifeq ($(feature-pthread-attr-setaffinity-np), 1)
CFLAGS += -DHAVE_PTHREAD_ATTR_SETAFFINITY_NP
endif
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 2dcf3d12ba32..f33f09b8b535 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -130,11 +130,6 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
return 0;
}

-#if BITS_PER_LONG != 64 && !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
- pr_err("Cannot use AUX area tracing mmaps\n");
- return -1;
-#endif
-
pc->aux_offset = mp->offset;
pc->aux_size = mp->len;

--
2.25.1

2021-08-09 11:20:27

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 9/9] tools: Remove feature-sync-compare-and-swap feature detection

Since the __sync functions have been removed from perf, it's needless
for perf tool to test the feature sync-compare-and-swap.

The feature test is not used by any other components, remove it.

Signed-off-by: Leo Yan <[email protected]>
---
tools/build/Makefile.feature | 1 -
tools/build/feature/Makefile | 4 ----
tools/build/feature/test-all.c | 4 ----
tools/build/feature/test-sync-compare-and-swap.c | 15 ---------------
4 files changed, 24 deletions(-)
delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 04a8e3db8a54..3dd2f68366f9 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -34,7 +34,6 @@ FEATURE_TESTS_BASIC := \
dwarf_getlocations \
eventfd \
fortify-source \
- sync-compare-and-swap \
get_current_dir_name \
gettid \
glibc \
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index ec203e28407f..eff55d287db1 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -9,7 +9,6 @@ FILES= \
test-dwarf_getlocations.bin \
test-eventfd.bin \
test-fortify-source.bin \
- test-sync-compare-and-swap.bin \
test-get_current_dir_name.bin \
test-glibc.bin \
test-gtk2.bin \
@@ -260,9 +259,6 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin:
$(OUTPUT)test-libbabeltrace.bin:
$(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)

-$(OUTPUT)test-sync-compare-and-swap.bin:
- $(BUILD)
-
$(OUTPUT)test-compile-32.bin:
$(CC) -m32 -o $@ test-compile.c

diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index 464873883396..920439527291 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -106,10 +106,6 @@
# include "test-libdw-dwarf-unwind.c"
#undef main

-#define main main_test_sync_compare_and_swap
-# include "test-sync-compare-and-swap.c"
-#undef main
-
#define main main_test_zlib
# include "test-zlib.c"
#undef main
diff --git a/tools/build/feature/test-sync-compare-and-swap.c b/tools/build/feature/test-sync-compare-and-swap.c
deleted file mode 100644
index 3bc6b0768a53..000000000000
--- a/tools/build/feature/test-sync-compare-and-swap.c
+++ /dev/null
@@ -1,15 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <stdint.h>
-
-volatile uint64_t x;
-
-int main(int argc, char *argv[])
-{
- uint64_t old, new = argc;
-
- (void)argv;
- do {
- old = __sync_val_compare_and_swap(&x, 0, 0);
- } while (!__sync_bool_compare_and_swap(&x, old, new));
- return old == new;
-}
--
2.25.1

2021-08-09 12:38:42

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 1/9] perf/ring_buffer: Add comment for barriers on AUX ring buffer

AUX ring buffer applies almost the same barriers as perf ring buffer,
but there has an exception for ordering between writing the AUX trace
data and updating user_page::aux_head.

This patch adds comment for how to use the barriers on AUX ring buffer,
and gives comment to ask the drivers to flush the trace data into AUX
ring buffer prior to updating user_page::aux_head.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/events/ring_buffer.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 52868716ec35..5cf6579be05e 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -509,6 +509,15 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
perf_event_aux_event(handle->event, aux_head, size,
handle->aux_flags);

+ /*
+ * See perf_output_put_handle(), AUX ring buffer applies the same
+ * barrier pairing as the perf ring buffer; except for B, since
+ * AUX ring buffer is written by hardware trace, we cannot simply
+ * use the generic memory barrier (like smp_wmb()) prior to update
+ * user_page::aux_head, the hardware trace driver takes the
+ * responsibility to ensure the trace data has been flushed into
+ * the AUX buffer before calling perf_aux_output_end().
+ */
WRITE_ONCE(rb->user_page->aux_head, rb->aux_head);
if (rb_need_aux_wakeup(rb))
wakeup = true;
--
2.25.1

2021-08-09 12:38:43

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 7/9] perf auxtrace: Remove auxtrace_mmap__read_snapshot_head()

Since the function auxtrace_mmap__read_snapshot_head() is exactly same
with auxtrace_mmap__read_head(), whether the session is in snapshot mode
or not, it's unified to use function auxtrace_mmap__read_head() for
reading AUX buffer head.

And the function auxtrace_mmap__read_snapshot_head() is unused so this
patch removes it.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/auxtrace.c | 13 +++++--------
tools/perf/util/auxtrace.h | 10 ----------
2 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index cb19669d2a5b..2dcf3d12ba32 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1686,14 +1686,11 @@ static int __auxtrace_mmap__read(struct mmap *map,
union perf_event ev;
void *data1, *data2;

- if (snapshot) {
- head = auxtrace_mmap__read_snapshot_head(mm);
- if (auxtrace_record__find_snapshot(itr, mm->idx, mm, data,
- &head, &old))
- return -1;
- } else {
- head = auxtrace_mmap__read_head(mm);
- }
+ head = auxtrace_mmap__read_head(mm);
+
+ if (snapshot &&
+ auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old))
+ return -1;

if (old == head)
return 0;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 4f9176368134..d68a5e80b217 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -440,16 +440,6 @@ struct auxtrace_cache;

#ifdef HAVE_AUXTRACE_SUPPORT

-static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
-{
- struct perf_event_mmap_page *pc = mm->userpg;
- u64 head = READ_ONCE(pc->aux_head);
-
- /* Ensure all reads are done after we read the head */
- smp_rmb();
- return head;
-}
-
static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->userpg;
--
2.25.1

2021-08-09 21:43:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 7/9] perf auxtrace: Remove auxtrace_mmap__read_snapshot_head()

Em Mon, Aug 09, 2021 at 07:14:05PM +0800, Leo Yan escreveu:
> Since the function auxtrace_mmap__read_snapshot_head() is exactly same
> with auxtrace_mmap__read_head(), whether the session is in snapshot mode
> or not, it's unified to use function auxtrace_mmap__read_head() for
> reading AUX buffer head.
>
> And the function auxtrace_mmap__read_snapshot_head() is unused so this
> patch removes it.

Thanks, applied to perf/core.

- Arnaldo


> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/auxtrace.c | 13 +++++--------
> tools/perf/util/auxtrace.h | 10 ----------
> 2 files changed, 5 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index cb19669d2a5b..2dcf3d12ba32 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1686,14 +1686,11 @@ static int __auxtrace_mmap__read(struct mmap *map,
> union perf_event ev;
> void *data1, *data2;
>
> - if (snapshot) {
> - head = auxtrace_mmap__read_snapshot_head(mm);
> - if (auxtrace_record__find_snapshot(itr, mm->idx, mm, data,
> - &head, &old))
> - return -1;
> - } else {
> - head = auxtrace_mmap__read_head(mm);
> - }
> + head = auxtrace_mmap__read_head(mm);
> +
> + if (snapshot &&
> + auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old))
> + return -1;
>
> if (old == head)
> return 0;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 4f9176368134..d68a5e80b217 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -440,16 +440,6 @@ struct auxtrace_cache;
>
> #ifdef HAVE_AUXTRACE_SUPPORT
>
> -static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
> -{
> - struct perf_event_mmap_page *pc = mm->userpg;
> - u64 head = READ_ONCE(pc->aux_head);
> -
> - /* Ensure all reads are done after we read the head */
> - smp_rmb();
> - return head;
> -}
> -
> static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> {
> struct perf_event_mmap_page *pc = mm->userpg;
> --
> 2.25.1
>

--

- Arnaldo

2021-08-09 22:04:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 6/9] perf auxtrace: Drop legacy __sync functions

Em Mon, Aug 09, 2021 at 07:14:04PM +0800, Leo Yan escreveu:
> The main purpose for using __sync built-in functions is to support
> compat mode for 32-bit perf with 64-bit kernel. But using these
> built-in functions might cause potential issues.
>
> __sync functions originally support Intel Itanium processoer [1]
> but it cannot promise to support all 32-bit archs. Now these
> functions have become the legacy functions.
>
> Considering __sync functions cannot really fix the 64-bit value
> atomicity on 32-bit archs, thus this patch drops __sync functions.
>
> Credits to Peter for detailed analysis.

Thanks, applied to perf/core.

- Arnaldo


> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins
>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/auxtrace.h | 19 -------------------
> 1 file changed, 19 deletions(-)
>
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 79227b8864cd..4f9176368134 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -440,12 +440,6 @@ struct auxtrace_cache;
>
> #ifdef HAVE_AUXTRACE_SUPPORT
>
> -/*
> - * In snapshot mode the mmapped page is read-only which makes using
> - * __sync_val_compare_and_swap() problematic. However, snapshot mode expects
> - * the buffer is not updated while the snapshot is made (e.g. Intel PT disables
> - * the event) so there is not a race anyway.
> - */
> static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
> {
> struct perf_event_mmap_page *pc = mm->userpg;
> @@ -459,11 +453,7 @@ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
> static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> {
> struct perf_event_mmap_page *pc = mm->userpg;
> -#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
> u64 head = READ_ONCE(pc->aux_head);
> -#else
> - u64 head = __sync_val_compare_and_swap(&pc->aux_head, 0, 0);
> -#endif
>
> /* Ensure all reads are done after we read the head */
> smp_rmb();
> @@ -473,19 +463,10 @@ static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> {
> struct perf_event_mmap_page *pc = mm->userpg;
> -#if BITS_PER_LONG != 64 && defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
> - u64 old_tail;
> -#endif
>
> /* Ensure all reads are done before we write the tail out */
> smp_mb();
> -#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
> WRITE_ONCE(pc->aux_tail, tail);
> -#else
> - do {
> - old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
> - } while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail));
> -#endif
> }
>
> int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
> --
> 2.25.1
>

--

- Arnaldo

2021-08-09 22:04:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 5/9] perf auxtrace: Use WRITE_ONCE() for updating aux_tail

Em Mon, Aug 09, 2021 at 07:14:03PM +0800, Leo Yan escreveu:
> Use WRITE_ONCE() for updating aux_tail, so can avoid unexpected memory
> behaviour.

Thanks, applied to perf/core.

- Arnaldo


> Signed-off-by: Leo Yan <[email protected]>
> Acked-by: Adrian Hunter <[email protected]>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> tools/perf/util/auxtrace.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index cc1c1b9cec9c..79227b8864cd 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -480,7 +480,7 @@ static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> /* Ensure all reads are done before we write the tail out */
> smp_mb();
> #if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
> - pc->aux_tail = tail;
> + WRITE_ONCE(pc->aux_tail, tail);
> #else
> do {
> old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
> --
> 2.25.1
>

--

- Arnaldo

2021-08-09 22:15:26

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 8/9] perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT

Em Mon, Aug 09, 2021 at 07:14:06PM +0800, Leo Yan escreveu:
> Since the __sync functions have been dropped, This patch removes unused
> build and checking for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT in perf tool.


Thanks, applied to perf/core.

- Arnaldo

> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/Makefile.config | 4 ----
> tools/perf/util/auxtrace.c | 5 -----
> 2 files changed, 9 deletions(-)
>
> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> index eb8e487ef90b..4a0d9a6defc7 100644
> --- a/tools/perf/Makefile.config
> +++ b/tools/perf/Makefile.config
> @@ -349,10 +349,6 @@ CXXFLAGS += $(INC_FLAGS)
>
> LIBPERF_CFLAGS := $(CORE_CFLAGS) $(EXTRA_CFLAGS)
>
> -ifeq ($(feature-sync-compare-and-swap), 1)
> - CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
> -endif
> -
> ifeq ($(feature-pthread-attr-setaffinity-np), 1)
> CFLAGS += -DHAVE_PTHREAD_ATTR_SETAFFINITY_NP
> endif
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 2dcf3d12ba32..f33f09b8b535 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -130,11 +130,6 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
> return 0;
> }
>
> -#if BITS_PER_LONG != 64 && !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
> - pr_err("Cannot use AUX area tracing mmaps\n");
> - return -1;
> -#endif
> -
> pc->aux_offset = mp->offset;
> pc->aux_size = mp->len;
>
> --
> 2.25.1
>

--

- Arnaldo

2021-08-09 22:17:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 9/9] tools: Remove feature-sync-compare-and-swap feature detection

Em Mon, Aug 09, 2021 at 07:14:07PM +0800, Leo Yan escreveu:
> Since the __sync functions have been removed from perf, it's needless
> for perf tool to test the feature sync-compare-and-swap.
>
> The feature test is not used by any other components, remove it.

Thanks, applied to perf/core.

- Arnaldo


> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/build/Makefile.feature | 1 -
> tools/build/feature/Makefile | 4 ----
> tools/build/feature/test-all.c | 4 ----
> tools/build/feature/test-sync-compare-and-swap.c | 15 ---------------
> 4 files changed, 24 deletions(-)
> delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c
>
> diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
> index 04a8e3db8a54..3dd2f68366f9 100644
> --- a/tools/build/Makefile.feature
> +++ b/tools/build/Makefile.feature
> @@ -34,7 +34,6 @@ FEATURE_TESTS_BASIC := \
> dwarf_getlocations \
> eventfd \
> fortify-source \
> - sync-compare-and-swap \
> get_current_dir_name \
> gettid \
> glibc \
> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index ec203e28407f..eff55d287db1 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile
> @@ -9,7 +9,6 @@ FILES= \
> test-dwarf_getlocations.bin \
> test-eventfd.bin \
> test-fortify-source.bin \
> - test-sync-compare-and-swap.bin \
> test-get_current_dir_name.bin \
> test-glibc.bin \
> test-gtk2.bin \
> @@ -260,9 +259,6 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin:
> $(OUTPUT)test-libbabeltrace.bin:
> $(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)
>
> -$(OUTPUT)test-sync-compare-and-swap.bin:
> - $(BUILD)
> -
> $(OUTPUT)test-compile-32.bin:
> $(CC) -m32 -o $@ test-compile.c
>
> diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
> index 464873883396..920439527291 100644
> --- a/tools/build/feature/test-all.c
> +++ b/tools/build/feature/test-all.c
> @@ -106,10 +106,6 @@
> # include "test-libdw-dwarf-unwind.c"
> #undef main
>
> -#define main main_test_sync_compare_and_swap
> -# include "test-sync-compare-and-swap.c"
> -#undef main
> -
> #define main main_test_zlib
> # include "test-zlib.c"
> #undef main
> diff --git a/tools/build/feature/test-sync-compare-and-swap.c b/tools/build/feature/test-sync-compare-and-swap.c
> deleted file mode 100644
> index 3bc6b0768a53..000000000000
> --- a/tools/build/feature/test-sync-compare-and-swap.c
> +++ /dev/null
> @@ -1,15 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -#include <stdint.h>
> -
> -volatile uint64_t x;
> -
> -int main(int argc, char *argv[])
> -{
> - uint64_t old, new = argc;
> -
> - (void)argv;
> - do {
> - old = __sync_val_compare_and_swap(&x, 0, 0);
> - } while (!__sync_bool_compare_and_swap(&x, old, new));
> - return old == new;
> -}
> --
> 2.25.1
>

--

- Arnaldo

2021-08-29 10:54:31

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] perf/ring_buffer: Add comment for barriers on AUX ring buffer

Hi Peter,

On Mon, Aug 09, 2021 at 07:13:59PM +0800, Leo Yan wrote:
> AUX ring buffer applies almost the same barriers as perf ring buffer,
> but there has an exception for ordering between writing the AUX trace
> data and updating user_page::aux_head.
>
> This patch adds comment for how to use the barriers on AUX ring buffer,
> and gives comment to ask the drivers to flush the trace data into AUX
> ring buffer prior to updating user_page::aux_head.
>
> Signed-off-by: Leo Yan <[email protected]>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>

You have given the ACK tag before, could you pick up this patch?

Thanks,
Leo

> ---
> kernel/events/ring_buffer.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 52868716ec35..5cf6579be05e 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -509,6 +509,15 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
> perf_event_aux_event(handle->event, aux_head, size,
> handle->aux_flags);
>
> + /*
> + * See perf_output_put_handle(), AUX ring buffer applies the same
> + * barrier pairing as the perf ring buffer; except for B, since
> + * AUX ring buffer is written by hardware trace, we cannot simply
> + * use the generic memory barrier (like smp_wmb()) prior to update
> + * user_page::aux_head, the hardware trace driver takes the
> + * responsibility to ensure the trace data has been flushed into
> + * the AUX buffer before calling perf_aux_output_end().
> + */
> WRITE_ONCE(rb->user_page->aux_head, rb->aux_head);
> if (rb_need_aux_wakeup(rb))
> wakeup = true;
> --
> 2.25.1
>

2021-08-29 10:57:26

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] coresight: tmc-etr: Add barrier after updating AUX ring buffer

Hi Mathieu, Suzuki,

On Mon, Aug 09, 2021 at 07:14:00PM +0800, Leo Yan wrote:
> Since a memory barrier is required between AUX trace data store and
> aux_head store, and the AUX trace data is filled with memcpy(), it's
> sufficient to use smp_wmb() so can ensure the trace data is visible
> prior to updating aux_head.
>
> Signed-off-by: Leo Yan <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>

Could you pick up patches 02 and 03 in this series? Please note,
patch 02 has the review tag from Suzuki, but I didn't receive the
review tag for patch 03.

If anything need to follow up, just let me know. Thanks!

> ---
> drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index acdb59e0e661..13fd1fc730ed 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
> */
> if (etr_perf->snapshot)
> handle->head += size;
> +
> + /*
> + * Ensure that the AUX trace data is visible before the aux_head
> + * is updated via perf_aux_output_end(), as expected by the
> + * perf ring buffer.
> + */
> + smp_wmb();
> +
> out:
> /*
> * Don't set the TRUNCATED flag in snapshot mode because 1) the
> --
> 2.25.1
>

2021-08-29 10:59:10

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] perf/x86: Add compiler barrier after updating BTS

Hi Peter, or any x86 maintainer,

On Mon, Aug 09, 2021 at 07:14:02PM +0800, Leo Yan wrote:
> Since BTS is coherent, simply add a compiler barrier to separate the BTS
> update and aux_head store.

Could you reivew this patch and check if BTS needs the comipler
barrier in this case? Thanks.

> Signed-off-by: Leo Yan <[email protected]>
> ---
> arch/x86/events/intel/bts.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
> index 6320d2cfd9d3..974e917e65b2 100644
> --- a/arch/x86/events/intel/bts.c
> +++ b/arch/x86/events/intel/bts.c
> @@ -209,6 +209,12 @@ static void bts_update(struct bts_ctx *bts)
> } else {
> local_set(&buf->data_size, head);
> }
> +
> + /*
> + * Since BTS is coherent, just add compiler barrier to ensure
> + * BTS updating is ordered against bts::handle::event.
> + */
> + barrier();
> }
>
> static int
> --
> 2.25.1
>

2021-09-14 08:26:12

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] coresight: tmc-etf: Add comment for store ordering

On 09/08/2021 12:14, Leo Yan wrote:
> Since the function CS_LOCK() has contained memory barrier mb(), it
> ensures the visibility of the AUX trace data before updating the
> aux_head, thus it's needless to add any explicit barrier anymore.
>
> Add comment to make clear for the barrier usage for ETF.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index cd0fb7bfba68..8debd4f40f06 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -553,6 +553,11 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
> if (buf->snapshot)
> handle->head += to_read;
>
> + /*
> + * CS_LOCK() contains mb() so it can ensure visibility of the AUX trace
> + * data before the aux_head is updated via perf_aux_output_end(), which
> + * is expected by the perf ring buffer.
> + */
> CS_LOCK(drvdata->base);
> out:
> spin_unlock_irqrestore(&drvdata->spinlock, flags);
>

I will queue this.

Thanks
Suzuki

2021-09-14 09:09:46

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] coresight: tmc-etr: Add barrier after updating AUX ring buffer

On 29/08/2021 11:55, Leo Yan wrote:
> Hi Mathieu, Suzuki,
>
> On Mon, Aug 09, 2021 at 07:14:00PM +0800, Leo Yan wrote:
>> Since a memory barrier is required between AUX trace data store and
>> aux_head store, and the AUX trace data is filled with memcpy(), it's
>> sufficient to use smp_wmb() so can ensure the trace data is visible
>> prior to updating aux_head.
>>
>> Signed-off-by: Leo Yan <[email protected]>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>
> Could you pick up patches 02 and 03 in this series? Please note,
> patch 02 has the review tag from Suzuki, but I didn't receive the
> review tag for patch 03.
>
> If anything need to follow up, just let me know. Thanks!

I have picked up both the patches.

Thanks
Suzuki

>
>> ---
>> drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index acdb59e0e661..13fd1fc730ed 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
>> */
>> if (etr_perf->snapshot)
>> handle->head += size;
>> +
>> + /*
>> + * Ensure that the AUX trace data is visible before the aux_head
>> + * is updated via perf_aux_output_end(), as expected by the
>> + * perf ring buffer.
>> + */
>> + smp_wmb();
>> +
>> out:
>> /*
>> * Don't set the TRUNCATED flag in snapshot mode because 1) the
>> --
>> 2.25.1
>>

2021-09-14 09:58:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] perf/x86: Add compiler barrier after updating BTS

On Sun, Aug 29, 2021 at 06:56:57PM +0800, Leo Yan wrote:
> Hi Peter, or any x86 maintainer,
>
> On Mon, Aug 09, 2021 at 07:14:02PM +0800, Leo Yan wrote:
> > Since BTS is coherent, simply add a compiler barrier to separate the BTS
> > update and aux_head store.
>
> Could you reivew this patch and check if BTS needs the comipler
> barrier in this case? Thanks.

Yes, a compiler barrier is sufficient.

You want me to pick it up?

> > Signed-off-by: Leo Yan <[email protected]>
> > ---
> > arch/x86/events/intel/bts.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
> > index 6320d2cfd9d3..974e917e65b2 100644
> > --- a/arch/x86/events/intel/bts.c
> > +++ b/arch/x86/events/intel/bts.c
> > @@ -209,6 +209,12 @@ static void bts_update(struct bts_ctx *bts)
> > } else {
> > local_set(&buf->data_size, head);
> > }
> > +
> > + /*
> > + * Since BTS is coherent, just add compiler barrier to ensure
> > + * BTS updating is ordered against bts::handle::event.
> > + */
> > + barrier();
> > }
> >
> > static int
> > --
> > 2.25.1
> >

2021-09-14 10:07:36

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] perf/x86: Add compiler barrier after updating BTS

Hi Peter,

On Tue, Sep 14, 2021 at 11:51:36AM +0200, Peter Zijlstra wrote:
> On Sun, Aug 29, 2021 at 06:56:57PM +0800, Leo Yan wrote:
> > Hi Peter, or any x86 maintainer,
> >
> > On Mon, Aug 09, 2021 at 07:14:02PM +0800, Leo Yan wrote:
> > > Since BTS is coherent, simply add a compiler barrier to separate the BTS
> > > update and aux_head store.
> >
> > Could you reivew this patch and check if BTS needs the comipler
> > barrier in this case? Thanks.
>
> Yes, a compiler barrier is sufficient.
>
> You want me to pick it up?

Maybe other maintainers are more suitable than me to answer this :)

Yeah, I think it's great if you could pick it.

Thanks,
Leo

2021-09-14 11:59:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] perf/x86: Add compiler barrier after updating BTS

On Tue, Sep 14, 2021 at 06:05:17PM +0800, Leo Yan wrote:
> Hi Peter,
>
> On Tue, Sep 14, 2021 at 11:51:36AM +0200, Peter Zijlstra wrote:
> > On Sun, Aug 29, 2021 at 06:56:57PM +0800, Leo Yan wrote:
> > > Hi Peter, or any x86 maintainer,
> > >
> > > On Mon, Aug 09, 2021 at 07:14:02PM +0800, Leo Yan wrote:
> > > > Since BTS is coherent, simply add a compiler barrier to separate the BTS
> > > > update and aux_head store.
> > >
> > > Could you reivew this patch and check if BTS needs the comipler
> > > barrier in this case? Thanks.
> >
> > Yes, a compiler barrier is sufficient.
> >
> > You want me to pick it up?
>
> Maybe other maintainers are more suitable than me to answer this :)
>
> Yeah, I think it's great if you could pick it.

OK, done.

Subject: [tip: perf/core] perf/x86: Add compiler barrier after updating BTS

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 41100833cdd8b1bef363b81a6482d74711c116ad
Gitweb: https://git.kernel.org/tip/41100833cdd8b1bef363b81a6482d74711c116ad
Author: Leo Yan <[email protected]>
AuthorDate: Mon, 09 Aug 2021 19:14:02 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 17 Sep 2021 15:08:38 +02:00

perf/x86: Add compiler barrier after updating BTS

Since BTS is coherent, simply add a compiler barrier to separate the BTS
update and aux_head store.

Signed-off-by: Leo Yan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/intel/bts.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 6320d2c..974e917 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -209,6 +209,12 @@ static void bts_update(struct bts_ctx *bts)
} else {
local_set(&buf->data_size, head);
}
+
+ /*
+ * Since BTS is coherent, just add compiler barrier to ensure
+ * BTS updating is ordered against bts::handle::event.
+ */
+ barrier();
}

static int