2013-05-20 16:15:02

by Vince Weaver

[permalink] [raw]
Subject: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

Hello

on 3.10-rc1 with the trinity fuzzer patched to exercise the
perf_event_open() syscall I am triggering this WARN_ONCE:

[ 75.864822] ------------[ cut here ]------------
[ 75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
[ 75.864832] Can't find any breakpoint slot
[ 75.864833] Modules linked in: dn_rtmsg can_raw nfnetlink can_bcm can xfrm_user xfrm_algo nfc rfkill ax25 scsi_transport_iscsi atm ipt_ULOG x_tables ipx p8023 p8022 irda crc_ccitt appletalk psnap llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache sunrpc loop fuse snd_hda_codec_hdmi snd_hda_codec_realtek coretemp kvm_intel kvm evdev nouveau mxm_wmi ttm drm_kms_helper video wmi drm i2c_algo_bit microcode snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq snd_seq_device snd_timer snd acpi_cpufreq mperf processor thermal_sys psmouse serio_raw pcspkr button soundcore shpchp i2c_nforce2 i2c_core ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif ata_generic ahci libahci ehci_pci ohci_hcd ehci_hcd libata r8169 mii scsi_mod usbcore usb_common
[ 75.864890] CPU: 0 PID: 3136 Comm: trinity-child0 Not tainted 3.10.0-rc1 #1
[ 75.864892] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015 10/19/2012
[ 75.864894] 0000000000000000 ffffffff8102e205 ffff880119b01c98 ffff880119abac00
[ 75.864897] ffff880119b01ce8 ffff88011fc15b28 0000000ed19e90a3 ffffffff8102e2b0
[ 75.864900] ffffffff814db382 0000000200000018 ffff880119b01cf8 ffff880119b01cb8
[ 75.864903] Call Trace:
[ 75.864907] [<ffffffff8102e205>] ? warn_slowpath_common+0x5b/0x70
[ 75.864910] [<ffffffff8102e2b0>] ? warn_slowpath_fmt+0x47/0x49
[ 75.864913] [<ffffffff81055d08>] ? sched_clock_local+0xc/0x6d
[ 75.864916] [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
[ 75.864919] [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c
[ 75.864921] [<ffffffff810ab69b>] ? group_sched_in+0x46/0x120
[ 75.864923] [<ffffffff810ab898>] ? ctx_sched_in+0x123/0x141
[ 75.864926] [<ffffffff810abdf1>] ? __perf_install_in_context+0xcb/0xea
[ 75.864929] [<ffffffff810a88b2>] ? perf_swevent_add+0xb8/0xb8
[ 75.864932] [<ffffffff810a88c5>] ? remote_function+0x13/0x3b
[ 75.864934] [<ffffffff8106fd87>] ? smp_call_function_single+0x76/0xf1
[ 75.864937] [<ffffffff810a803a>] ? task_function_call+0x42/0x4c
[ 75.864939] [<ffffffff810abd26>] ? perf_event_sched_in+0x69/0x69
[ 75.864942] [<ffffffff810aa018>] ? perf_install_in_context+0x5b/0x97
[ 75.864945] [<ffffffff810aecf0>] ? SYSC_perf_event_open+0x65f/0x7d4
[ 75.864949] [<ffffffff81369792>] ? system_call_fastpath+0x16/0x1b
[ 75.864951] ---[ end trace 250def16d8853b8c ]---

Is this condition really drastic enough to deserve a WARNing?

Vince Weaver
[email protected]
http://www.eece.maine.edu/~vweaver/


2013-05-28 17:04:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

Well. I am not familiar with this code, and when I tried to read it
I feel I will be never able to understand it ;)

On 05/20, Vince Weaver wrote:
>
> on 3.10-rc1 with the trinity fuzzer patched to exercise the
> perf_event_open() syscall I am triggering this WARN_ONCE:
>
> [ 75.864822] ------------[ cut here ]------------
> [ 75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
...
> [ 75.864916] [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
> [ 75.864919] [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c

I am wondering if we should check attr->pinned before WARN_ONCE...

But it seems that hw_breakpoint.c is buggy anyway.

Suppose that attr.task != NULL and event->cpu = -1.

__reserve_bp_slot() tries to calculate slots.pinned and calls
fetch_bp_busy_slots().

In this case fetch_bp_busy_slots() does

for_each_online_cpu(cpu)
...
nr += task_bp_pinned(cpu, bp, type);

And task_bp_pinned() (in particular) checks cpu == event->cpu,
this will be never true.

IOW, it seems that __reserve_bp_slot(task, cpu => -1) always
succeeds because task_bp_pinned() returns 0 and thus we can
create more than HWP_NUM breakpoints. Much more ;)

As for _create, I guess we probably need something like

--- x/kernel/events/hw_breakpoint.c
+++ x/kernel/events/hw_breakpoint.c
@@ -156,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots
if (!tsk)
nr += max_task_bp_pinned(cpu, type);
else
- nr += task_bp_pinned(cpu, bp, type);
+ nr += task_bp_pinned(-1, bp, type);

if (nr > slots->pinned)
slots->pinned = nr;

But I simply can't understand toggle_bp_task_slot()->task_bp_pinned().

Oleg.

2013-05-28 17:32:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

OK, Jiri explained me how can I use perf to install the hwbp ;)

And indeed,

# perl -e 'sleep 1 while 1' &
[1] 507
# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

triggers the same warn/problem.

Interestingly,

perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 true

correctly fails with ENOSPC, this is because perf installs NR_CPUS counters
for each cpu and the accounting works.

IIRC, I already tried to complain that perf could be smarter in this case
and install a single counter with event->cpu = -1, but this is offtopic.

Oleg.

On 05/28, Oleg Nesterov wrote:
>
> Well. I am not familiar with this code, and when I tried to read it
> I feel I will be never able to understand it ;)
>
> On 05/20, Vince Weaver wrote:
> >
> > on 3.10-rc1 with the trinity fuzzer patched to exercise the
> > perf_event_open() syscall I am triggering this WARN_ONCE:
> >
> > [ 75.864822] ------------[ cut here ]------------
> > [ 75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()
> ...
> > [ 75.864916] [<ffffffff81006fff>] ? arch_install_hw_breakpoint+0x5b/0xcb
> > [ 75.864919] [<ffffffff810ab5a1>] ? event_sched_in+0x68/0x11c
>
> I am wondering if we should check attr->pinned before WARN_ONCE...
>
> But it seems that hw_breakpoint.c is buggy anyway.
>
> Suppose that attr.task != NULL and event->cpu = -1.
>
> __reserve_bp_slot() tries to calculate slots.pinned and calls
> fetch_bp_busy_slots().
>
> In this case fetch_bp_busy_slots() does
>
> for_each_online_cpu(cpu)
> ...
> nr += task_bp_pinned(cpu, bp, type);
>
> And task_bp_pinned() (in particular) checks cpu == event->cpu,
> this will be never true.
>
> IOW, it seems that __reserve_bp_slot(task, cpu => -1) always
> succeeds because task_bp_pinned() returns 0 and thus we can
> create more than HWP_NUM breakpoints. Much more ;)
>
> As for _create, I guess we probably need something like
>
> --- x/kernel/events/hw_breakpoint.c
> +++ x/kernel/events/hw_breakpoint.c
> @@ -156,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots
> if (!tsk)
> nr += max_task_bp_pinned(cpu, type);
> else
> - nr += task_bp_pinned(cpu, bp, type);
> + nr += task_bp_pinned(-1, bp, type);
>
> if (nr > slots->pinned)
> slots->pinned = nr;
>
> But I simply can't understand toggle_bp_task_slot()->task_bp_pinned().
>
> Oleg.

2013-05-28 18:51:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

On 05/28, Oleg Nesterov wrote:
>
> IIRC, I already tried to complain that perf could be smarter in this case
> and install a single counter with event->cpu = -1, but this is offtopic.

Just in case, please see my old email

http://marc.info/?l=linux-kernel&m=136017983113299

I didn't check if perf was changed since then. And I forgot everything
(not too much ;) I learned after I briefly looked into tools/perf/.

But, again, this is off-topic.

Oleg.

2013-05-29 16:37:04

by Oleg Nesterov

[permalink] [raw]
Subject: [MAYBEPATCH] : WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

OK, I seem to understand what toggle_bp_task_slot() does, and
it looks equally wrong.

I think we need something like the patch below... I'll try to
recheck/test tomorrow, but I would not mind if someone who knows
this code makes the authoritative fix.

Even if this patch is right, I think this all needs more cleanups,
at least. For example, every DEFINE_PER_CPU() looks bogus. This
is not pcpu memory.

Oleg.

--- x/kernel/events/hw_breakpoint.c
+++ x/kernel/events/hw_breakpoint.c
@@ -111,7 +111,7 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
* Count the number of breakpoints of the same type and same task.
* The given event must be not on the list.
*/
-static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
+static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
{
struct task_struct *tsk = bp->hw.bp_target;
struct perf_event *iter;
@@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
if (iter->hw.bp_target == tsk &&
find_slot_idx(iter) == type &&
- cpu == iter->cpu)
+ bp->cpu == iter->cpu)
count += hw_breakpoint_weight(iter);
}

@@ -137,13 +137,17 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
{
int cpu = bp->cpu;
struct task_struct *tsk = bp->hw.bp_target;
+ int task_pinned;
+
+ if (tsk)
+ task_pinned = task_bp_pinned(bp, type);

if (cpu >= 0) {
slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
if (!tsk)
slots->pinned += max_task_bp_pinned(cpu, type);
else
- slots->pinned += task_bp_pinned(cpu, bp, type);
+ slots->pinned += task_pinned;
slots->flexible = per_cpu(nr_bp_flexible[type], cpu);

return;
@@ -156,7 +160,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
if (!tsk)
nr += max_task_bp_pinned(cpu, type);
else
- nr += task_bp_pinned(cpu, bp, type);
+ nr += task_pinned;

if (nr > slots->pinned)
slots->pinned = nr;
@@ -182,15 +186,13 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
/*
* Add a pinned breakpoint for the given task in our constraint table
*/
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
+static void toggle_bp_task_slot(int old_count, int cpu, bool enable,
enum bp_type_idx type, int weight)
{
unsigned int *tsk_pinned;
- int old_count = 0;
int old_idx = 0;
int idx = 0;

- old_count = task_bp_pinned(cpu, bp, type);
old_idx = old_count - 1;
idx = old_idx + weight;

@@ -216,6 +218,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
{
int cpu = bp->cpu;
struct task_struct *tsk = bp->hw.bp_target;
+ int task_pinned;

/* Pinned counter cpu profiling */
if (!tsk) {
@@ -232,11 +235,13 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
if (!enable)
list_del(&bp->hw.bp_list);

+ task_pinned = task_bp_pinned(bp, type);
+
if (cpu >= 0) {
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ toggle_bp_task_slot(task_pinned, cpu, enable, type, weight);
} else {
for_each_online_cpu(cpu)
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ toggle_bp_task_slot(task_pinned, cpu, enable, type, weight);
}

if (enable)

2013-06-01 18:25:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

On 05/20, Vince Weaver wrote:
>
> on 3.10-rc1 with the trinity fuzzer patched to exercise the
> perf_event_open() syscall I am triggering this WARN_ONCE:
>
> [ 75.864822] ------------[ cut here ]------------
> [ 75.864830] WARNING: at arch/x86/kernel/hw_breakpoint.c:121 arch_install_hw_breakpoint+0x5b/0xcb()

Ingo,

I am not sure about -stable, but probably this is 3.10 material.

Oleg.

2013-06-01 18:25:45

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)

trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint slot")
in arch_install_hw_breakpoint() but the problem is not arch-specific.

The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu" but
this doesn't account the "all cpus" events with iter->cpu < 0.

This means that, say, register_user_hw_breakpoint(tsk) can happily
create the arbitrary number > HBP_NUM of breakpoints which can not
be activated. toggle_bp_task_slot() is equally wrong by the same
reason and nr_task_bp_pinned[] can have negative entries.

Simple test:

# perl -e 'sleep 1 while 1' &
# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

Before this patch this triggers the same problem/WARN_ON(), after
the patch it correctly fails with -ENOSPC.

Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: <[email protected]>
---
kernel/events/hw_breakpoint.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index ed1c897..29d3abe 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
if (iter->hw.bp_target == tsk &&
find_slot_idx(iter) == type &&
- cpu == iter->cpu)
+ (iter->cpu < 0 || cpu == iter->cpu))
count += hw_breakpoint_weight(iter);
}

--
1.5.5.1

2013-06-01 18:25:54

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()

fetch_bp_busy_slots() and toggle_bp_slot() use for_each_online_cpu(),
this is obviously wrong wrt cpu_up() or cpu_down(), we can over/under
account the per-cpu numbers.

For example:

# echo 0 >> /sys/devices/system/cpu/cpu1/online
# perf record -e mem:0x10 -p 1 &
# echo 1 >> /sys/devices/system/cpu/cpu1/online
# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
# taskset -p 0x2 1

triggers the same WARN_ONCE("Can't find any breakpoint slot") in
arch_install_hw_breakpoint().

Signed-off-by: Oleg Nesterov <[email protected]>
Cc: <[email protected]>
---
kernel/events/hw_breakpoint.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 29d3abe..4407e43 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -149,7 +149,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
return;
}

- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
unsigned int nr;

nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
@@ -235,7 +235,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
if (cpu >= 0) {
toggle_bp_task_slot(bp, cpu, enable, type, weight);
} else {
- for_each_online_cpu(cpu)
+ for_each_possible_cpu(cpu)
toggle_bp_task_slot(bp, cpu, enable, type, weight);
}

--
1.5.5.1

2013-06-01 19:49:46

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/3] hw_breakpoint: cleanups

Hello.

Cleanups, on top of

[PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

series.

Oleg.

kernel/events/hw_breakpoint.c | 91 ++++++++++++++++-------------------------
1 files changed, 35 insertions(+), 56 deletions(-)

2013-06-01 19:50:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths

The enable/disable logic in toggle_bp_slot() is not symmetrical
and imho very confusing. "old_count" in toggle_bp_task_slot() is
actually new_count because this bp was already removed from the
list.

Change toggle_bp_slot() to always call list_add/list_del after
toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
this entry should be decremented, new_idx is +/-weight and we
need to increment this element. The code/logic looks obvious.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/hw_breakpoint.c | 40 ++++++++++++++++------------------------
1 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index ed31fe1..21a3c88 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -185,26 +185,20 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
enum bp_type_idx type, int weight)
{
- unsigned int *tsk_pinned;
- int old_count = 0;
- int old_idx = 0;
- int idx = 0;
-
- old_count = task_bp_pinned(cpu, bp, type);
- old_idx = old_count - 1;
- idx = old_idx + weight;
-
- /* tsk_pinned[n] is the number of tasks having n breakpoints */
- tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
- if (enable) {
- tsk_pinned[idx]++;
- if (old_count > 0)
- tsk_pinned[old_idx]--;
- } else {
- tsk_pinned[idx]--;
- if (old_count > 0)
- tsk_pinned[old_idx]++;
- }
+ /* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
+ unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+ int old_idx, new_idx;
+
+ old_idx = task_bp_pinned(cpu, bp, type) - 1;
+ if (enable)
+ new_idx = old_idx + weight;
+ else
+ new_idx = old_idx - weight;
+
+ if (old_idx >= 0)
+ tsk_pinned[old_idx]--;
+ if (new_idx >= 0)
+ tsk_pinned[new_idx]++;
}

/*
@@ -228,10 +222,6 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
}

/* Pinned counter task profiling */
-
- if (!enable)
- list_del(&bp->hw.bp_list);
-
if (cpu >= 0) {
toggle_bp_task_slot(bp, cpu, enable, type, weight);
} else {
@@ -241,6 +231,8 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,

if (enable)
list_add_tail(&bp->hw.bp_list, &bp_task_head);
+ else
+ list_del(&bp->hw.bp_list);
}

/*
--
1.5.5.1

2013-06-01 19:50:18

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths

Change toggle_bp_slot() to make "weight" negative if !enable. This
way we can always use "+ weight" without additional "if (enable)"
check and toggle_bp_task_slot() no longer needs this arg.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/hw_breakpoint.c | 20 ++++++++------------
1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 21a3c88..2f4d7c4 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -182,7 +182,7 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
/*
* Add a pinned breakpoint for the given task in our constraint table
*/
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
+static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
enum bp_type_idx type, int weight)
{
/* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
@@ -190,10 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
int old_idx, new_idx;

old_idx = task_bp_pinned(cpu, bp, type) - 1;
- if (enable)
- new_idx = old_idx + weight;
- else
- new_idx = old_idx - weight;
+ new_idx = old_idx + weight;

if (old_idx >= 0)
tsk_pinned[old_idx]--;
@@ -211,22 +208,21 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
int cpu = bp->cpu;
struct task_struct *tsk = bp->hw.bp_target;

+ if (!enable)
+ weight = -weight;
+
/* Pinned counter cpu profiling */
if (!tsk) {
-
- if (enable)
- per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
- else
- per_cpu(nr_cpu_bp_pinned[type], bp->cpu) -= weight;
+ per_cpu(nr_cpu_bp_pinned[type], cpu) += weight;
return;
}

/* Pinned counter task profiling */
if (cpu >= 0) {
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ toggle_bp_task_slot(bp, cpu, type, weight);
} else {
for_each_possible_cpu(cpu)
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ toggle_bp_task_slot(bp, cpu, type, weight);
}

if (enable)
--
1.5.5.1

2013-06-01 19:50:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp()

Add the trivial helper which simply returns cpumask_of() or
cpu_possible_mask depending on bp->cpu.

Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
code duplication.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/hw_breakpoint.c | 43 ++++++++++++++++------------------------
1 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 2f4d7c4..57efe5d 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -127,6 +127,13 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
return count;
}

+static const struct cpumask *cpumask_of_bp(struct perf_event *bp)
+{
+ if (bp->cpu >= 0)
+ return cpumask_of(bp->cpu);
+ return cpu_possible_mask;
+}
+
/*
* Report the number of pinned/un-pinned breakpoints we have in
* a given cpu (cpu > -1) or in all of them (cpu = -1).
@@ -135,25 +142,13 @@ static void
fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
enum bp_type_idx type)
{
- int cpu = bp->cpu;
- struct task_struct *tsk = bp->hw.bp_target;
-
- if (cpu >= 0) {
- slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
- if (!tsk)
- slots->pinned += max_task_bp_pinned(cpu, type);
- else
- slots->pinned += task_bp_pinned(cpu, bp, type);
- slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
-
- return;
- }
+ const struct cpumask *cpumask = cpumask_of_bp(bp);
+ int cpu;

- for_each_possible_cpu(cpu) {
- unsigned int nr;
+ for_each_cpu(cpu, cpumask) {
+ unsigned int nr = per_cpu(nr_cpu_bp_pinned[type], cpu);

- nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
- if (!tsk)
+ if (!bp->hw.bp_target)
nr += max_task_bp_pinned(cpu, type);
else
nr += task_bp_pinned(cpu, bp, type);
@@ -205,25 +200,21 @@ static void
toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
int weight)
{
- int cpu = bp->cpu;
- struct task_struct *tsk = bp->hw.bp_target;
+ const struct cpumask *cpumask = cpumask_of_bp(bp);
+ int cpu;

if (!enable)
weight = -weight;

/* Pinned counter cpu profiling */
- if (!tsk) {
- per_cpu(nr_cpu_bp_pinned[type], cpu) += weight;
+ if (!bp->hw.bp_target) {
+ per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
return;
}

/* Pinned counter task profiling */
- if (cpu >= 0) {
+ for_each_cpu(cpu, cpumask)
toggle_bp_task_slot(bp, cpu, type, weight);
- } else {
- for_each_possible_cpu(cpu)
- toggle_bp_task_slot(bp, cpu, type, weight);
- }

if (enable)
list_add_tail(&bp->hw.bp_list, &bp_task_head);
--
1.5.5.1

2013-06-02 19:54:26

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/2] hw_breakpoint: more cleanups

Hello.

More cleanups, on top of "[PATCH 0/3] hw_breakpoint: cleanups".

Oleg.

kernel/events/hw_breakpoint.c | 103 +++++++++++++++++++----------------------
1 files changed, 47 insertions(+), 56 deletions(-)

2013-06-02 19:55:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint()

1. register_wide_hw_breakpoint() can use unregister_ if failure,
no need to duplicate the code.

2. "struct perf_event **pevent" adds the unnecesary lever of
indirection and complication, use per_cpu(*cpu_events, cpu).

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/hw_breakpoint.c | 34 +++++++++++-----------------------
1 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 74d739b..17d8093 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -497,8 +497,8 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
void *context)
{
- struct perf_event * __percpu *cpu_events, **pevent, *bp;
- long err;
+ struct perf_event * __percpu *cpu_events, *bp;
+ long err = 0;
int cpu;

cpu_events = alloc_percpu(typeof(*cpu_events));
@@ -507,31 +507,21 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,

get_online_cpus();
for_each_online_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
bp = perf_event_create_kernel_counter(attr, cpu, NULL,
triggered, context);
-
- *pevent = bp;
-
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
- goto fail;
+ break;
}
- }
- put_online_cpus();
-
- return cpu_events;

-fail:
- for_each_online_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
- if (IS_ERR(*pevent))
- break;
- unregister_hw_breakpoint(*pevent);
+ per_cpu(*cpu_events, cpu) = bp;
}
put_online_cpus();

- free_percpu(cpu_events);
+ if (likely(!err))
+ return cpu_events;
+
+ unregister_wide_hw_breakpoint(cpu_events);
return (void __percpu __force *)ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
@@ -543,12 +533,10 @@ EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events)
{
int cpu;
- struct perf_event **pevent;

- for_each_possible_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
- unregister_hw_breakpoint(*pevent);
- }
+ for_each_possible_cpu(cpu)
+ unregister_hw_breakpoint(per_cpu(*cpu_events, cpu));
+
free_percpu(cpu_events);
}
EXPORT_SYMBOL_GPL(unregister_wide_hw_breakpoint);
--
1.5.5.1

2013-06-02 19:55:56

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"

This patch simply moves all per-cpu variables into the new single
per-cpu "struct bp_cpuinfo".

To me this looks more logical and clean, but this can also simplify
the further potential changes. In particular, I do not think this
memory should be per-cpu, it is never used "locally". After this
change it is trivial to turn it into, say, bootmem[nr_cpu_ids].

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/hw_breakpoint.c | 69 +++++++++++++++++++++-------------------
1 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 17d8093..42c47a8 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -46,23 +46,26 @@
#include <linux/smp.h>

#include <linux/hw_breakpoint.h>
-
-
/*
* Constraints data
*/
+struct bp_cpuinfo {
+ /* Number of pinned cpu breakpoints in a cpu */
+ unsigned int cpu_pinned;
+ /* tsk_pinned[n] is the number of tasks having n+1 breakpoints */
+ unsigned int *tsk_pinned;
+ /* Number of non-pinned cpu/task breakpoints in a cpu */
+ unsigned int flexible; /* XXX: placeholder, see fetch_this_slot() */
+};

-/* Number of pinned cpu breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_cpu_bp_pinned[TYPE_MAX]);
-
-/* Number of pinned task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int *, nr_task_bp_pinned[TYPE_MAX]);
-
-/* Number of non-pinned cpu/task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_bp_flexible[TYPE_MAX]);
-
+static DEFINE_PER_CPU(struct bp_cpuinfo, bp_cpuinfo[TYPE_MAX]);
static int nr_slots[TYPE_MAX];

+static struct bp_cpuinfo *get_bp_info(int cpu, enum bp_type_idx type)
+{
+ return per_cpu_ptr(bp_cpuinfo + type, cpu);
+}
+
/* Keep track of the breakpoints attached to tasks */
static LIST_HEAD(bp_task_head);

@@ -96,8 +99,8 @@ static inline enum bp_type_idx find_slot_idx(struct perf_event *bp)
*/
static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
{
+ unsigned int *tsk_pinned = get_bp_info(cpu, type)->tsk_pinned;
int i;
- unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);

for (i = nr_slots[type] - 1; i >= 0; i--) {
if (tsk_pinned[i] > 0)
@@ -146,8 +149,10 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
int cpu;

for_each_cpu(cpu, cpumask) {
- unsigned int nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
+ struct bp_cpuinfo *info = get_bp_info(cpu, type);
+ int nr;

+ nr = info->cpu_pinned;
if (!bp->hw.bp_target)
nr += max_task_bp_pinned(cpu, type);
else
@@ -156,8 +161,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
if (nr > slots->pinned)
slots->pinned = nr;

- nr = per_cpu(nr_bp_flexible[type], cpu);
-
+ nr = info->flexible;
if (nr > slots->flexible)
slots->flexible = nr;
}
@@ -180,8 +184,7 @@ fetch_this_slot(struct bp_busy_slots *slots, int weight)
static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
enum bp_type_idx type, int weight)
{
- /* tsk_pinned[n-1] is the number of tasks having n>0 breakpoints */
- unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+ unsigned int *tsk_pinned = get_bp_info(cpu, type)->tsk_pinned;
int old_idx, new_idx;

old_idx = task_bp_pinned(cpu, bp, type) - 1;
@@ -208,7 +211,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,

/* Pinned counter cpu profiling */
if (!bp->hw.bp_target) {
- per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
+ get_bp_info(bp->cpu, type)->cpu_pinned += weight;
return;
}

@@ -240,8 +243,8 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
*
* - If attached to a single cpu, check:
*
- * (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
- * + max(per_cpu(nr_task_bp_pinned, cpu)))) < HBP_NUM
+ * (per_cpu(info->flexible, cpu) || (per_cpu(info->cpu_pinned, cpu)
+ * + max(per_cpu(info->tsk_pinned, cpu)))) < HBP_NUM
*
* -> If there are already non-pinned counters in this cpu, it means
* there is already a free slot for them.
@@ -251,8 +254,8 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
*
* - If attached to every cpus, check:
*
- * (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
- * + max(per_cpu(nr_task_bp_pinned, *)))) < HBP_NUM
+ * (per_cpu(info->flexible, *) || (max(per_cpu(info->cpu_pinned, *))
+ * + max(per_cpu(info->tsk_pinned, *)))) < HBP_NUM
*
* -> This is roughly the same, except we check the number of per cpu
* bp for every cpu and we keep the max one. Same for the per tasks
@@ -263,16 +266,16 @@ __weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
*
* - If attached to a single cpu, check:
*
- * ((per_cpu(nr_bp_flexible, cpu) > 1) + per_cpu(nr_cpu_bp_pinned, cpu)
- * + max(per_cpu(nr_task_bp_pinned, cpu))) < HBP_NUM
+ * ((per_cpu(info->flexible, cpu) > 1) + per_cpu(info->cpu_pinned, cpu)
+ * + max(per_cpu(info->tsk_pinned, cpu))) < HBP_NUM
*
- * -> Same checks as before. But now the nr_bp_flexible, if any, must keep
+ * -> Same checks as before. But now the info->flexible, if any, must keep
* one register at least (or they will never be fed).
*
* - If attached to every cpus, check:
*
- * ((per_cpu(nr_bp_flexible, *) > 1) + max(per_cpu(nr_cpu_bp_pinned, *))
- * + max(per_cpu(nr_task_bp_pinned, *))) < HBP_NUM
+ * ((per_cpu(info->flexible, *) > 1) + max(per_cpu(info->cpu_pinned, *))
+ * + max(per_cpu(info->tsk_pinned, *))) < HBP_NUM
*/
static int __reserve_bp_slot(struct perf_event *bp)
{
@@ -617,7 +620,6 @@ static struct pmu perf_breakpoint = {

int __init init_hw_breakpoint(void)
{
- unsigned int **task_bp_pinned;
int cpu, err_cpu;
int i;

@@ -626,10 +628,11 @@ int __init init_hw_breakpoint(void)

for_each_possible_cpu(cpu) {
for (i = 0; i < TYPE_MAX; i++) {
- task_bp_pinned = &per_cpu(nr_task_bp_pinned[i], cpu);
- *task_bp_pinned = kzalloc(sizeof(int) * nr_slots[i],
- GFP_KERNEL);
- if (!*task_bp_pinned)
+ struct bp_cpuinfo *info = get_bp_info(cpu, i);
+
+ info->tsk_pinned = kcalloc(nr_slots[i], sizeof(int),
+ GFP_KERNEL);
+ if (!info->tsk_pinned)
goto err_alloc;
}
}
@@ -643,7 +646,7 @@ int __init init_hw_breakpoint(void)
err_alloc:
for_each_possible_cpu(err_cpu) {
for (i = 0; i < TYPE_MAX; i++)
- kfree(per_cpu(nr_task_bp_pinned[i], err_cpu));
+ kfree(get_bp_info(err_cpu, i)->tsk_pinned);
if (err_cpu == cpu)
break;
}
--
1.5.5.1

2013-06-13 14:01:19

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/3] hw_breakpoint: cleanups

On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
> Hello.
>
> Cleanups, on top of
>
> [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c

So this series doesn't have the fix for the warning?

>
> series.
>
> Oleg.
>
> kernel/events/hw_breakpoint.c | 91 ++++++++++++++++-------------------------
> 1 files changed, 35 insertions(+), 56 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-06-13 14:20:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)

On Sat, Jun 01, 2013 at 08:21:20PM +0200, Oleg Nesterov wrote:
> trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint slot")
> in arch_install_hw_breakpoint() but the problem is not arch-specific.
>
> The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu" but
> this doesn't account the "all cpus" events with iter->cpu < 0.
>
> This means that, say, register_user_hw_breakpoint(tsk) can happily
> create the arbitrary number > HBP_NUM of breakpoints which can not
> be activated. toggle_bp_task_slot() is equally wrong by the same
> reason and nr_task_bp_pinned[] can have negative entries.
>
> Simple test:
>
> # perl -e 'sleep 1 while 1' &
> # perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`
>
> Before this patch this triggers the same problem/WARN_ON(), after
> the patch it correctly fails with -ENOSPC.
>
> Reported-by: Vince Weaver <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> Cc: <[email protected]>

Looks good, thanks!

Acked-by: Frederic Weisbecker <[email protected]>


> ---
> kernel/events/hw_breakpoint.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index ed1c897..29d3abe 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -120,7 +120,7 @@ static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
> list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> if (iter->hw.bp_target == tsk &&
> find_slot_idx(iter) == type &&
> - cpu == iter->cpu)
> + (iter->cpu < 0 || cpu == iter->cpu))
> count += hw_breakpoint_weight(iter);
> }
>
> --
> 1.5.5.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-06-13 15:20:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/3] hw_breakpoint: cleanups

On 06/13, Frederic Weisbecker wrote:
>
> On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > Cleanups, on top of
> >
> > [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
>
> So this series doesn't have the fix for the warning?

I don't understand the question ;)

The previous series

[PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
[PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()

tried to fix the bugs which lead to this (correct) warning.

This and the next one try to cleanup the code.

Oleg.

2013-06-13 15:24:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/3] hw_breakpoint: cleanups

2013/6/13 Oleg Nesterov <[email protected]>:
> On 06/13, Frederic Weisbecker wrote:
>>
>> On Sat, Jun 01, 2013 at 09:45:26PM +0200, Oleg Nesterov wrote:
>> > Hello.
>> >
>> > Cleanups, on top of
>> >
>> > [PATCH 0/2]: WARN_ONCE in arch/x86/kernel/hw_breakpoint.c
>>
>> So this series doesn't have the fix for the warning?
>
> I don't understand the question ;)
>
> The previous series
>
> [PATCH 1/2] hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
> [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
>
> tried to fix the bugs which lead to this (correct) warning.
>
> This and the next one try to cleanup the code.

Ah ok, there is two series (/me confused as usual :)

2013-06-15 12:46:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()

On Sat, Jun 01, 2013 at 08:21:39PM +0200, Oleg Nesterov wrote:
> fetch_bp_busy_slots() and toggle_bp_slot() use for_each_online_cpu(),
> this is obviously wrong wrt cpu_up() or cpu_down(), we can over/under
> account the per-cpu numbers.
>
> For example:
>
> # echo 0 >> /sys/devices/system/cpu/cpu1/online
> # perf record -e mem:0x10 -p 1 &
> # echo 1 >> /sys/devices/system/cpu/cpu1/online
> # perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
> # taskset -p 0x2 1
>
> triggers the same WARN_ONCE("Can't find any breakpoint slot") in
> arch_install_hw_breakpoint().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> Cc: <[email protected]>

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-15 12:59:33

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/3] hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths

On Sat, Jun 01, 2013 at 09:45:48PM +0200, Oleg Nesterov wrote:
> The enable/disable logic in toggle_bp_slot() is not symmetrical
> and imho very confusing. "old_count" in toggle_bp_task_slot() is
> actually new_count because this bp was already removed from the
> list.
>
> Change toggle_bp_slot() to always call list_add/list_del after
> toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
> this entry should be decremented, new_idx is +/-weight and we
> need to increment this element. The code/logic looks obvious.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Nice :-)

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-15 13:14:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths

On Sat, Jun 01, 2013 at 09:46:06PM +0200, Oleg Nesterov wrote:
> Change toggle_bp_slot() to make "weight" negative if !enable. This
> way we can always use "+ weight" without additional "if (enable)"
> check and toggle_bp_task_slot() no longer needs this arg.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-15 13:29:33

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 3/3] hw_breakpoint: Introduce cpumask_of_bp()

On Sat, Jun 01, 2013 at 09:46:26PM +0200, Oleg Nesterov wrote:
> Add the trivial helper which simply returns cpumask_of() or
> cpu_possible_mask depending on bp->cpu.
>
> Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
> for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
> code duplication.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-18 00:12:21

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/2] hw_breakpoint: Simplify *register_wide_hw_breakpoint()

On Sun, Jun 02, 2013 at 09:50:11PM +0200, Oleg Nesterov wrote:
> 1. register_wide_hw_breakpoint() can use unregister_ if failure,
> no need to duplicate the code.
>
> 2. "struct perf_event **pevent" adds the unnecesary lever of
> indirection and complication, use per_cpu(*cpu_events, cpu).
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-18 12:37:49

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"

On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> This patch simply moves all per-cpu variables into the new single
> per-cpu "struct bp_cpuinfo".
>
> To me this looks more logical and clean, but this can also simplify
> the further potential changes. In particular, I do not think this
> memory should be per-cpu, it is never used "locally". After this
> change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
>
> Signed-off-by: Oleg Nesterov <[email protected]>

I'm ok with the patch because it's indeed more logical and clean to pack the info
to a single struct.

But I'm not sure why you think using per-cpu is a problem. It's not only
deemed for optimized local uses, it's also convenient for allocations and
de-allocation, or static definitions. I'm not sure why bootmem would make
more sense.

Other than this in the changelog, the patch is nice, thanks!

Acked-by: Frederic Weisbecker <[email protected]>

2013-06-18 14:47:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"

On 06/18, Frederic Weisbecker wrote:
>
> On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> > This patch simply moves all per-cpu variables into the new single
> > per-cpu "struct bp_cpuinfo".
> >
> > To me this looks more logical and clean, but this can also simplify
> > the further potential changes. In particular, I do not think this
> > memory should be per-cpu, it is never used "locally". After this
> > change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> I'm ok with the patch because it's indeed more logical and clean to pack the info
> to a single struct.

Great,

> But I'm not sure why you think using per-cpu is a problem. It's not only
> deemed for optimized local uses,

But it is.

Simplest example,

for_each_possible_cpu(cpu)
total_count = per_cpu(per_cpu_count, cpu);

Every per_cpu() likely means the cache miss. Not to mention we need the
additional math to calculate the address of the local counter.

for_each_possible_cpu(cpu)
total_count = bootmem_or_kmalloc_array[cpu];

is much better in this respect.

And note also that per_cpu_count above can share the cacheline with
another "hot" per-cpu variable.

> it's also convenient for allocations and
> de-allocation, or static definitions.

Yes, this is advantage. But afaics the only one.

> I'm not sure why bootmem would make
> more sense.

Or kcalloc(nr_cpu_ids), I didn't really mean that alloc_bootmem() is
necessarily the best option.

> Other than this in the changelog, the patch is nice, thanks!
>
> Acked-by: Frederic Weisbecker <[email protected]>

Thanks ;)

Oleg.

2013-06-18 17:01:54

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"

On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote:
> On 06/18, Frederic Weisbecker wrote:
> >
> > On Sun, Jun 02, 2013 at 09:50:57PM +0200, Oleg Nesterov wrote:
> > > This patch simply moves all per-cpu variables into the new single
> > > per-cpu "struct bp_cpuinfo".
> > >
> > > To me this looks more logical and clean, but this can also simplify
> > > the further potential changes. In particular, I do not think this
> > > memory should be per-cpu, it is never used "locally". After this
> > > change it is trivial to turn it into, say, bootmem[nr_cpu_ids].
> > >
> > > Signed-off-by: Oleg Nesterov <[email protected]>
> >
> > I'm ok with the patch because it's indeed more logical and clean to pack the info
> > to a single struct.
>
> Great,
>
> > But I'm not sure why you think using per-cpu is a problem. It's not only
> > deemed for optimized local uses,
>
> But it is.
>
> Simplest example,
>
> for_each_possible_cpu(cpu)
> total_count = per_cpu(per_cpu_count, cpu);
>
> Every per_cpu() likely means the cache miss. Not to mention we need the
> additional math to calculate the address of the local counter.
>
> for_each_possible_cpu(cpu)
> total_count = bootmem_or_kmalloc_array[cpu];
>
> is much better in this respect.
>
> And note also that per_cpu_count above can share the cacheline with
> another "hot" per-cpu variable.

Ah I see, that's good to know.

But these variables are supposed to only be touched from slow path
(perf events syscall, ptrace breakpoints creation, etc...), right?
So this is probably not a problem?

>
> > it's also convenient for allocations and
> > de-allocation, or static definitions.
>
> Yes, this is advantage. But afaics the only one.
>
> > I'm not sure why bootmem would make
> > more sense.
>
> Or kcalloc(nr_cpu_ids), I didn't really mean that alloc_bootmem() is
> necessarily the best option.

Ok.

Well if there are any real performance issue I don't mind using arrays
of course.

>
> > Other than this in the changelog, the patch is nice, thanks!
> >
> > Acked-by: Frederic Weisbecker <[email protected]>
>
> Thanks ;)
>
> Oleg.
>

2013-06-19 15:58:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo"

On 06/18, Frederic Weisbecker wrote:
>
> On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote:
> >
> > Simplest example,
> >
> > for_each_possible_cpu(cpu)
> > total_count = per_cpu(per_cpu_count, cpu);
> >
> > Every per_cpu() likely means the cache miss. Not to mention we need the
> > additional math to calculate the address of the local counter.
> >
> > for_each_possible_cpu(cpu)
> > total_count = bootmem_or_kmalloc_array[cpu];
> >
> > is much better in this respect.
> >
> > And note also that per_cpu_count above can share the cacheline with
> > another "hot" per-cpu variable.
>
> Ah I see, that's good to know.
>
> But these variables are supposed to only be touched from slow path
> (perf events syscall, ptrace breakpoints creation, etc...), right?
> So this is probably not a problem?

Yes, sure. But please note that this can also penalize other CPUs.
For example, toggle_bp_slot() writes to per_cpu(nr_cpu_bp_pinned),
this invalidates the cachline which can contain another per-cpu
variable.

But let me clarify. I agree, this all is minor, I am not trying to
say this change can actually improve the performance.

The main point of this patch is to make the code look a bit better,
and you seem to agree. The changelog mentions s/percpu/array/ only
as a potential change which obviously needs more discussion, I didnt
mean that we should necessarily do this.

Although yes, personally I really dislike per-cpu in this case, but
of course this is subjective and I won't argue ;)

Oleg.