2013-10-04 21:39:55

by Andi Kleen

[permalink] [raw]
Subject: Allow disabling perf on x86

This patchkit allows to disable perf events on x86. Requires
various changes to avoid dependencies.

-Andi


2013-10-04 21:39:59

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 2/6] perf, x86: Make perf amd ibs code depend on PERF_EVENTS and SUP_AMD

From: Andi Kleen <[email protected]>

Now that oprofile doesn't need the AMD IBS perf code anymore, we can
make that code dependent on PERF_EVENTS and AMD CPU support,
instead of unconditionally compiling it in.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 47b56a7..a735914 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -37,13 +37,16 @@ endif
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_p6.o perf_event_knc.o perf_event_p4.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_uncore.o
+ifdef CONFIG_X86_LOCAL_APIC
+obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd_ibs.o
+endif
endif


obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/

-obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o perf_event_amd_ibs.o
+obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o

obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o

--
1.8.3.1

2013-10-04 21:39:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 3/6] x86, ptrace: Ifdef HW_BREAKPOINTS code in ptrace

From: Andi Kleen <[email protected]>

Add ifdefs on CONFIG_HW_BREAKPOINTS to the hardware
break point code in x86 ptrace.c. This prepares
this file for being able to disable HW_BREAKPOINTS.

The only debug register that needs to be handled
without break points is DR6, so do that in a simple
separate function without breakpoints.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/ptrace.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 7461f50..dda433f 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -561,6 +561,8 @@ static int genregs_set(struct task_struct *target,
return ret;
}

+#ifdef CONFIG_HW_BREAKPOINTS
+
static void ptrace_triggered(struct perf_event *bp,
struct perf_sample_data *data,
struct pt_regs *regs)
@@ -778,6 +780,34 @@ static int ptrace_set_debugreg(struct task_struct *tsk, int n,
return rc;
}

+#else
+
+/* Without breakpoints only handle DR6 */
+static unsigned long ptrace_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_struct *thread = &tsk->thread;
+
+ if (n == 6)
+ return thread->debugreg6;
+ return 0;
+}
+
+/* Without breakpoints only handle DR6 */
+static int ptrace_set_debugreg(struct task_struct *tsk, int n,
+ unsigned long val)
+{
+ struct thread_struct *thread = &tsk->thread;
+ int rc = -EIO;
+
+ if (n == 6) {
+ thread->debugreg6 = val;
+ rc = 0;
+ }
+ return rc;
+}
+
+#endif
+
/*
* These access the current or another (stopped) task's io permission
* bitmap for debugging or core dump.
--
1.8.3.1

2013-10-04 21:39:56

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

From: Andi Kleen <[email protected]>

As suggested by Ingo.

Make HW_BREAKPOINTS a config option. HW_BREAKPOINTS depends
on perf. This allows disabling PERF_EVENTS for systems that
don't need it (e.g. anything not used for development)

Disabling PERF_EVENTS saves over 700k of kernel text (~5% of my config)
and significant data/bss:

text data bss dec hex filename
13692640 1922416 1478656 17093712 104d450 obj/vmlinux
12980092 1787544 1470464 16238100 f7c614 obj-noperf/vmlinux

I didn't make it depend on CONFIG_EXPERT for now, as the system
should be very usable even without it. To actually disable perf
a couple of options depending on it need to be disabled, including
KVM and HW_BREAKPOINTS.

Longer term it would be probably nice to have modular perf.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/Kconfig | 11 ++++++++---
arch/x86/kernel/Makefile | 3 ++-
arch/x86/kvm/Kconfig | 1 +
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c9d2b81..a8418ed 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -67,9 +67,6 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KERNEL_LZO
select HAVE_KERNEL_LZ4
- select HAVE_HW_BREAKPOINT
- select HAVE_MIXED_BREAKPOINTS_REGS
- select PERF_EVENTS
select HAVE_PERF_EVENTS_NMI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
@@ -602,6 +599,14 @@ config SCHED_OMIT_FRAME_POINTER

If in doubt, say "Y".

+config HW_BREAKPOINTS
+ bool "Enable hardware breakpoints support"
+ select HAVE_HW_BREAKPOINT
+ select HAVE_MIXED_BREAKPOINTS_REGS
+ ---help---
+ Enable support for x86 hardware breakpoints for debuggers
+ and perf. This will implicitly enable perf-events.
+
menuconfig HYPERVISOR_GUEST
bool "Linux guest support"
---help---
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a5408b9..1b09567 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -31,7 +31,7 @@ obj-$(CONFIG_X86_64) += vsyscall_64.o
obj-$(CONFIG_X86_64) += vsyscall_emu_64.o
obj-y += bootflag.o e820.o
obj-y += pci-dma.o quirks.o topology.o kdebugfs.o
-obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
+obj-y += alternative.o i8253.o pci-nommu.o
obj-y += tsc.o io_delay.o rtc.o
obj-y += pci-iommu_table.o
obj-y += resource.o
@@ -73,6 +73,7 @@ obj-$(CONFIG_DOUBLEFAULT) += doublefault.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_VM86) += vm86_32.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+obj-$(CONFIG_HW_BREAKPOINTS) += hw_breakpoint.o

obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_APB_TIMER) += apb_timer.o
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a47a3e5..de4190a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
select TASKSTATS
select TASK_DELAY_ACCT
select PERF_EVENTS
+ select HW_BREAKPOINTS
select HAVE_KVM_MSI
select HAVE_KVM_CPU_RELAX_INTERCEPT
---help---
--
1.8.3.1

2013-10-04 21:40:53

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 5/6] x86, kgdb: Support compiling without hardware break points

From: Andi Kleen <[email protected]>

Add some ifdefs to support compiling without CONFIG_HW_BREAKPOINTS.
HW_BREAKPOINTS pulls in all of perf, and presumably there are
some use cases where people want to debug without perf.

The standard software breakpoints still work of course.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/kgdb.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index a735914..e125307 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -46,7 +46,7 @@ endif
obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/

-obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
+obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o

obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 836f832..f3efeb9 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -193,6 +193,8 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
gdb_regs[GDB_SP] = p->thread.sp;
}

+#ifdef CONFIG_HW_BREAKPOINTS
+
static struct hw_breakpoint {
unsigned enabled;
unsigned long addr;
@@ -419,6 +421,8 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
}
}

+#endif
+
#ifdef CONFIG_SMP
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
@@ -639,6 +643,8 @@ out:
return retval;
}

+#ifdef CONFIG_HW_BREAKPOINTS
+
static void kgdb_hw_overflow_handler(struct perf_event *event,
struct perf_sample_data *data, struct pt_regs *regs)
{
@@ -689,6 +695,8 @@ void kgdb_arch_late(void)
}
}

+#endif
+
/**
* kgdb_arch_exit - Perform any architecture specific uninitalization.
*
@@ -697,6 +705,7 @@ void kgdb_arch_late(void)
*/
void kgdb_arch_exit(void)
{
+#ifdef CONFIG_HW_BREAKPOINTS
int i;
for (i = 0; i < 4; i++) {
if (breakinfo[i].pev) {
@@ -704,6 +713,7 @@ void kgdb_arch_exit(void)
breakinfo[i].pev = NULL;
}
}
+#endif
unregister_nmi_handler(NMI_UNKNOWN, "kgdb");
unregister_nmi_handler(NMI_LOCAL, "kgdb");
unregister_die_notifier(&kgdb_notifier);
@@ -807,9 +817,11 @@ struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: */
.gdb_bpt_instr = { 0xcc },
.flags = KGDB_HW_BREAKPOINT,
+#ifdef CONFIG_HW_BREAKPOINTS
.set_hw_breakpoint = kgdb_set_hw_break,
.remove_hw_breakpoint = kgdb_remove_hw_break,
.disable_hw_break = kgdb_disable_hw_debug,
.remove_all_hw_break = kgdb_remove_all_hw_break,
.correct_hw_break = kgdb_correct_hw_break,
+#endif
};
--
1.8.3.1

2013-10-04 21:41:09

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 4/6] trace: Make UPROBES depend on PERF_EVENTS

From: Andi Kleen <[email protected]>

UPROBES need the perf events code, so add a dependency
from PERF_EVENTS to UPROBES.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
kernel/trace/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 015f85a..8639819 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -424,6 +424,7 @@ config UPROBE_EVENT
bool "Enable uprobes-based dynamic events"
depends on ARCH_SUPPORTS_UPROBES
depends on MMU
+ depends on PERF_EVENTS
select UPROBES
select PROBE_EVENTS
select TRACING
--
1.8.3.1

2013-10-04 21:41:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 1/6] perf, x86, amd: Move __get_ibs_caps into common amd CPU file

From: Andi Kleen <[email protected]>

__get_ibs_caps is used by both oprofile and perf events.
This causes oprofile to be dependent on perf.

__get_ibs_caps is actually only a few lines, so we can
easily move that to the standard AMD CPU initialization
file.

It will be always compiled in this way, but it's far
better than requiring perf always to be compiled in for this.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/perf_event.h | 2 ++
arch/x86/kernel/cpu/amd.c | 27 +++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_amd_ibs.c | 23 -----------------------
arch/x86/oprofile/op_model_amd.c | 4 +++-
4 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8249df4..866b0f3 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -198,6 +198,8 @@ struct x86_pmu_capability {
#define IBS_OP_MAX_CNT_EXT 0x007FFFFFULL /* not a register bit mask */
#define IBS_RIP_INVALID (1ULL<<38)

+extern u32 __get_ibs_caps(void);
+
#ifdef CONFIG_X86_LOCAL_APIC
extern u32 get_ibs_caps(void);
#else
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 903a264..887ad1e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -16,6 +16,8 @@
# include <asm/cacheflush.h>
#endif

+#include <asm/perf_event.h>
+
#include "cpu.h"

static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p)
@@ -909,3 +911,28 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)

return false;
}
+
+
+/* IBS - apic initialization, for perf and oprofile */
+
+u32 __get_ibs_caps(void)
+{
+ u32 caps;
+ unsigned int max_level;
+
+ if (!boot_cpu_has(X86_FEATURE_IBS))
+ return 0;
+
+ /* check IBS cpuid feature flags */
+ max_level = cpuid_eax(0x80000000);
+ if (max_level < IBS_CPUID_FEATURES)
+ return IBS_CAPS_DEFAULT;
+
+ caps = cpuid_eax(IBS_CPUID_FEATURES);
+ if (!(caps & IBS_CAPS_AVAIL))
+ /* cpuid flags not valid */
+ return IBS_CAPS_DEFAULT;
+
+ return caps;
+}
+EXPORT_SYMBOL(__get_ibs_caps);
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index e09f0bf..c4fed71 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -664,29 +664,6 @@ static __init int perf_event_ibs_init(void) { return 0; }

#endif

-/* IBS - apic initialization, for perf and oprofile */
-
-static __init u32 __get_ibs_caps(void)
-{
- u32 caps;
- unsigned int max_level;
-
- if (!boot_cpu_has(X86_FEATURE_IBS))
- return 0;
-
- /* check IBS cpuid feature flags */
- max_level = cpuid_eax(0x80000000);
- if (max_level < IBS_CPUID_FEATURES)
- return IBS_CAPS_DEFAULT;
-
- caps = cpuid_eax(IBS_CPUID_FEATURES);
- if (!(caps & IBS_CAPS_AVAIL))
- /* cpuid flags not valid */
- return IBS_CAPS_DEFAULT;
-
- return caps;
-}
-
u32 get_ibs_caps(void)
{
return ibs_caps;
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 50d86c0..964171f 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -26,6 +26,8 @@
#include <asm/processor.h>
#include <asm/cpufeature.h>

+#include <asm/perf_event.h> /* for __get_ibs_caps */
+
#include "op_x86_model.h"
#include "op_counter.h"

@@ -446,7 +448,7 @@ static void op_amd_stop(struct op_msrs const * const msrs)

static void init_ibs(void)
{
- ibs_caps = get_ibs_caps();
+ ibs_caps = __get_ibs_caps();

if (!ibs_caps)
return;
--
1.8.3.1

2013-10-04 22:27:45

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Fri, Oct 04, 2013 at 02:39:48PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> As suggested by Ingo.
>
> Make HW_BREAKPOINTS a config option. HW_BREAKPOINTS depends
> on perf. This allows disabling PERF_EVENTS for systems that
> don't need it (e.g. anything not used for development)
>
> Disabling PERF_EVENTS saves over 700k of kernel text (~5% of my config)
> and significant data/bss:
>
> text data bss dec hex filename
> 13692640 1922416 1478656 17093712 104d450 obj/vmlinux
> 12980092 1787544 1470464 16238100 f7c614 obj-noperf/vmlinux
>
> I didn't make it depend on CONFIG_EXPERT for now, as the system
> should be very usable even without it. To actually disable perf
> a couple of options depending on it need to be disabled, including
> KVM and HW_BREAKPOINTS.
>
> Longer term it would be probably nice to have modular perf.

I'm personally not against the idea of allowing hw breakpoint support
to be disabled but hpa did not like much the idea. I must say he had
valid concerns though, see the thread of this patchset: https://lkml.org/lkml/2011/7/14/163

IMHO, a better solution would be to make the hw breakpoint subsystem work
without using perf.

I mean, perf should use the hw breakpoint, but hw breakpoint shouldn't use perf,
(in fact that's what we agreed on with hpa IIRC).
But right now there is a kind of circular dependency between both that makes
perf always built-in in x86. I'm all for solving that by making hw breakpoint independant
from perf, but I think Ingo had mixed feelings about doing it that way. And he had valid
concerns on that as well.

So we are a bit stuck :)

Also some comments below:

>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/Kconfig | 11 ++++++++---
> arch/x86/kernel/Makefile | 3 ++-
> arch/x86/kvm/Kconfig | 1 +
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c9d2b81..a8418ed 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -67,9 +67,6 @@ config X86
> select HAVE_KERNEL_XZ
> select HAVE_KERNEL_LZO
> select HAVE_KERNEL_LZ4
> - select HAVE_HW_BREAKPOINT
> - select HAVE_MIXED_BREAKPOINTS_REGS

I really wish we stick the HAVE_*
Kconfig symbols to the simple role of expressing a
constant arch ability, not a variable user choice...

The user choice itself is CONFIG_HW_BREAKPOINT, the frontend,
which depends on the arch ability that is CONFIG_HAVE_HW_BREAKPOINT,
the backend.

ie, that's what I did in this patchset
https://lkml.org/lkml/2011/7/14/163

Thanks.


> - select PERF_EVENTS
> select HAVE_PERF_EVENTS_NMI
> select HAVE_PERF_REGS
> select HAVE_PERF_USER_STACK_DUMP
> @@ -602,6 +599,14 @@ config SCHED_OMIT_FRAME_POINTER
>
> If in doubt, say "Y".
>
> +config HW_BREAKPOINTS
> + bool "Enable hardware breakpoints support"
> + select HAVE_HW_BREAKPOINT
> + select HAVE_MIXED_BREAKPOINTS_REGS
> + ---help---
> + Enable support for x86 hardware breakpoints for debuggers
> + and perf. This will implicitly enable perf-events.
> +
> menuconfig HYPERVISOR_GUEST
> bool "Linux guest support"
> ---help---
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index a5408b9..1b09567 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -31,7 +31,7 @@ obj-$(CONFIG_X86_64) += vsyscall_64.o
> obj-$(CONFIG_X86_64) += vsyscall_emu_64.o
> obj-y += bootflag.o e820.o
> obj-y += pci-dma.o quirks.o topology.o kdebugfs.o
> -obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
> +obj-y += alternative.o i8253.o pci-nommu.o
> obj-y += tsc.o io_delay.o rtc.o
> obj-y += pci-iommu_table.o
> obj-y += resource.o
> @@ -73,6 +73,7 @@ obj-$(CONFIG_DOUBLEFAULT) += doublefault.o
> obj-$(CONFIG_KGDB) += kgdb.o
> obj-$(CONFIG_VM86) += vm86_32.o
> obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> +obj-$(CONFIG_HW_BREAKPOINTS) += hw_breakpoint.o
>
> obj-$(CONFIG_HPET_TIMER) += hpet.o
> obj-$(CONFIG_APB_TIMER) += apb_timer.o
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index a47a3e5..de4190a 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -36,6 +36,7 @@ config KVM
> select TASKSTATS
> select TASK_DELAY_ACCT
> select PERF_EVENTS
> + select HW_BREAKPOINTS
> select HAVE_KVM_MSI
> select HAVE_KVM_CPU_RELAX_INTERCEPT
> ---help---
> --
> 1.8.3.1
>

2013-10-05 00:52:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 4/6] trace: Make UPROBES depend on PERF_EVENTS

On Fri, 4 Oct 2013 14:39:46 -0700
Andi Kleen <[email protected]> wrote:

> From: Andi Kleen <[email protected]>
>
> UPROBES need the perf events code, so add a dependency
> from PERF_EVENTS to UPROBES.

Can you please be a bit more specific on what uprobes requires of perf?
That is, what part of the perf events code is needed, and why?

-- Steve

>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> kernel/trace/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 015f85a..8639819 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -424,6 +424,7 @@ config UPROBE_EVENT
> bool "Enable uprobes-based dynamic events"
> depends on ARCH_SUPPORTS_UPROBES
> depends on MMU
> + depends on PERF_EVENTS
> select UPROBES
> select PROBE_EVENTS
> select TRACING

2013-10-05 03:25:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 4/6] trace: Make UPROBES depend on PERF_EVENTS

On Fri, Oct 04, 2013 at 08:52:53PM -0400, Steven Rostedt wrote:
> On Fri, 4 Oct 2013 14:39:46 -0700
> Andi Kleen <[email protected]> wrote:
>
> > From: Andi Kleen <[email protected]>
> >
> > UPROBES need the perf events code, so add a dependency
> > from PERF_EVENTS to UPROBES.
>
> Can you please be a bit more specific on what uprobes requires of perf?
> That is, what part of the perf events code is needed, and why?

Without this dependency, when UPROBE_EVENT is enabled and PERF_EVENTS disabled
I first get

warning: (UPROBE_EVENT) selects UPROBES which has unmet direct dependencies (UPROBE_EVENT && PERF_EVENTS)
warning: (UPROBE_EVENT) selects UPROBES which has unmet direct dependencies (UPROBE_EVENT && PERF_EVENTS)

and then later lots of errors like

arch/x86/built-in.o: In function `do_notify_resume':
/home/ak/lsrc/git/linux-2.6/arch/x86/kernel/signal.c:743: undefined reference to `uprobe_notify_resume'
arch/x86/built-in.o: In function `arch_uprobe_exception_notify':
/home/ak/lsrc/git/linux-2.6/arch/x86/kernel/uprobes.c:637: undefined reference to `uprobe_pre_sstep_notifier'
/home/ak/lsrc/git/linux-2.6/arch/x86/kernel/uprobes.c:643: undefined reference to `uprobe_post_sstep_notifier'
kernel/built-in.o: In function `mmput':
/home/ak/lsrc/git/linux-2.6/kernel/fork.c:608: undefined reference to `uprobe_clear_state'

I'm not fully sure why UPROBES has the && PERF_EVENTS; I didn't add that.

-Andi

2013-10-05 07:08:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Andi Kleen <[email protected]> wrote:

> From: Andi Kleen <[email protected]>
>
> As suggested by Ingo.

No, you haven't read my suggestion carefully enough so NAK.

Thanks,

Ingo

2013-10-05 17:05:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Sat, Oct 05, 2013 at 09:08:06AM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > From: Andi Kleen <[email protected]>
> >
> > As suggested by Ingo.
>
> No, you haven't read my suggestion carefully enough so NAK.

Ok I trust you will do a better solution then to save the 700+k text.
Ball is in your court.

-Andi

2013-10-06 16:49:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Andi Kleen <[email protected]> wrote:

> From: Andi Kleen <[email protected]>
>
> As suggested by Ingo.
>
> Make HW_BREAKPOINTS a config option. HW_BREAKPOINTS depends
> on perf. This allows disabling PERF_EVENTS for systems that
> don't need it (e.g. anything not used for development)
>
> Disabling PERF_EVENTS saves over 700k of kernel text (~5% of my config)
> and significant data/bss:
>
> text data bss dec hex filename
> 13692640 1922416 1478656 17093712 104d450 obj/vmlinux
> 12980092 1787544 1470464 16238100 f7c614 obj-noperf/vmlinux

Btw., could you send me your .config so I can investigate this?

Thanks,

Ingo

2013-10-08 06:59:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Ingo Molnar <[email protected]> wrote:

> Andi Kleen <[email protected]> wrote:
>
> > text data bss dec hex filename
> > 13692640 1922416 1478656 17093712 104d450 obj/vmlinux
> > 12980092 1787544 1470464 16238100 f7c614 obj-noperf/vmlinux

[...]

> > Ok I trust you will do a better solution then to save the 700+k text.
> > Ball is in your court.
>
> Btw., could you send me your .config so I can investigate this?

Ok, you sent me your .config off-list - I won't re-post the full config,
but it can be reproduced by others fairly accurately by appending the
attached list of config symbols to a 'make defconfig' x86-64 .config and
running 'make oldconfig'.

So I test-built a config close to your config with both tracing and perf
on and off (note, I had OPROFILE and KVM in a module), and got the
following kernel sizes:

text data bss dec hex filename
====================================================================================
12081246 1279656 1003520 14364422 db2f06 vmlinux.perf-OFF.tracing-OFF
12979238 1785192 1150976 15915406 f2d98e vmlinux.perf-OFF.tracing-ON
12232758 1325424 1003520 14561702 de31a6 vmlinux.perf-ON.tracing-OFF
13223817 1877872 1150976 16252665 f7fef9 vmlinux.perf-ON.tracing-ON

Firstly, your 'perf increases vmlinux size by 700k' result does not seem
to reproduce, at all. [*]

[ Are you sure you made the measurement correctly? Please send me _both_
configs you used for the test, one with perf enabled, and the other one
with perf disabled, so I can double check it. I suppose you tested a
recent Linus kernel with your hardware-breakpoint patches applied. ]

Secondly and more importantly, visualized with relative sizes, in a
feature matrix, makes it clearer what's going on with vmlinux .text:

perf-OFF perf-ON

ftrace-OFF 0 +151k

ftrace-ON +897k +1142k

So basically ftrace is causing a big chunk of the instrumentation size
increase. With tons of tracers and lots of kernel subsystems built into
your .config that's a lot of nice instrumentation functionality and it's
thus also a natural end result IMO.

The base cost of perf is +151k - not small but a far cry from your claimed
+700k ...

In any case, on one hand I obviously welcome genuine kernel size
reductions, on the other hand what you did in these patches wasn't really
size reduction usable to too many people: all distros enable perf and
tracing and very few kernel developers are left who are still using
oprofile. You fought the symptom, not the reason.

So my NAK against your series stands. To sum up the technical reasons:

- you blaming +700k on perf alone is wrong, or at most a red herring

- your patches might break apps/ABI

- your patch-set unnecessarily complicates things, making the kernel
less maintainable

- details of your patch-set are broken as well, see the discussions

( There might be more reasons, this is just off the top of my head.
Any of these reasons would be enough to not apply your patches.)

You might want to concentrate your efforts from fighting perf
functionality towards decreasing per tracepoint overhead instead,
without hurting kernel functionality and maintainability.

Thanks,

Ingo

=================>
CONFIG_LOCALVERSION_AUTO=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_EXPERT=y
CONFIG_KALLSYMS_ALL=y
CONFIG_COMPAT_BRK=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_JUMP_LABEL=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_X86_MCE_INJECT=y
CONFIG_NUMA_EMU=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_MMU_NOTIFIER=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
CONFIG_EFI_STUB=y
CONFIG_HZ_250=y
CONFIG_COMPAT_VDSO=y
CONFIG_PM_RUNTIME=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_X86_POWERNOW_K8=y
CONFIG_INTEL_IDLE=y
CONFIG_PCIEAER_INJECT=y
CONFIG_PCIE_PME=y
CONFIG_PCI_IOAPIC=y
CONFIG_IA32_AOUT=y
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_INET_UDP_DIAG=y
CONFIG_BPF_JIT=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211_WEXT=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_LOWLEVEL=y
CONFIG_MEGARAID_SAS=y
CONFIG_SATA_NV=y
CONFIG_SATA_SIL=y
CONFIG_SATA_SVW=y
CONFIG_PATA_ATIIXP=y
CONFIG_PATA_PDC2027X=y
CONFIG_PATA_MPIIX=y
CONFIG_PATA_ACPI=y
CONFIG_ATA_GENERIC=y
CONFIG_DM_CRYPT=y
CONFIG_FIREWIRE=y
CONFIG_FIREWIRE_OHCI=y
CONFIG_TUN=y
CONFIG_MDIO=y
CONFIG_VORTEX=y
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_BNX2=y
CONFIG_TULIP=y
CONFIG_E1000E=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
CONFIG_IXGB=y
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
CONFIG_R8169=y
CONFIG_BROADCOM_PHY=y
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
CONFIG_USB_NET_CDC_NCM=y
CONFIG_USB_NET_MCS7830=y
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_LEGACY_PTYS=y
CONFIG_SERIAL_KGDB_NMI=y
CONFIG_CONSOLE_POLL=y
CONFIG_HW_RANDOM_INTEL=y
CONFIG_HW_RANDOM_AMD=y
CONFIG_RAW_DRIVER=y
CONFIG_HPET_MMAP=y
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y
CONFIG_DRM_TTM=y
CONFIG_DRM_RADEON=y
CONFIG_FB_DDC=y
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_RADEON=y
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
CONFIG_SOUND_PRIME=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HID_DRAGONRISE=y
CONFIG_HID_KYE=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_ORTEK=y
CONFIG_HID_GREENASIA=y
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_HID_THRUSTMASTER=y
CONFIG_HID_ZEROPLUS=y
CONFIG_USB_STORAGE_REALTEK=y
CONFIG_REALTEK_AUTOPM=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
CONFIG_USB_STORAGE_ONETOUCH=y
CONFIG_USB_STORAGE_KARMA=y
CONFIG_USB_STORAGE_CYPRESS_ATACB=y
CONFIG_USB_STORAGE_ENE_UB6250=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_REISERFS_FS=y
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_FTRACE=y
CONFIG_PSTORE_RAM=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_CIFS=y
CONFIG_CIFS_DEBUG=y
CONFIG_NLS_ISO8859_15=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_READABLE_ASM=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_SCHED_DEBUG=y
CONFIG_LATENCYTOP=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FIREWIRE_OHCI_REMOTE_DMA=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_KDB=y
CONFIG_KDB_KEYBOARD=y
CONFIG_X86_PTDUMP=y
CONFIG_IO_DELAY_0XED=y
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_AES_X86_64=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=y
CONFIG_KVM_INTEL=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_REED_SOLOMON=y
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y

2013-10-08 15:35:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:
>
> Secondly and more importantly, visualized with relative sizes, in a
> feature matrix, makes it clearer what's going on with vmlinux .text:
>
> perf-OFF perf-ON
>
> ftrace-OFF 0 +151k
>
> ftrace-ON +897k +1142k
>
> So basically ftrace is causing a big chunk of the instrumentation size
> increase. With tons of tracers and lots of kernel subsystems built into
> your .config that's a lot of nice instrumentation functionality and it's
> thus also a natural end result IMO.

I'm curious what you turned on for between the "ftrace-OFF" and "ftrace-ON"?

Technically, ftrace is just the infrastructure for the /debug/tracing directory
and the function tracer. And there's lots of options to turn on or off:

CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y

The one thing that you didn't post was CONFIG_EVENT_TRACING. This is where
a lot of bloat can come from, as this enables the infrastructure for
tracepoints. This is also needed by perf to trace most sw events.

The trace event infrastructure that both perf and ftrace uses (as well
as other tracers and tools) is well known to cause bloat in the kernel.
There's patches to help make it more bearable, but there's more work to
be done.

Currently, ftrace selects events, but that was only because you suggested
to do so as events are so valuable in tracing.

-- Steve

2013-10-08 15:40:30

by Vince Weaver

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, 8 Oct 2013, Ingo Molnar wrote:

> The base cost of perf is +151k - not small but a far cry from your claimed
> +700k ...

Things might get that bad if we enable full event lists in the kernel for
all of x86, as Power is starting to do.

The patch to add all Power7 events cfe0d8ba14a1d98245b371e486c68f37eba1ca52
increased kernel size by like 50k. Imagine 10 different x86 event tables
times 50k each and you're talking a big chunk of memory.

Vince

2013-10-08 15:52:55

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:

> You might want to concentrate your efforts from fighting perf
> functionality towards decreasing per tracepoint overhead instead,
> without hurting kernel functionality and maintainability.

Making it easier to disable perf entirely would be desirable for one use case.
I can't do a trinity run for more than a few hours for the last few months
without hitting perf/ftrace bugs that no-one seems to be able to get their
heads around.

It would be useful if I could in the meantime find bugs in other parts of the
kernel by disabling all the affected code.

Dave

2013-10-08 16:22:35

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On 10/8/13 9:51 AM, Dave Jones wrote:
> On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:
>
> > You might want to concentrate your efforts from fighting perf
> > functionality towards decreasing per tracepoint overhead instead,
> > without hurting kernel functionality and maintainability.
>
> Making it easier to disable perf entirely would be desirable for one use case.
> I can't do a trinity run for more than a few hours for the last few months
> without hitting perf/ftrace bugs that no-one seems to be able to get their
> heads around.

Looks like trinity has an exclude syscall option. Seems like that option
can be used to avoid perf_event_open (haven't tried though).

David

2013-10-08 16:26:01

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 10:22:23AM -0600, David Ahern wrote:
> On 10/8/13 9:51 AM, Dave Jones wrote:
> > On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:
> >
> > > You might want to concentrate your efforts from fighting perf
> > > functionality towards decreasing per tracepoint overhead instead,
> > > without hurting kernel functionality and maintainability.
> >
> > Making it easier to disable perf entirely would be desirable for one use case.
> > I can't do a trinity run for more than a few hours for the last few months
> > without hitting perf/ftrace bugs that no-one seems to be able to get their
> > heads around.
>
> Looks like trinity has an exclude syscall option. Seems like that option
> can be used to avoid perf_event_open (haven't tried though).

You'd think that, but for whatever reason, ftrace/perf oopses still happen.

Dave

2013-10-08 19:36:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Dave Jones <[email protected]> wrote:

> On Tue, Oct 08, 2013 at 10:22:23AM -0600, David Ahern wrote:
> > On 10/8/13 9:51 AM, Dave Jones wrote:
> > > On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:
> > >
> > > > You might want to concentrate your efforts from fighting perf
> > > > functionality towards decreasing per tracepoint overhead instead,
> > > > without hurting kernel functionality and maintainability.
> > >
> > > Making it easier to disable perf entirely would be desirable for one use case.
> > > I can't do a trinity run for more than a few hours for the last few months
> > > without hitting perf/ftrace bugs that no-one seems to be able to get their
> > > heads around.
> >
> > Looks like trinity has an exclude syscall option. Seems like that option
> > can be used to avoid perf_event_open (haven't tried though).
>
> You'd think that, but for whatever reason, ftrace/perf oopses still happen.

Peter is working on it - but it's slow. Could you try to disable
sys_perf_open, ptrace and the NMI watchdog? No perf functionality should
be used in that case. If you disable CONFIG_FTRACE then no ftrace
functionality should be used.

Thanks,

Ingo

2013-10-08 19:39:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Steven Rostedt <[email protected]> wrote:

> On Tue, Oct 08, 2013 at 08:59:38AM +0200, Ingo Molnar wrote:
> >
> > Secondly and more importantly, visualized with relative sizes, in a
> > feature matrix, makes it clearer what's going on with vmlinux .text:
> >
> > perf-OFF perf-ON
> >
> > ftrace-OFF 0 +151k
> >
> > ftrace-ON +897k +1142k
> >
> > So basically ftrace is causing a big chunk of the instrumentation size
> > increase. With tons of tracers and lots of kernel subsystems built into
> > your .config that's a lot of nice instrumentation functionality and it's
> > thus also a natural end result IMO.
>
> I'm curious what you turned on for between the "ftrace-OFF" and "ftrace-ON"?

ON: all the tracing bits in Andi's original config - which had most
tracing options enabled AFAICS.

OFF: all of it turned off.

I don't think there's anything inherently wrong with 3000+ tracepoints
built into the kernel using up a fair amount of cold-path space. We should
try to compress it all, but it's a hell of a lot real functionality.

> The one thing that you didn't post was CONFIG_EVENT_TRACING. This is
> where a lot of bloat can come from, as this enables the infrastructure
> for tracepoints. This is also needed by perf to trace most sw events.

Correct.

EVENT_TRACING was on - Andi did not post his .config publicly, but you can
get something fairly close to his config by doing this:

make defconfig
cat config-options.txt >> .config
make oldconfig

where 'config-options.txt' is the short list of options I appended to my
mail.

> The trace event infrastructure that both perf and ftrace uses (as well
> as other tracers and tools) is well known to cause bloat in the kernel.
> There's patches to help make it more bearable, but there's more work to
> be done.

It's 3000+ very nice tracepoints - which is a huge multiplier, so we
should not expect any miracles - it's a whole lot of good functionality.
Subsystems can decide whether they want to expose their tracepoints or
not.

Thanks,

Ingo

2013-10-08 19:56:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Sat, Oct 05, 2013 at 07:05:52PM +0200, Andi Kleen wrote:
> On Sat, Oct 05, 2013 at 09:08:06AM +0200, Ingo Molnar wrote:
> >
> > * Andi Kleen <[email protected]> wrote:
> >
> > > From: Andi Kleen <[email protected]>
> > >
> > > As suggested by Ingo.
> >
> > No, you haven't read my suggestion carefully enough so NAK.
>
> Ok I trust you will do a better solution then to save the 700+k text.
> Ball is in your court.

Ok I have two choices in mind.

1) make breakpoints independant from perf. The drawback is that we must then
add seperate hooks on context switch for ptrace breakpoints. OTOH we get
rid of the perf -> breakpoint -> perf circular dependency, which is the very
controversial thing.


2) Build breakpoints conditionally (depend on CONFIG_EXPERT). hpa didn't like
much that solution because breakpoints is an elementary debugging feature that
is too useful to be unbuild.

Both ways can allow us to disable perf on x86.

We just need to agree on a solution and I'll happily work on it.

Ingo, others?

2013-10-08 20:05:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 09:55:59PM +0200, Frederic Weisbecker wrote:
> 1) make breakpoints independant from perf. The drawback is that we must then
> add seperate hooks on context switch for ptrace breakpoints. OTOH we get
> rid of the perf -> breakpoint -> perf circular dependency, which is the very
> controversial thing.

we used to have this in __switch_to_xtra(), right?

2013-10-08 20:20:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 09:36:13PM +0200, Ingo Molnar wrote:
>
> Peter is working on it - but it's slow. Could you try to disable
> sys_perf_open, ptrace and the NMI watchdog? No perf functionality should
> be used in that case. If you disable CONFIG_FTRACE then no ftrace
> functionality should be used.
>

Dave,

The ftrace oops you've been seeing is from the function tracer, you don't
need to disable all of ftrace, just CONFIG_FUNCTION_TRACER (and anything
that selects it: CONFIG_STACK_TRACER).

-- Steve

2013-10-08 20:35:07

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 10:05:21PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 08, 2013 at 09:55:59PM +0200, Frederic Weisbecker wrote:
> > 1) make breakpoints independant from perf. The drawback is that we must then
> > add seperate hooks on context switch for ptrace breakpoints. OTOH we get
> > rid of the perf -> breakpoint -> perf circular dependency, which is the very
> > controversial thing.
>
> we used to have this in __switch_to_xtra(), right?

Right.

2013-10-08 20:35:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

On Tue, Oct 08, 2013 at 10:05:21PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 08, 2013 at 09:55:59PM +0200, Frederic Weisbecker wrote:
> > 1) make breakpoints independant from perf. The drawback is that we must then
> > add seperate hooks on context switch for ptrace breakpoints. OTOH we get
> > rid of the perf -> breakpoint -> perf circular dependency, which is the very
> > controversial thing.
>
> we used to have this in __switch_to_xtra(), right?

Right.

And then it got replaced by perf events scheduling.

2013-10-09 03:21:46

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

> You'd think that, but for whatever reason, ftrace/perf oopses still happen.

Hiding bugs seems like a poor use of the CONFIG option. It would
be better to figure out a way to catch them earlier. Perhaps
trinity needs to run more often? any chance of a fengguang style nightly
service for mainline?

-Andi

--
[email protected] -- Speaking for myself only.

2013-10-09 03:24:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

> So I test-built a config close to your config with both tracing and perf
> on and off (note, I had OPROFILE and KVM in a module), and got the
> following kernel sizes:

Yes I mistakenly included KVM (I think that was the difference)
Without KVM it's ~272k text, 96k BSS data delta.

Still big, but not quite as bad as I made it out to be.

Thanks for catching.

-Andi

2013-10-09 03:39:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

Some more comments.

> - your patches might break apps/ABI

Can you please explain that a bit more. We have a lot of CONFIG options
that disable syscalls, /sys, lots of stuff. Whoever uses them needs
to know what they are doing. I thought it was pretty
much consensus that Linux is supposed to be a very configurable OS,
which people can taylor from small embedded to large kitchen sink
included using CONFIG_*.

Are you saying that Linux should not be configurable small to big? (I would
find that hard to believe). If that's really your standpoint I would
like to see some confirmation on this, as it would seem a big
departure from traditional practice.

Or is the concern that you want it default y or EXPERT, so what
the defaults are? That sounds reasonable.

Or should it be more modular like Peter pointed out (that
would seem like a good solution for generic distros, but not so
good for deeply embedded like running on Quark)

BTW afaik pretty much every other architecture still allows to disable
it, just x86 has this dependency loop problem.

>
> - your patch-set unnecessarily complicates things, making the kernel
> less maintainable

I actually simplified some things, like unnecessary
dependencies between perf and profile.

These should be applied in any case as they are independent.
I can repost them.

Given some of the ifdefs/configs were not nice, perhaps there's a better
solution for this from Frederic.

-Andi

2013-10-09 06:02:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS


* Frederic Weisbecker <[email protected]> wrote:

> On Tue, Oct 08, 2013 at 10:05:21PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 08, 2013 at 09:55:59PM +0200, Frederic Weisbecker wrote:
> > > 1) make breakpoints independant from perf. The drawback is that we must then
> > > add seperate hooks on context switch for ptrace breakpoints. OTOH we get
> > > rid of the perf -> breakpoint -> perf circular dependency, which is the very
> > > controversial thing.
> >
> > we used to have this in __switch_to_xtra(), right?
>
> Right.
>
> And then it got replaced by perf events scheduling.

Which we have anyway - so this moved complexity out the contex switch
path, for the common case where no hardware-breakpoints are used.

Thanks,

Ingo