2012-02-08 12:47:35

by Dmitry Antipov

[permalink] [raw]
Subject: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

Generalize CONFIG_IRQ_TIME_ACCOUNTING between X86 and
ARM, move "noirqtime=" option to common debugging code.
For a bit of backward compatibility, "tsc=noirqtime"
is preserved, but issues a warning.

Suggested-by: Venki Pallipadi <[email protected]>
Signed-off-by: Dmitry Antipov <[email protected]>
---
arch/arm/kernel/sched_clock.c | 3 +++
arch/x86/Kconfig | 11 -----------
arch/x86/kernel/tsc.c | 7 ++++---
include/linux/sched.h | 2 ++
lib/Kconfig.debug | 12 ++++++++++++
lib/Makefile | 2 ++
lib/irqtime.c | 12 ++++++++++++
7 files changed, 35 insertions(+), 14 deletions(-)
create mode 100644 lib/irqtime.c

diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index 5416c7c..56d2a9d 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
if (read_sched_clock == jiffy_sched_clock_read)
setup_sched_clock(jiffy_sched_clock_read, 32, HZ);

+ if (!no_sched_irq_time)
+ enable_sched_clock_irqtime();
+
sched_clock_poll(sched_clock_timer.data);
}
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5bed94e..4759676 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -805,17 +805,6 @@ config SCHED_MC
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.

-config IRQ_TIME_ACCOUNTING
- bool "Fine granularity task level IRQ time accounting"
- default n
- ---help---
- Select this option to enable fine granularity task irq time
- accounting. This is done by reading a timestamp on each
- transitions between softirq and hardirq state, so there can be a
- small performance impact.
-
- If in doubt, say N here.
-
source "kernel/Kconfig.preempt"

config X86_UP_APIC
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a62c201..70510a3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -103,14 +103,15 @@ int __init notsc_setup(char *str)

__setup("notsc", notsc_setup);

-static int no_sched_irq_time;
-
static int __init tsc_setup(char *str)
{
if (!strcmp(str, "reliable"))
tsc_clocksource_reliable = 1;
- if (!strncmp(str, "noirqtime", 9))
+ if (!strncmp(str, "noirqtime", 9)) {
+ printk(KERN_WARNING "tsc: tsc=noirqtime is "
+ "obsolete, use noirqtime instead\n");
no_sched_irq_time = 1;
+ }
return 1;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7d379a6..b3575b5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1966,9 +1966,11 @@ extern void sched_clock_idle_wakeup_event(u64 delta_ns);
* The reason for this explicit opt-in is not to have perf penalty with
* slow sched_clocks.
*/
+extern int no_sched_irq_time;
extern void enable_sched_clock_irqtime(void);
extern void disable_sched_clock_irqtime(void);
#else
+#define no_sched_irq_time 1
static inline void enable_sched_clock_irqtime(void) {}
static inline void disable_sched_clock_irqtime(void) {}
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8745ac7..48be210 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -299,6 +299,18 @@ config SCHEDSTATS
application, you can say N to avoid the very slight overhead
this adds.

+config IRQ_TIME_ACCOUNTING
+ bool "Fine granularity task level IRQ time accounting"
+ depends on (X86 || (ARM && HAVE_SCHED_CLOCK))
+ default n
+ ---help---
+ Select this option to enable fine granularity task irq time
+ accounting. This is done by reading a timestamp on each
+ transitions between softirq and hardirq state, so there can be a
+ small performance impact.
+
+ If in doubt, say N here.
+
config TIMER_STATS
bool "Collect kernel timers statistics"
depends on DEBUG_KERNEL && PROC_FS
diff --git a/lib/Makefile b/lib/Makefile
index 18515f0..44d67d4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,8 @@ obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
obj-$(CONFIG_DEBUG_LIST) += list_debug.o
obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o

+obj-$(CONFIG_IRQ_TIME_ACCOUNTING) += irqtime.o
+
ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
lib-y += dec_and_lock.o
endif
diff --git a/lib/irqtime.c b/lib/irqtime.c
new file mode 100644
index 0000000..10d440d
--- /dev/null
+++ b/lib/irqtime.c
@@ -0,0 +1,12 @@
+#include <linux/kernel.h>
+#include <linux/sched.h>
+
+int no_sched_irq_time;
+
+static int __init irqtime_setup(char *str)
+{
+ no_sched_irq_time = 1;
+ return 1;
+}
+
+__setup("noirqtime", irqtime_setup);
--
1.7.7.6


2012-02-08 13:19:04

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

On Wed, Feb 08, 2012 at 04:48:34AM -0800, Dmitry Antipov wrote:
> Generalize CONFIG_IRQ_TIME_ACCOUNTING between X86 and
> ARM, move "noirqtime=" option to common debugging code.
> For a bit of backward compatibility, "tsc=noirqtime"
> is preserved, but issues a warning.
>
> Suggested-by: Venki Pallipadi <[email protected]>
> Signed-off-by: Dmitry Antipov <[email protected]>
> ---
> arch/arm/kernel/sched_clock.c | 3 +++
> arch/x86/Kconfig | 11 -----------
> arch/x86/kernel/tsc.c | 7 ++++---
> include/linux/sched.h | 2 ++
> lib/Kconfig.debug | 12 ++++++++++++
> lib/Makefile | 2 ++
> lib/irqtime.c | 12 ++++++++++++
> 7 files changed, 35 insertions(+), 14 deletions(-)
> create mode 100644 lib/irqtime.c
>
> diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
> index 5416c7c..56d2a9d 100644
> --- a/arch/arm/kernel/sched_clock.c
> +++ b/arch/arm/kernel/sched_clock.c
> @@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
> if (read_sched_clock == jiffy_sched_clock_read)
> setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
>
> + if (!no_sched_irq_time)
> + enable_sched_clock_irqtime();

Why are you placing this here? sched_clock is available from the point
that it's registered, which should be before the first sched_clock()
call.

> +config IRQ_TIME_ACCOUNTING
> + bool "Fine granularity task level IRQ time accounting"
> + depends on (X86 || (ARM && HAVE_SCHED_CLOCK))

Even though it's not bad here, please get out of the habbit of throwing
unnecessary parens into the mix. It can make stuff more difficult to
read and therefore confirm correctness. (I've spent many a time
rewriting if() statements because of paren overuse.)

This could have been written:

depends on X86 || (ARM && HAVE_SCHED_CLOCK)

However, ARM will always have HAVE_SCHED_CLOCK after the next merge window,
so this can become a much simpler:

depends on X86 || ARM

Apart from these two points, the rest of the patch looks fine to me but
the ultimate decision about its acceptability is up to other people.

2012-02-08 15:14:21

by Dmitry Antipov

[permalink] [raw]
Subject: Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

On 02/08/2012 05:18 AM, Russell King - ARM Linux wrote:

>> diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
>> index 5416c7c..56d2a9d 100644
>> --- a/arch/arm/kernel/sched_clock.c
>> +++ b/arch/arm/kernel/sched_clock.c
>> @@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
>> if (read_sched_clock == jiffy_sched_clock_read)
>> setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
>>
>> + if (!no_sched_irq_time)
>> + enable_sched_clock_irqtime();
>
> Why are you placing this here? sched_clock is available from the point
> that it's registered, which should be before the first sched_clock()
> call.

This is just because I'm thinking about:

if (read_sched_clock == jiffy_sched_clock_read)
setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
else if (!no_sched_irq_time)
enable_sched_clock_irqtime();

I suppose that "fine granularity task irq time accounting"
makes no sense if sched_clock() granularity is poor.

> This could have been written:
>
> depends on X86 || (ARM&& HAVE_SCHED_CLOCK)
>
> However, ARM will always have HAVE_SCHED_CLOCK after the next merge window,
> so this can become a much simpler:
>
> depends on X86 || ARM

OK.

Dmitry

2012-02-08 15:25:16

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

On Wed, Feb 08, 2012 at 07:15:26AM -0800, Dmitry Antipov wrote:
> On 02/08/2012 05:18 AM, Russell King - ARM Linux wrote:
>> Why are you placing this here? sched_clock is available from the point
>> that it's registered, which should be before the first sched_clock()
>> call.
>
> This is just because I'm thinking about:
>
> if (read_sched_clock == jiffy_sched_clock_read)
> setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
> else if (!no_sched_irq_time)
> enable_sched_clock_irqtime();
>
> I suppose that "fine granularity task irq time accounting"
> makes no sense if sched_clock() granularity is poor.

Let me put it a different way - is there a reason not to do this in
setup_sched_clock() so that it becomes available as soon as sched_clock()
has been initialized by a platform?

2012-02-09 02:48:52

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

Cc'ing PeterZ.

On Wed, Feb 08, 2012 at 04:48:34AM -0800, Dmitry Antipov wrote:
> Generalize CONFIG_IRQ_TIME_ACCOUNTING between X86 and
> ARM, move "noirqtime=" option to common debugging code.
> For a bit of backward compatibility, "tsc=noirqtime"
> is preserved, but issues a warning.
>
> Suggested-by: Venki Pallipadi <[email protected]>
> Signed-off-by: Dmitry Antipov <[email protected]>
> ---
> lib/Kconfig.debug | 12 ++++++++++++
> lib/Makefile | 2 ++
> lib/irqtime.c | 12 ++++++++++++

Do we need a single file for this?
You know this feature is sched related, why not just move it
to kernel/sched/core.c?

Thanks,
Yong

> 7 files changed, 35 insertions(+), 14 deletions(-)
> create mode 100644 lib/irqtime.c
>
> diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
> index 5416c7c..56d2a9d 100644
> --- a/arch/arm/kernel/sched_clock.c
> +++ b/arch/arm/kernel/sched_clock.c
> @@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
> if (read_sched_clock == jiffy_sched_clock_read)
> setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
>
> + if (!no_sched_irq_time)
> + enable_sched_clock_irqtime();
> +
> sched_clock_poll(sched_clock_timer.data);
> }
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5bed94e..4759676 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -805,17 +805,6 @@ config SCHED_MC
> making when dealing with multi-core CPU chips at a cost of slightly
> increased overhead in some places. If unsure say N here.
>
> -config IRQ_TIME_ACCOUNTING
> - bool "Fine granularity task level IRQ time accounting"
> - default n
> - ---help---
> - Select this option to enable fine granularity task irq time
> - accounting. This is done by reading a timestamp on each
> - transitions between softirq and hardirq state, so there can be a
> - small performance impact.
> -
> - If in doubt, say N here.
> -
> source "kernel/Kconfig.preempt"
>
> config X86_UP_APIC
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index a62c201..70510a3 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -103,14 +103,15 @@ int __init notsc_setup(char *str)
>
> __setup("notsc", notsc_setup);
>
> -static int no_sched_irq_time;
> -
> static int __init tsc_setup(char *str)
> {
> if (!strcmp(str, "reliable"))
> tsc_clocksource_reliable = 1;
> - if (!strncmp(str, "noirqtime", 9))
> + if (!strncmp(str, "noirqtime", 9)) {
> + printk(KERN_WARNING "tsc: tsc=noirqtime is "
> + "obsolete, use noirqtime instead\n");
> no_sched_irq_time = 1;
> + }
> return 1;
> }
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7d379a6..b3575b5 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1966,9 +1966,11 @@ extern void sched_clock_idle_wakeup_event(u64 delta_ns);
> * The reason for this explicit opt-in is not to have perf penalty with
> * slow sched_clocks.
> */
> +extern int no_sched_irq_time;
> extern void enable_sched_clock_irqtime(void);
> extern void disable_sched_clock_irqtime(void);
> #else
> +#define no_sched_irq_time 1
> static inline void enable_sched_clock_irqtime(void) {}
> static inline void disable_sched_clock_irqtime(void) {}
> #endif
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 8745ac7..48be210 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -299,6 +299,18 @@ config SCHEDSTATS
> application, you can say N to avoid the very slight overhead
> this adds.
>
> +config IRQ_TIME_ACCOUNTING
> + bool "Fine granularity task level IRQ time accounting"
> + depends on (X86 || (ARM && HAVE_SCHED_CLOCK))
> + default n
> + ---help---
> + Select this option to enable fine granularity task irq time
> + accounting. This is done by reading a timestamp on each
> + transitions between softirq and hardirq state, so there can be a
> + small performance impact.
> +
> + If in doubt, say N here.
> +
> config TIMER_STATS
> bool "Collect kernel timers statistics"
> depends on DEBUG_KERNEL && PROC_FS
> diff --git a/lib/Makefile b/lib/Makefile
> index 18515f0..44d67d4 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -49,6 +49,8 @@ obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
> obj-$(CONFIG_DEBUG_LIST) += list_debug.o
> obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
>
> +obj-$(CONFIG_IRQ_TIME_ACCOUNTING) += irqtime.o
> +
> ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
> lib-y += dec_and_lock.o
> endif
> diff --git a/lib/irqtime.c b/lib/irqtime.c
> new file mode 100644
> index 0000000..10d440d
> --- /dev/null
> +++ b/lib/irqtime.c
> @@ -0,0 +1,12 @@
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +
> +int no_sched_irq_time;
> +
> +static int __init irqtime_setup(char *str)
> +{
> + no_sched_irq_time = 1;
> + return 1;
> +}
> +
> +__setup("noirqtime", irqtime_setup);
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Only stand for myself

2012-02-09 02:51:23

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

On Wed, Feb 08, 2012 at 01:18:33PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 08, 2012 at 04:48:34AM -0800, Dmitry Antipov wrote:
> > Generalize CONFIG_IRQ_TIME_ACCOUNTING between X86 and
> > ARM, move "noirqtime=" option to common debugging code.
> > For a bit of backward compatibility, "tsc=noirqtime"
> > is preserved, but issues a warning.
> >
> > Suggested-by: Venki Pallipadi <[email protected]>
> > Signed-off-by: Dmitry Antipov <[email protected]>
> > ---
> > arch/arm/kernel/sched_clock.c | 3 +++
> > arch/x86/Kconfig | 11 -----------
> > arch/x86/kernel/tsc.c | 7 ++++---
> > include/linux/sched.h | 2 ++
> > lib/Kconfig.debug | 12 ++++++++++++
> > lib/Makefile | 2 ++
> > lib/irqtime.c | 12 ++++++++++++
> > 7 files changed, 35 insertions(+), 14 deletions(-)
> > create mode 100644 lib/irqtime.c
> >
> > diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
> > index 5416c7c..56d2a9d 100644
> > --- a/arch/arm/kernel/sched_clock.c
> > +++ b/arch/arm/kernel/sched_clock.c
> > @@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
> > if (read_sched_clock == jiffy_sched_clock_read)
> > setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
> >
> > + if (!no_sched_irq_time)
> > + enable_sched_clock_irqtime();
>
> Why are you placing this here? sched_clock is available from the point
> that it's registered, which should be before the first sched_clock()
> call.
>
> > +config IRQ_TIME_ACCOUNTING
> > + bool "Fine granularity task level IRQ time accounting"
> > + depends on (X86 || (ARM && HAVE_SCHED_CLOCK))
>
> Even though it's not bad here, please get out of the habbit of throwing
> unnecessary parens into the mix. It can make stuff more difficult to
> read and therefore confirm correctness. (I've spent many a time
> rewriting if() statements because of paren overuse.)
>
> This could have been written:
>
> depends on X86 || (ARM && HAVE_SCHED_CLOCK)
>
> However, ARM will always have HAVE_SCHED_CLOCK after the next merge window,
> so this can become a much simpler:
>
> depends on X86 || ARM

Maybe we can hand the depend-things to every ARCH, say let ARCH provides
HAVE_IRQ_TIME_ACCOUNTING. Thus we can make IRQ_TIME_ACCOUNTING
denpend on HAVE_IRQ_TIME_ACCOUNTING.

Thanks,
Yong

>
> Apart from these two points, the rest of the patch looks fine to me but
> the ultimate decision about its acceptability is up to other people.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Only stand for myself