2018-05-01 13:12:25

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 0/5] kernel hacking: GCC optimization for debug experience (-Og)

From: Changbin Du <[email protected]>

Hi all,
I know some kernel developers was searching for a method to dissable GCC
optimizations, probably they want to apply GCC '-O0' option. But since Linux
kernel replys on GCC optimization to remove some dead code, so '-O0' just
breaks the build. They do need this because they want to debug kernel with
qemu, simics, kgtp or kgdb.

Thanks for the GCC '-Og' optimization level introduced in GCC 4.8, which
offers a reasonable level of optimization while maintaining fast compilation
and a good debugging experience. It is similar to '-O1' while perfer keeping
debug ability over runtime speed. With '-Og', we can build a kernel with
better debug ability and little performance drop after some simple change.

In this serias, firstly introduce a new config CONFIG_NO_AUTO_INLINE after two
fixes. Selecting this option will make compiler not auto-inline kernel
functions.This is useful when you are using ftrace to understand the control
flow of kernel code or tracing some static functions.

Then introduce new config CONFIG_DEBUG_EXPERIENCE which apply '-Og'
optimization level for whole kernel, with a simple fix in fix_to_virt().
Currently this option is only tested on a QEMU gust and it works fine.

Comparison of vmlinux size: a bit smaller.

w/o CONFIG_DEBUG_EXPERIENCE
$ size vmlinux
text data bss dec hex filename
22665554 9709674 2920908 35296136 21a9388 vmlinux

w/ CONFIG_DEBUG_EXPERIENCE
$ size vmlinux
text data bss dec hex filename
21499032 10102758 2920908 34522698 20ec64a vmlinux

Comparison of system performance: a bit drop.

w/o CONFIG_DEBUG_EXPERIENCE
$ time make -j4
real 6m43.619s
user 19m5.160s
sys 2m20.287s

w/ CONFIG_DEBUG_EXPERIENCE
$ time make -j4
real 6m55.054s
user 19m11.129s
sys 2m36.345s

Changbin Du (5):
x86/mm: surround level4_kernel_pgt with #ifdef
CONFIG_X86_5LEVEL...#endif
regulator: add dummy of_find_regulator_by_node
kernel hacking: new config NO_AUTO_INLINE to disable compiler
atuo-inline optimizations
kernel hacking: new config DEBUG_EXPERIENCE to apply GCC -Og
optimization
asm-generic: fix build error in fix_to_virt with
CONFIG_DEBUG_EXPERIENCE

Makefile | 10 ++++++++++
arch/x86/include/asm/pgtable_64.h | 2 ++
arch/x86/kernel/head64.c | 13 ++++++-------
drivers/regulator/internal.h | 9 +++++++--
include/asm-generic/fixmap.h | 3 ++-
include/linux/compiler-gcc.h | 2 +-
include/linux/compiler.h | 2 +-
lib/Kconfig.debug | 34 ++++++++++++++++++++++++++++++++++
8 files changed, 63 insertions(+), 12 deletions(-)

--
2.7.4



2018-05-01 13:11:37

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 5/5] asm-generic: fix build error in fix_to_virt with CONFIG_DEBUG_EXPERIENCE

From: Changbin Du <[email protected]>

With '-Og' optimization level, GCC would not optimize a count for a loop
as a constant value. But BUILD_BUG_ON() only accept compile-time constant
values.

arch/arm/mm/mmu.o: In function `fix_to_virt':
/home/changbin/work/linux/./include/asm-generic/fixmap.h:31: undefined reference to `__compiletime_assert_31'
Makefile:1051: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Signed-off-by: Changbin Du <[email protected]>
---
include/asm-generic/fixmap.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/fixmap.h b/include/asm-generic/fixmap.h
index 827e4d3..a6576d4 100644
--- a/include/asm-generic/fixmap.h
+++ b/include/asm-generic/fixmap.h
@@ -28,7 +28,8 @@
*/
static __always_inline unsigned long fix_to_virt(const unsigned int idx)
{
- BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
+ BUILD_BUG_ON(__builtin_constant_p(idx) &&
+ idx >= __end_of_fixed_addresses);
return __fix_to_virt(idx);
}

--
2.7.4


2018-05-01 13:11:58

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 3/5] kernel hacking: new config NO_AUTO_INLINE to disable compiler atuo-inline optimizations

From: Changbin Du <[email protected]>

This patch add a new kernel hacking option NO_AUTO_INLINE. Selecting
this option will make compiler not auto-inline kernel functions. By
enabling this option, all the kernel functions (including static ones)
will not be optimized out except those marked as inline or always_inline.
This is useful when you are using ftrace to understand the control flow
of kernel code or tracing some static functions.

Signed-off-by: Changbin Du <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
Makefile | 6 ++++++
lib/Kconfig.debug | 13 +++++++++++++
2 files changed, 19 insertions(+)

diff --git a/Makefile b/Makefile
index 619a85a..eb694f6 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,12 @@ KBUILD_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \
$(call cc-option,-fno-var-tracking)
endif

+ifdef CONFIG_NO_AUTO_INLINE
+KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions) \
+ $(call cc-option, -fno-inline-small-functions) \
+ $(call cc-option, -fno-inline-functions-called-once)
+endif
+
ifdef CONFIG_FUNCTION_TRACER
ifndef CC_FLAGS_FTRACE
CC_FLAGS_FTRACE := -pg
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b7..90f35ad 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -198,6 +198,19 @@ config GDB_SCRIPTS
instance. See Documentation/dev-tools/gdb-kernel-debugging.rst
for further details.

+config NO_AUTO_INLINE
+ bool "Disable compiler atuo-inline optimizations"
+ default n
+ help
+ This will make compiler not auto-inline kernel functions for
+ optimization. By enabling this option, all the kernel functions
+ (including static ones) will not be optimized out except those
+ marked as inline or always_inline. This is useful when you are
+ using ftrace to understand the control flow of kernel code or
+ tracing some static functions.
+
+ Use only if you want to debug the kernel.
+
config ENABLE_WARN_DEPRECATED
bool "Enable __deprecated logic"
default y
--
2.7.4


2018-05-01 13:12:08

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 4/5] kernel hacking: new config DEBUG_EXPERIENCE to apply GCC -Og optimization

From: Changbin Du <[email protected]>

This will apply GCC '-Og' optimization level get supported from
GCC 4.8. This optimization level offers a reasonable level of
optimization while maintaining fast compilation and a good
debugging experience. It is similar to '-O1' while perfer keeping
debug ability over runtime speed.

If enabling this option break your kernel, you should either
disable this or find a fix (mostly in the arch code). Currently
this option has only be tested in qemu x86_64 guest.

This option can satisfy people who was searching for a method
to disable compiler optimizations so to achieve better kernel
debugging experience with kgdb or qemu.

The main problem of '-Og' is we must not use __attribute__((error(smg))).
The compiler will report error though the call to error function
still can be optimize out. So we must fallback to array tricky.

Comparison of vmlinux size: a bit smaller.

w/o CONFIG_DEBUG_EXPERIENCE
$ size vmlinux
text data bss dec hex filename
22665554 9709674 2920908 35296136 21a9388 vmlinux

w/ CONFIG_DEBUG_EXPERIENCE
$ size vmlinux
text data bss dec hex filename
21499032 10102758 2920908 34522698 20ec64a vmlinux

Comparison of system performance: a bit drop.

w/o CONFIG_DEBUG_EXPERIENCE
$ time make -j4
real 6m43.619s
user 19m5.160s
sys 2m20.287s

w/ CONFIG_DEBUG_EXPERIENCE
$ time make -j4
real 6m55.054s
user 19m11.129s
sys 2m36.345s

Signed-off-by: Changbin Du <[email protected]>
---
Makefile | 4 ++++
include/linux/compiler-gcc.h | 2 +-
include/linux/compiler.h | 2 +-
lib/Kconfig.debug | 21 +++++++++++++++++++++
4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index eb694f6..6a10469 100644
--- a/Makefile
+++ b/Makefile
@@ -639,6 +639,9 @@ KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)

+ifdef CONFIG_DEBUG_EXPERIENCE
+KBUILD_CFLAGS += $(call cc-option, -Og)
+else
ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += $(call cc-option,-Oz,-Os)
KBUILD_CFLAGS += $(call cc-disable-warning,maybe-uninitialized,)
@@ -649,6 +652,7 @@ else
KBUILD_CFLAGS += -O2
endif
endif
+endif

KBUILD_CFLAGS += $(call cc-ifversion, -lt, 0409, \
$(call cc-disable-warning,maybe-uninitialized,))
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index b4bf73f..b8b3832 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -192,7 +192,7 @@

#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)

-#ifndef __CHECKER__
+#if !defined(__CHECKER__) && !defined(CONFIG_DEBUG_EXPERIENCE)
# define __compiletime_warning(message) __attribute__((warning(message)))
# define __compiletime_error(message) __attribute__((error(message)))
#endif /* __CHECKER__ */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c..952cc7f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -301,7 +301,7 @@ unsigned long read_word_at_a_time(const void *addr)
* sparse see a constant array size without breaking compiletime_assert on old
* versions of GCC (e.g. 4.2.4), so hide the array from sparse altogether.
*/
-# ifndef __CHECKER__
+# if !defined(__CHECKER__) && !defined(CONFIG_DEBUG_EXPERIENCE)
# define __compiletime_error_fallback(condition) \
do { ((void)sizeof(char[1 - 2 * condition])); } while (0)
# endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 90f35ad..2432e77d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -211,6 +211,27 @@ config NO_AUTO_INLINE

Use only if you want to debug the kernel.

+config DEBUG_EXPERIENCE
+ bool "Optimize for better debugging experience (-Og)"
+ default n
+ select NO_AUTO_INLINE
+ depends on !CC_OPTIMIZE_FOR_SIZE
+ help
+ This will apply GCC '-Og' optimization level get supported from
+ GCC 4.8. This optimization level offers a reasonable level of
+ optimization while maintaining fast compilation and a good
+ debugging experience. It is similar to '-O1' while perfer keeping
+ debug ability over runtime speed. The overall performance will
+ drop a bit.
+
+ If enabling this option break your kernel, you should either
+ disable this or find a fix (mostly in the arch code). Currently
+ this option has only be tested in qemu x86_64 guest.
+
+ Use only if you want to debug the kernel, especially if you want
+ to have better kernel debugging experience with gdb facilities
+ like kgdb and qemu.
+
config ENABLE_WARN_DEPRECATED
bool "Enable __deprecated logic"
default y
--
2.7.4


2018-05-01 13:12:37

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 1/5] x86/mm: surround level4_kernel_pgt with #ifdef CONFIG_X86_5LEVEL...#endif

From: Changbin Du <[email protected]>

The level4_kernel_pgt is only defined when X86_5LEVEL is enabled. So
surround level4_kernel_pgt with #ifdef CONFIG_X86_5LEVEL...#endif to
make code correct.

Signed-off-by: Changbin Du <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 2 ++
arch/x86/kernel/head64.c | 13 ++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 877bc27..9e7f667 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -15,7 +15,9 @@
#include <linux/bitops.h>
#include <linux/threads.h>

+#ifdef CONFIG_X86_5LEVEL
extern p4d_t level4_kernel_pgt[512];
+#endif
extern p4d_t level4_ident_pgt[512];
extern pud_t level3_kernel_pgt[512];
extern pud_t level3_ident_pgt[512];
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0c408f8..775d7a6 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,16 +143,15 @@ unsigned long __head __startup_64(unsigned long physaddr,

pgd = fixup_pointer(&early_top_pgt, physaddr);
p = pgd + pgd_index(__START_KERNEL_map);
- if (la57)
- *p = (unsigned long)level4_kernel_pgt;
- else
- *p = (unsigned long)level3_kernel_pgt;
- *p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
-
+#ifdef CONFIG_X86_5LEVEL
if (la57) {
+ *p = (unsigned long)level4_kernel_pgt;
p4d = fixup_pointer(&level4_kernel_pgt, physaddr);
p4d[511] += load_delta;
- }
+ } else
+#endif
+ *p = (unsigned long)level3_kernel_pgt;
+ *p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;

pud = fixup_pointer(&level3_kernel_pgt, physaddr);
pud[510] += load_delta;
--
2.7.4


2018-05-01 13:14:16

by Du, Changbin

[permalink] [raw]
Subject: [PATCH 2/5] regulator: add dummy of_find_regulator_by_node

From: Changbin Du <[email protected]>

If device tree is not enabled, of_find_regulator_by_node() should have
a dummy function since the function call is still there.

Signed-off-by: Changbin Du <[email protected]>
---
drivers/regulator/internal.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/internal.h b/drivers/regulator/internal.h
index abfd56e..24fde1e 100644
--- a/drivers/regulator/internal.h
+++ b/drivers/regulator/internal.h
@@ -56,14 +56,19 @@ static inline struct regulator_dev *dev_to_rdev(struct device *dev)
return container_of(dev, struct regulator_dev, dev);
}

-struct regulator_dev *of_find_regulator_by_node(struct device_node *np);
-
#ifdef CONFIG_OF
+struct regulator_dev *of_find_regulator_by_node(struct device_node *np);
struct regulator_init_data *regulator_of_get_init_data(struct device *dev,
const struct regulator_desc *desc,
struct regulator_config *config,
struct device_node **node);
#else
+static inline struct regulator_dev *
+of_find_regulator_by_node(struct device_node *np)
+{
+ return NULL;
+}
+
static inline struct regulator_init_data *
regulator_of_get_init_data(struct device *dev,
const struct regulator_desc *desc,
--
2.7.4


2018-05-01 14:54:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel hacking: new config NO_AUTO_INLINE to disable compiler atuo-inline optimizations

On Tue, 1 May 2018 21:00:12 +0800
[email protected] wrote:

> From: Changbin Du <[email protected]>
>
> This patch add a new kernel hacking option NO_AUTO_INLINE. Selecting
> this option will make compiler not auto-inline kernel functions. By
> enabling this option, all the kernel functions (including static ones)
> will not be optimized out except those marked as inline or always_inline.
> This is useful when you are using ftrace to understand the control flow
> of kernel code or tracing some static functions.

I'm not against this patch, but it's up to others if this gets included
or not.

>
> Signed-off-by: Changbin Du <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> ---
> Makefile | 6 ++++++
> lib/Kconfig.debug | 13 +++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 619a85a..eb694f6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -775,6 +775,12 @@ KBUILD_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \
> $(call cc-option,-fno-var-tracking)
> endif
>
> +ifdef CONFIG_NO_AUTO_INLINE
> +KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions) \
> + $(call cc-option, -fno-inline-small-functions) \
> + $(call cc-option, -fno-inline-functions-called-once)
> +endif
> +
> ifdef CONFIG_FUNCTION_TRACER
> ifndef CC_FLAGS_FTRACE
> CC_FLAGS_FTRACE := -pg
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b7..90f35ad 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -198,6 +198,19 @@ config GDB_SCRIPTS
> instance. See Documentation/dev-tools/gdb-kernel-debugging.rst
> for further details.
>
> +config NO_AUTO_INLINE
> + bool "Disable compiler atuo-inline optimizations"

typo: s/atuo/auto/

> + default n
> + help
> + This will make compiler not auto-inline kernel functions for
> + optimization. By enabling this option, all the kernel functions
> + (including static ones) will not be optimized out except those
> + marked as inline or always_inline. This is useful when you are
> + using ftrace to understand the control flow of kernel code or
> + tracing some static functions.

Some grammar updates:

This will prevent the compiler from optimizing the kernel by
auto-inlining functions not marked with the inline keyword.
With this option, only functions explicitly marked with
"inline" will be inlined. This will allow the function tracer
to trace more functions because it only traces functions that
the compiler has not inlined.

Enabling this function can help debugging a kernel if using
the function tracer. But it can also change how the kernel
works, because inlining functions may change the timing,
which could make it difficult while debugging race conditions.

> +
> + Use only if you want to debug the kernel.

The proper way to say the above is:

If unsure, select N

-- Steve

> +
> config ENABLE_WARN_DEPRECATED
> bool "Enable __deprecated logic"
> default y


2018-05-01 15:26:01

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 4/5] kernel hacking: new config DEBUG_EXPERIENCE to apply GCC -Og optimization

Good morning.

On 05/01/2018 06:00 AM, [email protected] wrote:
> From: Changbin Du <[email protected]>
>
>
> Signed-off-by: Changbin Du <[email protected]>
> ---
> Makefile | 4 ++++
> include/linux/compiler-gcc.h | 2 +-
> include/linux/compiler.h | 2 +-
> lib/Kconfig.debug | 21 +++++++++++++++++++++
> 4 files changed, 27 insertions(+), 2 deletions(-)
>

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 90f35ad..2432e77d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -211,6 +211,27 @@ config NO_AUTO_INLINE
>
> Use only if you want to debug the kernel.
>
> +config DEBUG_EXPERIENCE
> + bool "Optimize for better debugging experience (-Og)"
> + default n
> + select NO_AUTO_INLINE
> + depends on !CC_OPTIMIZE_FOR_SIZE
> + help
> + This will apply GCC '-Og' optimization level get supported from

which is supported since

> + GCC 4.8. This optimization level offers a reasonable level of
> + optimization while maintaining fast compilation and a good
> + debugging experience. It is similar to '-O1' while perfer keeping

while preferring to keep

> + debug ability over runtime speed. The overall performance will
> + drop a bit.
> +
> + If enabling this option break your kernel, you should either

breaks

> + disable this or find a fix (mostly in the arch code). Currently
> + this option has only be tested in qemu x86_64 guest.
> +
> + Use only if you want to debug the kernel, especially if you want
> + to have better kernel debugging experience with gdb facilities
> + like kgdb and qemu.
> +
> config ENABLE_WARN_DEPRECATED
> bool "Enable __deprecated logic"
> default y
>

thanks,
--
~Randy

2018-05-01 20:41:40

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 2/5] regulator: add dummy of_find_regulator_by_node

On Tue, May 01, 2018 at 09:00:11PM +0800, [email protected] wrote:
> From: Changbin Du <[email protected]>
>
> If device tree is not enabled, of_find_regulator_by_node() should have
> a dummy function since the function call is still there.
>
> Signed-off-by: Changbin Du <[email protected]>

This appears to have no obvious connection with the cover letter for the
series... The first question here is if this is something best fixed
with a stub or by fixing the users - is the lack of a stub pointing out
some bugs in them? I'm a bit worried about how we've been managing to
avoid any build test issues here though, surely the various builders
would have spotted a problem?


Attachments:
(No filename) (706.00 B)
signature.asc (499.00 B)
Download all attachments

2018-05-02 09:18:38

by Du, Changbin

[permalink] [raw]
Subject: Re: [PATCH 0/5] kernel hacking: GCC optimization for debug experience (-Og)

On Wed, May 02, 2018 at 09:33:15AM +0200, Ingo Molnar wrote:
>
> * [email protected] <[email protected]> wrote:
>
> > Comparison of system performance: a bit drop.
> >
> > w/o CONFIG_DEBUG_EXPERIENCE
> > $ time make -j4
> > real 6m43.619s
> > user 19m5.160s
> > sys 2m20.287s
> >
> > w/ CONFIG_DEBUG_EXPERIENCE
> > $ time make -j4
> > real 6m55.054s
> > user 19m11.129s
> > sys 2m36.345s
>
> Sorry, that's not a proper kbuild performance measurement - there's no noise
> estimation at all.
>
> Below is a description that should produce more reliable numbers.
>
> Thanks,
>
> Ingo
>
Thanks for your suggestion, I will try your tips to eliminate noise. Since it is
tested in KVM guest, so I just reboot the guest before testing. But in host side
I still need to consider these noises.

>
> =========================>
>
> So here's a pretty reliable way to measure kernel build time, which tries to avoid
> the various pitfalls of caching.
>
> First I make sure that cpufreq is set to 'performance':
>
> for ((cpu=0; cpu<120; cpu++)); do
> G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor
> [ -f $G ] && echo performance > $G
> done
>
> [ ... because it can be *really* annoying to discover that an ostensible
> performance regression was a cpufreq artifact ... again. ;-) ]
>
> Then I copy a kernel tree to /tmp (ramfs) as root:
>
> cd /tmp
> rm -rf linux
> git clone ~/linux linux
> cd linux
> make defconfig >/dev/null
>
> ... and then we can build the kernel in such a loop (as root again):
>
> perf stat --repeat 10 --null --pre '\
> cp -a kernel ../kernel.copy.$(date +%s); \
> rm -rf *; \
> git checkout .; \
> echo 1 > /proc/sys/vm/drop_caches; \
> find ../kernel* -type f | xargs cat >/dev/null; \
> make -j kernel >/dev/null; \
> make clean >/dev/null 2>&1; \
> sync '\
> \
> make -j16 >/dev/null
>
> ( I have tested these by pasting them into a terminal. Adjust the ~/linux source
> git tree and the '-j16' to your system. )
>
> Notes:
>
> - the 'pre' script portion is not timed by 'perf stat', only the raw build times
>
> - we flush all caches via drop_caches and re-establish everything again, but:
>
> - we also introduce an intentional memory leak by slowly filling up ramfs with
> copies of 'kernel/', thus continously changing the layout of free memory,
> cached data such as compiler binaries and the source code hierarchy. (Note
> that the leak is about 8MB per iteration, so it isn't massive.)
>
> With 10 iterations this is the statistical stability I get this on a big box:
>
> Performance counter stats for 'make -j128 kernel' (10 runs):
>
> 26.346436425 seconds time elapsed (+- 0.19%)
>
> ... which, despite a high iteration count of 10, is still surprisingly noisy,
> right?
>
> A 0.2% stddev is probably not enough to call a 0.7% regression with good
> confidence, so I had to use *30* iterations to make measurement noise to be about
> an order of magnitude lower than the effect I'm trying to measure:
>
> Performance counter stats for 'make -j128' (30 runs):
>
> 26.334767571 seconds time elapsed (+- 0.09% )
>
> i.e. "26.334 +- 0.023" seconds is a number we can have pretty high confidence in,
> on this system.
>
> And just to demonstrate that it's all real, I repeated the whole 30-iteration
> measurement again:
>
> Performance counter stats for 'make -j128' (30 runs):
>
> 26.311166142 seconds time elapsed (+- 0.07%)
>

--
Thanks,
Changbin Du

2018-05-02 09:35:14

by Du, Changbin

[permalink] [raw]
Subject: Re: [PATCH 2/5] regulator: add dummy of_find_regulator_by_node

On Wed, May 02, 2018 at 05:40:36AM +0900, Mark Brown wrote:
> On Tue, May 01, 2018 at 09:00:11PM +0800, [email protected] wrote:
> > From: Changbin Du <[email protected]>
> >
> > If device tree is not enabled, of_find_regulator_by_node() should have
> > a dummy function since the function call is still there.
> >
> > Signed-off-by: Changbin Du <[email protected]>
>
> This appears to have no obvious connection with the cover letter for the
> series... The first question here is if this is something best fixed
> with a stub or by fixing the users - is the lack of a stub pointing out
> some bugs in them? I'm a bit worried about how we've been managing to
> avoid any build test issues here though, surely the various builders
> would have spotted a problem?

This is to fix build error after NO_AUTO_INLINE is introduced. If this option
is enabled, GCC will not auto-inline functions that are not explicitly marked
as inline.

In this case (no CONFIG_OF), the copmiler will report error in regulator_dev_lookup().
W/o NO_AUTO_INLINE, function of_get_regulator() is auto-inlined and then the call
to of_find_regulator_by_node() is optimized out since of_get_regulator() always
return NULL. W/ NO_AUTO_INLINE, the return value of of_get_regulator() is a variable
so the call to of_find_regulator_by_node() cannot be optimized out.

static struct regulator_dev *regulator_dev_lookup(struct device *dev,
const char *supply)
{
struct regulator_dev *r = NULL;
struct device_node *node;
struct regulator_map *map;
const char *devname = NULL;

regulator_supply_alias(&dev, &supply);

/* first do a dt based lookup */
if (dev && dev->of_node) {
node = of_get_regulator(dev, supply);
if (node) {
r = of_find_regulator_by_node(node);
if (r)
return r;
....

It is safe we just provide a stub of_find_regulator_by_node() if no CONFIG_OF.

--
Thanks,
Changbin Du

2018-05-02 09:35:43

by Du, Changbin

[permalink] [raw]
Subject: Re: [PATCH 4/5] kernel hacking: new config DEBUG_EXPERIENCE to apply GCC -Og optimization

On Tue, May 01, 2018 at 08:25:27AM -0700, Randy Dunlap wrote:
> Good morning.
>
> On 05/01/2018 06:00 AM, [email protected] wrote:
> > From: Changbin Du <[email protected]>
> >
> >
> > Signed-off-by: Changbin Du <[email protected]>
> > ---
> > Makefile | 4 ++++
> > include/linux/compiler-gcc.h | 2 +-
> > include/linux/compiler.h | 2 +-
> > lib/Kconfig.debug | 21 +++++++++++++++++++++
> > 4 files changed, 27 insertions(+), 2 deletions(-)
> >
>
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 90f35ad..2432e77d 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -211,6 +211,27 @@ config NO_AUTO_INLINE
> >
> > Use only if you want to debug the kernel.
> >
> > +config DEBUG_EXPERIENCE
> > + bool "Optimize for better debugging experience (-Og)"
> > + default n
> > + select NO_AUTO_INLINE
> > + depends on !CC_OPTIMIZE_FOR_SIZE
> > + help
> > + This will apply GCC '-Og' optimization level get supported from
>
> which is supported since
>
> > + GCC 4.8. This optimization level offers a reasonable level of
> > + optimization while maintaining fast compilation and a good
> > + debugging experience. It is similar to '-O1' while perfer keeping
>
> while preferring to keep
>
> > + debug ability over runtime speed. The overall performance will
> > + drop a bit.
> > +
> > + If enabling this option break your kernel, you should either
>
> breaks
>
> > + disable this or find a fix (mostly in the arch code). Currently
> > + this option has only be tested in qemu x86_64 guest.
> > +
> > + Use only if you want to debug the kernel, especially if you want
> > + to have better kernel debugging experience with gdb facilities
> > + like kgdb and qemu.
> > +
> > config ENABLE_WARN_DEPRECATED
> > bool "Enable __deprecated logic"
> > default y
> >
>
> thanks,
> --
> ~Randy

Thanks for your correction, I will update.

--
Thanks,
Changbin Du

2018-05-02 09:37:35

by Du, Changbin

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel hacking: new config NO_AUTO_INLINE to disable compiler atuo-inline optimizations

On Tue, May 01, 2018 at 10:54:20AM -0400, Steven Rostedt wrote:
> On Tue, 1 May 2018 21:00:12 +0800
> [email protected] wrote:
>
> > From: Changbin Du <[email protected]>
> >
> > This patch add a new kernel hacking option NO_AUTO_INLINE. Selecting
> > this option will make compiler not auto-inline kernel functions. By
> > enabling this option, all the kernel functions (including static ones)
> > will not be optimized out except those marked as inline or always_inline.
> > This is useful when you are using ftrace to understand the control flow
> > of kernel code or tracing some static functions.
>
> I'm not against this patch, but it's up to others if this gets included
> or not.
>
> >
> > Signed-off-by: Changbin Du <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > ---
> > Makefile | 6 ++++++
> > lib/Kconfig.debug | 13 +++++++++++++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index 619a85a..eb694f6 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -775,6 +775,12 @@ KBUILD_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \
> > $(call cc-option,-fno-var-tracking)
> > endif
> >
> > +ifdef CONFIG_NO_AUTO_INLINE
> > +KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions) \
> > + $(call cc-option, -fno-inline-small-functions) \
> > + $(call cc-option, -fno-inline-functions-called-once)
> > +endif
> > +
> > ifdef CONFIG_FUNCTION_TRACER
> > ifndef CC_FLAGS_FTRACE
> > CC_FLAGS_FTRACE := -pg
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c40c7b7..90f35ad 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -198,6 +198,19 @@ config GDB_SCRIPTS
> > instance. See Documentation/dev-tools/gdb-kernel-debugging.rst
> > for further details.
> >
> > +config NO_AUTO_INLINE
> > + bool "Disable compiler atuo-inline optimizations"
>
> typo: s/atuo/auto/
>
> > + default n
> > + help
> > + This will make compiler not auto-inline kernel functions for
> > + optimization. By enabling this option, all the kernel functions
> > + (including static ones) will not be optimized out except those
> > + marked as inline or always_inline. This is useful when you are
> > + using ftrace to understand the control flow of kernel code or
> > + tracing some static functions.
>
> Some grammar updates:
>
> This will prevent the compiler from optimizing the kernel by
> auto-inlining functions not marked with the inline keyword.
> With this option, only functions explicitly marked with
> "inline" will be inlined. This will allow the function tracer
> to trace more functions because it only traces functions that
> the compiler has not inlined.
>
> Enabling this function can help debugging a kernel if using
> the function tracer. But it can also change how the kernel
> works, because inlining functions may change the timing,
> which could make it difficult while debugging race conditions.
>

Thanks for your kind grammar updates. I will update them. :)

> > +
> > + Use only if you want to debug the kernel.
>
> The proper way to say the above is:
>
> If unsure, select N
>
Agree.

> -- Steve
>
> > +
> > config ENABLE_WARN_DEPRECATED
> > bool "Enable __deprecated logic"
> > default y
>


--
Thanks,
Changbin Du

2018-05-02 20:31:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 4/5] kernel hacking: new config DEBUG_EXPERIENCE to apply GCC -Og optimization

On Tue, May 1, 2018 at 9:00 AM, <[email protected]> wrote:
> From: Changbin Du <[email protected]>
> +config DEBUG_EXPERIENCE
> + bool "Optimize for better debugging experience (-Og)"
> + default n
> + select NO_AUTO_INLINE
> + depends on !CC_OPTIMIZE_FOR_SIZE
> + help

How about having this as another option alongside CC_OPTIMIZE_FOR_SIZE
and CC_OPTIMIZE_FOR_PERFORMANCE in the same choice statement?

We could also add another option for -Os (for faster compiles) or possibly
for -O3 (if anyone cares) in there.

Arnd

2018-05-05 01:48:15

by Mark Brown

[permalink] [raw]
Subject: Applied "regulator: add dummy function of_find_regulator_by_node" to the regulator tree

The patch

regulator: add dummy function of_find_regulator_by_node

has been applied to the regulator tree at

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

From 08813e0ec1cb48e53c86a24d88d26b26878e7b6e Mon Sep 17 00:00:00 2001
From: Changbin Du <[email protected]>
Date: Wed, 2 May 2018 21:44:57 +0800
Subject: [PATCH] regulator: add dummy function of_find_regulator_by_node

If device tree is not enabled, of_find_regulator_by_node() should have
a dummy function since the function call is still there.

This is to fix build error after CONFIG_NO_AUTO_INLINE is introduced.
If this option is enabled, GCC will not auto-inline functions that are
not explicitly marked as inline.

In this case (no CONFIG_OF), the copmiler will report error in function
regulator_dev_lookup().

W/O NO_AUTO_INLINE, function of_get_regulator() is auto-inlined and then
the call to of_find_regulator_by_node() is optimized out since
of_get_regulator() always return NULL.

W/ NO_AUTO_INLINE, the return value of of_get_regulator() is a variable
so the call to of_find_regulator_by_node() cannot be optimized out. So
we need a stub of_find_regulator_by_node().

static struct regulator_dev *regulator_dev_lookup(struct device *dev,
const char *supply)
{
struct regulator_dev *r = NULL;
struct device_node *node;
struct regulator_map *map;
const char *devname = NULL;

regulator_supply_alias(&dev, &supply);

/* first do a dt based lookup */
if (dev && dev->of_node) {
node = of_get_regulator(dev, supply);
if (node) {
r = of_find_regulator_by_node(node);
if (r)
return r;
...

Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
---
drivers/regulator/internal.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/internal.h b/drivers/regulator/internal.h
index abfd56e8c78a..24fde1e08f3a 100644
--- a/drivers/regulator/internal.h
+++ b/drivers/regulator/internal.h
@@ -56,14 +56,19 @@ static inline struct regulator_dev *dev_to_rdev(struct device *dev)
return container_of(dev, struct regulator_dev, dev);
}

-struct regulator_dev *of_find_regulator_by_node(struct device_node *np);
-
#ifdef CONFIG_OF
+struct regulator_dev *of_find_regulator_by_node(struct device_node *np);
struct regulator_init_data *regulator_of_get_init_data(struct device *dev,
const struct regulator_desc *desc,
struct regulator_config *config,
struct device_node **node);
#else
+static inline struct regulator_dev *
+of_find_regulator_by_node(struct device_node *np)
+{
+ return NULL;
+}
+
static inline struct regulator_init_data *
regulator_of_get_init_data(struct device *dev,
const struct regulator_desc *desc,
--
2.17.0