From: Austin S. Hemmelgarn <[email protected]>
This patch adds options to specifically optimize for a number of newer 64-bit microarchitectures; specifically, Intel's Nehalem, Westmere, Ivy Bridge, and Sandy Bridge, and AMD's Family 10h, Bobcat, Jaguar, Bulldozer, Piledriver, and Steamroller. This serves primarily as an attempt to render this particular sub-menu up-to-date with respect to the options offered by current versions of GCC.
Signed-Off-By: Austin S. Hemmelgarn <[email protected]>
---
Based on some testing of MIVYBRIDGE, MAMDFAM10, and MPILEDRIVER (the only three that I personally have the hardware to test on), most of these should preform better than GENERIC_CPU on the right hardware, and none of them will preform any worse than GENERIC_CPU on the correct hardware. Furthermore, testing of MPILEDRIVER seems to indicate an improvement to energy efficiency over GENERIC_CPU as it causes the on cpu power sensor to consistently read an average of 1.5 watts lower under idle load than when using GENERIC_CPU (this corresponds to about 5% decrease in power consumption on a idle-tickless system, and about 2% on a non-dynticks system.). In addition, most of the people who would be likely to use this are almost certainly going to use it regardless because they either don't care how much of an improvement it provides, or use Linux on such a large scale that any performance improvement is significant (i.e. if you have need for 1000 servers to meet load requirements, then a 1%
performance boost means that you need 10 fewer servers to meet the same load).
diff -urNp linux/arch/x86/Kconfig.cpu linux-patched/arch/x86/Kconfig.cpu
--- linux/arch/x86/Kconfig.cpu 2013-10-16 16:16:06.722058808 -0400
+++ linux-patched/arch/x86/Kconfig.cpu 2013-10-16 16:29:38.532078946 -0400
@@ -158,6 +158,7 @@ config MK8
bool "Opteron/Athlon64/Hammer/K8"
---help---
Select this for an AMD Opteron or Athlon64 Hammer-family processor.
+ These processors show a CPU family of 8 in /proc/cpuinfo.
Enables use of some extended instructions, and passes appropriate
optimization flags to GCC.
@@ -269,6 +270,105 @@ config MATOM
accordingly optimized code. Use a recent GCC with specific Atom
support in order to fully benefit from selecting this option.
+config MNEHALEM
+ bool "Intel Nehalem/Westmere"
+ depends on X86_64
+ ---help---
+ Select this for first and second generation Core i3, i5, and i7
+ processors and other Nehalem or Westmere based processors
+ This also includes some Xeon server processors.
+
+config MSANDYBRIDGE
+ bool "Intel Sandy Bridge"
+ depends on X86_64
+ ---help---
+ Select this for third generation Core i3, i5, and i7
+ processors and other Sandy Bridge based processors
+ In addition, this includes some Xeon processors, and many recent
+ mobile processors branded as Pentium or Celeron.
+
+config MIVYBRIDGE
+ bool "Intel Ivy Bridge"
+ depends on X86_64
+ ---help---
+ Select this for fourth generation Core i3, i5, and i7 processors
+ and other Ivy Bridge based processors. This also includes some
+ Xeon, Pentium, and Celeron processors.
+
+config MAMDFAM10
+ bool "AMD Family 10h (Athlon II, Phenom II, and Opteron)"
+ depends on X86_64
+ ---help---
+ Select this for AMD Family 10h processors.
+ This includes Athlon II, Phenom II, early third-generation
+ Opterons, and a number of other Socket AM2, AM2+, AM3, and
+ Socket F processors. CPU's in this series show cpu family
+ 16 in /proc/cpuinfo.
+
+config MBOBCAT
+ bool "AMD Bobcat (C, E, G, and Z series APU's)"
+ depends on X86_64
+ ---help---
+ Select this for AMD C, E, G, and Z series APU's. These are
+ ultra low power CPU+GPU combos that are similar to the
+ Bulldozer CPU cores, but have a significantly different
+ instruction pipeline, and fewer instruction set extensions.
+
+config MJAGUAR
+ bool "AMD Jaguar (Newer E and A series APU's)"
+ depends on X86_64
+ ---help---
+ Select this for AMD Jaguar based CPU's. These are the successors
+ of the Bobcat microarchitecture. CPU's based on this microarchitecture
+ will show a CPU family of 22 in /proc/cpuinfo.
+
+config MBULLDOZER
+ bool "AMD Bulldozer (FX and Opteron)"
+ depends on X86_64
+ ---help---
+ Select this for AMD Bulldozer microarchitecture processors.
+ This includes the following CPUs:
+ FX-41x0
+ FX-61x0
+ FX-6200
+ FX-81x0
+ Opteron 32xx
+ Opteron 42xx
+ Opteron 62xx
+
+config MPILEDRIVER
+ bool "AMD Piledriver (FX, APU, and Opteron)"
+ depends on X86_64
+ ---help---
+ Select this for AMD Piledriver microarchitecture processors.
+ This includes the Following CPUs:
+ 'Trinity' APUs
+ 'Richland' APUs
+ FX-43xx
+ FX-63xx
+ FX-83xx
+ FX-9370
+ FX-9590
+ Opteron 33xx
+ Opteron 43xx
+ Opteron 63xx
+
+config MSTEAMROLLER
+ bool "AMD Steamroller (Kaveri and Berlin APU's)"
+ depends on X86_64
+ ---help---
+ Select this for AMD's 'Kaveri' and 'Berlin' APU's. These are the
+ next generation of Bulldozer derived processors.
+
config GENERIC_CPU
bool "Generic-x86-64"
depends on X86_64
@@ -300,7 +400,7 @@ config X86_INTERNODE_CACHE_SHIFT
config X86_L1_CACHE_SHIFT
int
default "7" if MPENTIUM4 || MPSC
- default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+ default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || MNEHALEM || MSANDYBRIDGE || MIVYBRIDGE || MAMDFAM10 || MBOBCAT || MJAGUAR || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || X86_GENERIC || GENERIC_CPU
default "4" if MELAN || M486 || MGEODEGX1
default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
@@ -331,7 +431,7 @@ config X86_ALIGNMENT_16
config X86_INTEL_USERCOPY
def_bool y
- depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
+ depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNEHALEM || MSANDYBRIDGE || MIVYBRIDGE || MAMDFAM10 || MBOBCAT || MJAGUAR || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
config X86_USE_PPRO_CHECKSUM
def_bool y
diff -urNp linux/arch/x86/Makefile linux-patched/arch/x86/Makefile
--- linux/arch/x86/Makefile 2013-10-16 16:16:06.738725130 -0400
+++ linux-patched/arch/x86/Makefile 2013-10-16 17:28:28.200605479 -0400
@@ -68,6 +68,24 @@ else
$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MNEHALEM) += $(call cc-option,-marche=corei7) \
+ $(call cc-option,-mtune=corei7,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_SANDYBRIDGE) += $(call cc-option,-march=corei7-avx) \
+ $(call cc-option,-mtune=corei7-avx,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MIVYBRIDGE) += $(call cc-option,-march=core-avx-i) \
+ $(call cc-option,-mtune=core-avx-i,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MAMDFAM10) += $(call cc-option,-march=amdfam10) \
+ $(call cc-option,-mtune=amdfam10,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1) \
+ $(call cc-option,-mtune=btver1,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-match=btver2) \
+ $(call cc-option,-mtune=btver2,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MBULLCOZER) += $(call cc-option,-march=bdver1) \
+ $(call cc-option,-mtune=bdver1,$(call cc-option,-mtune=generic)) \
+ cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2) \
+ $(call cc-option,-mtune=bdver2,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MSTEAMROLLER) += $(call cc-option,-march=bdver3) \
+ $(call cc-option,-mtune=bdver3,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
KBUILD_CFLAGS += $(cflags-y)
On Sun, Oct 20, 2013 at 2:37 AM, Austin S Hemmelgarn
<[email protected]> wrote:
> From: Austin S. Hemmelgarn <[email protected]>
>
> This patch adds options to specifically optimize for a number of newer 64-bit microarchitectures; specifically, Intel's Nehalem, Westmere, Ivy Bridge, and Sandy Bridge, and AMD's Family 10h, Bobcat, Jaguar, Bulldozer, Piledriver, and Steamroller. This serves primarily as an attempt to render this particular sub-menu up-to-date with respect to the options offered by current versions of GCC.
>
> Signed-Off-By: Austin S. Hemmelgarn <[email protected]>
> ---
> Based on some testing of MIVYBRIDGE, MAMDFAM10, and MPILEDRIVER (the only three that I personally have the hardware to test on), most of these should preform better than GENERIC_CPU on the right hardware, and none of them will preform any worse than GENERIC_CPU on the correct hardware. Furthermore, testing of MPILEDRIVER seems to indicate an improvement to energy efficiency over GENERIC_CPU as it causes the on cpu power sensor to consistently read an average of 1.5 watts lower under idle load than when using GENERIC_CPU (this corresponds to about 5% decrease in power consumption on a idle-tickless system, and about 2% on a non-dynticks system.). In addition, most of the people who would be likely to use this are almost certainly going to use it regardless because they either don't care how much of an improvement it provides, or use Linux on such a large scale that any performance improvement is significant (i.e. if you have need for 1000 servers to meet load requirements, then a 1%
> performance boost means that you need 10 fewer servers to meet the same load).
While discussion your initial patch Boris had reasonable doubts. Have
they all been resolved?
> diff -urNp linux/arch/x86/Kconfig.cpu linux-patched/arch/x86/Kconfig.cpu
> --- linux/arch/x86/Kconfig.cpu 2013-10-16 16:16:06.722058808 -0400
> +++ linux-patched/arch/x86/Kconfig.cpu 2013-10-16 16:29:38.532078946 -0400
> @@ -158,6 +158,7 @@ config MK8
> bool "Opteron/Athlon64/Hammer/K8"
> ---help---
> Select this for an AMD Opteron or Athlon64 Hammer-family processor.
> + These processors show a CPU family of 8 in /proc/cpuinfo.
> Enables use of some extended instructions, and passes appropriate
> optimization flags to GCC.
> @@ -269,6 +270,105 @@ config MATOM
> accordingly optimized code. Use a recent GCC with specific Atom
> support in order to fully benefit from selecting this option.
> +config MNEHALEM
> + bool "Intel Nehalem/Westmere"
> + depends on X86_64
> + ---help---
> + Select this for first and second generation Core i3, i5, and i7
> + processors and other Nehalem or Westmere based processors
> + This also includes some Xeon server processors.
> +
> +config MSANDYBRIDGE
> + bool "Intel Sandy Bridge"
> + depends on X86_64
> + ---help---
> + Select this for third generation Core i3, i5, and i7
> + processors and other Sandy Bridge based processors
> + In addition, this includes some Xeon processors, and many recent
> + mobile processors branded as Pentium or Celeron.
> +
> +config MIVYBRIDGE
> + bool "Intel Ivy Bridge"
> + depends on X86_64
> + ---help---
> + Select this for fourth generation Core i3, i5, and i7 processors
> + and other Ivy Bridge based processors. This also includes some
> + Xeon, Pentium, and Celeron processors.
> +
> +config MAMDFAM10
> + bool "AMD Family 10h (Athlon II, Phenom II, and Opteron)"
> + depends on X86_64
> + ---help---
> + Select this for AMD Family 10h processors.
> + This includes Athlon II, Phenom II, early third-generation
> + Opterons, and a number of other Socket AM2, AM2+, AM3, and
> + Socket F processors. CPU's in this series show cpu family
> + 16 in /proc/cpuinfo.
> +
> +config MBOBCAT
> + bool "AMD Bobcat (C, E, G, and Z series APU's)"
> + depends on X86_64
> + ---help---
> + Select this for AMD C, E, G, and Z series APU's. These are
> + ultra low power CPU+GPU combos that are similar to the
> + Bulldozer CPU cores, but have a significantly different
> + instruction pipeline, and fewer instruction set extensions.
> +
> +config MJAGUAR
> + bool "AMD Jaguar (Newer E and A series APU's)"
> + depends on X86_64
> + ---help---
> + Select this for AMD Jaguar based CPU's. These are the successors
> + of the Bobcat microarchitecture. CPU's based on this microarchitecture
> + will show a CPU family of 22 in /proc/cpuinfo.
> +
> +config MBULLDOZER
> + bool "AMD Bulldozer (FX and Opteron)"
> + depends on X86_64
> + ---help---
> + Select this for AMD Bulldozer microarchitecture processors.
> + This includes the following CPUs:
> + FX-41x0
> + FX-61x0
> + FX-6200
> + FX-81x0
> + Opteron 32xx
> + Opteron 42xx
> + Opteron 62xx
> +
> +config MPILEDRIVER
> + bool "AMD Piledriver (FX, APU, and Opteron)"
> + depends on X86_64
> + ---help---
> + Select this for AMD Piledriver microarchitecture processors.
> + This includes the Following CPUs:
> + 'Trinity' APUs
> + 'Richland' APUs
> + FX-43xx
> + FX-63xx
> + FX-83xx
> + FX-9370
> + FX-9590
> + Opteron 33xx
> + Opteron 43xx
> + Opteron 63xx
> +
> +config MSTEAMROLLER
> + bool "AMD Steamroller (Kaveri and Berlin APU's)"
> + depends on X86_64
> + ---help---
> + Select this for AMD's 'Kaveri' and 'Berlin' APU's. These are the
> + next generation of Bulldozer derived processors.
> +
> config GENERIC_CPU
> bool "Generic-x86-64"
> depends on X86_64
> @@ -300,7 +400,7 @@ config X86_INTERNODE_CACHE_SHIFT
> config X86_L1_CACHE_SHIFT
> int
> default "7" if MPENTIUM4 || MPSC
> - default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
> + default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || MNEHALEM || MSANDYBRIDGE || MIVYBRIDGE || MAMDFAM10 || MBOBCAT || MJAGUAR || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || X86_GENERIC || GENERIC_CPU
> default "4" if MELAN || M486 || MGEODEGX1
> default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
> @@ -331,7 +431,7 @@ config X86_ALIGNMENT_16
> config X86_INTEL_USERCOPY
> def_bool y
> - depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
> + depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNEHALEM || MSANDYBRIDGE || MIVYBRIDGE || MAMDFAM10 || MBOBCAT || MJAGUAR || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
> config X86_USE_PPRO_CHECKSUM
> def_bool y
> diff -urNp linux/arch/x86/Makefile linux-patched/arch/x86/Makefile
> --- linux/arch/x86/Makefile 2013-10-16 16:16:06.738725130 -0400
> +++ linux-patched/arch/x86/Makefile 2013-10-16 17:28:28.200605479 -0400
> @@ -68,6 +68,24 @@ else
> $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
> cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
> $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MNEHALEM) += $(call cc-option,-marche=corei7) \
> + $(call cc-option,-mtune=corei7,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_SANDYBRIDGE) += $(call cc-option,-march=corei7-avx) \
> + $(call cc-option,-mtune=corei7-avx,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MIVYBRIDGE) += $(call cc-option,-march=core-avx-i) \
> + $(call cc-option,-mtune=core-avx-i,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MAMDFAM10) += $(call cc-option,-march=amdfam10) \
> + $(call cc-option,-mtune=amdfam10,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1) \
> + $(call cc-option,-mtune=btver1,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-match=btver2) \
> + $(call cc-option,-mtune=btver2,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MBULLCOZER) += $(call cc-option,-march=bdver1) \
> + $(call cc-option,-mtune=bdver1,$(call cc-option,-mtune=generic)) \
> + cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2) \
> + $(call cc-option,-mtune=bdver2,$(call cc-option,-mtune=generic))
> + cflags-$(CONFIG_MSTEAMROLLER) += $(call cc-option,-march=bdver3) \
> + $(call cc-option,-mtune=bdver3,$(call cc-option,-mtune=generic))
> cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
> KBUILD_CFLAGS += $(cflags-y)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Thanks,
//richard
On Sat, Oct 19, 2013 at 08:37:29PM -0400, Austin S Hemmelgarn wrote:
> This patch adds options to specifically optimize for a number of newer
> 64-bit microarchitectures; specifically, Intel's Nehalem, Westmere,
> Ivy Bridge, and Sandy Bridge, and AMD's Family 10h, Bobcat, Jaguar,
> Bulldozer, Piledriver, and Steamroller. This serves primarily as an
> attempt to render this particular sub-menu up-to-date with respect to
> the options offered by current versions of GCC.
I'm sorry but did I miss anything from the last time where we determined
that those don't bring any sensible speedup and don't mean whit on
distros?
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 10/20/2013 05:18 AM, Borislav Petkov wrote:
> On Sat, Oct 19, 2013 at 08:37:29PM -0400, Austin S Hemmelgarn wrote:
>> This patch adds options to specifically optimize for a number of newer
>> 64-bit microarchitectures; specifically, Intel's Nehalem, Westmere,
>> Ivy Bridge, and Sandy Bridge, and AMD's Family 10h, Bobcat, Jaguar,
>> Bulldozer, Piledriver, and Steamroller. This serves primarily as an
>> attempt to render this particular sub-menu up-to-date with respect to
>> the options offered by current versions of GCC.
>
> I'm sorry but did I miss anything from the last time where we determined
> that those don't bring any sensible speedup and don't mean whit on
> distros?
>
> Thanks.
>
I am not trying to say that this provides any improvement in speed (or
at least the AMD options, the one for Intel Ivy Bridge processors does
appear to do better in this respect). As I stated in the comments just
before the patch itself, "...testing of MPILEDRIVER seems to indicate an
improvement to energy efficiency over GENERIC_CPU as it causes the on
cpu power sensor to consistently read an average of 1.5 watts lower
under idle load than when using GENERIC_CPU (this corresponds to about
5% decrease in power consumption on a idle-tickless system, and about 2%
on a non-dynticks system.).". While this result is dependent on a large
number of factors (not the least of which being that I have my CPU
over-clocked to the point that the lowest C-state runs at 1.4GHz, which
really hurts energy efficiency) it is still a positive result, and most
of the equivalent options primarily improve energy efficiency.
Also, you have to understand that my target audience isn't the
mainstream distros, it's HPC users, data-centers, scientific computing,
and other areas where a 0.01% increase in efficiency is significant
because if you need a million servers to do a job, a 0.01% boost in
efficiency means that you need ten-thousand fewer systems to do the same
amount of work.
On Sun, Oct 20, 2013 at 07:58:06PM -0400, Austin S Hemmelgarn wrote:
> I am not trying to say that this provides any improvement in speed (or
> at least the AMD options, the one for Intel Ivy Bridge processors does
> appear to do better in this respect).
"Does appear" is not a very good answer - you need to give concrete
benchmark results which people can repeat on their own and confirm your
observations.
> As I stated in the comments just before the patch itself, "...testing
> of MPILEDRIVER seems to indicate an improvement to energy efficiency
> over GENERIC_CPU as it causes the on cpu power sensor to consistently
> read an average of 1.5 watts lower under idle load than when
> using GENERIC_CPU (this corresponds to about 5% decrease in power
> consumption on a idle-tickless system, and about 2% on a non-dynticks
> system.).". While this result is dependent on a large number of
> factors (not the least of which being that I have my CPU over-clocked
> to the point that the lowest C-state runs at 1.4GHz, which really
> hurts energy efficiency) it is still a positive result, and most of
> the equivalent options primarily improve energy efficiency.
Again, how exactly do I reproduce your observations on my boxes? I need
a step-by-step walk through please.
> Also, you have to understand that my target audience isn't the
> mainstream distros,...
Well, if your target audience is only a very small fraction of the linux
users, then we don't need this in the mainline kernel in the first
place, do we?
Thanks.
On 2013-10-21 06:54, Borislav Petkov wrote:
> On Sun, Oct 20, 2013 at 07:58:06PM -0400, Austin S Hemmelgarn wrote:
>> I am not trying to say that this provides any improvement in speed (or
>> at least the AMD options, the one for Intel Ivy Bridge processors does
>> appear to do better in this respect).
>
> "Does appear" is not a very good answer - you need to give concrete
> benchmark results which people can repeat on their own and confirm your
> observations.
>
Specifically, boot time was reduced by approximately half a second
(measured as time from starting init till having a usable graphical
login), and the system ran approximately 1 degree cooler under heavy
load (namely a full GCC+binutils bootstrap with one job per virtual CPU
core).
>> As I stated in the comments just before the patch itself, "...testing
>> of MPILEDRIVER seems to indicate an improvement to energy efficiency
>> over GENERIC_CPU as it causes the on cpu power sensor to consistently
>> read an average of 1.5 watts lower under idle load than when
>> using GENERIC_CPU (this corresponds to about 5% decrease in power
>> consumption on a idle-tickless system, and about 2% on a non-dynticks
>> system.).". While this result is dependent on a large number of
>> factors (not the least of which being that I have my CPU over-clocked
>> to the point that the lowest C-state runs at 1.4GHz, which really
>> hurts energy efficiency) it is still a positive result, and most of
>> the equivalent options primarily improve energy efficiency.
>
> Again, how exactly do I reproduce your observations on my boxes? I need
> a step-by-step walk through please.
>
Using lm_sensors with CONFIG_FAM15H_POWER enabled, simply run the
command `sensors` a couple of times on an idle system as root comparing
the values between having CONFIG_MPILEDRIVER=y and CONFIG_GENERIC_CPU=y.
For this specific case I recorded the wattage values at one minute
intervals for 2 hours on both and took the arithmetic mean of the 120
data points in each case.
>> Also, you have to understand that my target audience isn't the
>> mainstream distros,...
>
> Well, if your target audience is only a very small fraction of the linux
> users, then we don't need this in the mainline kernel in the first
> place, do we?
>
Just because the target audience isn't the mainstream distros doesn't
mean that it's a very small fraction of the Linux users, based on just
the number of systems running Linux, Google is probably the biggest
user, followed closely by supercomputers. I know that most
supercomputer operators do run custom builds of the kernel because they
need maximal efficiency, and Google is also known to run custom kernel
builds to try and increase performance. These are my primary intended
audience followed closely by users of Gentoo and similar distros that
build the kernel locally in preference to distributing pre-built kernel
images.
> Thanks.
>
Something else to keep in mind, the effects of -mtune=generic change
over time, as these processors become less common, the optimizations
done by -mtune=generic will shift away from them. The reason that many
of the equivalent options in the kernel currently provide as much
benefit as they do is that gcc no longer tries to create machine code
that is tuned for them unless you tell it to.
On Mon, Oct 21, 2013 at 07:44:53AM -0400, Austin S Hemmelgarn wrote:
> Specifically, boot time was reduced by approximately half a second
> (measured as time from starting init till having a usable graphical
> login),
How did you measure that? Kernel printk timestamps? I keep repeating
this and you simply don't state your benchmarking methods clearly
enough, for some reason: I need a detailed explanation about how exactly
you're doing your measurements so that I or anyone else for that matter,
can repeat them.
> and the system ran approximately 1 degree cooler under heavy
> load (namely a full GCC+binutils bootstrap with one job per virtual
> CPU core).
Ditto.
> Using lm_sensors with CONFIG_FAM15H_POWER enabled, simply run the
> command `sensors` a couple of times on an idle system as root comparing
> the values between having CONFIG_MPILEDRIVER=y and CONFIG_GENERIC_CPU=y.
> For this specific case I recorded the wattage values at one minute
That's too coarse-grained since the sensors output will give your
momentary power consumption. But I see what you do here and I'll run a
modified, more finer-granulary test of yours on my machine too to check.
> Something else to keep in mind, the effects of -mtune=generic change
> over time, as these processors become less common, the optimizations
Which processors?
> done by -mtune=generic will shift away from them. The reason that many
> of the equivalent options in the kernel currently provide as much
> benefit as they do is that gcc no longer tries to create machine code
> that is tuned for them unless you tell it to.
This doesn't really make much sense because the single biggest build
target is distros with a single system image which is supposed to run as
optimally as possible on any x86 hardware.
So specialized builds are only for Gentoo users and others who build
customized kernels. And those who can do that, can also apply this patch
to their own tree.
On 2013-10-21 09:59, Borislav Petkov wrote:
>
> This doesn't really make much sense because the single biggest
> build target is distros with a single system image which is
> supposed to run as optimally as possible on any x86 hardware.
>
> So specialized builds are only for Gentoo users and others who
> build customized kernels. And those who can do that, can also apply
> this patch to their own tree.
>
By this same logic though, none of the x86 processor tuning options
currently in the kernel should have been put in in the first place
(and, taking it to an extreme, should in fact be removed).
On Mon, Oct 21, 2013 at 10:20:55AM -0400, Austin S Hemmelgarn wrote:
> By this same logic though, none of the x86 processor tuning options
> currently in the kernel should have been put in in the first place
> (and, taking it to an extreme, should in fact be removed).
Some of them make sense like MATOM as Atom supports a different subset
of x86 insns, MOVBE being one example. And there's no need to touch them
as unnecessary code churn is something we don't do.