2002-08-04 14:23:58

by Luca Barbieri

[permalink] [raw]
Subject: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

This is a revised version of a patch I posted a few months ago and
implements all the suggestions that were posted in reply and several
other things.


This patch does the following:
- Modifies CPU choices texts to include processor features (e.g.
"586+MMX+TSC...Pentium-MMX")
- Splits Pentium2 and PentiumPro
- Splits K6 and K6-2/K6-3
- Splits Athlon and Athlon-SSE
- Splits 586 and 6x86MX
- Makes CONFIG_X86_PPRO_FENCE user-settable and disable if not SMP or
CPU incompatible with Pentium Pro selected
- Supports F0 0F bug workaround for all kernels that don't require 686
or 3DNow!
- Defines CONFIG_X86_{686,MMX{EXT,},SSE{2,},3DNOW{EXT,}}: all except
MMXEXT are currently unused (this is the reason for splitting
Athlon-SSE, 6x86MX and Pentium2)
- Updates and adds entries in Config.help
- Compiles K6s with -march=i586 -mcpu=i686 if -march=k6{-2,} is not
available (matches insn costs more closely)
- Replaces GCC tests in Makefile with calls to new cc_test function to
reduce redundancy
- Uses -march={pentium{-mmx,2,3,4},k6{-2,},athlon{-xp,}} when
appropriate and supported and adds -mno-{mmx,sse{2,}} to prevent
careless use of builtin and automatic use by compiler
- Adds panics for CONFIG_X86_{PPRO_FENCE,USE_SSE_PREFETCH,686,USE_3DNOW}
in bugs.h
- Enables prefetchnta when CONFIG_X86_USE_SSE_PREFETCH is defined, which
happens for Pentium3, Pentium4 and Athlon
- Causes Athlon to use prefetchnta rather than prefetch
- Adds Makefile entry for Elan (as 486)


I need help on these issues:
- I have read the specification update and Intel explicitly says that UP
systems are not affected, but the use of CONFIG_PPRO_FENCE in io.h might
indicate the contrary. Does anyone know if this is the case or not?
(patch assumes Intel is right)

- Is prefetchnta really better than prefetch on Athlons? (consider both
prefetch() and memcpy()) (patch assumes it is for prefetch() but doesn't
modify memcpy())

- The optimized memcpy routines seem seriously deficient: IMHO they
should be available in MMX, MMX+prefetch, MMX+prefetch+nt, SSE and SSE2
flavors rather than only MMX+prefetch and a partial version of
MMX+prefetch+nt (uses movntq but not prefetchnta). Any comments? Is
anyone doing this/did anyone already do this?

- Should I port this to 2.4?


diffstat:
arch/i386/config.in | 106 +++++++++++++++++++++++++++++++++---------
arch/i386/Config.help | 93 +++++++++++++++++++++++++++---------
arch/i386/Makefile | 43 ++++++++++++++---
include/asm-i386/bugs.h | 23 +++++++++
include/asm-i386/processor.h | 9 ++-
arch/i386/kernel/cpu/common.c | 4 +
arch/i386/lib/mmx.c | 2
7 files changed, 224 insertions(+), 56 deletions(-)


diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/Config.help b/arch/i386/Config.help
--- a/arch/i386/Config.help 2002-07-20 21:11:07.000000000 +0200
+++ b/arch/i386/Config.help 2002-08-04 07:37:29.000000000 +0200
@@ -403,14 +403,18 @@
- "486" for the AMD/Cyrix/IBM/Intel 486DX/DX2/DX4 or
SL/SLC/SLC2/SLC3/SX/SX2 and UMC U5D or U5S.
- "586" for generic Pentium CPUs lacking the TSC
- (time stamp counter) register.
- - "Pentium-Classic" for the Intel Pentium.
- - "Pentium-MMX" for the Intel Pentium MMX.
- - "Pentium-Pro" for the Intel Pentium Pro/Celeron/Pentium II.
+ (time stamp counter) register and MMX instructions.
+ - 6x86MX for the 6x86MX and generic Pentium CPUs lacking the TSC
+ but with MMX instructions.
+ - "Pentium-Classic" for the Intel Pentium (586 with TSC).
+ - "Pentium-MMX" for the Intel Pentium MMX (586 with TSC+MMX).
+ - "Pentium-Pro" for the Intel Pentium Pro/Celeron.
+ - "Pentium-II" for the Intel Pentium II.
- "Pentium-III" for the Intel Pentium III
and Celerons based on the Coppermine core.
- "Pentium-4" for the Intel Pentium 4.
- - "K6" for the AMD K6, K6-II and K6-III (aka K6-3D).
+ - "K6" for the AMD K6
+ - "K6-2" for the K6-II and K6-III (aka K6-3D).
- "Athlon" for the AMD K7 family (Athlon/Duron/Thunderbird).
- "Crusoe" for the Transmeta Crusoe series.
- "Winchip-C6" for original IDT Winchip.
@@ -418,7 +422,9 @@
- "Winchip-2A" for IDT Winchips with 3dNow! capabilities.
- "CyrixIII" for VIA Cyrix III or VIA C3.

- If you don't know what to do, choose "386".
+ If you don't know what to do, type "cat /proc/cpuinfo" or open the
+ case and look at the processor. If you are building a generic
+ kernel or don't have access to the target machine, choose "386".

CONFIG_M486
Select this for a x486 processor, ether Intel or one of the
@@ -431,18 +437,38 @@
Intel 5x86 or 6x86, or the Intel 6x86MX. This choice does not
assume the RDTSC (Read Time Stamp Counter) instruction.

+CONFIG_M586MX
+ Select this for the NexGen 6x86MX and other Pentium-class processors
+ with the MMX graphics/multimedia extended instructions but without
+ the RDTSC (Read Time Stamp Counter) instruction.
+
CONFIG_M586TSC
Select this for a Pentium Classic processor with the RDTSC (Read
Time Stamp Counter) instruction for benchmarking.

CONFIG_M586MMX
Select this for a Pentium with the MMX graphics/multimedia
- extended instructions.
+ extended instructions and with the RDTSC (Read Time Stamp Counter)
+ instruction for benchmarking.

CONFIG_M686
- Select this for a Pro/Celeron/Pentium II. This enables the use of
- Pentium Pro extended instructions, and disables the init-time guard
- against the f00f bug found in earlier Pentiums.
+ Select this for a non-Intel/AMD 686 processor without MMX support
+ support. This enables the use of Pentium Pro extended instructions
+ and disables the init-time guard against the f00f bug found in
+ earlier Pentiums.
+
+CONFIG_MPENTIUMPRO
+ Select this for a Pentium Pro or Celeron. This enables the use of
+ Pentium Pro extended instructions, disables the init-time guard
+ against the f00f bug found in earlier Pentiums and enables the
+ workaround for the Pentium Pro store ordering bug.
+
+CONFIG_MPENTIUMII
+ Select this for a Pentium II or another 686 processor with MMX
+ support. This enables the use of Pentium Pro extended instructions,
+ disables the init-time guard against the f00f bug found in earlier
+ Pentiums and disables the workaround for the Pentium Pro store
+ ordering bug.

CONFIG_MPENTIUMIII
Select this for Intel chips based on the Pentium-III and
@@ -450,9 +476,10 @@
instructions, in addition to the Pentium II extensions.

CONFIG_MPENTIUM4
- Select this for Intel Pentium 4 chips. Presently these are
- treated almost like Pentium IIIs, but with a different cache
- shift.
+ Select this for Intel Pentium 4 chips. They have a different cache
+ shift and if you are compiling with GCC 3.1 or later instructions
+ will be selected and ordered specifically for maximum performance on
+ the Intel Pentium 4.

CONFIG_MCRUSOE
Select this for Transmeta Crusoe processor. Treats the processor
@@ -460,14 +487,18 @@
Pentium Pro with no alignment requirements).

CONFIG_MK6
- Select this for an AMD K6-family processor. Enables use of
- some extended instructions, and passes appropriate optimization
- flags to GCC.
+ Select this for an AMD K6 processor. Enables use of some extended
+ instructions, and passes appropriate optimization flags to GCC.
+
+CONFIG_MK6
+ Select this for an AMD K6-2/K6-3D or K6-3. Enables use of some
+ extended instructions, passes appropriate optimization flags to GCC
+ and enables 3DNow!

CONFIG_MK7
Select this for an AMD Athlon K7-family processor. Enables use of
- some extended instructions, and passes appropriate optimization
- flags to GCC.
+ some extended instructions, passes appropriate optimization flags to
+ GCC and enables 3DNow!

CONFIG_MCYRIXIII
Select this for a Cyrix III or C3 chip. Presently Linux and GCC
@@ -477,20 +508,34 @@

CONFIG_MWINCHIPC6
Select this for a IDT Winchip C6 chip. Linux and GCC
- treat this chip as a 586TSC with some extended instructions
- and alignment requirements.
+ treat this chip as a 586 with some extended instructions
+ and alignment requirements. Development kernels also enable
+ out of order memory stores for this CPU, which can increase
+ performance of some operations.

CONFIG_MWINCHIP2
Select this for a IDT Winchip-2. Linux and GCC
treat this chip as a 586TSC with some extended instructions
- and alignment requirements.
+ and alignment requirements. Development kernels also enable
+ out of order memory stores for this CPU, which can increase
+ performance of some operations.

CONFIG_MWINCHIP3D
Select this for a IDT Winchip-2A or 3. Linux and GCC
treat this chip as a 586TSC with some extended instructions
- and alignment reqirements. Development kernels also enable
- out of order memory stores for this CPU, which can increase
- performance of some operations.
+ and alignment reqirements and with 3DNow! support. Development
+ kernels also enable out of order memory stores for this CPU, which
+ can increase performance of some operations.
+
+CONFIG_X86_PPRO_FENCE
+ Allows the kernel to run on Pentium Pro SMP systems by supporting a
+ workaround for the store ordering bug present on them.
+ This slows down all processors except WinChips since they already do
+ out-of-order stores.
+
+ If you don't know what to do, type "cat /proc/cpuinfo" or open the
+ case and look at the processor. If you are building a generic
+ kernel or don't have access to the target machine, choose Y.

CONFIG_VGA_CONSOLE
Saying Y here will allow you to use Linux in text mode through a
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/config.in b/arch/i386/config.in
--- a/arch/i386/config.in 2002-07-29 04:22:22.000000000 +0200
+++ b/arch/i386/config.in 2002-08-04 14:59:39.000000000 +0200
@@ -17,20 +17,24 @@
choice 'Processor family' \
"386 CONFIG_M386 \
486 CONFIG_M486 \
- 586/K5/5x86/6x86/6x86MX CONFIG_M586 \
- Pentium-Classic CONFIG_M586TSC \
- Pentium-MMX CONFIG_M586MMX \
- Pentium-Pro/Celeron/Pentium-II CONFIG_M686 \
- Pentium-III/Celeron(Coppermine) CONFIG_MPENTIUMIII \
+ 586/K5/5x86/6x86 CONFIG_M586 \
+ 586+MMX...6x86MX CONFIG_M586MX \
+ 586+TSC...Pentium-Classic CONFIG_M586TSC \
+ 586+TSC+MMX...Pentium-MMX CONFIG_M586MMX \
+ 686...Pentium-Pro/Celeron CONFIG_M686 \
+ 686+MMX...Pentium-II CONFIG_MPENTIUMII \
+ 686+SSE...Pentium-III/Celeron(Coppermine) CONFIG_MPENTIUMIII \
Pentium-4 CONFIG_MPENTIUM4 \
- K6/K6-II/K6-III CONFIG_MK6 \
- Athlon/Duron/K7 CONFIG_MK7 \
+ K6 CONFIG_MK6 \
+ K6-II/K6-III CONFIG_MK6II \
+ K7...Athlon/Duron CONFIG_MK7 \
+ K7+SSE...Athlon-4/XP/MP CONFIG_MK7SSE \
Elan CONFIG_MELAN \
Crusoe CONFIG_MCRUSOE \
Winchip-C6 CONFIG_MWINCHIPC6 \
- Winchip-2 CONFIG_MWINCHIP2 \
- Winchip-2A/Winchip-3 CONFIG_MWINCHIP3D \
- CyrixIII/C3 CONFIG_MCYRIXIII" Pentium-Pro
+ Winchip+TSC...Winchip-2 CONFIG_MWINCHIP2 \
+ Winchip+TSC+3DNow...Winchip-2A/Winchip-3 CONFIG_MWINCHIP3D \
+ CyrixIII/C3 CONFIG_MCYRIXIII" Pentium-III
#
# Define implied options from the CPU selection here
#
@@ -41,8 +45,6 @@
define_int CONFIG_X86_L1_CACHE_SHIFT 4
define_bool CONFIG_RWSEM_GENERIC_SPINLOCK y
define_bool CONFIG_RWSEM_XCHGADD_ALGORITHM n
- define_bool CONFIG_X86_PPRO_FENCE y
- define_bool CONFIG_X86_F00F_BUG y
else
define_bool CONFIG_X86_WP_WORKS_OK y
define_bool CONFIG_X86_INVLPG y
@@ -57,23 +59,23 @@
define_int CONFIG_X86_L1_CACHE_SHIFT 4
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_PPRO_FENCE y
- define_bool CONFIG_X86_F00F_BUG y
fi
if [ "$CONFIG_M586" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
- define_bool CONFIG_X86_PPRO_FENCE y
- define_bool CONFIG_X86_F00F_BUG y
+fi
+if [ "$CONFIG_M586MX" = "y" ]; then
+ define_int CONFIG_X86_L1_CACHE_SHIFT 5
+ define_bool CONFIG_X86_USE_STRING_486 y
+ define_bool CONFIG_X86_ALIGNMENT_16 y
+ define_bool CONFIG_X86_MMX y
fi
if [ "$CONFIG_M586TSC" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_TSC y
- define_bool CONFIG_X86_PPRO_FENCE y
- define_bool CONFIG_X86_F00F_BUG y
fi
if [ "$CONFIG_M586MMX" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -81,33 +83,60 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_GOOD_APIC y
- define_bool CONFIG_X86_PPRO_FENCE y
- define_bool CONFIG_X86_F00F_BUG y
+ define_bool CONFIG_X86_MMX y
fi
if [ "$CONFIG_M686" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
- define_bool CONFIG_X86_PPRO_FENCE y
+ define_bool CONFIG_X86_686 y
+fi
+if [ "$CONFIG_MPENTIUMII" = "y" ]; then
+ define_int CONFIG_X86_L1_CACHE_SHIFT 5
+ define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_GOOD_APIC y
+ define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_686 y
+ define_bool CONFIG_X86_MMX y
fi
if [ "$CONFIG_MPENTIUMIII" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_686 y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_MMXEXT y
+ define_bool CONFIG_X86_SSE y
+ define_bool CONFIG_X86_USE_SSE_PREFETCH y
fi
if [ "$CONFIG_MPENTIUM4" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 7
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_686 y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_MMXEXT y
+ define_bool CONFIG_X86_SSE y
+ define_bool CONFIG_X86_SSE2 y
+ define_bool CONFIG_X86_USE_SSE_PREFETCH y
fi
if [ "$CONFIG_MK6" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_MMX y
+fi
+if [ "$CONFIG_MK6II" = "y" ]; then
+ define_int CONFIG_X86_L1_CACHE_SHIFT 5
+ define_bool CONFIG_X86_ALIGNMENT_16 y
+ define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_3DNOW y
fi
if [ "$CONFIG_MK7" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 6
@@ -115,6 +144,26 @@
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_USE_3DNOW y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_686 y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_MMXEXT y
+ define_bool CONFIG_X86_3DNOW y
+ define_bool CONFIG_X86_3DNOWEXT y
+ define_bool CONFIG_X86_USE_SSE_PREFETCH y
+fi
+if [ "$CONFIG_MK7SSE" = "y" ]; then
+ define_int CONFIG_X86_L1_CACHE_SHIFT 6
+ define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_GOOD_APIC y
+ define_bool CONFIG_X86_USE_3DNOW y
+ define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_686 y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_MMXEXT y
+ define_bool CONFIG_X86_3DNOW y
+ define_bool CONFIG_X86_3DNOWEXT y
+ define_bool CONFIG_X86_SSE y
+ define_bool CONFIG_X86_USE_SSE_PREFETCH y
fi
if [ "$CONFIG_MELAN" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 4
@@ -127,16 +176,20 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_USE_3DNOW y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_3DNOW y
fi
if [ "$CONFIG_MCRUSOE" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_TSC y
+ define_bool CONFIG_X86_686 y
fi
if [ "$CONFIG_MWINCHIPC6" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
+ define_bool CONFIG_X86_MMX y
fi
if [ "$CONFIG_MWINCHIP2" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -144,6 +197,7 @@
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
+ define_bool CONFIG_X86_MMX y
fi
if [ "$CONFIG_MWINCHIP3D" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -151,6 +205,16 @@
define_bool CONFIG_X86_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
+ define_bool CONFIG_X86_MMX y
+ define_bool CONFIG_X86_3DNOW y
+fi
+
+# Enable workarounds if kernel would not panic on an affected processor
+if [ "$CONFIG_X86_686" != "y" -a "$CONFIG_X86_USE_3DNOW" != "y" ]; then
+ define_bool CONFIG_X86_F00F_BUG y
+fi
+if [ "$CONFIG_X86_USE_SSE_PREFETCH" != "y" -a "$CONFIG_X86_USE_3DNOW" != "y" -a "$CONFIG_X86_OOSTORE" != "y" ]; then
+ dep_bool 'Support Pentium Pro SMP and slow down all processors' CONFIG_X86_PPRO_FENCE $CONFIG_SMP
fi

bool 'Symmetric multi-processing support' CONFIG_SMP
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c 2002-07-29 04:18:06.000000000 +0200
+++ b/arch/i386/kernel/cpu/common.c 2002-08-04 09:58:15.000000000 +0200
@@ -299,6 +299,10 @@
clear_bit(X86_FEATURE_TSC, c->x86_capability);
#endif

+ /* Intel SSE-capable processors have all AMD MMX extensions */
+ if(test_bit(X86_FEATURE_XMM, c->x86_capability))
+ set_bit(X86_FEATURE_MMXEXT, c->x86_capability);
+
/* FXSR disabled? */
if (disable_x86_fxsr) {
clear_bit(X86_FEATURE_FXSR, c->x86_capability);
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/lib/mmx.c b/arch/i386/lib/mmx.c
--- a/arch/i386/lib/mmx.c 2002-07-20 21:11:20.000000000 +0200
+++ b/arch/i386/lib/mmx.c 2002-08-04 09:58:49.000000000 +0200
@@ -121,7 +121,7 @@
return p;
}

-#ifdef CONFIG_MK7
+#ifdef CONFIG_X86_MMXEXT

/*
* The K7 has streaming cache bypass load/store. The Cyrix III, K6 and
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/Makefile b/arch/i386/Makefile
--- a/arch/i386/Makefile 2002-07-20 21:11:13.000000000 +0200
+++ b/arch/i386/Makefile 2002-08-04 15:51:08.000000000 +0200
@@ -20,10 +20,20 @@
OBJCOPYFLAGS := -O binary -R .note -R .comment -S
LDFLAGS_vmlinux := -T arch/i386/vmlinux.lds -e stext

+__cc_test = if $(CC) $(1) -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "$(1)"; $(2)fi
+cc_test = $(call __cc_test,$(1),)
+cc_test_ = $(call __cc_test,$(1),else $(2); )
+cc_test_march = $(call cc_test_,-march=$(1),echo "-march=$(2)")
+cc_test_march3 = $(call cc_test_,-march=$(1),$(call cc_test_march,$(2),$(3)))
+
CFLAGS += -pipe

# prevent gcc from keeping the stack 16 byte aligned
-CFLAGS += $(shell if $(CC) -mpreferred-stack-boundary=2 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mpreferred-stack-boundary=2"; fi)
+CFLAGS += $(shell $(call cc_test,-mpreferred-stack-boundary=2))
+
+# MMX, SSE and SSE2 are disabled to prevent use by compiler (that could clobber user registers and cause unexpected oopses)
+# and because builtins must not be used until we require GCC 3.1
+CFLAGS += $(shell $(call cc_test,-mno-mmx -mno-sse -mno-sse2))

ifdef CONFIG_M386
CFLAGS += -march=i386
@@ -33,6 +43,10 @@
CFLAGS += -march=i486
endif

+ifdef CONFIG_MELAN
+CFLAGS += -march=i486
+endif
+
ifdef CONFIG_M586
CFLAGS += -march=i586
endif
@@ -41,32 +55,48 @@
CFLAGS += -march=i586
endif

+ifdef CONFIG_M586MX
+CFLAGS += $(shell $(call cc_test_march,pentium-mmx,i586))
+endif
+
ifdef CONFIG_M586MMX
-CFLAGS += -march=i586
+CFLAGS += $(shell $(call cc_test_march,pentium-mmx,i586))
endif

ifdef CONFIG_M686
CFLAGS += -march=i686
endif

+ifdef CONFIG_MPENTIUMII
+CFLAGS += $(shell $(call cc_test_march,pentium2,i686))
+endif
+
ifdef CONFIG_MPENTIUMIII
-CFLAGS += -march=i686
+CFLAGS += $(shell $(call cc_test_march,pentium3,i686))
endif

ifdef CONFIG_MPENTIUM4
-CFLAGS += -march=i686
+CFLAGS += $(shell $(call cc_test_march,pentium4,i686))
endif

ifdef CONFIG_MK6
-CFLAGS += $(shell if $(CC) -march=k6 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-march=k6"; else echo "-march=i586"; fi)
+CFLAGS += $(shell $(call cc_test_march,k6,i586 -mcpu=i686))
+endif
+
+ifdef CONFIG_MK6II
+CFLAGS += $(shell $(call cc_test_march3,k6-2,k6,i586 -mcpu=i686))
endif

ifdef CONFIG_MK7
-CFLAGS += $(shell if $(CC) -march=athlon -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-march=athlon"; else echo "-march=i686 -malign-functions=4"; fi)
+CFLAGS += $(shell $(call cc_test_march,athlon,i686))
+endif
+
+ifdef CONFIG_MK7SSE
+CFLAGS += $(shell $(call cc_test_march3,athlon-xp,athlon,i686))
endif

ifdef CONFIG_MCRUSOE
-CFLAGS += -march=i686 -malign-functions=0 -malign-jumps=0 -malign-loops=0
+CFLAGS += $(shell $(call cc_test_march,crusoe,i686 -malign-functions=0 -malign-jumps=0 -malign-loops=0))
endif

ifdef CONFIG_MWINCHIPC6
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/include/asm-i386/bugs.h b/include/asm-i386/bugs.h
--- a/include/asm-i386/bugs.h 2002-07-20 21:11:09.000000000 +0200
+++ b/include/asm-i386/bugs.h 2002-08-04 15:27:40.000000000 +0200
@@ -171,6 +171,11 @@
panic("Kernel requires i486+ for 'invlpg' and other features");
#endif

+#if defined(CONFIG_X86_686)
+ if(boot_cpu_data.x86 < 6)
+ panic("Kernel requires 686+ for cmov");
+#endif
+
/*
* If we configured ourselves for a TSC, we'd better have one!
*/
@@ -193,6 +198,24 @@
&& (boot_cpu_data.x86_mask < 6 || boot_cpu_data.x86_mask == 11))
panic("Kernel compiled for PMMX+, assumes a local APIC without the read-before-write bug!");
#endif
+
+#if !defined(CONFIG_X86_PPRO_FENCE) && defined(CONFIG_SMP)
+ if(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL
+ && boot_cpu_data.x86 == 6
+ && boot_cpu_data.x86_model <= 1
+ )
+ panic("Kernel compiled without Pentium Pro SMP support!");
+#endif
+
+#if defined(CONFIG_X86_USE_3DNOW)
+ if(!boot_cpu_has(X86_FEATURE_3DNOW))
+ panic("Kernel requires 3DNow support (K6-2/3, Athlon)");
+#endif
+
+#if defined(CONFIG_X86_USE_SSE_PREFETCH)
+ if(!boot_cpu_has(X86_FEATURE_MMXEXT))
+ panic("Kernel requires extended MMX support (Pentium3/4, Athlon)");
+#endif
}

static void __init check_bugs(void)
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/include/asm-i386/processor.h b/include/asm-i386/processor.h
--- a/include/asm-i386/processor.h 2002-08-02 01:19:14.000000000 +0200
+++ b/include/asm-i386/processor.h 2002-08-03 13:41:03.000000000 +0200
@@ -464,7 +464,7 @@
#define cpu_relax() rep_nop()

/* Prefetch instructions for Pentium III and AMD Athlon */
-#ifdef CONFIG_MPENTIUMIII
+#ifdef CONFIG_X86_USE_SSE_PREFETCH

#define ARCH_HAS_PREFETCH
extern inline void prefetch(const void *x)
@@ -475,13 +475,16 @@
#elif CONFIG_X86_USE_3DNOW

#define ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCHW
-#define ARCH_HAS_SPINLOCK_PREFETCH

extern inline void prefetch(const void *x)
{
__asm__ __volatile__ ("prefetch (%0)" : : "r"(x));
}
+#endif
+
+#ifdef CONFIG_X86_USE_3DNOW
+#define ARCH_HAS_PREFETCHW
+#define ARCH_HAS_SPINLOCK_PREFETCH

extern inline void prefetchw(const void *x)
{


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-04 14:33:13

by Thunder from the hill

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

Hi,

On 4 Aug 2002, Luca Barbieri wrote:
> CONFIG_MK6
> +CONFIG_MK6

This one's fishy. Replace with CONFIG_MK6II.

Thunder
--
.-../../-./..-/-..- .-./..-/.-.././.../.-.-.-

2002-08-04 14:29:12

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Sun, 2002-08-04 at 15:27, Luca Barbieri wrote:
> - Makes CONFIG_X86_PPRO_FENCE user-settable and disable if not SMP or
> CPU incompatible with Pentium Pro selected

PPro fence is required for uniprocessor pentium pro.

Alan

2002-08-04 14:42:01

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

> PPro fence is required for uniprocessor pentium pro.

This should fix this and the CONFIG_MK6(II) issue.


diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/Config.help b/arch/i386/Config.help
--- a/arch/i386/Config.help 2002-08-04 16:39:50.000000000 +0200
+++ b/arch/i386/Config.help 2002-08-04 16:38:53.000000000 +0200
@@ -490,7 +490,7 @@
Select this for an AMD K6 processor. Enables use of some extended
instructions, and passes appropriate optimization flags to GCC.

-CONFIG_MK6
+CONFIG_MK6II
Select this for an AMD K6-2/K6-3D or K6-3. Enables use of some
extended instructions, passes appropriate optimization flags to GCC
and enables 3DNow!
@@ -528,7 +528,7 @@
can increase performance of some operations.

CONFIG_X86_PPRO_FENCE
- Allows the kernel to run on Pentium Pro SMP systems by supporting a
+ Allows the kernel to run on Pentium Pro systems by supporting a
workaround for the store ordering bug present on them.
This slows down all processors except WinChips since they already do
out-of-order stores.
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/config.in b/arch/i386/config.in
--- a/arch/i386/config.in 2002-08-04 16:39:50.000000000 +0200
+++ b/arch/i386/config.in 2002-08-04 16:35:24.000000000 +0200
@@ -214,7 +214,7 @@
define_bool CONFIG_X86_F00F_BUG y
fi
if [ "$CONFIG_X86_USE_SSE_PREFETCH" != "y" -a "$CONFIG_X86_USE_3DNOW" != "y" -a "$CONFIG_X86_OOSTORE" != "y" ]; then
- dep_bool 'Support Pentium Pro SMP and slow down all processors' CONFIG_X86_PPRO_FENCE $CONFIG_SMP
+ dep_bool 'Support Pentium Pro and slow down all processors' CONFIG_X86_PPRO_FENCE
fi

bool 'Symmetric multi-processing support' CONFIG_SMP
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/include/asm-i386/bugs.h b/include/asm-i386/bugs.h
--- a/include/asm-i386/bugs.h 2002-08-04 16:39:51.000000000 +0200
+++ b/include/asm-i386/bugs.h 2002-08-04 16:40:12.000000000 +0200
@@ -199,12 +199,12 @@
panic("Kernel compiled for PMMX+, assumes a local APIC without the read-before-write bug!");
#endif

-#if !defined(CONFIG_X86_PPRO_FENCE) && defined(CONFIG_SMP)
+#if !defined(CONFIG_X86_PPRO_FENCE)
if(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL
&& boot_cpu_data.x86 == 6
&& boot_cpu_data.x86_model <= 1
)
- panic("Kernel compiled without Pentium Pro SMP support!");
+ panic("Kernel compiled without Pentium Pro support!");
#endif

#if defined(CONFIG_X86_USE_3DNOW)


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-04 15:29:26

by Sebastian Droege

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On 04 Aug 2002 16:27:16 +0200
Luca Barbieri <[email protected]> wrote:
[...]
> if [ "$CONFIG_MK7" = "y" ]; then
> define_int CONFIG_X86_L1_CACHE_SHIFT 6
> @@ -115,6 +144,26 @@
> define_bool CONFIG_X86_GOOD_APIC y
> define_bool CONFIG_X86_USE_3DNOW y
> define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
> + define_bool CONFIG_X86_686 y
> + define_bool CONFIG_X86_MMX y
> + define_bool CONFIG_X86_MMXEXT y
> + define_bool CONFIG_X86_3DNOW y
> + define_bool CONFIG_X86_3DNOWEXT y
> + define_bool CONFIG_X86_USE_SSE_PREFETCH y
> +fi
Hi,
is there really support for SSE prefetch in athlons _without_ SSE?!
I don't know but this seems wrong...

Bye


Attachments:
(No filename) (189.00 B)

2002-08-04 15:39:59

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

> is there really support for SSE prefetch in athlons _without_ SSE?!
> I don't know but this seems wrong...
Yes, according to
<http://www.amd.com/products/cpg/athlon/techdocs/pdf/22466.pdf>.
AMD added several intructions in Athlons including movntq, sfence and
prefetchnta/t0/t1/t2.
The last 4 instructions are what I call "SSE prefetch" (they could
called MMXEXT prefetch instead, but it's not much better).


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-04 18:57:24

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes


On 20020804 Luca Barbieri wrote:
> This is a revised version of a patch I posted a few months ago and
> implements all the suggestions that were posted in reply and several
> other things.
>
> - Defines CONFIG_X86_{686,MMX{EXT,},SSE{2,},3DNOW{EXT,}}: all except
> MMXEXT are currently unused (this is the reason for splitting
> Athlon-SSE, 6x86MX and Pentium2)

You could also add the optimized memory barriers from Zwane Mwaikambo.
Take a look at:
http://giga.cps.unizar.es/~magallon/linux/kernel/2.4.19-jam0/22-mem-barriers.bz2


--
J.A. Magallon \ Software is like sex:
junk.able.es \ It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.19-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-0.2mdk))

2002-08-04 20:19:54

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

Added, with the exception that sfence is only used if CONFIG_X86_OOSTORE
is not defined (currently never).

This patch, to be applied after the previous two ones, does:
- s/dep_bool/bool/ in config.in for CONFIG_X86_PPRO_FENCE
- Works around make xconfig brokenness in config.in
- Adds an option in config.in to select the processor to optimize for,
that determines the -mcpu flags
- Supports lfence, mfence and sfence


diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/Config.help b/arch/i386/Config.help
--- a/arch/i386/Config.help 2002-08-04 21:54:59.000000000 +0200
+++ b/arch/i386/Config.help 2002-08-04 21:52:41.000000000 +0200
@@ -527,6 +527,38 @@
kernels also enable out of order memory stores for this CPU, which
can increase performance of some operations.

+CONFIG_MCPU_386
+ This is the processor type the kernel will be optimized for. The
+ kernel will run on the processor you selected in the previous
+ question and processors compatible with it, but it will run faster
+ on the CPUs selected here and slower on others.
+
+ The following settings are available:
+ - "386" for the AMD/Cyrix/Intel 386DX/DXL/SL/SLC/SX, Cyrix/TI
+ 486DLC/DLC2, UMC 486SX-S and NexGen Nx586.
+ - "486" for the AMD/Cyrix/IBM/Intel 486DX/DX2/DX4 or
+ SL/SLC/SLC2/SLC3/SX/SX2 and UMC U5D or U5S.
+ - "586" for Pentium/K5/5x86/6x86.
+ - "586+MMX" for Pentium-6x86MX/CyrixIII/C3/Winchip. Currently this
+ is the same as "586".
+ - "686" for Pentium-Pro and other 686 processors.
+ - "Pentium-II" for Pentium-II. Currently this is the same as "686".
+ - "Pentium-III" for Pentium-III. Currently this is the same as "686".
+ - "Pentium-4" for the Intel Pentium 4.
+ - "K6" for the AMD K6.
+ - "K6-II/K6-III" for the AMD K6-II/K6-III processors. Currently
+ this is the same as "K6".
+ - "Athlon" for the AMD Athlon
+ - "Athlon-ThunderBird" for the AMD Athlon ThunderBird. Currently
+ this is the same as "Athlon".
+ - "Athlon-4/XP/MP" for AMD Athlon 4/XP/MP processors. Currently
+ this is the same as "Athlon".
+ - "Crusoe" for the Transmeta Crusoe.
+
+ If you don't know what to do, choose the same processor that you
+ chose in the previous question. If you still don't know what to do,
+ choose "686".
+
CONFIG_X86_PPRO_FENCE
Allows the kernel to run on Pentium Pro systems by supporting a
workaround for the store ordering bug present on them.
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/config.in b/arch/i386/config.in
--- a/arch/i386/config.in 2002-08-04 21:54:59.000000000 +0200
+++ b/arch/i386/config.in 2002-08-04 22:09:05.000000000 +0200
@@ -14,7 +14,7 @@

mainmenu_option next_comment
comment 'Processor type and features'
-choice 'Processor family' \
+choice 'Required processor family' \
"386 CONFIG_M386 \
486 CONFIG_M486 \
586/K5/5x86/6x86 CONFIG_M586 \
@@ -35,10 +35,33 @@
Winchip+TSC...Winchip-2 CONFIG_MWINCHIP2 \
Winchip+TSC+3DNow...Winchip-2A/Winchip-3 CONFIG_MWINCHIP3D \
CyrixIII/C3 CONFIG_MCYRIXIII" Pentium-III
+
+choice 'Optimized for processor family' \
+ "386 CONFIG_MCPU_386 \
+ 486...486/Elan CONFIG_MCPU_486 \
+ 586...Pentium/K5/etc CONFIG_MCPU_586 \
+ 586+MMX...Pentium-MMX/6x86MX/etc CONFIG_MCPU_586MMX \
+ 686...Pentium-Pro CONFIG_MCPU_686 \
+ Pentium-II CONFIG_MCPU_PENTIUMII \
+ Pentium-III CONFIG_MCPU_PENTIUMIII \
+ Pentium-4 CONFIG_MCPU_PENTIUM4 \
+ K6 CONFIG_MCPU_K6 \
+ K6-II/K6-III CONFIG_MCPU_K6II \
+ Athlon CONFIG_MCPU_ATHLON \
+ Athlon-ThunderBird CONFIG_MCPU_ATHLON_TBIRD \
+ Athlon-4/XP/MP CONFIG_MCPU_ATHLON_XP \
+ Crusoe CONFIG_MCPU_CRUSOE" 686
+
#
# Define implied options from the CPU selection here
#

+# Workaround make xconfig brokenness
+define_bool CONFIG_X86_USE_SSE_PREFETCH n
+define_bool CONFIG_X86_USE_3DNOW n
+define_bool CONFIG_X86_OOSTORE n
+define_bool CONFIG_X86_686 n
+
if [ "$CONFIG_M386" = "y" ]; then
define_bool CONFIG_X86_CMPXCHG n
define_bool CONFIG_X86_XADD n
@@ -214,7 +237,7 @@
define_bool CONFIG_X86_F00F_BUG y
fi
if [ "$CONFIG_X86_USE_SSE_PREFETCH" != "y" -a "$CONFIG_X86_USE_3DNOW" != "y" -a "$CONFIG_X86_OOSTORE" != "y" ]; then
- dep_bool 'Support Pentium Pro and slow down all processors' CONFIG_X86_PPRO_FENCE
+ bool 'Support Pentium Pro and slow down all processors' CONFIG_X86_PPRO_FENCE
fi

bool 'Symmetric multi-processing support' CONFIG_SMP
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/arch/i386/Makefile b/arch/i386/Makefile
--- a/arch/i386/Makefile 2002-08-04 16:39:51.000000000 +0200
+++ b/arch/i386/Makefile 2002-08-04 21:57:21.000000000 +0200
@@ -23,9 +23,8 @@
__cc_test = if $(CC) $(1) -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "$(1)"; $(2)fi
cc_test = $(call __cc_test,$(1),)
cc_test_ = $(call __cc_test,$(1),else $(2); )
-cc_test_march = $(call cc_test_,-march=$(1),echo "-march=$(2)")
-cc_test_march3 = $(call cc_test_,-march=$(1),$(call cc_test_march,$(2),$(3)))
-
+cc_test_o = $(call cc_test_,-$(1)=$(2),echo "-$(1)=$(3)")
+cc_test_o3 = $(call cc_test_,-$(1)=$(2),$(call cc_test_o,$(1),$(3),$(4)))
CFLAGS += -pipe

# prevent gcc from keeping the stack 16 byte aligned
@@ -56,11 +55,11 @@
endif

ifdef CONFIG_M586MX
-CFLAGS += $(shell $(call cc_test_march,pentium-mmx,i586))
+CFLAGS += $(shell $(call cc_test_o,march,pentium-mmx,i586))
endif

ifdef CONFIG_M586MMX
-CFLAGS += $(shell $(call cc_test_march,pentium-mmx,i586))
+CFLAGS += $(shell $(call cc_test_o,march,pentium-mmx,i586))
endif

ifdef CONFIG_M686
@@ -68,35 +67,35 @@
endif

ifdef CONFIG_MPENTIUMII
-CFLAGS += $(shell $(call cc_test_march,pentium2,i686))
+CFLAGS += $(shell $(call cc_test_o,march,pentium2,i686))
endif

ifdef CONFIG_MPENTIUMIII
-CFLAGS += $(shell $(call cc_test_march,pentium3,i686))
+CFLAGS += $(shell $(call cc_test_o,march,pentium3,i686))
endif

ifdef CONFIG_MPENTIUM4
-CFLAGS += $(shell $(call cc_test_march,pentium4,i686))
+CFLAGS += $(shell $(call cc_test_o,march,pentium4,i686))
endif

ifdef CONFIG_MK6
-CFLAGS += $(shell $(call cc_test_march,k6,i586 -mcpu=i686))
+CFLAGS += $(shell $(call cc_test_o,march,k6,i586))
endif

ifdef CONFIG_MK6II
-CFLAGS += $(shell $(call cc_test_march3,k6-2,k6,i586 -mcpu=i686))
+CFLAGS += $(shell $(call cc_test_o3,march,k6-2,k6,i586))
endif

ifdef CONFIG_MK7
-CFLAGS += $(shell $(call cc_test_march,athlon,i686))
+CFLAGS += $(shell $(call cc_test_o,march,athlon,i686))
endif

ifdef CONFIG_MK7SSE
-CFLAGS += $(shell $(call cc_test_march3,athlon-xp,athlon,i686))
+CFLAGS += $(shell $(call cc_test_o3,march,athlon-xp,athlon,i686))
endif

ifdef CONFIG_MCRUSOE
-CFLAGS += $(shell $(call cc_test_march,crusoe,i686 -malign-functions=0 -malign-jumps=0 -malign-loops=0))
+CFLAGS += $(shell $(call cc_test_o,march,crusoe,i686))
endif

ifdef CONFIG_MWINCHIPC6
@@ -115,6 +114,62 @@
CFLAGS += -march=i586
endif

+ifdef CONFIG_MCPU_386
+CFLAGS += -march=i386
+endif
+
+ifdef CONFIG_MCPU_486
+CFLAGS += -march=i486
+endif
+
+ifdef CONFIG_MCPU_586
+CFLAGS += -march=i586
+endif
+
+ifdef CONFIG_MCPU_586MMX
+CFLAGS += $(shell $(call cc_test_o,mcpu,pentium-mmx,i586))
+endif
+
+ifdef CONFIG_MCPU_686
+CFLAGS += -march=i686
+endif
+
+ifdef CONFIG_MCPU_PENTIUMII
+CFLAGS += $(shell $(call cc_test_o,mcpu,pentium2,i686))
+endif
+
+ifdef CONFIG_MCPU_PENTIUMIII
+CFLAGS += $(shell $(call cc_test_o,mcpu,pentium3,i686))
+endif
+
+ifdef CONFIG_MCPU_PENTIUM4
+CFLAGS += $(shell $(call cc_test_o,mcpu,pentium4,i686))
+endif
+
+ifdef CONFIG_MCPU_K6
+CFLAGS += $(shell $(call cc_test_o,mcpu,k6,i686))
+endif
+
+ifdef CONFIG_MCPU_K6II
+CFLAGS += $(shell $(call cc_test_o3,mcpu,k6-2,k6,i686))
+endif
+
+ifdef CONFIG_MCPU_ATHLON
+CFLAGS += $(shell $(call cc_test_o,mcpu,athlon,i686))
+endif
+
+ifdef CONFIG_MCPU_ATHLON_TBIRD
+CFLAGS += $(shell $(call cc_test_o3,mcpu,athlon-tbird,athlon,i686))
+endif
+
+ifdef CONFIG_MCPU_ATHLON_XP
+CFLAGS += $(shell $(call cc_test_o3,mcpu,athlon-xp,athlon,i686))
+endif
+
+ifdef CONFIG_MCPU_CRUSOE
+CFLAGS += $(shell $(call cc_test_o,mcpu,crusoe,i686 -malign-functions=0 -malign-jumps=0 -malign-loops=0))
+endif
+
HEAD := arch/i386/kernel/head.o arch/i386/kernel/init_task.o

SUBDIRS += arch/i386/kernel arch/i386/mm arch/i386/lib
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/include/asm-i386/bugs.h b/include/asm-i386/bugs.h
--- a/include/asm-i386/bugs.h 2002-08-04 21:54:59.000000000 +0200
+++ b/include/asm-i386/bugs.h 2002-08-04 22:17:22.000000000 +0200
@@ -212,10 +212,15 @@
panic("Kernel requires 3DNow support (K6-2/3, Athlon)");
#endif

-#if defined(CONFIG_X86_USE_SSE_PREFETCH)
+#if defined(CONFIG_X86_USE_SSE_PREFETCH) || (defined(CONFIG_X86_MMXEXT) && defined(CONFIG_X86_OOSTORE))
if(!boot_cpu_has(X86_FEATURE_MMXEXT))
panic("Kernel requires extended MMX support (Pentium3/4, Athlon)");
#endif
+
+#if defined(CONFIG_X86_SSE2)
+ if(!boot_cpu_has(X86_FEATURE_XMM2))
+ panic("Kernel requires SSE2 (Pentium4)");
+#endif
}

static void __init check_bugs(void)
diff --exclude-from=/home/ldb/src/linux-exclude -urNd a/include/asm-i386/system.h b/include/asm-i386/system.h
--- a/include/asm-i386/system.h 2002-07-29 04:17:20.000000000 +0200
+++ b/include/asm-i386/system.h 2002-08-04 21:56:34.000000000 +0200
@@ -282,12 +282,21 @@
* Some non intel clones support out of order store. wmb() ceases to be a
* nop for these.
*/
-
+
+#ifdef CONFIG_X86_SSE2
+#define mb() __asm__ __volatile__ ("mfence": : :"memory")
+#define rmb() __asm__ __volatile__ ("lfence": : :"memory")
+#else
#define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
#define rmb() mb()
+#endif

#ifdef CONFIG_X86_OOSTORE
+#ifdef CONFIG_X86_MMXEXT /* never happens right now */
+#define wmb() __asm__ __volatile__ ("sfence": : :"memory")
+#else
#define wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
+#endif
#else
#define wmb() __asm__ __volatile__ ("": : :"memory")
#endif


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-04 20:29:20

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Sun, 2002-08-04 at 21:23, Luca Barbieri wrote:
> Added, with the exception that sfence is only used if CONFIG_X86_OOSTORE
> is not defined (currently never).

OOSTORE should be defined for all WINCHIP family processors. We run the
winchip cpus in weak store ordered mode which requires a store fence
when we want to be sure the cpu/memory/pci view is consistent for a DMA

2002-08-04 20:32:41

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Sun, 2002-08-04 at 21:23, Luca Barbieri wrote:
> Added, with the exception that sfence is only used if CONFIG_X86_OOSTORE
> is not defined (currently never).

Ok sorry I follow what you are doing. What I don't understand is why you
are generating unneeded sfence/mfence instructions in the other cases ?

When we use MMX/SSE we need the view to be consistent anyway so the
various copying routines already handle this internally.

2002-08-04 20:40:46

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Sun, 2002-08-04 at 23:54, Alan Cox wrote:
> On Sun, 2002-08-04 at 21:23, Luca Barbieri wrote:
> > Added, with the exception that sfence is only used if CONFIG_X86_OOSTORE
> > is not defined (currently never).
>
> Ok sorry I follow what you are doing. What I don't understand is why you
> are generating unneeded sfence/mfence instructions in the other cases ?
It was my fault: I explained it incorrectly. sfence is only used if both
CONFIG_X86_OOSTORE and CONFIG_MMXEXT are set, which currently never
happens with the existing processors.

> When we use MMX/SSE we need the view to be consistent anyway so the
> various copying routines already handle this internally.
That's why sfence is not used unless CONFIG_X86_OOSTORE (and
CONFIG_X86_MMXEXT) is defined.
mfence and lfence instead replace the "lock; addl $0,0(%%esp)". Is this
wrong?


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-04 22:40:17

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Sun, 2002-08-04 at 21:43, Luca Barbieri wrote:
> > When we use MMX/SSE we need the view to be consistent anyway so the
> > various copying routines already handle this internally.
> That's why sfence is not used unless CONFIG_X86_OOSTORE (and
> CONFIG_X86_MMXEXT) is defined.
> mfence and lfence instead replace the "lock; addl $0,0(%%esp)". Is this
> wrong?

I'm trying to understand why you think they are needed at all. Except
for code that specifically does non-temporal we don't need fences on an
X86, and the code that uses non temporal stores has its own fences built
in.

So as far as I can see the only cases we ever have to care about are

PPro - processor bug
IDT Winchip - because we run it in oostore module not strict x86 mode

I don't see why you are generating extra fence instructions for other
cases

2002-08-05 08:10:10

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

> I'm trying to understand why you think they are needed at all. Except
> for code that specifically does non-temporal we don't need fences on an
> X86, and the code that uses non temporal stores has its own fences built
> in.
>
> So as far as I can see the only cases we ever have to care about are
>
> PPro - processor bug
> IDT Winchip - because we run it in oostore module not strict x86 mode
>
> I don't see why you are generating extra fence instructions for other
> cases
>

__volatile__ and : : :"memory" omitted from asm statements

Both without and with patch:
- barrier(): asm("")

Without patch:
- mb(): asm("lock; addl $0,0(%%esp)")
- rmb(): asm("lock; addl $0,0(%%esp)")
- wmb: if(OOSTORE) asm("lock; addl $0,0(%%esp)") else barrier()

With patch:
- mb(): if(SSE2) asm("mfence") else asm("lock; addl $0,0(%%esp)")
- rmb(): if(SSE2) asm("lfence") else asm("lock; addl $0,0(%%esp)")
- wmb: if(OOSTORE) {if(MMXEXT) asm("sfence") else asm("lock; addl
$0,0(%%esp)")} else barrier()

So I'm only replacing the lock; addl $0,0(%%esp) with the Xfence
instructions which are more efficient.

As for the need for fences, based on the Intel documentation it seems
that we need read fences to read all hardware locations not mapped as
uncacheable and write fences for all memory locations mapped as write
combining.

Since drivers often map cacheable memory and then use rmb(), rmb()
cannot be made a nop.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-05 08:27:06

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Mon, 2002-08-05 at 09:12, Luca Barbieri wrote:
> So I'm only replacing the lock; addl $0,0(%%esp) with the Xfence
> instructions which are more efficient.

The original code has rmb not doing any kind of CPU operation, and wmb
likewise. (Quoting 2.4 and 2.5.29 here)

You don't need stronger barriers except on the Pentium Pro or the
Winchip because of the guarantees already made by the processor and by
the PCI interface.

The only case you need a store fence with non buggy/weird processors is
when you do non temporal stores. In that situation the barriers are
still not needed because the non temporal using functions already have
their own sfence instructions and need them.


#define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":
:"memory")
#define rmb() mb()

#ifdef CONFIG_X86_OOSTORE
#define wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": :
:"memory")
#else
#define wmb() __asm__ __volatile__ ("": : :"memory")
#endif


For the PPro a lock addl is the most efficient one I know of for working
around the store order errata. If you want to optimise it further then
the winchip appears to be fractionally faster using an rdmsr() but that
impacts registers so wants more profiling



2002-08-05 09:30:41

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Mon, 2002-08-05 at 11:49, Alan Cox wrote:
> On Mon, 2002-08-05 at 09:12, Luca Barbieri wrote:
> > So I'm only replacing the lock; addl $0,0(%%esp) with the Xfence
> > instructions which are more efficient.
>
> The original code has rmb not doing any kind of CPU operation, and wmb
> likewise. (Quoting 2.4 and 2.5.29 here)
No. As you quote below, it does a lock addl which is a serializing
operation.

lfence is the same but doesn't serialize write operations and is
probably more efficient since it is designed for the purpose of
serializing load operations.
mfence is like lfence but also serializes writes.

However I don't have a Pentium 4 so I haven't done any checking or
benchmarks.

> You don't need stronger barriers except on the Pentium Pro or the
> Winchip because of the guarantees already made by the processor and by
> the PCI interface.
>
> The only case you need a store fence with non buggy/weird processors is
> when you do non temporal stores. In that situation the barriers are
> still not needed because the non temporal using functions already have
> their own sfence instructions and need them.
I agree and the patch only adds sfence _if_ CONFIG_X86_OOSTORE is
defined (and CONFIG_X86_MMXEXT is also defined).

>
> #define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)":
> :"memory")
> #define rmb() mb()
>
> #ifdef CONFIG_X86_OOSTORE
> #define wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": :
> :"memory")
> #else
> #define wmb() __asm__ __volatile__ ("": : :"memory")
> #endif
>
>
> For the PPro a lock addl is the most efficient one I know of for working
> around the store order errata. If you want to optimise it further then
> the winchip appears to be fractionally faster using an rdmsr() but that
> impacts registers so wants more profiling
This isn't changed.



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-05 09:43:02

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Mon, 2002-08-05 at 10:31, Luca Barbieri wrote:
> I agree and the patch only adds sfence _if_ CONFIG_X86_OOSTORE is
> defined (and CONFIG_X86_MMXEXT is also defined)

If OOSTORE is defined then we can't safely use any mmx operations, so
this is all noise and its still the case no change is required


2002-08-05 09:50:24

by Luca Barbieri

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

> If OOSTORE is defined then we can't safely use any mmx operations, so
> this is all noise and its still the case no change is required
Yes, this is only for future processors (e.g. out-order AMD/Intels or
Winchips with extended MMX).

So are lfence and mfence OK?


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-08-05 10:04:07

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [RFC] [2.5 i386] GCC 3.1 -march support, PPRO_FENCE reduction, prefetch fixes and other CPU-related changes

On Mon, 2002-08-05 at 10:53, Luca Barbieri wrote:
> > If OOSTORE is defined then we can't safely use any mmx operations, so
> > this is all noise and its still the case no change is required
> Yes, this is only for future processors (e.g. out-order AMD/Intels or
> Winchips with extended MMX).

The winchip line is dead so that bit is ok as it is. PPro is sorted with
the current code. The wmb() case therefore seems resolved already. The
guarantees already given by the processors are sufficient to ensure that
wmb is simply a compiler optimisation barrier for other cases (and
indeed the spinlock code breaks if it ceases to be true, as does an
awful lot of 'other vendors' code)

For the rmb() situation then yes your fence changes may well be a win on
the PIV, and would certainly be worth benchmarking. I'd be interested to
see the numbers.

Alan