Hi,
the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
root@alix:/usr/src# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 10
model name : Geode(TM) Integrated Processor by AMD PCS
stepping : 2
cpu MHz : 498.060
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
mmxext 3dnowext 3dnow
bogomips : 996.12
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:
indeed it has mmx, mmxext and cmov.
So I added the quirk below and I can confirm that the system is running fine.
I did an x264 encode to trigger any SIGILL due to missing opcodes, but
it worked.
I also did this simple test which gives a ~2.5x speed boost:
http://pastebin.ca/1590089
I wish some feedback from someone, I really dunno why AMD set the
cpuid value to 5
--- a/arch/x86/kernel/cpu/bugs.c 2009-10-03 03:43:02.399323313 +0200
+++ b/arch/x86/kernel/cpu/bugs.c 2009-10-03 03:54:22.582205090 +0200
@@ -151,6 +151,12 @@
#endif
}
+static void __init check_geode_i586(void)
+{
+ if(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86_model == 10 && boot_cpu_data.x86 == 5)
+ boot_cpu_data.x86 = 6;
+}
void __init check_bugs(void)
{
@@ -163,6 +169,7 @@
check_fpu();
check_hlt();
check_popad();
+ check_geode_i586();
init_utsname()->machine[1] =
'0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
alternative_instructions();
On 10/02/2009 07:12 PM, Matteo Croce wrote:
> Hi,
>
> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>
a) Wrong place - it should be in cpu/amd.c.
b) You need to also make sure that it has FCOMI and the long NOP\
instructions.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 10/02/2009 07:12 PM, Matteo Croce wrote:
> Hi,
>
> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>
See checkin a7ef94e6889186848573a10c5bdb8271405f44de for additional details.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Sat, Oct 3, 2009 at 4:34 AM, H. Peter Anvin <[email protected]> wrote:
> On 10/02/2009 07:12 PM, Matteo Croce wrote:
>> Hi,
>>
>> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>>
>
> a) Wrong place - it should be in cpu/amd.c.
> b) You need to also make sure that it has FCOMI and the long NOP\
> ? instructions.
>
> ? ? ? ?-hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. ?I don't speak on their behalf.
>
>
I've run the attached files and I get this:
root@alix:/usr/src# ./longnop
Long NOPs supported: no
root@alix:/usr/src# ./fcomi
Long FCOMi supported: yes
* Matteo Croce <[email protected]> wrote:
> Hi,
>
> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>
> root@alix:/usr/src# cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 5
> model : 10
> model name : Geode(TM) Integrated Processor by AMD PCS
> stepping : 2
> cpu MHz : 498.060
> cache size : 128 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
> mmxext 3dnowext 3dnow
> bogomips : 996.12
> clflush size : 32
> cache_alignment : 32
> address sizes : 32 bits physical, 32 bits virtual
> power management:
>
> indeed it has mmx, mmxext and cmov.
> So I added the quirk below and I can confirm that the system is running fine.
> I did an x264 encode to trigger any SIGILL due to missing opcodes, but
> it worked.
>
> I also did this simple test which gives a ~2.5x speed boost:
> http://pastebin.ca/1590089
>
> I wish some feedback from someone, I really dunno why AMD set the
> cpuid value to 5
>
> --- a/arch/x86/kernel/cpu/bugs.c 2009-10-03 03:43:02.399323313 +0200
> +++ b/arch/x86/kernel/cpu/bugs.c 2009-10-03 03:54:22.582205090 +0200
> @@ -151,6 +151,12 @@
> #endif
> }
>
> +static void __init check_geode_i586(void)
> +{
> + if(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> + boot_cpu_data.x86_model == 10 && boot_cpu_data.x86 == 5)
> + boot_cpu_data.x86 = 6;
> +}
>
> void __init check_bugs(void)
> {
> @@ -163,6 +169,7 @@
> check_fpu();
> check_hlt();
> check_popad();
> + check_geode_i586();
> init_utsname()->machine[1] =
> '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
> alternative_instructions();
Looks good, but your signoff line is missing.
Ingo
On Sat, Oct 3, 2009 at 9:21 AM, Ingo Molnar <[email protected]> wrote:
>
> * Matteo Croce <[email protected]> wrote:
>
>> Hi,
>>
>> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>>
>> root@alix:/usr/src# cat /proc/cpuinfo
>> processor ? ? ? : 0
>> vendor_id ? ? ? : AuthenticAMD
>> cpu family ? ? ?: 5
>> model ? ? ? ? ? : 10
>> model name ? ? ?: Geode(TM) Integrated Processor by AMD PCS
>> stepping ? ? ? ?: 2
>> cpu MHz ? ? ? ? : 498.060
>> cache size ? ? ?: 128 KB
>> fdiv_bug ? ? ? ?: no
>> hlt_bug ? ? ? ? : no
>> f00f_bug ? ? ? ?: no
>> coma_bug ? ? ? ?: no
>> fpu ? ? ? ? ? ? : yes
>> fpu_exception ? : yes
>> cpuid level ? ? : 1
>> wp ? ? ? ? ? ? ?: yes
>> flags ? ? ? ? ? : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
>> mmxext 3dnowext 3dnow
>> bogomips ? ? ? ?: 996.12
>> clflush size ? ?: 32
>> cache_alignment : 32
>> address sizes ? : 32 bits physical, 32 bits virtual
>> power management:
>>
>> indeed it has mmx, mmxext and cmov.
>> So I added the quirk below and I can confirm that the system is running fine.
>> I did an x264 encode to trigger any SIGILL due to missing opcodes, but
>> it worked.
>>
>> I also did this simple test which gives a ~2.5x speed boost:
>> http://pastebin.ca/1590089
>>
>> I wish some feedback from someone, I really dunno why AMD set the
>> cpuid value to 5
>>
>> --- a/arch/x86/kernel/cpu/bugs.c ? ? ?2009-10-03 03:43:02.399323313 +0200
>> +++ b/arch/x86/kernel/cpu/bugs.c ? ? ?2009-10-03 03:54:22.582205090 +0200
>> @@ -151,6 +151,12 @@
>> ?#endif
>> ?}
>>
>> +static void __init check_geode_i586(void)
>> +{
>> + ? ? if(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>> + ? ? ? ?boot_cpu_data.x86_model == 10 && boot_cpu_data.x86 == 5)
>> + ? ? ? ? ? ? boot_cpu_data.x86 = 6;
>> +}
>>
>> ?void __init check_bugs(void)
>> ?{
>> @@ -163,6 +169,7 @@
>> ? ? ? check_fpu();
>> ? ? ? check_hlt();
>> ? ? ? check_popad();
>> + ? ? check_geode_i586();
>> ? ? ? init_utsname()->machine[1] =
>> ? ? ? ? ? ? ? '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
>> ? ? ? alternative_instructions();
>
> Looks good, but your signoff line is missing.
>
> ? ? ? ?Ingo
>
Sorry Ingo but if you read my last mail I wrote that the GEODE does
supports FCOMI but not NOPL
so i don't know if it's worth it, even is I haven't ANY binary in my
system with such instruction:
# find /lib/tls/i686 /lib/i686 -type f |xargs objdump -d |grep -c nopl
0
On 10/03/2009 02:53 AM, Matteo Croce wrote:
>
> Sorry Ingo but if you read my last mail I wrote that the GEODE does
> supports FCOMI but not NOPL
> so i don't know if it's worth it, even is I haven't ANY binary in my
> system with such instruction:
>
> # find /lib/tls/i686 /lib/i686 -type f |xargs objdump -d |grep -c nopl
> 0
Recent gcc/binutils has been known to start to use them.
Either way, it's a user-space-visible difference, so at least the kernel
will not signal anything else based on existing precedent.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Sat, Oct 3, 2009 at 4:12 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/03/2009 02:53 AM, Matteo Croce wrote:
>>
>> Sorry Ingo but if you read my last mail I wrote that the GEODE does
>> supports FCOMI but not NOPL
>> so i don't know if it's worth it, even is I haven't ANY binary in my
>> system with such instruction:
>>
>> # find /lib/tls/i686 /lib/i686 -type f |xargs objdump -d |grep -c nopl
>> 0
>
> Recent gcc/binutils has been known to start to use them.
>
> Either way, it's a user-space-visible difference, so at least the kernel
> will not signal anything else based on existing precedent.
So it can't be applied, right?
On Sat, 3 Oct 2009 04:12:47 +0200
Matteo Croce <[email protected]> wrote:
>
> I also did this simple test which gives a ~2.5x speed boost:
> http://pastebin.ca/1590089
the number that is exposed in /proc/cpuinfo does not control
if cmov is used... so the patch and the benchmark are totally unrelated.
Anything that needs to know if cmov can be used will need to read the
flags line..
(since technically cmov is also optional for generation 6 and later)
While I can't say anything against your patch, I do need to say that it
is something purely cosmetic on first sight...
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sat, Oct 3, 2009 at 8:05 PM, Arjan van de Ven <[email protected]> wrote:
> On Sat, 3 Oct 2009 04:12:47 +0200
> Matteo Croce <[email protected]> wrote:
>>
>> I also did this simple test which gives a ~2.5x speed boost:
>> http://pastebin.ca/1590089
>
> the number that is exposed in /proc/cpuinfo does not control
> if cmov is used... so the patch and the benchmark are totally unrelated.
>
> Anything that needs to know if cmov can be used will need to read the
> flags line..
>
> (since technically cmov is also optional for generation 6 and later)
>
> While I can't say anything against your patch, I do need to say that it
> is something purely cosmetic on first sight...
>
> --
> Arjan van de Ven ? ? ? ?Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
>
yes cmov and i686 are unrelated but some distros (Debian & co.)
compiles libraries for two archs:
i486 and i686-cmov so only i686 machines does benefit from cmov.
This is why I cared about having an i686 aware userspace but it seems
that for a missing instruction I can't do that
On Sun, Oct 04, 2009 at 12:04:03AM +0200, Matteo Croce wrote:
> yes cmov and i686 are unrelated but some distros (Debian & co.)
> compiles libraries for two archs:
> i486 and i686-cmov so only i686 machines does benefit from cmov.
> This is why I cared about having an i686 aware userspace but it seems
> that for a missing instruction I can't do that
You could emulate the missing instruction. As long as it is not really
used, the performance cost of the emulation should not matter. Of course
if userspace starts to use long NOPs extensively then it becomes a
problem.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
On Sun, Oct 4, 2009 at 12:32 AM, Gabor Gombas <[email protected]> wrote:
> On Sun, Oct 04, 2009 at 12:04:03AM +0200, Matteo Croce wrote:
>
>> yes cmov and i686 are unrelated but some distros (Debian & co.)
>> compiles libraries for two archs:
>> i486 and i686-cmov so only i686 machines does benefit from cmov.
>> This is why I cared about having an i686 aware userspace but it seems
>> that for a missing instruction I can't do that
>
> You could emulate the missing instruction. As long as it is not really
> used, the performance cost of the emulation should not matter. Of course
> if userspace starts to use long NOPs extensively then it becomes a
> problem.
Emulate? I can replace it with two plain NOPs.
How can I do the "emulation"? trapping SIGILL?
On Sun, 4 Oct 2009 00:04:03 +0200
Matteo Croce <[email protected]> wrote:
> >
> > the number that is exposed in /proc/cpuinfo does not control
> > if cmov is used... so the patch and the benchmark are totally
> > unrelated.
> >
>
> yes cmov and i686 are unrelated but some distros (Debian & co.)
> compiles libraries for two archs:
> i486 and i686-cmov so only i686 machines does benefit from cmov.
> This is why I cared about having an i686 aware userspace but it seems
> that for a missing instruction I can't do that
I'm surprised that "i686" in debian depends on the family flag
in /proc/cpuinfo.. but hey.. weirder things have been done.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sun, Oct 04, 2009 at 12:54:03AM +0200, Matteo Croce wrote:
> Emulate? I can replace it with two plain NOPs.
> How can I do the "emulation"? trapping SIGILL?
No, emulate it in the kernel, so userspace never notices.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
> I'm surprised that "i686" in debian depends on the family flag
> in /proc/cpuinfo.. but hey.. weirder things have been done.
Its compensating for the old gcc bugs where gcc "i686" generated cmov
instructions without any checks whether the CPU supported cmov (which is
optional for a 686 architecture)
RPM has (or had) similar hacks. Both arguably come about from fundamental
design thinkos in that they treat architecture as "special", not simply
as a set of dependancies (needs x86, x86-cmov, glibc x86-32, ...) as
should hve been done and which would also have made emulators just work
out of the box instead of the current mess.
Alan
On Sat, Oct 3, 2009 at 8:21 AM, Ingo Molnar <[email protected]> wrote:
>
> * Matteo Croce <[email protected]> wrote:
>
>> Hi,
>>
>> the AMD GEode LX has an x86 id of 5 (i586) tought it's technically an i686:
>>
>> root@alix:/usr/src# cat /proc/cpuinfo
>> processor ? ? ? : 0
>> vendor_id ? ? ? : AuthenticAMD
>> cpu family ? ? ?: 5
>> model ? ? ? ? ? : 10
>> model name ? ? ?: Geode(TM) Integrated Processor by AMD PCS
>> stepping ? ? ? ?: 2
>> cpu MHz ? ? ? ? : 498.060
>> cache size ? ? ?: 128 KB
>> fdiv_bug ? ? ? ?: no
>> hlt_bug ? ? ? ? : no
>> f00f_bug ? ? ? ?: no
>> coma_bug ? ? ? ?: no
>> fpu ? ? ? ? ? ? : yes
>> fpu_exception ? : yes
>> cpuid level ? ? : 1
>> wp ? ? ? ? ? ? ?: yes
>> flags ? ? ? ? ? : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
>> mmxext 3dnowext 3dnow
>> bogomips ? ? ? ?: 996.12
>> clflush size ? ?: 32
>> cache_alignment : 32
>> address sizes ? : 32 bits physical, 32 bits virtual
>> power management:
>>
>> indeed it has mmx, mmxext and cmov.
>> So I added the quirk below and I can confirm that the system is running fine.
>> I did an x264 encode to trigger any SIGILL due to missing opcodes, but
>> it worked.
>>
>> I also did this simple test which gives a ~2.5x speed boost:
>> http://pastebin.ca/1590089
>>
>> I wish some feedback from someone, I really dunno why AMD set the
>> cpuid value to 5
>>
>> --- a/arch/x86/kernel/cpu/bugs.c ? ? ?2009-10-03 03:43:02.399323313 +0200
>> +++ b/arch/x86/kernel/cpu/bugs.c ? ? ?2009-10-03 03:54:22.582205090 +0200
>> @@ -151,6 +151,12 @@
>> ?#endif
>> ?}
>>
>> +static void __init check_geode_i586(void)
>> +{
>> + ? ? if(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>> + ? ? ? ?boot_cpu_data.x86_model == 10 && boot_cpu_data.x86 == 5)
>> + ? ? ? ? ? ? boot_cpu_data.x86 = 6;
>> +}
>>
>> ?void __init check_bugs(void)
>> ?{
>> @@ -163,6 +169,7 @@
>> ? ? ? check_fpu();
>> ? ? ? check_hlt();
>> ? ? ? check_popad();
>> + ? ? check_geode_i586();
>> ? ? ? init_utsname()->machine[1] =
>> ? ? ? ? ? ? ? '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
>> ? ? ? alternative_instructions();
>
> Looks good, but your signoff line is missing.
>
> ? ? ? ?Ingo
>
The AMD Geode LX has an x86 id of 5 (i586) tought it's technically an i686:
root@alix:~# egrep '^(cpu family|model name|flags)' /proc/cpuinfo
cpu family : 5
model name : Geode(TM) Integrated Processor by AMD PCS
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
mmxext 3dnowext 3dnow
indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
This patch adds a quirck to promote the Geode to an i686 and emulates
the NOPL in the do_invalid_op trap, so the userspace never notices.
Emulating the NOPL has minimum performance loss, emulating a NOPL
takes 0.5 usecs
and they are rarely used in x86
Signed-off-by: Matteo Croce <[email protected]>
--- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989 +0100
+++ b/arch/x86/kernel/Makefile 2009-11-06 15:07:04.294054613 +0100
@@ -89,7 +89,7 @@
obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_K8_NB) += k8.o
-obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
+obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o nopl_emu.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
--- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805 +0100
+++ b/arch/x86/kernel/cpu/amd.c 2009-11-06 15:07:04.294054613 +0100
@@ -138,8 +138,10 @@
}
if (c->x86_model == 10) {
- /* AMD Geode LX is model 10 */
- /* placeholder for any needed mods */
+ /* Geode only lacks the NOPL instruction to be i686,
+ but we can emulate it in the exception handler
+ and promote it to a class 6 cpu */
+ boot_cpu_data.x86 = 6;
return;
}
}
--- a/arch/x86/kernel/entry_32.S 2009-11-06 15:06:52.258224172 +0100
+++ b/arch/x86/kernel/entry_32.S 2009-11-06 15:07:04.306230613 +0100
@@ -901,7 +901,11 @@
RING0_INT_FRAME
pushl $0
CFI_ADJUST_CFA_OFFSET 4
+#ifdef CONFIG_MGEODE_LX
+ pushl $do_nopl_emu
+#else
pushl $do_invalid_op
+#endif
CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ b/arch/x86/kernel/nopl_emu.c 2009-11-06 15:07:33.537723795 +0100
@@ -0,0 +1,102 @@
+/*
+ * linux/arch/x86/kernel/nopl_emu.c
+ *
+ * Copyright (C) 2002 Willy Tarreau
+ * Copyright (C) 2009 Matteo Croce
+ */
+
+#include <linux/sched.h>
+#include <linux/linkage.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+void do_invalid_op(struct pt_regs *regs, long error_code);
+
+/* This code can be used to allow the AMD Geode to hopefully correctly execute
+ * some code which was originally compiled for an i686, by emulating NOPL,
+ * the only missing i686 instruction in the CPU
+ *
+ * Willy Tarreau <[email protected]>
+ * Matteo Croce <[email protected]>
+ */
+
+static inline int do_1f(u8 *ip)
+{
+ int length = 3;
+ switch(*ip) {
+ case 0x84:if(!ip[5])
+ length++;
+ else
+ return 0;
+ case 0x80:if(!ip[4] && !ip[3])
+ length += 2;
+ else
+ return 0;
+ case 0x44:if(!ip[2])
+ length++;
+ else
+ return 0;
+ case 0x40:if(!ip[1])
+ length++;
+ else
+ return 0;
+ case 0x00:return length;
+ default: return 0;
+ }
+ return length;
+}
+
+static inline int do_0f(u8 *ip)
+{
+ if(*ip == 0x1f)
+ return do_1f(ip + 1);
+ return 0;
+}
+
+static inline int do_66(u8 *ip)
+{
+ if(*ip == 0x90)
+ return 2;
+ if(*ip == 0x0f) {
+ int res = do_0f(ip + 1);
+ if(res)
+ return res + 1;
+ else
+ return 0;
+ }
+ return 0;
+}
+
+static inline int do_start(u8 *ip)
+{
+ if(*ip == 0x0f)
+ return do_0f(ip + 1);
+ if(*ip == 0x66)
+ return do_66(ip + 1);
+ return 0;
+}
+
+/* [do_nopl_emu] is called by exception 6 after an invalid opcode has been
+ * encountered. It will try to emulate it by doing nothing,
+ * and will send a SIGILL or SIGSEGV to the process if not possible.
+ * the NOPL can have variable length opcodes:
+
+bytes number opcode
+ 2 66 90
+ 3 0f 1f 00
+ 4 0f 1f 40 00
+ 5 0f 1f 44 00 00
+ 6 66 0f 1f 44 00 00
+ 7 0f 1f 80 00 00 00 00
+ 8 0f 1f 84 00 00 00 00 00
+ 9 66 0f 1f 84 00 00 00 00 00
+*/
+void do_nopl_emu(struct pt_regs *regs, long error_code)
+{
+ int res = do_start((u8*)regs->ip);
+
+ if(res)
+ regs->ip += res;
+ else
+ do_invalid_op(regs, error_code);
+}
On Fri, 6 Nov 2009 15:59:16 +0100 Matteo Croce wrote:
> On Sat, Oct 3, 2009 at 8:21 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Matteo Croce <[email protected]> wrote:
[snip]
> > Looks good, but your signoff line is missing.
> >
> > Ingo
> >
>
> The AMD Geode LX has an x86 id of 5 (i586) tought it's technically an
> i686:
>
> root@alix:~# egrep '^(cpu family|model name|flags)' /proc/cpuinfo
> cpu family : 5
> model name : Geode(TM) Integrated Processor by AMD PCS
> flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
> mmxext 3dnowext 3dnow
>
> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction
> (NOPL).
> This patch adds a quirck to promote the Geode to an i686 and emulates
> the NOPL in the do_invalid_op trap, so the userspace never notices.
> Emulating the NOPL has minimum performance loss, emulating a NOPL
> takes 0.5 usecs
> and they are rarely used in x86
>
> Signed-off-by: Matteo Croce <[email protected]>
>
> --- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989 +0100
> +++ b/arch/x86/kernel/Makefile 2009-11-06 15:07:04.294054613 +0100
> @@ -89,7 +89,7 @@
> obj-$(CONFIG_HPET_TIMER) += hpet.o
>
> obj-$(CONFIG_K8_NB) += k8.o
> -obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
> +obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o nopl_emu.o
> obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
> obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
>
> --- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805 +0100
> +++ b/arch/x86/kernel/cpu/amd.c 2009-11-06 15:07:04.294054613 +0100
> @@ -138,8 +138,10 @@
> }
>
> if (c->x86_model == 10) {
> - /* AMD Geode LX is model 10 */
> - /* placeholder for any needed mods */
> + /* Geode only lacks the NOPL instruction to be i686,
> + but we can emulate it in the exception handler
> + and promote it to a class 6 cpu */
> + boot_cpu_data.x86 = 6;
> return;
> }
> }
> --- a/arch/x86/kernel/entry_32.S 2009-11-06 15:06:52.258224172 +0100
> +++ b/arch/x86/kernel/entry_32.S 2009-11-06 15:07:04.306230613 +0100
> @@ -901,7 +901,11 @@
> RING0_INT_FRAME
> pushl $0
> CFI_ADJUST_CFA_OFFSET 4
> +#ifdef CONFIG_MGEODE_LX
> + pushl $do_nopl_emu
> +#else
> pushl $do_invalid_op
> +#endif
> CFI_ADJUST_CFA_OFFSET 4
> jmp error_code
> CFI_ENDPROC
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ b/arch/x86/kernel/nopl_emu.c 2009-11-06 15:07:33.537723795 +0100
> @@ -0,0 +1,102 @@
> +/*
> + * linux/arch/x86/kernel/nopl_emu.c
> + *
> + * Copyright (C) 2002 Willy Tarreau
> + * Copyright (C) 2009 Matteo Croce
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/linkage.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +void do_invalid_op(struct pt_regs *regs, long error_code);
> +
> +/* This code can be used to allow the AMD Geode to hopefully correctly
> execute
> + * some code which was originally compiled for an i686, by emulating
> NOPL,
> + * the only missing i686 instruction in the CPU
> + *
> + * Willy Tarreau <[email protected]>
> + * Matteo Croce <[email protected]>
> + */
> +
> +static inline int do_1f(u8 *ip)
> +{
> + int length = 3;
> + switch(*ip) {
> + case 0x84:if(!ip[5])
> + length++;
> + else
> + return 0;
> + case 0x80:if(!ip[4] && !ip[3])
> + length += 2;
> + else
> + return 0;
> + case 0x44:if(!ip[2])
> + length++;
> + else
> + return 0;
> + case 0x40:if(!ip[1])
> + length++;
> + else
> + return 0;
> + case 0x00:return length;
> + default: return 0;
> + }
> + return length;
> +}
> +
> +static inline int do_0f(u8 *ip)
> +{
> + if(*ip == 0x1f)
> + return do_1f(ip + 1);
> + return 0;
> +}
> +
> +static inline int do_66(u8 *ip)
> +{
> + if(*ip == 0x90)
> + return 2;
> + if(*ip == 0x0f) {
> + int res = do_0f(ip + 1);
> + if(res)
> + return res + 1;
> + else
> + return 0;
> + }
> + return 0;
> +}
> +
> +static inline int do_start(u8 *ip)
> +{
> + if(*ip == 0x0f)
> + return do_0f(ip + 1);
> + if(*ip == 0x66)
> + return do_66(ip + 1);
> + return 0;
> +}
> +
> +/* [do_nopl_emu] is called by exception 6 after an invalid opcode has
> been
> + * encountered. It will try to emulate it by doing nothing,
> + * and will send a SIGILL or SIGSEGV to the process if not possible.
> + * the NOPL can have variable length opcodes:
> +
> +bytes number opcode
> + 2 66 90
> + 3 0f 1f 00
> + 4 0f 1f 40 00
> + 5 0f 1f 44 00 00
> + 6 66 0f 1f 44 00 00
> + 7 0f 1f 80 00 00 00 00
> + 8 0f 1f 84 00 00 00 00 00
> + 9 66 0f 1f 84 00 00 00 00 00
> +*/
> +void do_nopl_emu(struct pt_regs *regs, long error_code)
> +{
> + int res = do_start((u8*)regs->ip);
> +
> + if(res)
> + regs->ip += res;
> + else
> + do_invalid_op(regs, error_code);
> +}
> --
Looks good? checkpatch.pl has a very different opinion.
WARNING: externs should be avoided in .c files
#56: FILE: arch/x86/kernel/nopl_emu.c:13:
+void do_invalid_op(struct pt_regs *regs, long error_code);
ERROR: switch and case should be at the same indent
#69: FILE: arch/x86/kernel/nopl_emu.c:26:
+ switch(*ip) {
+ case 0x84:if(!ip[5])
[...]
+ case 0x80:if(!ip[4] && !ip[3])
[...]
+ case 0x44:if(!ip[2])
[...]
+ case 0x40:if(!ip[1])
[...]
+ case 0x00:return length;
+ default: return 0;
ERROR: space required before the open parenthesis '('
#69: FILE: arch/x86/kernel/nopl_emu.c:26:
+ switch(*ip) {
ERROR: space required before the open parenthesis '('
#70: FILE: arch/x86/kernel/nopl_emu.c:27:
+ case 0x84:if(!ip[5])
ERROR: trailing statements should be on next line
#70: FILE: arch/x86/kernel/nopl_emu.c:27:
+ case 0x84:if(!ip[5])
ERROR: space required before the open parenthesis '('
#74: FILE: arch/x86/kernel/nopl_emu.c:31:
+ case 0x80:if(!ip[4] && !ip[3])
ERROR: trailing statements should be on next line
#74: FILE: arch/x86/kernel/nopl_emu.c:31:
+ case 0x80:if(!ip[4] && !ip[3])
ERROR: space required before the open parenthesis '('
#78: FILE: arch/x86/kernel/nopl_emu.c:35:
+ case 0x44:if(!ip[2])
ERROR: trailing statements should be on next line
#78: FILE: arch/x86/kernel/nopl_emu.c:35:
+ case 0x44:if(!ip[2])
ERROR: space required before the open parenthesis '('
#82: FILE: arch/x86/kernel/nopl_emu.c:39:
+ case 0x40:if(!ip[1])
ERROR: trailing statements should be on next line
#82: FILE: arch/x86/kernel/nopl_emu.c:39:
+ case 0x40:if(!ip[1])
ERROR: space required before the open parenthesis '('
#94: FILE: arch/x86/kernel/nopl_emu.c:51:
+ if(*ip == 0x1f)
ERROR: space required before the open parenthesis '('
#101: FILE: arch/x86/kernel/nopl_emu.c:58:
+ if(*ip == 0x90)
ERROR: space required before the open parenthesis '('
#103: FILE: arch/x86/kernel/nopl_emu.c:60:
+ if(*ip == 0x0f) {
ERROR: space required before the open parenthesis '('
#105: FILE: arch/x86/kernel/nopl_emu.c:62:
+ if(res)
ERROR: space required before the open parenthesis '('
#115: FILE: arch/x86/kernel/nopl_emu.c:72:
+ if(*ip == 0x0f)
ERROR: space required before the open parenthesis '('
#117: FILE: arch/x86/kernel/nopl_emu.c:74:
+ if(*ip == 0x66)
ERROR: "(foo*)" should be "(foo *)"
#139: FILE: arch/x86/kernel/nopl_emu.c:96:
+ int res = do_start((u8*)regs->ip);
ERROR: space required before the open parenthesis '('
#141: FILE: arch/x86/kernel/nopl_emu.c:98:
+ if(res)
ERROR: Missing Signed-off-by: line(s)
total: 19 errors, 1 warnings, 133 lines checked
This patch has style problems, please review.
If any of these errors are false positives report them
to the maintainer, see CHECKPATCH in MAINTAINERS.
--
DSL-Preisknaller: DSL Komplettpakete von GMX schon f?r
16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02
> Looks good? checkpatch.pl has a very different opinion.
Firstly please learn to trim your email
Secondly Ingo knows how to operate checkpatch and trivial style bits like
that are irrelevant to meaningful discussion about code
Alan
On Fri, Nov 6, 2009 at 4:59 PM, Alan Cox <[email protected]> wrote:
>> Looks good? checkpatch.pl has a very different opinion.
>
> Firstly please learn to trim your email
> Secondly Ingo knows how to operate checkpatch and trivial style bits like
> that are irrelevant to meaningful discussion about code
>
>
> Alan
>
I fixed the style issues with checkpatch.pl, here is it.
The AMD Geode LX has an x86 id of 5 (i586) tought it's technically an i686:
root@alix:~# egrep '^(cpu family|model name|flags)' /proc/cpuinfo
cpu family : 5
model name : Geode(TM) Integrated Processor by AMD PCS
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
mmxext 3dnowext 3dnow
indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
This patch adds a quirck to promote the Geode to an i686 and emulates
the NOPL in the do_invalid_op trap, so the userspace never notices.
Emulating the NOPL has minimum performance loss, emulating a NOPL
takes 0.5 usecs
and they are rarely used in x86
Signed-off-by: Matteo Croce <[email protected]>
--- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989 +0100
+++ b/arch/x86/kernel/Makefile 2009-11-06 15:07:04.294054613 +0100
@@ -89,7 +89,7 @@
obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_K8_NB) += k8.o
-obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
+obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o nopl_emu.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
--- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805 +0100
+++ b/arch/x86/kernel/cpu/amd.c 2009-11-06 15:07:04.294054613 +0100
@@ -138,8 +138,10 @@
}
if (c->x86_model == 10) {
- /* AMD Geode LX is model 10 */
- /* placeholder for any needed mods */
+ /* Geode only lacks the NOPL instruction to be i686,
+ but we can emulate it in the exception handler
+ and promote it to a class 6 cpu */
+ boot_cpu_data.x86 = 6;
return;
}
}
--- a/arch/x86/kernel/entry_32.S 2009-11-06 15:06:52.258224172 +0100
+++ b/arch/x86/kernel/entry_32.S 2009-11-06 15:07:04.306230613 +0100
@@ -901,7 +901,11 @@
RING0_INT_FRAME
pushl $0
CFI_ADJUST_CFA_OFFSET 4
+#ifdef CONFIG_MGEODE_LX
+ pushl $do_nopl_emu
+#else
pushl $do_invalid_op
+#endif
CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ b/arch/x86/kernel/nopl_emu.c 2009-11-06 17:36:40.684361766 +0100
@@ -0,0 +1,106 @@
+/*
+ * linux/arch/x86/kernel/nopl_emu.c
+ *
+ * Copyright (C) 2002 Willy Tarreau
+ * Copyright (C) 2009 Matteo Croce
+ */
+
+#include <linux/sched.h>
+#include <linux/linkage.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+void do_invalid_op(struct pt_regs *regs, long error_code);
+
+/* This code can be used to allow the AMD Geode to hopefully correctly execute
+ * some code which was originally compiled for an i686, by emulating NOPL,
+ * the only missing i686 instruction in the CPU
+ *
+ * Willy Tarreau <[email protected]>
+ * Matteo Croce <[email protected]>
+ */
+
+static inline int do_1f(u8 *ip)
+{
+ int length = 3;
+ switch (*ip) {
+ case 0x84:
+ if (!ip[5])
+ length++;
+ else
+ return 0;
+ case 0x80:
+ if (!ip[4] && !ip[3])
+ length += 2;
+ else
+ return 0;
+ case 0x44:
+ if (!ip[2])
+ length++;
+ else
+ return 0;
+ case 0x40:
+ if (!ip[1])
+ length++;
+ else
+ return 0;
+ case 0x00:return length;
+ default: return 0;
+ }
+ return length;
+}
+
+static inline int do_0f(u8 *ip)
+{
+ if (*ip == 0x1f)
+ return do_1f(ip + 1);
+ return 0;
+}
+
+static inline int do_66(u8 *ip)
+{
+ if (*ip == 0x90)
+ return 2;
+ if (*ip == 0x0f) {
+ int res = do_0f(ip + 1);
+ if (res)
+ return res + 1;
+ else
+ return 0;
+ }
+ return 0;
+}
+
+static inline int do_start(u8 *ip)
+{
+ if (*ip == 0x0f)
+ return do_0f(ip + 1);
+ if (*ip == 0x66)
+ return do_66(ip + 1);
+ return 0;
+}
+
+/* [do_nopl_emu] is called by exception 6 after an invalid opcode has been
+ * encountered. It will try to emulate it by doing nothing,
+ * and will send a SIGILL or SIGSEGV to the process if not possible.
+ * the NOPL can have variable length opcodes:
+
+bytes number opcode
+ 2 66 90
+ 3 0f 1f 00
+ 4 0f 1f 40 00
+ 5 0f 1f 44 00 00
+ 6 66 0f 1f 44 00 00
+ 7 0f 1f 80 00 00 00 00
+ 8 0f 1f 84 00 00 00 00 00
+ 9 66 0f 1f 84 00 00 00 00 00
+*/
+void do_nopl_emu(struct pt_regs *regs, long error_code)
+{
+ int res = do_start((u8 *)regs->ip);
+
+ if (res)
+ regs->ip += res;
+ else
+ do_invalid_op(regs, error_code);
+}
On 11/06/2009 06:59 AM, Matteo Croce wrote:
> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
I'm somewhat wondering about the general value of this patch; is i686
code really that much faster on Geode that it's worth it?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Fri, 6 Nov 2009 15:59:37 Alan Cox wrote:
> > Looks good? checkpatch.pl has a very different opinion.
>
> Firstly please learn to trim your email
No, this email contained comments regarding the code.
If it wasn't riddled with 19 errors (not bad for only 133 lines),
I would have bothered to remove these irrelevant lines.
So this advises goes right out of the window.
> Secondly Ingo knows how to operate checkpatch and trivial style bits
> like that are irrelevant to meaningful discussion about code.
Oh, I'm deeply sorry Sir Cox,
I was unaware of the fact that Ingo is just one of your checkpatch minions!?
There's a document, you should have heard of before,
Documentation/SubmittingPatches and it states: "
4) Style check your changes.
Check your patch for basic style violations, details of which can be
found in Documentation/CodingStyle. Failure to do so simply wastes
the reviewers time and will get your patch rejected, probably
without even being read.
At a minimum you should check your patches with the patch style
checker prior to submission (scripts/checkpatch.pl). You should
be able to justify all violations that remain in your patch."
Style checks are indeed part of the job of submitting a patch.
It's supposed to make life easier for the Maintainers, so
they only need to add the SOB-line. Instead of wasting their
precious time with really trivia checkpatch.pl fixes.
If you don't like these guidelines,
I'm sure you can call for one of your other
minions e.g. linus to change that for your majesty.
---
Anyway, Matteo Croce reacted quickly and posted a followup.
Well done!
--
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> If it wasn't riddled with 19 errors (not bad for only 133 lines),
> I would have bothered to remove these irrelevant lines.
Checkpatch is just formatting - its just an aide nothing more. It's not
remotely useful to bother with them for stuff that is basically sanely
formatted until such point as someone is actually sure the patch is worth
going into the tree.
On Fri, 6 Nov 2009 18:22:18 Alan Cox wrote:
> > If it wasn't riddled with 19 errors (not bad for only 133 lines),
> > I would have bothered to remove these irrelevant lines.
>
> Checkpatch is just formatting - its just an aide nothing more.
> It's not remotely useful to bother with them for stuff that is
> basically sanely formatted until such point as someone is actually
> sure the patch is worth going into the tree.
the utility is called checkpatch and not checkstyle or checkformatting.
And there's a good reason behind this decision, because it does
more than just checking style.
e.g:
- correct use of some blackfin hi/lo macros.
- if certain data structures are declared as const
(struct seq_operations/file_operations)
- correct use of NR_CPUS is usually wrong
- complains about in_atomic() outside core kernel code
- warns about LINUX_VERSION_CODE, #if 0,
volatile or deprecated functions.
- informs about needless kfree/usb_free_urb checks
- etc...
and I'm sure that future modifications will add more
useful functionality _checks_ to many more _common pitfalls_
areas.
--
DSL-Preisknaller: DSL Komplettpakete von GMX schon f?r
16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02
On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>
> MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> I'm somewhat wondering about the general value of this patch; is i686
> code really that much faster on Geode that it's worth it?
>
> ? ? ? ?-hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. ?I don't speak on their behalf.
>
>
yes, I did some test like gzip, bzip2, lame etc and they give more or less
the same results of dhrystone
root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
gcc -c -O3 -march=i586 ./dry.c -o dry1.o
gcc -DPASS2 -O3 -march=i586 ./dry.c dry1.o -o dry2
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using times(), HZ=100
Trying 5000000 runs through Dhrystone:
Microseconds for one run through Dhrystone: 1.4
Dhrystones per Second: 740741
root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
gcc -c -O3 -march=i686 ./dry.c -o dry1.o
gcc -DPASS2 -O3 -march=i686 ./dry.c dry1.o -o dry2
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using times(), HZ=100
Trying 5000000 runs through Dhrystone:
Microseconds for one run through Dhrystone: 1.2
Dhrystones per Second: 841751
On Fri, 6 Nov 2009 21:02:59 Alan Cox wrote:
> ** Plonk ***
*** Plonk ***
here, Sir, I have that posting fixed that for you.
--
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
"Martin Schleier" <[email protected]> writes:
> e.g:
> - correct use of some blackfin hi/lo macros.
> - if certain data structures are declared as const
> (struct seq_operations/file_operations)
> - correct use of NR_CPUS is usually wrong
> - complains about in_atomic() outside core kernel code
> - warns about LINUX_VERSION_CODE, #if 0,
> volatile or deprecated functions.
> - informs about needless kfree/usb_free_urb checks
> - etc...
Did the patch in question contain such problems?
--
Krzysztof Halasa
Sat, 07 Nov 2009 00:05:12 Krzystof Halasa
> "Martin Schleier" <[email protected]> writes:
>On Fri, 6 Nov 2009 18:22:18 Alan Cox wrote:
> > > If it wasn't riddled with 19 errors (not bad for only 133 lines),
> > > I would have bothered to remove these irrelevant lines.
> >
> > Checkpatch is just formatting - its just an aide nothing more.
> > It's not remotely useful to bother with them for stuff that is
> > basically sanely formatted until such point as someone is actually
> > sure the patch is worth going into the tree.
> >
> > the utility is called checkpatch and not checkstyle or
> > checkformatting.
> > And there's a good reason behind this decision, because it does
> > more than just checking style.
> >
> > e.g:
> > - correct use of some blackfin hi/lo macros.
> > - if certain data structures are declared as const
> > (struct seq_operations/file_operations)
> > - correct use of NR_CPUS is usually wrong
> > - complains about in_atomic() outside core kernel code
> > - warns about LINUX_VERSION_CODE, #if 0,
> > volatile or deprecated functions.
> > - informs about needless kfree/usb_free_urb checks
> > - etc...
> >
> > and I'm sure that future modifications will add more
> >useful functionality _checks_ to many more _common pitfalls_
> >areas.
>
> Did the patch in question contain such problems?
the last point:
- etc... =>
"WARNING: externs should be avoided in .c files
#56: FILE: arch/x86/kernel/nopl_emu.c:13:
+void do_invalid_op(struct pt_regs *regs, long error_code);" ?
(or do you think that this is a formatting issue?!)
a grep will give you a header file where it is defined:
"arch/x86/include/asm/traps.h"
dotraplinkage void do_invalid_op(struct pt_regs *, long);
anyway, in case we get more followers here. I put your question back
in context of the original response. Because this discussion-branch was
not about arguing about nopl emulation, since - apparently - nothing
was/is wrong with the code itself.
Instead, we ended up here because of:
Fri, 6 Nov 2009 15:59:37 Alan Cox wrote:
"Secondly Ingo knows how to operate checkpatch and trivial style bits like
that are irrelevant to meaningful discussion about code."
And this is clearly not the case. It is the job of a Submitter
(as described in Documentations/SubmittingPatches section 4)
to check and test his patches with tools like checkpatch or sparse
before posting them.
After all this patch is going into /arch/x86 and not /drivers/staging
and this sort of "extern declaration" is prone to break one day when
void do_invalid_op(struct pt_regs *, long); declaration is modified.
--
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
On Fri, 6 Nov 2009 23:18:06 +0100
Matteo Croce <[email protected]> wrote:
> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> >
> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
cmov is not i686 either. Its an optional extension that *should* only be
used after you test its available. So gcc "i686" isn't quite "686". There
are of course good reasons for that choice.
> yes, I did some test like gzip, bzip2, lame etc and they give more or less
> the same results of dhrystone
How does that compare to i486. Certainly the old Nat Semi geode seemed to
prefer to be fed i486 code to i586 (and wouldn't of course hack i686).
You might also want to play around with -mtune= as well as arch= before
assuming why i686 is a win
(I still btw think the patch is a good idea, it simplifies life
enormously for users with the CPU)
Alan
"Martin Schleier" <[email protected]> writes:
>> Did the patch in question contain such problems?
> the last point:
> - etc... =>
Yeah.
> "WARNING: externs should be avoided in .c files
Ironically, it's the only "WARNING" while the rest are "ERRORS".
OTOH I personally believe all output from checkpatch should be labeled
"WARNING"; it's not for checkpatch to decide. It's only a tool.
> #56: FILE: arch/x86/kernel/nopl_emu.c:13:
> +void do_invalid_op(struct pt_regs *regs, long error_code);" ?
> (or do you think that this is a formatting issue?!)
Actually, I think it wasn't any issue at all at this point, when it
wasn't yet established if the patch makes sense at all.
> It is the job of a Submitter
> (as described in Documentations/SubmittingPatches section 4)
> to check and test his patches with tools like checkpatch or sparse
> before posting them.
You apparently forgot what SubmittingPatches file is all about:
"This text is a collection of suggestions which can greatly increase the
chances of your change being accepted."
You know, we don't have laws for everything here. And we're not
androids specialized in producing C code. We are supposed to use some
common sense first.
> After all this patch is going into /arch/x86 and not /drivers/staging
> and this sort of "extern declaration" is prone to break one day when
> void do_invalid_op(struct pt_regs *, long); declaration is modified.
That's true, though it's the same for "staging".
--
Krzysztof Halasa
On Sat, Nov 7, 2009 at 1:05 AM, Martin Schleier <[email protected]> wrote:
> Sat, 07 Nov 2009 00:05:12 Krzystof Halasa
>> "Martin Schleier" <[email protected]> writes:
>>On Fri, 6 Nov 2009 18:22:18 Alan Cox wrote:
>> > > If it wasn't riddled with 19 errors (not bad for only 133 lines),
>> > > I would have bothered to remove these irrelevant lines.
>> >
>> > Checkpatch is just formatting - its just an aide nothing more.
>> > It's not remotely useful to bother with them for stuff that is
>> > basically sanely formatted until such point as someone is actually
>> > sure the patch is worth going into the tree.
>> >
>> > the utility is called checkpatch and not checkstyle or
>> > checkformatting.
>> > And there's a good reason behind this decision, because it does
>> > more than just checking style.
>> >
>> > e.g:
>> > - correct use of some blackfin hi/lo macros.
>> > - if certain data structures are declared as const
>> > ? (struct seq_operations/file_operations)
>> > - correct use of NR_CPUS is usually wrong
>> > - complains about in_atomic() outside core kernel code
>> > - warns about LINUX_VERSION_CODE, #if 0,
>> > ? volatile or deprecated functions.
>> > - informs about needless kfree/usb_free_urb checks
>> > - etc...
>> >
>> > and I'm sure that future modifications will add more
>> >useful functionality _checks_ to many more _common pitfalls_
>> >areas.
>>
>> Did the patch in question contain such problems?
> the last point:
> ?- etc... =>
>
> "WARNING: externs should be avoided in .c files
> #56: FILE: arch/x86/kernel/nopl_emu.c:13:
> +void do_invalid_op(struct pt_regs *regs, long error_code);" ?
> (or do you think that this is a formatting issue?!)
>
> a grep will give you a header file where it is defined:
> "arch/x86/include/asm/traps.h"
> dotraplinkage void do_invalid_op(struct pt_regs *, long);
>
> anyway, in case we get more followers here. I put your question back
> in context of the original response. Because this discussion-branch was
> not about arguing about nopl emulation, since - apparently - nothing
> was/is wrong with the code itself.
>
> Instead, we ended up here because of:
>
> Fri, 6 Nov 2009 15:59:37 Alan Cox wrote:
> "Secondly Ingo knows how to operate checkpatch and trivial style bits like
> that are irrelevant to meaningful discussion about code."
>
> And this is clearly not the case. It is the job of a Submitter
> (as described in Documentations/SubmittingPatches section 4)
> to check and test his patches with tools like checkpatch or sparse
> before posting them.
>
> After all this patch is going into /arch/x86 and not /drivers/staging
> and this sort of "extern declaration" is prone to break one day when
> void do_invalid_op(struct pt_regs *, long); declaration is modified.
> --
> GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
>
This one is perfect according to checkpatch.pl
The AMD Geode LX has an x86 id of 5 (i586) tought it's technically an i686:
root@alix:~# egrep '^(cpu family|model name|flags)' /proc/cpuinfo
cpu family : 5
model name : Geode(TM) Integrated Processor by AMD PCS
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
mmxext 3dnowext 3dnow
indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
This patch adds a quirck to promote the Geode to an i686 and emulates
the NOPL in the do_invalid_op trap, so the userspace never notices.
Emulating the NOPL has minimum performance loss, emulating a NOPL
takes 0.5 usecs
and they are rarely used in x86
Signed-off-by: Matteo Croce <[email protected]
--- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989 +0100
+++ b/arch/x86/kernel/Makefile 2009-11-06 15:07:04.294054613 +0100
@@ -89,7 +89,7 @@
obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_K8_NB) += k8.o
-obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
+obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o nopl_emu.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
--- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805 +0100
+++ b/arch/x86/kernel/cpu/amd.c 2009-11-06 15:07:04.294054613 +0100
@@ -138,8 +138,10 @@
}
if (c->x86_model == 10) {
- /* AMD Geode LX is model 10 */
- /* placeholder for any needed mods */
+ /* Geode only lacks the NOPL instruction to be i686,
+ but we can emulate it in the exception handler
+ and promote it to a class 6 cpu */
+ boot_cpu_data.x86 = 6;
return;
}
}
--- a/arch/x86/kernel/entry_32.S 2009-11-06 15:06:52.258224172 +0100
+++ b/arch/x86/kernel/entry_32.S 2009-11-06 15:07:04.306230613 +0100
@@ -901,7 +901,11 @@
RING0_INT_FRAME
pushl $0
CFI_ADJUST_CFA_OFFSET 4
+#ifdef CONFIG_MGEODE_LX
+ pushl $do_nopl_emu
+#else
pushl $do_invalid_op
+#endif
CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ b/arch/x86/kernel/nopl_emu.c 2009-11-07 11:59:17.667748571 +0100
@@ -0,0 +1,103 @@
+/*
+ * linux/arch/x86/kernel/nopl_emu.c
+ *
+ * Copyright (C) 2002 Willy Tarreau
+ * Copyright (C) 2009 Matteo Croce
+ */
+
+#include <linux/linkage.h>
+#include <asm/math_emu.h>
+#include <asm/traps.h>
+
+/* This code can be used to allow the AMD Geode to hopefully correctly execute
+ * some code which was originally compiled for an i686, by emulating NOPL,
+ * the only missing i686 instruction in the CPU
+ *
+ * Willy Tarreau <[email protected]>
+ * Matteo Croce <[email protected]>
+ */
+
+static inline int do_1f(u8 *ip)
+{
+ int length = 3;
+ switch (*ip) {
+ case 0x84:
+ if (!ip[5])
+ length++;
+ else
+ return 0;
+ case 0x80:
+ if (!ip[4] && !ip[3])
+ length += 2;
+ else
+ return 0;
+ case 0x44:
+ if (!ip[2])
+ length++;
+ else
+ return 0;
+ case 0x40:
+ if (!ip[1])
+ length++;
+ else
+ return 0;
+ case 0x00:
+ return length;
+ }
+ return 0;
+}
+
+static inline int do_0f(u8 *ip)
+{
+ if (*ip == 0x1f)
+ return do_1f(ip + 1);
+ return 0;
+}
+
+static inline int do_66(u8 *ip)
+{
+ if (*ip == 0x90)
+ return 2;
+ if (*ip == 0x0f) {
+ int res = do_0f(ip + 1);
+ if (res)
+ return res + 1;
+ else
+ return 0;
+ }
+ return 0;
+}
+
+static inline int do_start(u8 *ip)
+{
+ if (*ip == 0x0f)
+ return do_0f(ip + 1);
+ if (*ip == 0x66)
+ return do_66(ip + 1);
+ return 0;
+}
+
+/* [do_nopl_emu] is called by exception 6 after an invalid opcode has been
+ * encountered. It will try to emulate it by doing nothing,
+ * and will send a SIGILL or SIGSEGV to the process if not possible.
+ * the NOPL can have variable length opcodes:
+
+bytes number opcode
+ 2 66 90
+ 3 0f 1f 00
+ 4 0f 1f 40 00
+ 5 0f 1f 44 00 00
+ 6 66 0f 1f 44 00 00
+ 7 0f 1f 80 00 00 00 00
+ 8 0f 1f 84 00 00 00 00 00
+ 9 66 0f 1f 84 00 00 00 00 00
+*/
+void do_nopl_emu(struct pt_regs *regs, long error_code)
+{
+ int res = do_start((u8 *)instruction_pointer(regs));
+
+ if (res)
+ regs->ip += res;
+ else
+ do_invalid_op(regs, error_code);
+}
On Sat, 07 Nov 2009 11:37:46 Krzysztof Halasa
> "Martin Schleier" <[email protected]> writes:
> >> Did the patch in question contain such problems?
> > the last point:
> > - etc... =>
>
> Yeah.
>
> > "WARNING: externs should be avoided in .c files
>
> Ironically, it's the only "WARNING" while the rest are "ERRORS".
> OTOH I personally believe all output from checkpatch should be labeled
> "WARNING"; it's not for checkpatch to decide. It's only a tool.
Ironically, I assumed that these matters are taken somewhat serious
and everything would be fine after the respin...
But instead, all everyone (except the submitter) is barking whenever checkpatch.pl is irrelevant - because apparently it only catches formatting errors -
(BTW: I think this message should be an ERROR, because it
can really break Randy Dulap's massive kernel compile tests)
> > #56: FILE: arch/x86/kernel/nopl_emu.c:13:
> > +void do_invalid_op(struct pt_regs *regs, long error_code);" ?
> > (or do you think that this is a formatting issue?!)
>
> Actually, I think it wasn't any issue at all at this point, when it
> wasn't yet established if the patch makes sense at all.
here's the quote from on which the comment was based:
| On Sat, Oct 3, 2009 at 8:21 AM, Ingo Molnar <[email protected]> wrote:
|
| Looks good, but your signoff line is missing.
|
| Ingo
now tell me: What is the word he was using to say that the idea
needs _rethinking_ and that he's declined to merge the patch
in the foreseeable future because of these shortcomings?
I can't see them, but I would be delighted if you can
point them out to me.
The discussion whenever this feature make sense has
taken place _a bit_ earlier in the thread with a _positive_ result.
(if you look at the date: the thread started over a month ago:
http://lkml.org/lkml/2009/10/2/464 )
so I'm not sure if everyone was aware of this,
since this might explain the _differences_.
> > It is the job of a Submitter
> > (as described in Documentations/SubmittingPatches section 4)
> > to check and test his patches with tools like checkpatch or sparse
> > before posting them.
>
> You apparently forgot what SubmittingPatches file is all about:
>
> "This text is a collection of suggestions which can greatly increase the
> chances of your change being accepted."
> You know, we don't have laws for everything here.
> And we're not androids specialized in producing C code.
> We are supposed to use some common sense first.
Ahh common sense, so checking & testing your work
before submitting it not _common sense_ anymore?
Surely it's hard for anyone new to know about this before
hitting "send". But so far everyone has stumbled across this. :\
But back to the topic about laws.
What about: "12) Sign your work" in the same SubmittingPatches file?
Have you noticed that only SubmittingPatches talks about the signed-off-by?
And we all know that every patch has to have one.
So Clearly SubmittingPatches really contains LAWs for how to do things.
The only question is if "4) Style check your changes." is a
guideline or a _more_... And there's a paragraph in Documentation/SubmitChecklist:"
5: Check your patch for general style as detailed in
Documentation/CodingStyle. Check for trivial violations with the
patch style checker prior to submission (scripts/checkpatch.pl).
[BOLD] You should be able to justify all violations that remain in
your patch. [BOLD]"
Andy Whitcroft <[email protected]> clearly wrote that down.
(And there's no point in arguing about checkpatch.pl when you
have to JUSTIFY ALL REMAINING VIOLATIONS, or more to the point:
FIX THEM INSTEAD.)
And now my - head hurts - we need a lawyer to answer if this
IS or IS NOT a law before we can bang on with this.
And yes Documentation/SubmitChecklist also has the same _header_:
"Here are some basic things that developers should do if they want to see their kernel patch submissions accepted more quickly."
I know about that and I agree: time is always an issue.
This cycle is already @ -rc6 (rc5) and given that the debating
started over a month ago it's really time to get cracking...
Thankfully v3 is already available, and even better: fixed :-).
> > After all this patch is going into /arch/x86 and not /drivers/staging
> > and this sort of "extern declaration" is prone to break one day when
> > void do_invalid_op(struct pt_regs *, long); declaration is modified.
>
> That's true, though it's the same for "staging".
AFAIK the only rule for staging is: it must compile (somehow).
but I'm sure we can ask greg here if there are uncertainties.
--
DSL-Preisknaller: DSL Komplettpakete von GMX schon f?r
16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02
"Martin Schleier" <[email protected]> writes:
> And now my - head hurts - we need a lawyer to answer if this
> IS or IS NOT a law before we can bang on with this.
Good luck. I have no more questions.
--
Krzysztof Halasa
On 11/07/2009 03:11 AM, Matteo Croce wrote:
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ b/arch/x86/kernel/nopl_emu.c 2009-11-07 11:59:17.667748571 +0100
> @@ -0,0 +1,103 @@
> +/*
> + * linux/arch/x86/kernel/nopl_emu.c
> + *
> + * Copyright (C) 2002 Willy Tarreau
> + * Copyright (C) 2009 Matteo Croce
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/math_emu.h>
> +#include <asm/traps.h>
> +
> +/* This code can be used to allow the AMD Geode to hopefully correctly execute
> + * some code which was originally compiled for an i686, by emulating NOPL,
> + * the only missing i686 instruction in the CPU
> + *
> + * Willy Tarreau <[email protected]>
> + * Matteo Croce <[email protected]>
> + */
> +
If we're doing to introduce a missed-instruction interpreter (which is
what this is) in the kernel, it needs to handle all the subtleties of
x86 execution correctly; in particular I believe it needs to check the
code segment limits, permissions, and mode. Things it doesn't
understand it can SIGILL (or, if more appropriate, SIGSEGV) on, of course.
Personally I think the easiest is to verify that the code segment is
flat 32 bits or even more specifically CS == USER_CS, and SIGILL otherwise.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
See comment below. BTW, how does this affect performance on LXs?
Do you have any hard numbers for common tasks?
On Sat, 7 Nov 2009 12:11:55 +0100
Matteo Croce <[email protected]> wrote:
[...]
>
> --- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989
> +0100 +++ b/arch/x86/kernel/Makefile 2009-11-06
> 15:07:04.294054613 +0100 @@ -89,7 +89,7 @@
> obj-$(CONFIG_HPET_TIMER) += hpet.o
>
> obj-$(CONFIG_K8_NB) += k8.o
> -obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
> +obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
> nopl_emu.o obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
> obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
>
> --- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805
> +0100 +++ b/arch/x86/kernel/cpu/amd.c 2009-11-06
> 15:07:04.294054613 +0100 @@ -138,8 +138,10 @@
> }
>
> if (c->x86_model == 10) {
> - /* AMD Geode LX is model 10 */
> - /* placeholder for any needed mods */
> + /* Geode only lacks the NOPL instruction to be i686,
> + but we can emulate it in the exception handler
> + and promote it to a class 6 cpu */
> + boot_cpu_data.x86 = 6;
> return;
> }
If you're going to update this, you also need to make sure that you're
not breaking things that check it. For example,
arch/x86/include/asm/geode.h has an is_geode_lx check that expects
boot_cpu_data.x86 to be 5. Please be sure to update all these places
when creating a patch like this.
Hi!
> The AMD Geode LX has an x86 id of 5 (i586) tought it's technically an i686:
It is not.
> root@alix:~# egrep '^(cpu family|model name|flags)' /proc/cpuinfo
> cpu family : 5
> model name : Geode(TM) Integrated Processor by AMD PCS
> flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx
> mmxext 3dnowext 3dnow
>
> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> This patch adds a quirck to promote the Geode to an i686 and emulates
> the NOPL in the do_invalid_op trap, so the userspace never notices.
> Emulating the NOPL has minimum performance loss, emulating a NOPL
> takes 0.5 usecs
> and they are rarely used in x86
NOP should be fast, so this is bad idea.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> >
> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> > I'm somewhat wondering about the general value of this patch; is i686
> > code really that much faster on Geode that it's worth it?
> >
> > ? ? ? ?-hpa
> >
> > --
> > H. Peter Anvin, Intel Open Source Technology Center
> > I work for Intel. ?I don't speak on their behalf.
> >
> >
>
> yes, I did some test like gzip, bzip2, lame etc and they give more or less
> the same results of dhrystone
>
> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
> Microseconds for one run through Dhrystone: 1.4
> Dhrystones per Second: 740741
...
> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
> Trying 5000000 runs through Dhrystone:
> Microseconds for one run through Dhrystone: 1.2
> Dhrystones per Second: 841751
Teach gcc that geodelx exists? No need to break kernel for that... and
you probably can gain even bigger gains.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
> On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
>> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
>> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>> >
>> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
>> > I'm somewhat wondering about the general value of this patch; is i686
>> > code really that much faster on Geode that it's worth it?
>> >
>> > ? ? ? ?-hpa
>> >
>> > --
>> > H. Peter Anvin, Intel Open Source Technology Center
>> > I work for Intel. ?I don't speak on their behalf.
>> >
>> >
>>
>> yes, I did some test like gzip, bzip2, lame etc and they give more or less
>> the same results of dhrystone
>>
>> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
>> Microseconds for one run through Dhrystone: ? ? ? ?1.4
>> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
> ...
>> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
>> Trying 5000000 runs through Dhrystone:
>> Microseconds for one run through Dhrystone: ? ? ? ?1.2
>> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
>
> Teach gcc that geodelx exists? No need to break kernel for that... and
> you probably can gain even bigger gains.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Pavel
Gcc 4.4 already knows about it, just sucks at optimizing:
# CFLAGS='-march=geode' ./dry.c
gcc -c -O3 -march=geode ./dry.c -o dry1.o
gcc -DPASS2 -O3 -march=geode ./dry.c dry1.o -o dry2
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using times(), HZ=100
Trying 5000000 runs through Dhrystone:
Microseconds for one run through Dhrystone: 1.4
Dhrystones per Second: 719424
On Sun, Nov 8, 2009 at 5:05 PM, Andres Salomon <[email protected]> wrote:
> See comment below. ?BTW, how does this affect performance on LXs?
> Do you have any hard numbers for common tasks?
>
> On Sat, 7 Nov 2009 12:11:55 +0100
> Matteo Croce <[email protected]> wrote:
> [...]
>>
>> --- a/arch/x86/kernel/Makefile ? ? ? ?2009-11-06 15:06:52.246223989
>> +0100 +++ b/arch/x86/kernel/Makefile ?2009-11-06
>> 15:07:04.294054613 +0100 @@ -89,7 +89,7 @@
>> ?obj-$(CONFIG_HPET_TIMER) ? ? += hpet.o
>>
>> ?obj-$(CONFIG_K8_NB) ? ? ? ? ?+= k8.o
>> -obj-$(CONFIG_MGEODE_LX) ? ? ? ? ? ? ?+= geode_32.o mfgpt_32.o
>> +obj-$(CONFIG_MGEODE_LX) ? ? ? ? ? ? ?+= geode_32.o mfgpt_32.o
>> nopl_emu.o obj-$(CONFIG_DEBUG_RODATA_TEST) ? ?+= test_rodata.o
>> ?obj-$(CONFIG_DEBUG_NX_TEST) ?+= test_nx.o
>>
>> --- a/arch/x86/kernel/cpu/amd.c ? ? ? 2009-11-06 15:06:52.254223805
>> +0100 +++ b/arch/x86/kernel/cpu/amd.c 2009-11-06
>> 15:07:04.294054613 +0100 @@ -138,8 +138,10 @@
>> ? ? ? }
>>
>> ? ? ? if (c->x86_model == 10) {
>> - ? ? ? ? ? ? /* AMD Geode LX is model 10 */
>> - ? ? ? ? ? ? /* placeholder for any needed mods */
>> + ? ? ? ? ? ? /* Geode only lacks the NOPL instruction to be i686,
>> + ? ? ? ? ? ? ? ?but we can emulate it in the exception handler
>> + ? ? ? ? ? ? ? ?and promote it to a class 6 cpu */
>> + ? ? ? ? ? ? boot_cpu_data.x86 = 6;
>> ? ? ? ? ? ? ? return;
>> ? ? ? }
>
> If you're going to update this, you also need to make sure that you're
> not breaking things that check it. ?For example,
> arch/x86/include/asm/geode.h has an is_geode_lx check that expects
> boot_cpu_data.x86 to be 5. ?Please be sure to update all these places
> when creating a patch like this.
>
True, but also remove the duplicate function is_geode in the NAND driver
and use the identical one defined in geode.h:
--- a/drivers/mtd/nand/cs553x_nand.c 2009-11-08 18:58:14.835043214 +0100
+++ b/drivers/mtd/nand/cs553x_nand.c 2009-11-08 19:00:07.914117831 +0100
@@ -30,6 +30,7 @@
#include <asm/msr.h>
#include <asm/io.h>
+#include <asm/geode.h>
#define NR_CS553X_CONTROLLERS 4
@@ -260,23 +261,6 @@
return err;
}
-static int is_geode(void)
-{
- /* These are the CPUs which will have a CS553[56] companion chip */
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
- boot_cpu_data.x86 == 5 &&
- boot_cpu_data.x86_model == 10)
- return 1; /* Geode LX */
-
- if ((boot_cpu_data.x86_vendor == X86_VENDOR_NSC ||
- boot_cpu_data.x86_vendor == X86_VENDOR_CYRIX) &&
- boot_cpu_data.x86 == 5 &&
- boot_cpu_data.x86_model == 5)
- return 1; /* Geode GX (n?e GX2) */
-
- return 0;
-}
-
#ifdef CONFIG_MTD_PARTITIONS
static const char *part_probes[] = { "cmdlinepart", NULL };
On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
> On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
> > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
> >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
> >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> >> >
> >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> >> > I'm somewhat wondering about the general value of this patch; is i686
> >> > code really that much faster on Geode that it's worth it?
> >> >
> >> > ? ? ? ?-hpa
> >> >
> >> > --
> >> > H. Peter Anvin, Intel Open Source Technology Center
> >> > I work for Intel. ?I don't speak on their behalf.
> >> >
> >> >
> >>
> >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
> >> the same results of dhrystone
> >>
> >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
> >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
> >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
> > ...
> >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
> >> Trying 5000000 runs through Dhrystone:
> >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
> >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
> >
> > Teach gcc that geodelx exists? No need to break kernel for that... and
> > you probably can gain even bigger gains.
>
> Gcc 4.4 already knows about it, just sucks at optimizing:
Good. So there's really no point in breaking kernel.
> # CFLAGS='-march=geode' ./dry.c
> gcc -c -O3 -march=geode ./dry.c -o dry1.o
> gcc -DPASS2 -O3 -march=geode ./dry.c dry1.o -o dry2
>
> Dhrystone Benchmark, Version C, Version 2.2
> Program compiled without 'register' attribute
> Using times(), HZ=100
>
> Trying 5000000 runs through Dhrystone:
> Microseconds for one run through Dhrystone: 1.4
> Dhrystones per Second: 719424
...fix gcc to genereta code at least as good as for i686 and you are
done...?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Nov 8, 2009 at 7:10 PM, Pavel Machek <[email protected]> wrote:
> On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
>> On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
>> > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
>> >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
>> >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>> >> >
>> >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
>> >> > I'm somewhat wondering about the general value of this patch; is i686
>> >> > code really that much faster on Geode that it's worth it?
>> >> >
>> >> > ? ? ? ?-hpa
>> >> >
>> >> > --
>> >> > H. Peter Anvin, Intel Open Source Technology Center
>> >> > I work for Intel. ?I don't speak on their behalf.
>> >> >
>> >> >
>> >>
>> >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
>> >> the same results of dhrystone
>> >>
>> >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
>> >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
>> >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
>> > ...
>> >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
>> >> Trying 5000000 runs through Dhrystone:
>> >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
>> >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
>> >
>> > Teach gcc that geodelx exists? No need to break kernel for that... and
>> > you probably can gain even bigger gains.
>>
>> Gcc 4.4 already knows about it, just sucks at optimizing:
>
> Good. So there's really no point in breaking kernel.
>
>> # CFLAGS='-march=geode' ./dry.c
>> gcc -c -O3 -march=geode ./dry.c -o dry1.o
>> gcc -DPASS2 -O3 -march=geode ./dry.c dry1.o ?-o dry2
>>
>> Dhrystone Benchmark, Version C, Version 2.2
>> Program compiled without 'register' attribute
>> Using times(), HZ=100
>>
>> Trying 5000000 runs through Dhrystone:
>> Microseconds for one run through Dhrystone: ? ? ? ?1.4
>> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?719424
>
> ...fix gcc to genereta code at least as good as for i686 and you are
> done...?
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
This is on my TODO list, but better to discuss it in the GCC mailing list,
the kernel should use -march=geode and GCC should generate the best code
for the AMD Geode, actually the i686 one
On Sun, Nov 8, 2009 at 5:05 PM, Andres Salomon <[email protected]> wrote:
> See comment below. ?BTW, how does this affect performance on LXs?
> Do you have any hard numbers for common tasks?
>
> On Sat, 7 Nov 2009 12:11:55 +0100
> Matteo Croce <[email protected]> wrote:
> [...]
>>
>> --- a/arch/x86/kernel/Makefile ? ? ? ?2009-11-06 15:06:52.246223989
>> +0100 +++ b/arch/x86/kernel/Makefile ?2009-11-06
>> 15:07:04.294054613 +0100 @@ -89,7 +89,7 @@
>> ?obj-$(CONFIG_HPET_TIMER) ? ? += hpet.o
>>
>> ?obj-$(CONFIG_K8_NB) ? ? ? ? ?+= k8.o
>> -obj-$(CONFIG_MGEODE_LX) ? ? ? ? ? ? ?+= geode_32.o mfgpt_32.o
>> +obj-$(CONFIG_MGEODE_LX) ? ? ? ? ? ? ?+= geode_32.o mfgpt_32.o
>> nopl_emu.o obj-$(CONFIG_DEBUG_RODATA_TEST) ? ?+= test_rodata.o
>> ?obj-$(CONFIG_DEBUG_NX_TEST) ?+= test_nx.o
>>
>> --- a/arch/x86/kernel/cpu/amd.c ? ? ? 2009-11-06 15:06:52.254223805
>> +0100 +++ b/arch/x86/kernel/cpu/amd.c 2009-11-06
>> 15:07:04.294054613 +0100 @@ -138,8 +138,10 @@
>> ? ? ? }
>>
>> ? ? ? if (c->x86_model == 10) {
>> - ? ? ? ? ? ? /* AMD Geode LX is model 10 */
>> - ? ? ? ? ? ? /* placeholder for any needed mods */
>> + ? ? ? ? ? ? /* Geode only lacks the NOPL instruction to be i686,
>> + ? ? ? ? ? ? ? ?but we can emulate it in the exception handler
>> + ? ? ? ? ? ? ? ?and promote it to a class 6 cpu */
>> + ? ? ? ? ? ? boot_cpu_data.x86 = 6;
>> ? ? ? ? ? ? ? return;
>> ? ? ? }
>
> If you're going to update this, you also need to make sure that you're
> not breaking things that check it. ?For example,
> arch/x86/include/asm/geode.h has an is_geode_lx check that expects
> boot_cpu_data.x86 to be 5. ?Please be sure to update all these places
> when creating a patch like this.
>
Right, but what if is_geode_lx() is called befor the x86.id change takes effect?
Maybe something like this?
--- a/arch/x86/include/asm/geode.h 2009-11-08 19:13:43.531117343 +0100
+++ b/arch/x86/include/asm/geode.h 2009-11-08 19:19:42.130618023 +0100
@@ -177,7 +177,7 @@
static inline int is_geode_lx(void)
{
return ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
- (boot_cpu_data.x86 == 5) &&
+ (boot_cpu_data.x86 == 5 || boot_cpu_data.x86 == 6) &&
(boot_cpu_data.x86_model == 10));
}
On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>
> MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> I'm somewhat wondering about the general value of this patch; is i686
> code really that much faster on Geode that it's worth it?
>
> ? ? ? ?-hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. ?I don't speak on their behalf.
>
>
There is a small advantage, but considering that GCC isn't much geode aware yet
there is stil room for improvement IMHO:
root@alix:/usr/src/dist# ll
totale 257M
-rwxr-xr-x 1 1000 src 93K 8 nov 2009 bzip2-i586
-rwxr-xr-x 1 1000 src 93K 8 nov 2009 bzip2-i686
-rwxr-xr-x 1 1000 src 60K 8 nov 2009 gzip-i586
-rwxr-xr-x 1 1000 src 60K 8 nov 2009 gzip-i686
-rw-r--r-- 1 1000 src 256M 8 nov 2009 linux-2.6.31.5.tar
-rwxr-xr-x 1 1000 src 90K 8 nov 2009 lzma-i586
-rwxr-xr-x 1 1000 src 94K 8 nov 2009 lzma-i686
root@alix:/usr/src/dist# time cat linux-2.6.31.5.tar >/dev/null
real 0m10.168s
user 0m0.030s
sys 0m1.390s
root@alix:/usr/src/dist# time ./gzip-i586 -9 < linux-2.6.31.5.tar >/dev/null
real 5m22.331s
user 5m10.820s
sys 0m11.170s
root@alix:/usr/src/dist# time ./gzip-i686 -9 < linux-2.6.31.5.tar >/dev/null
real 5m3.737s
user 4m51.880s
sys 0m11.510s
root@alix:/usr/src/dist# time ./bzip2-i586 -9 < linux-2.6.31.5.tar >/dev/null
real 9m16.539s
user 9m4.410s
sys 0m11.760s
root@alix:/usr/src/dist# time ./bzip2-i686 -9 < linux-2.6.31.5.tar >/dev/null
real 8m48.682s
user 8m34.950s
sys 0m13.260s
On Sun, 8 Nov 2009 19:04:35 +0100
Matteo Croce <[email protected]> wrote:
[...]
>
> True, but also remove the duplicate function is_geode in the NAND
> driver and use the identical one defined in geode.h:
>
> --- a/drivers/mtd/nand/cs553x_nand.c 2009-11-08
> 18:58:14.835043214 +0100 +++ b/drivers/mtd/nand/cs553x_nand.c
> 2009-11-08 19:00:07.914117831 +0100 @@ -30,6 +30,7 @@
>
> #include <asm/msr.h>
> #include <asm/io.h>
> +#include <asm/geode.h>
>
> #define NR_CS553X_CONTROLLERS 4
>
> @@ -260,23 +261,6 @@
> return err;
> }
>
> -static int is_geode(void)
> -{
> - /* These are the CPUs which will have a CS553[56] companion
> chip */
> - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> - boot_cpu_data.x86 == 5 &&
> - boot_cpu_data.x86_model == 10)
> - return 1; /* Geode LX */
> -
> - if ((boot_cpu_data.x86_vendor == X86_VENDOR_NSC ||
> - boot_cpu_data.x86_vendor == X86_VENDOR_CYRIX) &&
> - boot_cpu_data.x86 == 5 &&
> - boot_cpu_data.x86_model == 5)
> - return 1; /* Geode GX (née GX2) */
> -
> - return 0;
> -}
> -
>
> #ifdef CONFIG_MTD_PARTITIONS
> static const char *part_probes[] = { "cmdlinepart", NULL };
I think the nand driver needs a bit more love than this. The cs553x is
available for non-geode platforms, so a cs553x driver should not be
checking for the existence of a specific CPU.
On Sun, 8 Nov 2009 19:22:47 +0100
Matteo Croce <[email protected]> wrote:
> On Sun, Nov 8, 2009 at 5:05 PM, Andres Salomon
> <[email protected]> wrote:
> > See comment below. BTW, how does this affect performance on LXs?
> > Do you have any hard numbers for common tasks?
> >
> > On Sat, 7 Nov 2009 12:11:55 +0100
> > Matteo Croce <[email protected]> wrote:
> > [...]
> >>
> >> --- a/arch/x86/kernel/Makefile 2009-11-06 15:06:52.246223989
> >> +0100 +++ b/arch/x86/kernel/Makefile 2009-11-06
> >> 15:07:04.294054613 +0100 @@ -89,7 +89,7 @@
> >> obj-$(CONFIG_HPET_TIMER) += hpet.o
> >>
> >> obj-$(CONFIG_K8_NB) += k8.o
> >> -obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
> >> +obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
> >> nopl_emu.o obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
> >> obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
> >>
> >> --- a/arch/x86/kernel/cpu/amd.c 2009-11-06 15:06:52.254223805
> >> +0100 +++ b/arch/x86/kernel/cpu/amd.c 2009-11-06
> >> 15:07:04.294054613 +0100 @@ -138,8 +138,10 @@
> >> }
> >>
> >> if (c->x86_model == 10) {
> >> - /* AMD Geode LX is model 10 */
> >> - /* placeholder for any needed mods */
> >> + /* Geode only lacks the NOPL instruction to be i686,
> >> + but we can emulate it in the exception handler
> >> + and promote it to a class 6 cpu */
> >> + boot_cpu_data.x86 = 6;
> >> return;
> >> }
> >
> > If you're going to update this, you also need to make sure that
> > you're not breaking things that check it. For example,
> > arch/x86/include/asm/geode.h has an is_geode_lx check that expects
> > boot_cpu_data.x86 to be 5. Please be sure to update all these
> > places when creating a patch like this.
> >
>
> Right, but what if is_geode_lx() is called befor the x86.id change
> takes effect? Maybe something like this?
>
> --- a/arch/x86/include/asm/geode.h 2009-11-08 19:13:43.531117343 +0100
> +++ b/arch/x86/include/asm/geode.h 2009-11-08 19:19:42.130618023
> +0100 @@ -177,7 +177,7 @@
> static inline int is_geode_lx(void)
> {
> return ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> - (boot_cpu_data.x86 == 5) &&
> + (boot_cpu_data.x86 == 5 || boot_cpu_data.x86 == 6) &&
> (boot_cpu_data.x86_model == 10));
> }
Yeah, that looks better.
On Sun, 8 Nov 2009, Pavel Machek wrote:
> On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
> > On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
> > > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
> > >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> > >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
> > >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> > >> >
> > >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> > >> > I'm somewhat wondering about the general value of this patch; is i686
> > >> > code really that much faster on Geode that it's worth it?
> > >> >
> > >> > ? ? ? ?-hpa
> > >> >
> > >> > --
> > >> > H. Peter Anvin, Intel Open Source Technology Center
> > >> > I work for Intel. ?I don't speak on their behalf.
> > >> >
> > >> >
> > >>
> > >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
> > >> the same results of dhrystone
> > >>
> > >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
> > >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
> > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
> > > ...
> > >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
> > >> Trying 5000000 runs through Dhrystone:
> > >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
> > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
> > >
> > > Teach gcc that geodelx exists? No need to break kernel for that... and
> > > you probably can gain even bigger gains.
But no standard distribution will be made available in a geode special
version - not enough machines in the marekt. So I think it is better to
be able to use the i686 specific things they already support, like
libc6-686 from debian for example.
c'ya
sven
--
The lights are fading out, once more...
On Sun 2009-11-08 20:29:55, Sven-Haegar Koch wrote:
> On Sun, 8 Nov 2009, Pavel Machek wrote:
>
> > On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
> > > On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
> > > > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
> > > >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
> > > >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
> > > >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
> > > >> >
> > > >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
> > > >> > I'm somewhat wondering about the general value of this patch; is i686
> > > >> > code really that much faster on Geode that it's worth it?
> > > >> >
> > > >> > ? ? ? ?-hpa
> > > >> >
> > > >> > --
> > > >> > H. Peter Anvin, Intel Open Source Technology Center
> > > >> > I work for Intel. ?I don't speak on their behalf.
> > > >> >
> > > >> >
> > > >>
> > > >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
> > > >> the same results of dhrystone
> > > >>
> > > >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
> > > >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
> > > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
> > > > ...
> > > >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
> > > >> Trying 5000000 runs through Dhrystone:
> > > >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
> > > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
> > > >
> > > > Teach gcc that geodelx exists? No need to break kernel for that... and
> > > > you probably can gain even bigger gains.
>
> But no standard distribution will be made available in a geode special
> version - not enough machines in the marekt. So I think it is better to
> be able to use the i686 specific things they already support, like
> libc6-686 from debian for example.
So hack your distribution to use libc6-686 if you know that it is
safe... (that is no NOPL usage there). Still no need to break
/proc/cpuinfo.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Nov 8, 2009 at 8:29 PM, Sven-Haegar Koch <[email protected]> wrote:
> On Sun, 8 Nov 2009, Pavel Machek wrote:
>
>> On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
>> > On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
>> > > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
>> > >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
>> > >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> > >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>> > >> >
>> > >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
>> > >> > I'm somewhat wondering about the general value of this patch; is i686
>> > >> > code really that much faster on Geode that it's worth it?
>> > >> >
>> > >> > ? ? ? ?-hpa
>> > >> >
>> > >> > --
>> > >> > H. Peter Anvin, Intel Open Source Technology Center
>> > >> > I work for Intel. ?I don't speak on their behalf.
>> > >> >
>> > >> >
>> > >>
>> > >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
>> > >> the same results of dhrystone
>> > >>
>> > >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
>> > >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
>> > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
>> > > ...
>> > >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
>> > >> Trying 5000000 runs through Dhrystone:
>> > >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
>> > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
>> > >
>> > > Teach gcc that geodelx exists? No need to break kernel for that... and
>> > > you probably can gain even bigger gains.
>
> But no standard distribution will be made available in a geode special
> version - not enough machines in the marekt. So I think it is better to
> be able to use the i686 specific things they already support, like
> libc6-686 from debian for example.
That's exactly the patch point
On Sun, Nov 8, 2009 at 8:36 PM, Pavel Machek <[email protected]> wrote:
> On Sun 2009-11-08 20:29:55, Sven-Haegar Koch wrote:
>> On Sun, 8 Nov 2009, Pavel Machek wrote:
>>
>> > On Sun 2009-11-08 18:40:06, Matteo Croce wrote:
>> > > On Sun, Nov 8, 2009 at 6:37 PM, Pavel Machek <[email protected]> wrote:
>> > > > On Fri 2009-11-06 23:18:06, Matteo Croce wrote:
>> > > >> On Fri, Nov 6, 2009 at 5:44 PM, H. Peter Anvin <[email protected]> wrote:
>> > > >> > On 11/06/2009 06:59 AM, Matteo Croce wrote:
>> > > >> >> indeed it has MMX, MMXEXT and CMOV, just lacks the long NOP instruction (NOPL).
>> > > >> >
>> > > >> > MMX and MMXEXT are hardly hallmarks of i686, which leaves only cmov.
>> > > >> > I'm somewhat wondering about the general value of this patch; is i686
>> > > >> > code really that much faster on Geode that it's worth it?
>> > > >> >
>> > > >> > ? ? ? ?-hpa
>> > > >> >
>> > > >> > --
>> > > >> > H. Peter Anvin, Intel Open Source Technology Center
>> > > >> > I work for Intel. ?I don't speak on their behalf.
>> > > >> >
>> > > >> >
>> > > >>
>> > > >> yes, I did some test like gzip, bzip2, lame etc and they give more or less
>> > > >> the same results of dhrystone
>> > > >>
>> > > >> root@alix:/usr/src# CFLAGS='-march=i586' ./dry.c
>> > > >> Microseconds for one run through Dhrystone: ? ? ? ?1.4
>> > > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?740741
>> > > > ...
>> > > >> root@alix:/usr/src# CFLAGS='-march=i686' ./dry.c
>> > > >> Trying 5000000 runs through Dhrystone:
>> > > >> Microseconds for one run through Dhrystone: ? ? ? ?1.2
>> > > >> Dhrystones per Second: ? ? ? ? ? ? ? ? ? ? ? ? ?841751
>> > > >
>> > > > Teach gcc that geodelx exists? No need to break kernel for that... and
>> > > > you probably can gain even bigger gains.
>>
>> But no standard distribution will be made available in a geode special
>> version - not enough machines in the marekt. So I think it is better to
>> be able to use the i686 specific things they already support, like
>> libc6-686 from debian for example.
>
> So hack your distribution to use libc6-686 if you know that it is
> safe... (that is no NOPL usage there). Still no need to break
> /proc/cpuinfo.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Pavel
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
Better to be sure that a NOPL whouldn't SIGILL your program, isn't it?
> >> > > Teach gcc that geodelx exists? No need to break kernel for that... and
> >> > > you probably can gain even bigger gains.
> >
> > But no standard distribution will be made available in a geode special
> > version - not enough machines in the marekt. So I think it is better to
> > be able to use the i686 specific things they already support, like
> > libc6-686 from debian for example.
>
> That's exactly the patch point
Not quite.
Your cpu is not 686; stop trying to pretend it is.
Now, maybe it is good idea to run libc6-686 on it. If you can
guarantee it contains no NOPL, fix whatever code is responsible for
selecting libc to use libc6-686 to use that. (Bonus points for
renaming libc6-686 to something more suitable. libc6-cmov?)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> >> But no standard distribution will be made available in a geode special
> >> version - not enough machines in the marekt. So I think it is better to
> >> be able to use the i686 specific things they already support, like
> >> libc6-686 from debian for example.
> >
> > So hack your distribution to use libc6-686 if you know that it is
> > safe... (that is no NOPL usage there). Still no need to break
> > /proc/cpuinfo.
>
> Better to be sure that a NOPL whouldn't SIGILL your program, isn't
> it?
SIGILL is easier to debug than NOPL mysteriously taking 100x time it
should, sorry.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > Better to be sure that a NOPL whouldn't SIGILL your program, isn't
> > it?
>
> SIGILL is easier to debug than NOPL mysteriously taking 100x time it
> should, sorry.
And a working 686 distribution is a great deal more useful to end users,
who outnumber you by a few million to one. It's a very sensible patch, or
perhaps you'd prefer we didn't say. And you forget the user "debug" for
a Geode + i686 without this will be "it hangs when I try and boot the
install CD"
It would probably also be worth having cmov fixups for the VIA C3 as well.
Alan
Sven-Haegar Koch <[email protected]> writes:
> But no standard distribution will be made available in a geode special
> version - not enough machines in the marekt. So I think it is better to
> be able to use the i686 specific things they already support, like
> libc6-686 from debian for example.
If it's only libc issue (and maybe of a couple of other packages like
gzip and bzip2), then I think it would be preferable for distributions
to provide a version optimized for Geode, without kernel hacks.
Though I guess the above is not actually the case, is it?
--
Krzysztof Halasa
Keep in mind that if we end up running that on code with any kind of significant fraction of CMOV it will utterly suck. Count 500-odd cycles per emulated instruction. It *is* still useful in some cases - which is why we have x87 emulation already, for example. The question is mostly where the point of diminishing returns go. In theory we could provide a full x86 interpreter (perhaps shared with KVM?) which would allow any instruction at all to be executed.
"Alan Cox" <[email protected]> wrote:
>> > Better to be sure that a NOPL whouldn't SIGILL your program, isn't
>> > it?
>>
>> SIGILL is easier to debug than NOPL mysteriously taking 100x time it
>> should, sorry.
>
>And a working 686 distribution is a great deal more useful to end users,
>who outnumber you by a few million to one. It's a very sensible patch, or
>perhaps you'd prefer we didn't say. And you forget the user "debug" for
>a Geode + i686 without this will be "it hangs when I try and boot the
>install CD"
>
>It would probably also be worth having cmov fixups for the VIA C3 as well.
>
>Alan
--
Sent from my mobile phone. Please excuse any lack of formatting.
On Sun, 08 Nov 2009 14:10:24 -0800
"H. Peter Anvin" <[email protected]> wrote:
> Keep in mind that if we end up running that on code with any kind of significant fraction of CMOV it will utterly suck.
cmov is pretty useless anyway and not a good scheduling choice generally
speaking on a real CPU - so in fact gcc generates very few of them, but a
few is enough to make 686 break on C3
On Sun, Nov 08, 2009 at 07:42:48PM +0100, Matteo Croce wrote:
> There is a small advantage, but considering that GCC isn't much geode aware yet
> there is stil room for improvement IMHO:
Perhaps gcc considers geode to mean geode GX[m12] not geode LX (which is
newer and the one in question here).
The Geode GX line prefers i486 code to i586 code (which appears to be
what -march=geode is for). I am using i486 at the moment when I build
for this one.
The Geode LX is what is being discussed which is in fact mostly a K6 as
far as I understand things. It seems to like i686 code, other than
apparently those NOP instructions. I wonder if the K6 has those noop
instructions and if not, perhaps gcc 4.4's -march=k6-3 would be the
right choice. I have always suspected the LX was really a K6-3 based
design (the cache sizes are a bit different, but clock speeds and
instruction sets seem to match).
The Geode NX (which no one has mentioned yet) is an Athlon derived chip.
> root@alix:/usr/src/dist# ll
> totale 257M
> -rwxr-xr-x 1 1000 src 93K 8 nov 2009 bzip2-i586
> -rwxr-xr-x 1 1000 src 93K 8 nov 2009 bzip2-i686
> -rwxr-xr-x 1 1000 src 60K 8 nov 2009 gzip-i586
> -rwxr-xr-x 1 1000 src 60K 8 nov 2009 gzip-i686
> -rw-r--r-- 1 1000 src 256M 8 nov 2009 linux-2.6.31.5.tar
> -rwxr-xr-x 1 1000 src 90K 8 nov 2009 lzma-i586
> -rwxr-xr-x 1 1000 src 94K 8 nov 2009 lzma-i686
> root@alix:/usr/src/dist# time cat linux-2.6.31.5.tar >/dev/null
>
> real 0m10.168s
> user 0m0.030s
> sys 0m1.390s
> root@alix:/usr/src/dist# time ./gzip-i586 -9 < linux-2.6.31.5.tar >/dev/null
>
> real 5m22.331s
> user 5m10.820s
> sys 0m11.170s
> root@alix:/usr/src/dist# time ./gzip-i686 -9 < linux-2.6.31.5.tar >/dev/null
>
> real 5m3.737s
> user 4m51.880s
> sys 0m11.510s
> root@alix:/usr/src/dist# time ./bzip2-i586 -9 < linux-2.6.31.5.tar >/dev/null
>
> real 9m16.539s
> user 9m4.410s
> sys 0m11.760s
> root@alix:/usr/src/dist# time ./bzip2-i686 -9 < linux-2.6.31.5.tar >/dev/null
>
> real 8m48.682s
> user 8m34.950s
> sys 0m13.260s
--
Len Sorensen
On Mon, Nov 9, 2009 at 9:16 PM, Lennart Sorensen
<[email protected]> wrote:
> On Sun, Nov 08, 2009 at 07:42:48PM +0100, Matteo Croce wrote:
>> There is a small advantage, but considering that GCC isn't much geode aware yet
>> there is stil room for improvement IMHO:
>
> Perhaps gcc considers geode to mean geode GX[m12] not geode LX (which is
> newer and the one in question here).
>
> The Geode GX line prefers i486 code to i586 code (which appears to be
> what -march=geode is for). ?I am using i486 at the moment when I build
> for this one.
>
> The Geode LX is what is being discussed which is in fact mostly a K6 as
> far as I understand things. ?It seems to like i686 code, other than
> apparently those NOP instructions. ?I wonder if the K6 has those noop
> instructions and if not, perhaps gcc 4.4's -march=k6-3 would be the
> right choice. ?I have always suspected the LX was really a K6-3 based
> design (the cache sizes are a bit different, but clock speeds and
> instruction sets seem to match).
>
> The Geode NX (which no one has mentioned yet) is an Athlon derived chip.
>
>> root@alix:/usr/src/dist# ll
>> totale 257M
>> -rwxr-xr-x 1 1000 src ?93K ?8 nov ?2009 bzip2-i586
>> -rwxr-xr-x 1 1000 src ?93K ?8 nov ?2009 bzip2-i686
>> -rwxr-xr-x 1 1000 src ?60K ?8 nov ?2009 gzip-i586
>> -rwxr-xr-x 1 1000 src ?60K ?8 nov ?2009 gzip-i686
>> -rw-r--r-- 1 1000 src 256M ?8 nov ?2009 linux-2.6.31.5.tar
>> -rwxr-xr-x 1 1000 src ?90K ?8 nov ?2009 lzma-i586
>> -rwxr-xr-x 1 1000 src ?94K ?8 nov ?2009 lzma-i686
>> root@alix:/usr/src/dist# time cat linux-2.6.31.5.tar >/dev/null
>>
>> real ? ?0m10.168s
>> user ? ?0m0.030s
>> sys ? ? 0m1.390s
>> root@alix:/usr/src/dist# time ./gzip-i586 -9 < linux-2.6.31.5.tar >/dev/null
>>
>> real ? ?5m22.331s
>> user ? ?5m10.820s
>> sys ? ? 0m11.170s
>> root@alix:/usr/src/dist# time ./gzip-i686 -9 < linux-2.6.31.5.tar >/dev/null
>>
>> real ? ?5m3.737s
>> user ? ?4m51.880s
>> sys ? ? 0m11.510s
>> root@alix:/usr/src/dist# time ./bzip2-i586 -9 < linux-2.6.31.5.tar >/dev/null
>>
>> real ? ?9m16.539s
>> user ? ?9m4.410s
>> sys ? ? 0m11.760s
>> root@alix:/usr/src/dist# time ./bzip2-i686 -9 < linux-2.6.31.5.tar >/dev/null
>>
>> real ? ?8m48.682s
>> user ? ?8m34.950s
>> sys ? ? 0m13.260s
>
> --
> Len Sorensen
>
You're right, indeed this is being discussed here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41989
On 10/04/2009 07:58 AM, Alan Cox wrote:
>> I'm surprised that "i686" in debian depends on the family flag
>> in /proc/cpuinfo.. but hey.. weirder things have been done.
>
> Its compensating for the old gcc bugs where gcc "i686" generated cmov
> instructions without any checks whether the CPU supported cmov (which is
> optional for a 686 architecture)
>
> RPM has (or had) similar hacks. Both arguably come about from fundamental
> design thinkos in that they treat architecture as "special", not simply
> as a set of dependancies (needs x86, x86-cmov, glibc x86-32, ...) as
> should hve been done and which would also have made emulators just work
> out of the box instead of the current mess.
>
> Alan
Well, for gcc, the string "i686" really means "pentiumpro". All of
these are nothing but named feature sets.
-hpa
On 11/09/2009 12:16 PM, Lennart Sorensen wrote:
>
> The Geode LX is what is being discussed which is in fact mostly a K6 as
> far as I understand things. It seems to like i686 code, other than
> apparently those NOP instructions. I wonder if the K6 has those noop
> instructions and if not, perhaps gcc 4.4's -march=k6-3 would be the
> right choice. I have always suspected the LX was really a K6-3 based
> design (the cache sizes are a bit different, but clock speeds and
> instruction sets seem to match).
>
> The Geode NX (which no one has mentioned yet) is an Athlon derived chip.
>
*As far as I know* K6 didn't have NOPL, whereas K7 does.
Someone who has access to these chips could run that as an experiment.
-hpa
On Mon, Nov 09, 2009 at 01:17:08PM -0800, H. Peter Anvin wrote:
> On 11/09/2009 12:16 PM, Lennart Sorensen wrote:
> >
> > The Geode LX is what is being discussed which is in fact mostly a K6 as
> > far as I understand things. It seems to like i686 code, other than
> > apparently those NOP instructions. I wonder if the K6 has those noop
> > instructions and if not, perhaps gcc 4.4's -march=k6-3 would be the
> > right choice. I have always suspected the LX was really a K6-3 based
> > design (the cache sizes are a bit different, but clock speeds and
> > instruction sets seem to match).
> >
> > The Geode NX (which no one has mentioned yet) is an Athlon derived chip.
> >
>
> *As far as I know* K6 didn't have NOPL, whereas K7 does.
Which would be mroe indication that the Geode LX may in fact be a K6
based CPU design. The Geode NX is very clearly an Athlon (K7) and AMD
has never even tried to hide that. They have never said what the LX is
based on, other than it isn't based on the GX (fortunately, that thing
was a buggy piece of crap).
> Someone who has access to these chips could run that as an experiment.
I have lots of Geode SC1200 (GX2 SoC) and Geode LX systems around.
I think I might even have a K6-2 at home in the garage come to think
of it.
--
Len Sorensen
On Sun, Nov 08, 2009 at 08:08:52PM +0000, Alan Cox wrote:
> > SIGILL is easier to debug than NOPL mysteriously taking 100x time it
> > should, sorry.
>
> And a working 686 distribution is a great deal more useful to end users,
> who outnumber you by a few million to one. It's a very sensible patch, or
> perhaps you'd prefer we didn't say. And you forget the user "debug" for
> a Geode + i686 without this will be "it hangs when I try and boot the
> install CD"
>
> It would probably also be worth having cmov fixups for the VIA C3 as well.
Agreed! I've been using the cmov patch on my kernels for years and
it has helped a lot. It isn't *that* slow and allows you to boot a
machine which wouldn't boot otherwise. Also, I have memories about
the C3 supporting register-to-register CMOV but faulting only on
register-to-memory, which is less common and makes the patch even
more useful.
I'm very happy that people finally consider instruction emulation
in kernel. Many other systems do that to help porting code between
CPUs, and it's easier to work with that than to have builds for a
large variety of CPUs in the same family. I'm sure that many distros
would prefer to provide a build optimized for the fastest CPUs around
and support the other ones in compatiblity mode than optimize for the
smallest ones in order to support everyone.
Also, I think that if we start adding emulation, a global and per-task
counter would immensely help to know what processes make intensive use
of emulation.
Willy
On Sun, Nov 08, 2009 at 01:47:59PM -0500, Andres Salomon wrote:
> > Right, but what if is_geode_lx() is called befor the x86.id change
> > takes effect? Maybe something like this?
> >
> > --- a/arch/x86/include/asm/geode.h 2009-11-08 19:13:43.531117343 +0100
> > +++ b/arch/x86/include/asm/geode.h 2009-11-08 19:19:42.130618023
> > +0100 @@ -177,7 +177,7 @@
> > static inline int is_geode_lx(void)
> > {
> > return ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> > - (boot_cpu_data.x86 == 5) &&
> > + (boot_cpu_data.x86 == 5 || boot_cpu_data.x86 == 6) &&
> > (boot_cpu_data.x86_model == 10));
> > }
>
>
> Yeah, that looks better.
Wouldn't it be even better if we didn't touch boot_cpu_data.x86
in the first place ? We can provide the emulation to support
686 binaries without faking the CPU family/model, I think it
would be cleaner. Otherwise we would need to report "real" and
"emulated" families in /proc...
Willy
On 11/09/2009 09:27 PM, Willy Tarreau wrote:
>
> Also, I think that if we start adding emulation, a global and per-task
> counter would immensely help to know what processes make intensive use
> of emulation.
>
Agreed.
My first question would be if the interpreter in KVM can be reused in
any way.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 11/10/2009 08:02 AM, H. Peter Anvin wrote:
> On 11/09/2009 09:27 PM, Willy Tarreau wrote:
>
>> Also, I think that if we start adding emulation, a global and per-task
>> counter would immensely help to know what processes make intensive use
>> of emulation.
>>
>>
> Agreed.
>
> My first question would be if the interpreter in KVM can be reused in
> any way.
>
>
In theory yes. There would need to be a bit of work to disassociate it
from kvm, but nothing too difficult.
Note that the kvm x86 emulator is more general than need be (emulates
real mode code and cpl 0 instructions) and less general than needed
(many instructions are not emulated, for example nopl).
--
error compiling committee.c: too many arguments to function
> In theory yes. There would need to be a bit of work to disassociate it
> from kvm, but nothing too difficult.
For the tiny cases that actually matter using the KVM code appears a bit
over the top.
On 11/10/2009 02:41 AM, Avi Kivity wrote:
>
> In theory yes. There would need to be a bit of work to disassociate it
> from kvm, but nothing too difficult.
>
> Note that the kvm x86 emulator is more general than need be (emulates
> real mode code and cpl 0 instructions) and less general than needed
> (many instructions are not emulated, for example nopl).
>
Yes, but I suspect that adding new instructions probably is
straightforward. I would still prefer to have a single interpreter in
the kernel as opposed to two different ones which are going to have
different bugs.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 11/10/2009 02:56 AM, Alan Cox wrote:
>> In theory yes. There would need to be a bit of work to disassociate it
>> from kvm, but nothing too difficult.
>
> For the tiny cases that actually matter using the KVM code appears a bit
> over the top.
In the short term, yes, of course. However, if we're going to do
emulation, we might as well do it right.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Tue, 10 Nov 2009 09:08:20 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On 11/10/2009 02:56 AM, Alan Cox wrote:
> >> In theory yes. There would need to be a bit of work to disassociate it
> >> from kvm, but nothing too difficult.
> >
> > For the tiny cases that actually matter using the KVM code appears a bit
> > over the top.
>
> In the short term, yes, of course. However, if we're going to do
> emulation, we might as well do it right.
Why is using KVM doing it right ? It sounds like its doing it slowly,
and hideously memory inefficiently. You are solving an uninteresting
general case problem when you just need two tiny fixups (or perhaps 3 if
you want to fix up early x86-64 prefetch)
On 11/10/2009 09:24 AM, Alan Cox wrote:
>>
>> In the short term, yes, of course. However, if we're going to do
>> emulation, we might as well do it right.
>
> Why is using KVM doing it right ? It sounds like its doing it slowly,
> and hideously memory inefficiently. You are solving an uninteresting
> general case problem when you just need two tiny fixups (or perhaps 3 if
> you want to fix up early x86-64 prefetch)
Why do we only need "two tiny fixups"? Where do we draw the line in
terms of ISA compatibility? One could easily argue that the Right
Thing[TM] is to be able to process any optional instruction -- otherwise
one has a very difficult place to draw a line.
Consider SSE3, for example. Why should the same concept not apply to
SSE3 instructions as to CMOV?
-hpa
On 11/10/2009 08:49 PM, H. Peter Anvin wrote:
>
>> Why is using KVM doing it right ? It sounds like its doing it slowly,
>> and hideously memory inefficiently. You are solving an uninteresting
>> general case problem when you just need two tiny fixups (or perhaps 3 if
>> you want to fix up early x86-64 prefetch)
>>
> Why do we only need "two tiny fixups"? Where do we draw the line in
> terms of ISA compatibility? One could easily argue that the Right
> Thing[TM] is to be able to process any optional instruction -- otherwise
> one has a very difficult place to draw a line.
>
> Consider SSE3, for example. Why should the same concept not apply to
> SSE3 instructions as to CMOV?
>
Because then user programs would run 20x or more slower than the user
expects. Better to terminate early (and teach userspace how to choose
the instruction subset correctly).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
On 11/10/2009 11:50 AM, Avi Kivity wrote:
>>
>> Consider SSE3, for example. Why should the same concept not apply to
>> SSE3 instructions as to CMOV?
>
> Because then user programs would run 20x or more slower than the user
> expects. Better to terminate early (and teach userspace how to choose
> the instruction subset correctly).
>
I picked the example carefully: SSE3 is a small set of instructions
which probably aren't used very heavily. In that sense, it has
*exactly* the same properties as CMOV - if you have the source, you're
better off recompiling, but it *might* help you if you happen to only
have a binary.
What I want people to understand is that this is a *huge* rathole, and
it doesn't have any obvious bottom that I can see.
-hpa
On Tue, Nov 10, 2009 at 12:01:47PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 11:50 AM, Avi Kivity wrote:
> >>
> >> Consider SSE3, for example. Why should the same concept not apply to
> >> SSE3 instructions as to CMOV?
> >
> > Because then user programs would run 20x or more slower than the user
> > expects. Better to terminate early (and teach userspace how to choose
> > the instruction subset correctly).
> >
>
> I picked the example carefully: SSE3 is a small set of instructions
> which probably aren't used very heavily. In that sense, it has
> *exactly* the same properties as CMOV - if you have the source, you're
> better off recompiling, but it *might* help you if you happen to only
> have a binary.
>
> What I want people to understand is that this is a *huge* rathole, and
> it doesn't have any obvious bottom that I can see.
Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
on one side and [sse*] on the other : distros are built assuming the
former are always available while they are not always. And the distro
which make the difference have to provide an dedicated build for earlier
systems just for compatibility. SSE*, 3dnow* etc... are only used by a
handful of media players/converters/encoders which are able to detect
themselves what to use and already have the necessary fallbacks because
these instruction sets vary too much between processors and vendors.
One could argue that cmpxchg/bswap/xadd are supported by 486 and that
implementing them for 386 is almost useless now (though it costs almost
nothing to provide them, I did a few years ago).
CMOV/NOPL are rarely used, thus have no reason to cause a massive
performance drop, but are frequent enough (at least cmov) for almost
any program to have at least one or two inside, making it incompatible
with a given processor, and are almost obvious to implement too.
SSE*/3dnow* would be much much harder and would only serve very few
programs, and serve them badly because when they're used, it would
be intensive.
I personally am not against being able to emulate every optional
instruction, quite the opposite instead. It's just that if in order
to do this, we add cost to the other obvious ones, we lose what we
expected to win (simplicity and efficiency).
Regards,
Willy
On 11/10/2009 12:16 PM, Willy Tarreau wrote:
>
> Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> on one side and [sse*] on the other : distros are built assuming the
> former are always available while they are not always. And the distro
> which make the difference have to provide an dedicated build for earlier
> systems just for compatibility. SSE*, 3dnow* etc... are only used by a
> handful of media players/converters/encoders which are able to detect
> themselves what to use and already have the necessary fallbacks because
> these instruction sets vary too much between processors and vendors.
>
That is increasingly not true since gcc is now doing autovectorization.
> One could argue that cmpxchg/bswap/xadd are supported by 486 and that
> implementing them for 386 is almost useless now (though it costs almost
> nothing to provide them, I did a few years ago).
>
> CMOV/NOPL are rarely used, thus have no reason to cause a massive
> performance drop, but are frequent enough (at least cmov) for almost
> any program to have at least one or two inside, making it incompatible
> with a given processor, and are almost obvious to implement too.
I could 970 cmovs in libc out of 322660 instructions. That is one in
333 instruction. In other words, a trap-and-emulate of some 500 cycles
would add some two cycles *per instruction* during execution -- hardly
an insignificant number. All in all, any of this is really only useful
as a limp.
> SSE*/3dnow* would be much much harder and would only serve very few
> programs, and serve them badly because when they're used, it would
> be intensive.
>
> I personally am not against being able to emulate every optional
> instruction, quite the opposite instead. It's just that if in order
> to do this, we add cost to the other obvious ones, we lose what we
> expected to win (simplicity and efficiency).
I don't see any particular subset as being more obvious than the other,
with the *possible* exception of NOPL, simply because NOPL was
undocumented for so long.
-hpa
On Tue, Nov 10, 2009 at 12:25:02PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 12:16 PM, Willy Tarreau wrote:
> >
> > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> > on one side and [sse*] on the other : distros are built assuming the
> > former are always available while they are not always. And the distro
> > which make the difference have to provide an dedicated build for earlier
> > systems just for compatibility. SSE*, 3dnow* etc... are only used by a
> > handful of media players/converters/encoders which are able to detect
> > themselves what to use and already have the necessary fallbacks because
> > these instruction sets vary too much between processors and vendors.
> >
>
> That is increasingly not true since gcc is now doing autovectorization.
But programs have to be built to use that specific platform anyway ; this
is different from all programs built with support for CMOV enabled by
default and which will work on 95% of the platforms.
(...)
> I could 970 cmovs in libc out of 322660 instructions. That is one in
> 333 instruction.
Not bad, I agree ! But on the C3, CMOV from/to register is implemented.
It's only CMOV from/to memory which has to be emulated, which makes it
a lot less common. Anyway that's why we need counters, so that the user
knows when he really ought to recompile.
(...)
> I don't see any particular subset as being more obvious than the other,
> with the *possible* exception of NOPL, simply because NOPL was
> undocumented for so long.
well, simply the availability of binaries making use of them. I'm not
sure you would find SSE* instructions in your libc where you found the
970 cmov. For NOPL, that's different, I first heard about it in this
thread, and my C3 running with the CMOV patch has never complained from
missing it :-)
Regards,
Willy
Hi!
> Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> on one side and [sse*] on the other : distros are built assuming the
> former are always available while they are not always. And the
> distro
Well, fix the distros...
> which make the difference have to provide an dedicated build for earlier
> systems just for compatibility. SSE*, 3dnow* etc... are only used by a
> handful of media players/converters/encoders which are able to detect
> themselves what to use and already have the necessary fallbacks because
> these instruction sets vary too much between processors and vendors.
>
> One could argue that cmpxchg/bswap/xadd are supported by 486 and that
> implementing them for 386 is almost useless now (though it costs almost
> nothing to provide them, I did a few years ago).
>
> CMOV/NOPL are rarely used, thus have no reason to cause a massive
> performance drop, but are frequent enough (at least cmov) for almost
*One* CMOV in the inner loop will make your performance go down 20x.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, Nov 10, 2009 at 09:54:45PM +0100, Pavel Machek wrote:
> Hi!
>
> > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> > on one side and [sse*] on the other : distros are built assuming the
> > former are always available while they are not always. And the
> > distro
>
> Well, fix the distros...
you know like me that it's as easy as useless to point the finger at
distros, because people running on low end want something that works
and people running on high end want something that runs fast. In order
to satisfy every one, you would have to build with optimizations for
every CPU around, which does not make sense. Simply count the number
of CPU variants in the kernel, and imagine that many CDs/DVDs for a
single platform distro.
However, targetting the most common denominator of high end machines
(basically i686) and having the lower end systems experience a tiny
slowdown is not stupid at all since performance is not what matters
the most there. The higher end systems will simply be able to run
CPU-specific optimizations per-program as they already do right now.
(...)
> > CMOV/NOPL are rarely used, thus have no reason to cause a massive
> > performance drop, but are frequent enough (at least cmov) for almost
>
> *One* CMOV in the inner loop will make your performance go down 20x.
yes, just like with emulated FPU or trapped unaligned accesses. It's
just like flying fishes. They exist but they aren't the most common
ones. If people encounter these cases on a specific program, then
they just have to recompile it if it is a problem. At least they
don't rebuild the whole distro. And once again, I've been using
cmpxchg/bswap emulation for years on my i386 without feeling any
need for a rebuild, and CMOV emulation for years now on my mini-itx
C3 without any problem either. These are real experiences, not just
fears of imaginary problems. Yes I can design a program to run 400
times slower on these machines if I want. I just don't feel the need
to do so and apparently existing programs' authors didn't either.
Regards,
Willy
On 11/10/2009 01:12 PM, Willy Tarreau wrote:
> yes, just like with emulated FPU or trapped unaligned accesses. It's
> just like flying fishes. They exist but they aren't the most common
> ones. If people encounter these cases on a specific program, then
> they just have to recompile it if it is a problem. At least they
> don't rebuild the whole distro. And once again, I've been using
> cmpxchg/bswap emulation for years on my i386 without feeling any
> need for a rebuild, and CMOV emulation for years now on my mini-itx
> C3 without any problem either. These are real experiences, not just
> fears of imaginary problems. Yes I can design a program to run 400
> times slower on these machines if I want. I just don't feel the need
> to do so and apparently existing programs' authors didn't either.
Willy, perhaps you can come up with a list of features you think should
be emulated, together with an explanation of why you opted for that list
of features and *did not* opt for others.
Note: emulated FPU is a special subcase. The FPU operations are
heavyweight enough that the overhead of trapping versus library calls is
relatively insignificant.
-hpa
On Tue, Nov 10, 2009 at 12:54 PM, Pavel Machek <[email protected]> wrote:
> *One* CMOV in the inner loop will make your performance go down 20x.
This is 20x slower than not running at all, right?
On 11/10/2009 01:21 PM, Matt Thrailkill wrote:
> On Tue, Nov 10, 2009 at 12:54 PM, Pavel Machek <[email protected]> wrote:
>> *One* CMOV in the inner loop will make your performance go down 20x.
>
> This is 20x slower than not running at all, right?
And that's the fundamental win of doing a fullscale emulator: you will
always be able to run, at *some* performance level.
-hpa
On Tue, Nov 10, 2009 at 9:54 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
>> on one side and [sse*] on the other : distros are built assuming the
>> former are always available while they are not always. And the
>> distro
>
> Well, fix the distros...
$ objdump -d libflashplayer.so |grep cmov -c
10
and ask the companies to fix their binaries too?
>> which make the difference have to provide an dedicated build for earlier
>> systems just for compatibility. SSE*, 3dnow* etc... are only used by a
>> handful of media players/converters/encoders which are able to detect
>> themselves what to use and already have the necessary fallbacks because
>> these instruction sets vary too much between processors and vendors.
>>
>> One could argue that cmpxchg/bswap/xadd are supported by 486 and that
>> implementing them for 386 is almost useless now (though it costs almost
>> nothing to provide them, I did a few years ago).
>>
>> CMOV/NOPL are rarely used, thus have no reason to cause a massive
>> performance drop, but are frequent enough (at least cmov) for almost
>
> *One* CMOV in the inner loop will make your performance go down 20x.
Still better than a browser crashing for a SIGILL
On Tue, Nov 10, 2009 at 01:19:30PM -0800, H. Peter Anvin wrote:
> Willy, perhaps you can come up with a list of features you think should
> be emulated, together with an explanation of why you opted for that list
> of features and *did not* opt for others.
Well, the instructions I had to emulate were the result of failures
to run standard distros on older machines. When I ran a 486 distro
on my old 386, I found that almost everything worked except a few
programs making use of BSWAP for htonl(), and a small group of other
ones making occasional use of CMPXCHG for mutex handling. So I checked
the differences between 386 and 486 and found that the last remaining
one was XADD which I did not find in my binaries but which was really
obvious to implement, so it made sense to complete the emulator. That
said, a feature was missing with CMPXCHG. It was generally used with
a LOCK prefix which could not be emulated. In practice, that wasn't
an issue since I did not have any SMP i386 and I think we might only
find them on some very specific industrial boards if any.
So with just CMPXCHG + BSWAP (+xadd for the sake of completeness),
my 486 distro was fully operational on my 386.
At one point I got a laptop equipped with a K6-2. This one lacked
CMOV but I was very rarely hit, so I did not bother extending the
patch.
Later when I bought a VIA C3 the same issue happened when I sometimes
transferred a binary from my athlon to the C3 (both mounted the same
NFS home dirs, and my GCC on the athlon was optimizing by default for
686). Then I experimented a little bit with CMOV, discovering that it
only implemented CMOV reg,reg. So I added the instruction to the patch.
I then regularly started to get mails from people installing i686
distros on their C3 or K6 boards and who wanted the patch for the
same reasons (K6 induced people in error because of the "6" in its
name).
I remember that Debian at one point merged the part of the patch
providing the 486 emulation in their kernel. I don't know if they
finally merged the CMOV part too, I think not because they did
not optimize for 686.
But what I can say is that after emulating those instructions, I
never got any illegal instruction anymore on my systems. Here
Matteo reports an issue with NOPL, which might have been introduced
with newer compilers. So if we get NOPL+CMOV, I think that every
CPU starting from 486 will be able to execute all the applications
I have been running on those machines. We can add the 486 ones if
we think it's worth it.
Once again, I have no argument against emulating more instructions.
It's just that I never needed them, and I fear that doing so might
render the code a lot more complex and slower. Maybe time will prove
me wrong and I will have no problem with that. We can re-open this
thread after the first report of a SIGILL with the patch applied.
So in my opinion, we should have :
- CMOV (for 486, Pentium, C3, K6, ...)
- NOPL (newcomer)
And if we want to extend down to i386 :
- BSWAP (=htonl)
- CMPXCHG (mutex)
- XADD (never encoutered but cheap)
I still have the 2.4 patch for BSWAP, CMPXCHG, CMOV and XADD lying
around. I'm appending it to the end of this mail in case it can fuel
the discussion. I've not ported it to 2.6 yet simply because my old
systems are still on 2.4, but volunteers are welcome :-)
> Note: emulated FPU is a special subcase. The FPU operations are
> heavyweight enough that the overhead of trapping versus library calls is
> relatively insignificant.
Agreed for most of them, though some cheap ones such as FADD can
see a huge difference. In fact it's mostly that it's been common
for a long time to see slow software FPU (till 386 & 486-SX), so
it's been avoided for a long time.
Regards,
Willy
---
diff -urN linux-2.4.25-wt2-base/Documentation/Configure.help linux-2.4.25-wt2/Documentation/Configure.help
--- linux-2.4.25-wt2-base/Documentation/Configure.help Thu Mar 4 17:14:48 2004
+++ linux-2.4.25-wt2/Documentation/Configure.help Thu Mar 4 17:23:08 2004
@@ -5279,6 +5279,66 @@
Note, this kernel will not boot on older (pre model 9) C3s.
+486 emulation
+CONFIG_CPU_EMU486
+ When used on a 386, Linux can emulate 3 instructions from the 486 set.
+ This allows user space programs compiled for 486 to run on a 386
+ without crashing with a SIGILL. As any emulation, performance will be
+ very low, but since these instruction are not often used, this might
+ not hurt. The emulated instructions are :
+ - bswap (does the same as htonl())
+ - cmpxchg (used in multi-threading, mutex locking)
+ - xadd (rarely used)
+
+ Note that this can also allow Step-A 486's to correctly run multi-thread
+ applications since cmpxchg has a wrong opcode on this early CPU.
+
+ Don't use this to enable multi-threading on an SMP machine, the lock
+ atomicity can't be guaranted !
+
+ Although it's highly preferable that you only execute programs targetted
+ for your CPU, it may happen that, consecutively to a hardware replacement,
+ or during rescue of a damaged system, you have to execute such programs
+ on an inadapted processor. In this case, this option will help you get
+ your programs working, even if they will be slower.
+
+ It is recommended that you say N here in any case, except for the
+ kernels that you will use on your rescue disks.
+
+ This option should not be left on by default, because it means that
+ you execute a program not targetted for your CPU. You should recompile
+ your applications whenever possible.
+
+ If you are not sure, say N.
+
+Pentium-Pro CMOV emulation
+CONFIG_CPU_EMU686
+ Intel Pentium-Pro processor brought a new set of instructions borrowed
+ from RISC processors, which permit to write many simple conditionnal
+ blocks without a branch instruction, thus being faster. They are supported
+ on all PentiumII, PentiumIII, Pentium4 and Celerons to date. GCC generates
+ these instructions when "-march=i686" is specified. There is an ever
+ increasing number of programs compiled with this option, that will simply
+ crash on 386/486/Pentium/AmdK6 and others when trying to execute the
+ faulty instruction.
+
+ Although it's highly preferable that you only execute programs targetted
+ for your CPU, it may happen that, consecutively to a hardware replacement,
+ or during rescue of a damaged system, you have to execute such programs
+ on an inadapted processor. In this case, this option will help you keep
+ your programs working, even if some may be noticeably slower : an overhead
+ of 1us has been measured on a k6-2/450 (about 450 cycles).
+
+ It is recommended that you say N here in any case, except for the
+ kernels that you will use on your rescue disks. This emulation typically
+ increases a bzImage with 500 bytes.
+
+ This option should not be left on by default, because it means that
+ you execute a program not targetted for your CPU. You should recompile
+ your applications whenever possible.
+
+ If you are not sure, say N.
+
32-bit PDC
CONFIG_PDC_NARROW
Saying Y here will allow developers with a C180, C200, C240, C360,
diff -urN linux-2.4.25-wt2-base/arch/i386/config.in linux-2.4.25-wt2/arch/i386/config.in
--- linux-2.4.25-wt2-base/arch/i386/config.in Thu Mar 4 17:14:48 2004
+++ linux-2.4.25-wt2/arch/i386/config.in Thu Mar 4 17:26:01 2004
@@ -59,6 +59,8 @@
define_bool CONFIG_RWSEM_XCHGADD_ALGORITHM n
define_bool CONFIG_X86_PPRO_FENCE y
define_bool CONFIG_X86_F00F_WORKS_OK n
+ bool '486 emulation' CONFIG_CPU_EMU486
+ dep_bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686 $CONFIG_CPU_EMU486
else
define_bool CONFIG_X86_WP_WORKS_OK y
define_bool CONFIG_X86_INVLPG y
@@ -75,6 +77,7 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_PPRO_FENCE y
define_bool CONFIG_X86_F00F_WORKS_OK n
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_M586" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -82,6 +85,7 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_PPRO_FENCE y
define_bool CONFIG_X86_F00F_WORKS_OK n
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_M586TSC" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -90,6 +94,7 @@
define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_PPRO_FENCE y
define_bool CONFIG_X86_F00F_WORKS_OK n
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_M586MMX" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -99,6 +104,7 @@
define_bool CONFIG_X86_GOOD_APIC y
define_bool CONFIG_X86_PPRO_FENCE y
define_bool CONFIG_X86_F00F_WORKS_OK n
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_M686" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -139,6 +145,7 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_HAS_TSC y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MK8" = "y" ]; then
# for now. may later want to add SSE support and optimized
@@ -162,6 +169,7 @@
define_bool CONFIG_X86_USE_STRING_486 y
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MCYRIXIII" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -170,6 +178,7 @@
define_bool CONFIG_X86_USE_3DNOW y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MVIAC3_2" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -177,6 +186,7 @@
define_bool CONFIG_X86_ALIGNMENT_16 y
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MCRUSOE" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -189,6 +199,7 @@
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MWINCHIP2" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -197,6 +208,7 @@
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
if [ "$CONFIG_MWINCHIP3D" = "y" ]; then
define_int CONFIG_X86_L1_CACHE_SHIFT 5
@@ -205,6 +217,7 @@
define_bool CONFIG_X86_USE_PPRO_CHECKSUM y
define_bool CONFIG_X86_OOSTORE y
define_bool CONFIG_X86_F00F_WORKS_OK y
+ bool 'Pentium-Pro CMOV emulation' CONFIG_CPU_EMU686
fi
bool 'Machine Check Exception' CONFIG_X86_MCE
diff -urN linux-2.4.25-wt2-base/arch/i386/kernel/traps.c linux-2.4.25-wt2/arch/i386/kernel/traps.c
--- linux-2.4.25-wt2-base/arch/i386/kernel/traps.c Thu Mar 4 17:14:45 2004
+++ linux-2.4.25-wt2/arch/i386/kernel/traps.c Thu Mar 4 17:23:08 2004
@@ -85,6 +85,24 @@
asmlinkage void spurious_interrupt_bug(void);
asmlinkage void machine_check(void);
+#if defined(CONFIG_CPU_EMU486) || defined(CONFIG_CPU_EMU686)
+asmlinkage void do_general_protection(struct pt_regs * regs, long error_code);
+
+/* gives the address of any register member in a struct pt_regs */
+static const int reg_ofs[8] = {
+ (int)&((struct pt_regs *)0)->eax,
+ (int)&((struct pt_regs *)0)->ecx,
+ (int)&((struct pt_regs *)0)->edx,
+ (int)&((struct pt_regs *)0)->ebx,
+ (int)&((struct pt_regs *)0)->esp,
+ (int)&((struct pt_regs *)0)->ebp,
+ (int)&((struct pt_regs *)0)->esi,
+ (int)&((struct pt_regs *)0)->edi
+};
+
+#define REG_PTR(regs, reg) ((unsigned long *)(((void *)(regs)) + reg_ofs[reg]))
+#endif
+
int kstack_depth_to_print = 24;
@@ -404,11 +422,501 @@
do_trap(trapnr, signr, str, 1, regs, error_code, &info); \
}
+#if defined(CONFIG_CPU_EMU486) || defined(CONFIG_CPU_EMU686)
+/* This code can be used to allow old 386's to hopefully correctly execute some
+ * code which was originally compiled for a 486, and to allow CMOV-disabled
+ * processors to emulate CMOV instructions. In user space, only 3 instructions
+ * have been added between the 386 the 486 :
+ * - BSWAP reg performs exactly htonl())
+ * - CMPXCHG reg/mem, reg used for mutex locking
+ * - XADD reg/mem, reg not encountered yet.
+ *
+ * Warning: this will NEVER allow a kernel compiled for a 486 to boot on a 386,
+ * neither will it allow a CMOV-optimized kernel to run on a processor without
+ * CMOV ! It will only help to port programs, or save you on a rescue disk, but
+ * for performance's sake, it's far better to recompile.
+ *
+ * Tests patterns have been submitted to this code on a 386, and it now seems
+ * OK. If you think you've found a bug, please report it to
+ * Willy Tarreau <[email protected]>.
+ */
+
+/* [modrm_address] returns a pointer to a user-space location by decoding the
+ * mod/rm byte and the bytes at <from>, which point to the mod/reg/rm byte.
+ * This must only be called if modrm indicates memory and not register. The
+ * <from> parameter is updated when bytes are read.
+ * NOTE: this code has some ugly lines, which produce a better assembler output
+ * than the "cleaner" version.
+ */
+static void *modrm_address(struct pt_regs *regs, u8 **from,
+ int bit32, int modrm)
+{
+ u32 offset = 0;
+ u8 sib, mod, rm;
+
+ /* better optimization to compute them here, even
+ * if rm is not always used
+ */
+ rm = modrm & 7;
+ mod = modrm & 0xC0;
+
+ if (bit32) { /* 32-bits addressing mode (default) */
+ if (mod == 0 && rm == 5) /* 32 bits offset and nothing more */
+ return (void *)*((u32*)*from)++;
+
+ if (rm == 4) {
+ /* SIB byte is present and must be used */
+ sib = *(*from)++; /* SS(7-6) IDX(5-3) BASE(2-0) */
+
+ /* index * scale */
+ if (((sib >> 3) & 7) != 4)
+ offset += *REG_PTR(regs, (sib >> 3) & 7) << (sib >> 6);
+
+ rm = (sib & 7); /* base replaces rm from now */
+ if (mod == 0 && rm == 5) /* base off32 + scaled index */
+ return (void *)offset + *((u32*)*from)++;
+ }
+
+ /* base register */
+ offset += *REG_PTR(regs, rm);
+
+ if (mod) {
+ if (mod & 0x80) /* 32 bits unsigned offset */
+ offset += *((u32*)*from)++;
+ else /* 0x40: 8 bits signed offset */
+ offset += *((s8*)*from)++;
+ }
+
+ return (void *)offset;
+
+ } else { /* 16-bits addressing mode */
+ /* handle special case now */
+ if (mod == 0 && rm == 6) /* 16 bits offset */
+ return (void *)(u32)*((u16*)*from)++;
+
+ if ((rm & 4) == 0)
+ offset += (rm & 2) ? regs->ebp : regs->ebx;
+ if (rm < 6)
+ offset += (rm & 1) ? regs->edi : regs->esi;
+ else if (rm == 6) /* bp */
+ offset += regs->ebp;
+ else if (rm == 7) /* bx */
+ offset += regs->ebx;
+
+ /* now, let's include 8/16 bits offset */
+ if (mod) {
+ if (mod & 0x80) /* 16 bits unsigned offset */
+ offset += *((u16*)*from)++;
+ else /* 0x40: 8 bits signed offset */
+ offset += *((s8*)*from)++;
+ }
+ return (void *)(offset & 0xFFFF);
+ }
+}
+
+
+/*
+ * skip_modrm() computes the EIP value of next instruction from the
+ * pointer <from> which points to the first byte after the mod/rm byte.
+ * Its purpose is to implement a fast alternative to modrm_address()
+ * when offset value is not needed.
+ */
+static inline void *skip_modrm(u8 *from, int bit32, int modrm)
+{
+ u8 mod,rm;
+
+ /* better optimization to compute them here, even
+ * if rm is not always used
+ */
+ rm = modrm & 7;
+ mod = modrm & 0xC0;
+
+ /* most common case first : registers */
+ if (mod == 0xC0)
+ return from;
+
+ if (bit32) { /* 32 bits addressing mode (default) */
+ if (rm == 4) /* SIB byte : rm becomes base */
+ rm = (*from++ & 7);
+ if (mod == 0x00) {
+ if (rm == 5) /* 32 bits offset and nothing more */
+ return from + 4;
+ else
+ return from;
+ }
+ }
+ else { /* 16 bits mode */
+ if (mod == 0x00) {
+ if (rm == 6) /* 16 bits offset and nothing more */
+ return from + 2;
+ else
+ return from;
+ }
+ }
+
+ if (mod & 0x80)
+ return from + (2 * (bit32 + 1)); /* + 2 or 4 bytes */
+ else
+ return from + 1;
+}
+
+
+/* [reg_address] returns a pointer to a register in the regs struct, depending
+ * on <w> (byte/word) and reg. Since the caller knows about <w>, it's
+ * responsible for understanding the result as a byte, word or dword pointer.
+ * Only the 3 lower bits of <reg> are meaningful, higher ones are ignored.
+ */
+static inline void *reg_address(struct pt_regs *regs, char w, u8 reg)
+{
+ if (w)
+ /* 16/32 bits mode */
+ return REG_PTR(regs, reg & 7);
+ else
+ /* 8 bits mode : al,cl,dl,bl,ah,ch,dh,bh */
+ return ((reg & 4) >> 2) + (u8*)REG_PTR(regs, reg & 3);
+
+ /* this is set just to prevent the compiler from complaining */
+ return NULL;
+}
+
+/* [do_invalid_op] is called by exception 6 after an invalid opcode has been
+ * encountered. It will decode the prefixes and the instruction code, to try
+ * to emulate it, and will send a SIGILL or SIGSEGV to the process if not
+ * possible.
+ * REP/REPN prefixes are not supported anymore because it didn't make sense
+ * to emulate instructions prefixed with such opcodes since no arch-specific
+ * instruction start by one of them. At most, they will be the start of newer
+ * arch-specific instructions (SSE ?).
+ */
+asmlinkage void do_invalid_op(struct pt_regs * regs, long error_code)
+{
+ enum {
+ PREFIX_ES = 1,
+ PREFIX_CS = 2,
+ PREFIX_SS = 4,
+ PREFIX_DS = 8,
+ PREFIX_FS = 16,
+ PREFIX_GS = 32,
+ PREFIX_SEG = 63, /* any seg */
+ PREFIX_D32 = 64,
+ PREFIX_A32 = 128,
+ PREFIX_LOCK = 256,
+ } prefixes = 0;
+
+ u32 *src, *dst;
+ u8 *eip = (u8*)regs->eip;
+
+#ifdef BENCH_CPU_EXCEPTION_BUT_NOT_THE_CODE
+ regs->eip += 3;
+ return;
+#endif
+ /* we'll first read all known opcode prefixes, and discard obviously
+ invalid combinations.*/
+ while (1) {
+ /* prefix for CMOV, BSWAP, CMPXCHG, XADD */
+ if (*eip == 0x0F) {
+ eip++;
+#if defined(CONFIG_CPU_EMU686)
+ /* here, we'll emulate the CMOV* instructions, which gcc
+ * blindly generates when specifying -march=i686, even
+ * though the processor flags must be checked against
+ * support for these instructions.
+ */
+ if ((*eip & 0xF0) == 0x40) { /* CMOV* */
+ u8 cond, ncond, reg, modrm;
+ u32 flags;
+
+ /* to optimize processing, we'll associate a flag mask to each opcode.
+ * If the EFLAGS value ANDed with this mask is not null, then the cond
+ * is met. One exception is CMOVL which is true if SF != OF. For this
+ * purpose, we'll make a fake flag 'SFOF' (unused bit 3) which equals
+ * SF^OF, so that CMOVL is true if SFOF != 0.
+ */
+ static u16 cmov_flags[8] = {
+ 0x0800, /* CMOVO => OF */
+ 0x0001, /* CMOVB => CF */
+ 0x0040, /* CMOVE => ZF */
+ 0x0041, /* CMOVBE => CF | ZF */
+ 0x0080, /* CMOVS => SF */
+ 0x0004, /* CMOVP => PF */
+ 0x0008, /* CMOVL => SF^OF */
+ 0x0048, /* CMOVLE => SF^OF | ZF */
+ };
+
+ flags = regs->eflags & 0x08C5; /* OF, SF, ZF, PF, CF */
+
+ /* SFOF (flags_3) <= OF(flags_11) ^ SF(flags_7) */
+ flags |= ((flags ^ (flags >> 4)) >> 4) & 0x8;
+
+ cond = *eip & 0x0F;
+ ncond = cond & 1; /* condition is negated */
+ cond >>= 1;
+ ncond ^= !!(flags & cmov_flags[cond]);
+ /* ncond is now true if the cond matches the opcode */
+
+ modrm = *(eip + 1);
+ eip += 2; /* skips all the opcodes */
+
+ if (!ncond) {
+ /* condition is not valid, skip the instruction and do nothing */
+ regs->eip = (u32)skip_modrm(eip, !(prefixes & PREFIX_A32), modrm);
+ return;
+ }
+
+ /* condition is valid, we'll have to do the work */
+
+ reg = (modrm >> 3) & 7;
+ dst = reg_address(regs, 1, reg);
+ if ((modrm & 0xC0) == 0xC0) { /* register to register */
+ src = reg_address(regs, 1, modrm);
+ }
+ else {
+ src = modrm_address(regs, &eip, !(prefixes & PREFIX_A32), modrm);
+ /* we must verify that src is valid for this task */
+ if ((prefixes & (PREFIX_FS | PREFIX_GS)) ||
+ verify_area(VERIFY_WRITE, (void *)src, ((prefixes & PREFIX_D32) ? 2 : 4))) {
+ do_general_protection(regs, error_code);
+ return;
+ }
+ }
+
+ if (!(prefixes & PREFIX_D32)) /* 32 bits operands */
+ *(u32*)dst = *(u32*)src;
+ else
+ *(u16*)dst = *(u16*)src;
+
+ regs->eip = (u32)eip;
+ return;
+ } /* if CMOV */
+#endif /* CONFIG_CPU_EMU686 */
+
+#if defined(CONFIG_CPU_EMU486)
+ /* we'll verify if this is a BSWAP opcode, main source of SIGILL on 386's */
+ if ((*eip & 0xF8) == 0xC8) { /* BSWAP */
+ u8 w, reg, modrm;
+
+ reg = *eip++ & 0x07;
+ src = reg_address(regs, 1, reg);
+
+ __asm__ __volatile__ (
+ "xchgb %%al, %%ah\n\t"
+ "roll $16, %%eax\n\t"
+ "xchgb %%al, %%ah\n\t"
+ : "=a" (*(u32*)src)
+ : "a" (*(u32*)src));
+ regs->eip = (u32)eip;
+ return;
+ }
+
+
+ /* we'll also try to emulate the CMPXCHG instruction (used in mutex locks).
+ This instruction is often locked, but it's not possible to put a lock
+ here. Anyway, I don't believe that there are lots of multiprocessors
+ 386 out there ...
+ */
+ if ((*eip & 0xFE) == 0xB0) { /* CMPXCHG */
+ u8 w, reg, modrm;
+
+ w = *eip & 1;
+ modrm = *(eip + 1);
+ eip += 2; /* skips all the opcodes */
+
+ reg = (modrm >> 3) & 7;
+
+ dst = reg_address(regs, w, reg);
+ if ((modrm & 0xC0) == 0xC0) /* register to register */
+ src = reg_address(regs, w, modrm);
+ else {
+ src = modrm_address(regs, &eip, !(prefixes & PREFIX_A32), modrm);
+ /* we must verify that src is valid for this task */
+ if ((prefixes & (PREFIX_FS | PREFIX_GS)) ||
+ verify_area(VERIFY_WRITE, (void *)src, (w?((prefixes & PREFIX_D32)?2:4):1))) {
+ do_general_protection(regs, error_code);
+ return;
+ }
+ }
+
+ if (!w) { /* 8 bits operands */
+ if ((u8)regs->eax == *(u8*)src) {
+ *(u8*)src = *(u8*)dst;
+ regs->eflags |= X86_EFLAGS_ZF; /* set Zero Flag */
+ }
+ else {
+ *(u8*)&(regs->eax) = *(u8*)src;
+ regs->eflags &= ~X86_EFLAGS_ZF; /* clear Zero Flag */
+ }
+ }
+ else if (!(prefixes & PREFIX_D32)) { /* 32 bits operands */
+ if ((u32)regs->eax == *(u32*)src) {
+ *(u32*)src = *(u32*)dst;
+ regs->eflags |= X86_EFLAGS_ZF; /* set Zero Flag */
+ }
+ else {
+ regs->eax = *(u32*)src;
+ regs->eflags &= ~X86_EFLAGS_ZF; /* clear Zero Flag */
+ }
+ }
+ else { /* 16 bits operands */
+ if ((u16)regs->eax == *(u16*)src) {
+ *(u16*)src = *(u16*)dst;
+ regs->eflags |= X86_EFLAGS_ZF; /* set Zero Flag */
+ }
+ else {
+ *(u16*)®s->eax = *(u16*)src;
+ regs->eflags &= ~X86_EFLAGS_ZF; /* clear Zero Flag */
+ }
+ }
+ regs->eip = (u32)eip;
+ return;
+ }
+
+ /* we'll also try to emulate the XADD instruction (not very common) */
+ if ((*eip & 0xFE) == 0xC0) { /* XADD */
+ u8 w, reg, modrm;
+ u32 op1, op2;
+
+ w = *eip & 1;
+ modrm = *(eip + 1);
+ eip += 2; /* skips all the opcodes */
+
+ reg = (modrm >> 3) & 7;
+
+ dst = reg_address(regs, w, reg);
+ if ((modrm & 0xC0) == 0xC0) /* register to register */
+ src = reg_address(regs, w, modrm);
+ else {
+ src = modrm_address(regs, &eip, !(prefixes & PREFIX_A32), modrm);
+ /* we must verify that src is valid for this task */
+ if ((prefixes & (PREFIX_FS | PREFIX_GS)) ||
+ verify_area(VERIFY_WRITE, (void *)src, (w?((prefixes & PREFIX_D32)?2:4):1))) {
+ do_general_protection(regs, error_code);
+ return;
+ }
+ }
+
+ if (!w) { /* 8 bits operands */
+ op1 = *(u8*)src;
+ op2 = *(u8*)dst;
+ *(u8*)src = op1 + op2;
+ *(u8*)dst = op1;
+ }
+ else if (!(prefixes & PREFIX_D32)) { /* 32 bits operands */
+ op1 = *(u32*)src;
+ op2 = *(u32*)dst;
+ *(u32*)src = op1 + op2;
+ *(u32*)dst = op1;
+ }
+ else { /* 16 bits operands */
+ op1 = *(u16*)src;
+ op2 = *(u16*)dst;
+ *(u16*)src = op1 + op2;
+ *(u16*)dst = op1;
+ }
+ regs->eip = (u32)eip;
+ return;
+ }
+
+#endif /* CONFIG_CPU_EMU486 */
+
+ } /* if (*eip == 0x0F) */
+ else if ((*eip & 0xfc) == 0x64) {
+ switch (*eip) {
+ case 0x66: /* Operand switches 16/32 bits */
+ if (prefixes & PREFIX_D32)
+ goto invalid_opcode;
+ prefixes |= PREFIX_D32;
+ eip++;
+ continue;
+ case 0x67: /* Address switches 16/32 bits */
+ if (prefixes & PREFIX_A32)
+ goto invalid_opcode;
+ prefixes |= PREFIX_A32;
+ eip++;
+ continue;
+ case 0x64: /* FS: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_FS;
+ eip++;
+ continue;
+ case 0x65: /* GS: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_GS;
+ eip++;
+ continue;
+ }
+ }
+ else if (*eip == 0xf0) { /* lock */
+ if (prefixes & PREFIX_LOCK)
+ goto invalid_opcode;
+ prefixes |= PREFIX_LOCK;
+#ifdef CONFIG_SMP
+ /* if we're in SMP mode, a missing lock can lead to problems in
+ * multi-threaded environment. We must send a warning. In UP,
+ * however, this should have no effect.
+ */
+ printk(KERN_WARNING "Warning ! LOCK prefix found at EIP=0x%08x in"
+ "process %d(%s), has no effect before a software-emulated"
+ "instruction\n", regs->eip, current->pid, current->comm);
+#endif
+ eip++;
+ continue;
+ }
+ else if ((*eip & 0xe7) == 0x26) {
+ switch (*eip) {
+ case 0x26: /* ES: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_ES;
+ eip++;
+ continue;
+ case 0x2E: /* CS: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_CS;
+ eip++;
+ continue;
+ case 0x36: /* SS: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_SS;
+ eip++;
+ continue;
+ case 0x3E: /* DS: */
+ if (prefixes & PREFIX_SEG)
+ goto invalid_opcode;
+ prefixes |= PREFIX_DS;
+ eip++;
+ continue;
+ }
+ }
+ /* if this opcode has not been processed, it's not a prefix. */
+ break;
+ }
+
+ /* it's a case we can't handle. Unknown opcode or too many prefixes. */
+invalid_opcode:
+#ifdef CONFIG_CPU_EMU486_DEBUG
+ printk(KERN_DEBUG "do_invalid_op() : invalid opcode detected @%p : %02x %02x ...\n", eip, eip[0], eip[1]);
+#endif
+ current->thread.error_code = error_code;
+ current->thread.trap_no = 6;
+ force_sig(SIGILL, current);
+ die_if_kernel("invalid operand",regs,error_code);
+}
+
+#endif /* CONFIG_CPU_EMU486 || CONFIG_CPU_EMU686 */
+
DO_VM86_ERROR_INFO( 0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->eip)
DO_VM86_ERROR( 3, SIGTRAP, "int3", int3)
DO_VM86_ERROR( 4, SIGSEGV, "overflow", overflow)
DO_VM86_ERROR( 5, SIGSEGV, "bounds", bounds)
+
+#if !defined(CONFIG_CPU_EMU486) && !defined(CONFIG_CPU_EMU686)
DO_ERROR_INFO( 6, SIGILL, "invalid operand", invalid_op, ILL_ILLOPN, regs->eip)
+#endif
+
DO_VM86_ERROR( 7, SIGSEGV, "device not available", device_not_available)
DO_ERROR( 8, SIGSEGV, "double fault", double_fault)
DO_ERROR( 9, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun)
---
On Tue, Nov 10, 2009 at 11:01:26PM +0100, Matteo Croce wrote:
> On Tue, Nov 10, 2009 at 9:54 PM, Pavel Machek <[email protected]> wrote:
> > Hi!
> >
> >> Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> >> on one side and [sse*] on the other : distros are built assuming the
> >> former are always available while they are not always. And the
> >> distro
> >
> > Well, fix the distros...
>
> $ objdump -d libflashplayer.so |grep cmov -c
> 10
Good point !
Same here, only 2 of them won't work on C3 (the first ones) :
7e3a7c: 0f 4d 45 c8 cmovge -0x38(%ebp),%eax
7e3b4d: 0f 4d 45 c8 cmovge -0x38(%ebp),%eax
7e97b3: 0f 4c c2 cmovl %edx,%eax
7e9823: 0f 4c c2 cmovl %edx,%eax
7eb884: 0f 4c c2 cmovl %edx,%eax
7eb91d: 0f 4c c2 cmovl %edx,%eax
8095d7: 0f 42 ca cmovb %edx,%ecx
80997a: 0f 42 ca cmovb %edx,%ecx
809a4a: 0f 42 ca cmovb %edx,%ecx
80a5cb: 0f 42 ca cmovb %edx,%ecx
clearly worth emulating in my opinion.
Willy
On 11/10/2009 02:06 PM, Willy Tarreau wrote:
> On Tue, Nov 10, 2009 at 01:19:30PM -0800, H. Peter Anvin wrote:
>> Willy, perhaps you can come up with a list of features you think should
>> be emulated, together with an explanation of why you opted for that list
>> of features and *did not* opt for others.
>
> Well, the instructions I had to emulate were the result of failures
> to run standard distros on older machines. When I ran a 486 distro
> on my old 386, I found that almost everything worked except a few
> programs making use of BSWAP for htonl(), and a small group of other
> ones making occasional use of CMPXCHG for mutex handling. So I checked
> the differences between 386 and 486 and found that the last remaining
> one was XADD which I did not find in my binaries but which was really
> obvious to implement, so it made sense to complete the emulator. That
> said, a feature was missing with CMPXCHG. It was generally used with
> a LOCK prefix which could not be emulated. In practice, that wasn't
> an issue since I did not have any SMP i386 and I think we might only
> find them on some very specific industrial boards if any.
>
Linux doesn't support 386 SMP anyway, and we really don't support 486
SMP either since noone relevant seems to have a board we can test on.
> But what I can say is that after emulating those instructions, I
> never got any illegal instruction anymore on my systems. Here
> Matteo reports an issue with NOPL, which might have been introduced
> with newer compilers. So if we get NOPL+CMOV, I think that every
> CPU starting from 486 will be able to execute all the applications
> I have been running on those machines. We can add the 486 ones if
> we think it's worth it.
NOPL was introduced with a recent binutils(!) change. They might have
backed that one out.
> Once again, I have no argument against emulating more instructions.
> It's just that I never needed them, and I fear that doing so might
> render the code a lot more complex and slower. Maybe time will prove
> me wrong and I will have no problem with that. We can re-open this
> thread after the first report of a SIGILL with the patch applied.
>
> So in my opinion, we should have :
> - CMOV (for 486, Pentium, C3, K6, ...)
> - NOPL (newcomer)
>
> And if we want to extend down to i386 :
> - BSWAP (=htonl)
> - CMPXCHG (mutex)
> - XADD (never encoutered but cheap)
>
> I still have the 2.4 patch for BSWAP, CMPXCHG, CMOV and XADD lying
> around. I'm appending it to the end of this mail in case it can fuel
> the discussion. I've not ported it to 2.6 yet simply because my old
> systems are still on 2.4, but volunteers are welcome :-)
>
>> Note: emulated FPU is a special subcase. The FPU operations are
>> heavyweight enough that the overhead of trapping versus library calls is
>> relatively insignificant.
>
> Agreed for most of them, though some cheap ones such as FADD can
> see a huge difference. In fact it's mostly that it's been common
> for a long time to see slow software FPU (till 386 & 486-SX), so
> it's been avoided for a long time.
>
> Regards,
> Willy
I immediately note that you have absolutely no check on the code
segment, either in terms of code segment limits or even that we're in
the right mode. Furthermore, you read user space -- code in user space
is still user space -- without get_user(). We also need NX protection
to be honoured, and the various special subtleties of the x86
instruction format (15-byte limit, for example) to be preserved: they
aren't just there randomly, but are there to protect against specific
failures.
*THIS* is the kind of complexity that makes me think that having a
single source for all interpretation done in the kernel is the preferred
option.
-hpa
* H. Peter Anvin <[email protected]> wrote:
> *THIS* is the kind of complexity that makes me think that having a
> single source for all interpretation done in the kernel is the
> preferred option.
Definitely agreed ... The NX code is quite a maze right now, so changes
to it should come generously laced with cleanups.
Ingo
On Tue, Nov 10, 2009 at 02:15:55PM -0800, H. Peter Anvin wrote:
> I immediately note that you have absolutely no check on the code
> segment, either in terms of code segment limits or even that we're in
> the right mode. Furthermore, you read user space -- code in user space
> is still user space -- without get_user().
Yes I remember about that one now. HCH told me about it.
> We also need NX protection
> to be honoured, and the various special subtleties of the x86
> instruction format (15-byte limit, for example) to be preserved: they
> aren't just there randomly, but are there to protect against specific
> failures.
OK.
> *THIS* is the kind of complexity that makes me think that having a
> single source for all interpretation done in the kernel is the preferred
> option.
I understand, your point. We just need to check when it becomes overkill
to use a full-blown emulator of 3 instructions and a few "simple" rules.
Willy
On Tue, Nov 10, 2009 at 01:19:30PM -0800, H. Peter Anvin wrote:
> Willy, perhaps you can come up with a list of features you think should
> be emulated, together with an explanation of why you opted for that list
> of features and *did not* opt for others.
>
> Note: emulated FPU is a special subcase. The FPU operations are
> heavyweight enough that the overhead of trapping versus library calls is
> relatively insignificant.
That doesn't seem to be the experience of the arm EABI versus the old
arm ABI with kernel FPU emulation. Using user space library calls for
FPU is vastly faster than the trapping and kernel FPU emulation.
--
Len Sorensen
On 11/10/2009 02:27 PM, Lennart Sorensen wrote:
> On Tue, Nov 10, 2009 at 01:19:30PM -0800, H. Peter Anvin wrote:
>> Willy, perhaps you can come up with a list of features you think should
>> be emulated, together with an explanation of why you opted for that list
>> of features and *did not* opt for others.
>>
>> Note: emulated FPU is a special subcase. The FPU operations are
>> heavyweight enough that the overhead of trapping versus library calls is
>> relatively insignificant.
>
> That doesn't seem to be the experience of the arm EABI versus the old
> arm ABI with kernel FPU emulation. Using user space library calls for
> FPU is vastly faster than the trapping and kernel FPU emulation.
I don't believe we were talking about ARM.
-hpa
On Tue, Nov 10, 2009 at 02:29:33PM -0800, H. Peter Anvin wrote:
> I don't believe we were talking about ARM.
True. I do get the impression the ARM has higher trap overhead than x86.
--
Len Sorensen
On 11/10/2009 02:34 PM, Lennart Sorensen wrote:
> On Tue, Nov 10, 2009 at 02:29:33PM -0800, H. Peter Anvin wrote:
>> I don't believe we were talking about ARM.
>
> True. I do get the impression the ARM has higher trap overhead than x86.
Also, the x87 FPU is a very clunky beast that's slow to emulate (unlike
SSE).
-hpa
On Tue, Nov 10, 2009 at 11:20:31PM +0100, Ingo Molnar wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
> > *THIS* is the kind of complexity that makes me think that having a
> > single source for all interpretation done in the kernel is the
> > preferred option.
>
> Definitely agreed ... The NX code is quite a maze right now, so changes
> to it should come generously laced with cleanups.
BTW, I don't see why we should be impacted by NX. Trying to
execute from an NX page would return a SEGV, not SIGILL, so
we should not be bothered, am I wrong ?
Willy
On 11/10/2009 02:42 PM, Willy Tarreau wrote:
> On Tue, Nov 10, 2009 at 11:20:31PM +0100, Ingo Molnar wrote:
>>
>> * H. Peter Anvin <[email protected]> wrote:
>>
>>> *THIS* is the kind of complexity that makes me think that having a
>>> single source for all interpretation done in the kernel is the
>>> preferred option.
>>
>> Definitely agreed ... The NX code is quite a maze right now, so changes
>> to it should come generously laced with cleanups.
>
> BTW, I don't see why we should be impacted by NX. Trying to
> execute from an NX page would return a SEGV, not SIGILL, so
> we should not be bothered, am I wrong ?
Yes. Consider a page-crossing instruction.
-hpa
On Tue, Nov 10, 2009 at 02:38:02PM -0800, H. Peter Anvin wrote:
> Also, the x87 FPU is a very clunky beast that's slow to emulate (unlike
> SSE).
Yeah x87 is rather horrible.
--
Len Sorensen
On Tue, Nov 10, 2009 at 02:47:20PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 02:42 PM, Willy Tarreau wrote:
> > On Tue, Nov 10, 2009 at 11:20:31PM +0100, Ingo Molnar wrote:
> >>
> >> * H. Peter Anvin <[email protected]> wrote:
> >>
> >>> *THIS* is the kind of complexity that makes me think that having a
> >>> single source for all interpretation done in the kernel is the
> >>> preferred option.
> >>
> >> Definitely agreed ... The NX code is quite a maze right now, so changes
> >> to it should come generously laced with cleanups.
> >
> > BTW, I don't see why we should be impacted by NX. Trying to
> > execute from an NX page would return a SEGV, not SIGILL, so
> > we should not be bothered, am I wrong ?
>
> Yes. Consider a page-crossing instruction.
OK, but to be pragmatic, NX is there to prevent execution of
instructions in the stack (or heap) during buffer overflows.
If we only implement the few instructions lised in previous
mail, there is very little benefit to check for NX :
- those instructions cannot jump to other code, they just
change one register or memory location at most (or just nop)
- once we return from the signal handler, if we have crossed
a NX page boundary, the program will segfault anyway, taking
with it the change we just completed.
- last, the probability of having an NX page just after an
executable one seems too tight to me to even constitute
an attack vector ! BTW, I'm not even certain that all CPUs
correctly implement this check !
On the other hand, I certainly understand why this would be
an important check in a complete emulator which could parse
and emulate a flow of instructions.
So in short, I think we could reasonably implement CMOV/NOPL
with the instruction length control, with getuser for data
accesses but without checking the code pages permissions if
we know that the CPU could already fetch the beginning of
the instruction correctly to cause an invalid opcode trap.
I'm not saying this is perfect, just that this is reasonable.
Regards,
Willy
On 11/10/2009 09:52 PM, Willy Tarreau wrote:
>
> - last, the probability of having an NX page just after an
> executable one seems too tight to me to even constitute
> an attack vector ! BTW, I'm not even certain that all CPUs
> correctly implement this check !
>
Do you have *any* *evidence* *whatsoever* for that assertion?!
I personally will consider something that doesn't implement proper
security check to be a potential security hole and will NAK the patch.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Tue, Nov 10, 2009 at 10:15:58PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 09:52 PM, Willy Tarreau wrote:
> >
> > - last, the probability of having an NX page just after an
> > executable one seems too tight to me to even constitute
> > an attack vector ! BTW, I'm not even certain that all CPUs
> > correctly implement this check !
> >
>
> Do you have *any* *evidence* *whatsoever* for that assertion?!
No, just basic feeling based on implementation cost and difficulty
vs gains as I explained.
> I personally will consider something that doesn't implement proper
> security check to be a potential security hole and will NAK the patch.
Even in the case of the NOPL instruction ? I clearly don't see
the potential security hole !
Willy
On 11/10/2009 10:36 PM, Willy Tarreau wrote:
>>
>> Do you have *any* *evidence* *whatsoever* for that assertion?!
>
> No, just basic feeling based on implementation cost and difficulty
> vs gains as I explained.
Quite on the contrary; in hardware it would be pretty hard to *not* do
the right thing.
>> I personally will consider something that doesn't implement proper
>> security check to be a potential security hole and will NAK the patch.
>
> Even in the case of the NOPL instruction ? I clearly don't see
> the potential security hole !
>
You have it backwards. Prove that there *isn't* one and we'll talk.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
> (...)
> > > CMOV/NOPL are rarely used, thus have no reason to cause a massive
> > > performance drop, but are frequent enough (at least cmov) for almost
> >
> > *One* CMOV in the inner loop will make your performance go down 20x.
>
> yes, just like with emulated FPU or trapped unaligned accesses. It's
> just like flying fishes. They exist but they aren't the most common
> ones. If people encounter these cases on a specific program, then
> they just have to recompile it if it is a problem. At least they
> don't rebuild the whole distro. And once again, I've been using
> cmpxchg/bswap emulation for years on my i386 without feeling any
> need for a rebuild, and CMOV emulation for years now on my mini-itx
And did you set cpu family to 6 for your 386?
That's the part I was objecting most. Yes, you can emulate, but
emulation is very bad for performance... so don't lie about cpu
family.
(Imagine application that has NOPL in inner loop, for performance
reasons. You want to use version _without_ the NOPL on processors that
lack it.)
So... I don't like instruction emulation, but can live with it. But
don't lie about supported instructions in /proc as original patch did.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> > >>> *THIS* is the kind of complexity that makes me think that having a
> > >>> single source for all interpretation done in the kernel is the
> > >>> preferred option.
> > >>
> > >> Definitely agreed ... The NX code is quite a maze right now, so changes
> > >> to it should come generously laced with cleanups.
> > >
> > > BTW, I don't see why we should be impacted by NX. Trying to
> > > execute from an NX page would return a SEGV, not SIGILL, so
> > > we should not be bothered, am I wrong ?
> >
> > Yes. Consider a page-crossing instruction.
>
> OK, but to be pragmatic, NX is there to prevent execution of
> instructions in the stack (or heap) during buffer overflows.
> If we only implement the few instructions lised in previous
> mail, there is very little benefit to check for NX :
>
> - those instructions cannot jump to other code, they just
> change one register or memory location at most (or just nop)
>
> - once we return from the signal handler, if we have crossed
> a NX page boundary, the program will segfault anyway, taking
> with it the change we just completed.
>
> - last, the probability of having an NX page just after an
> executable one seems too tight to me to even constitute
> an attack vector ! BTW, I'm not even certain that all CPUs
> correctly implement this check !
Yes, you can probably "get away" with it. But I would not want to
debug problems on systems with half-instruction-emulation. Please do
it right, or not at all.
> So in short, I think we could reasonably implement CMOV/NOPL
> with the instruction length control, with getuser for data
> accesses but without checking the code pages permissions if
> we know that the CPU could already fetch the beginning of
> the instruction correctly to cause an invalid opcode trap.
>
> I'm not saying this is perfect, just that this is reasonable.
Reasonable hack to get distro booting yes. Reasonable for mainline?
No.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, Nov 10, 2009 at 11:57:04PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 10:36 PM, Willy Tarreau wrote:
> >>
> >> Do you have *any* *evidence* *whatsoever* for that assertion?!
> >
> > No, just basic feeling based on implementation cost and difficulty
> > vs gains as I explained.
>
> Quite on the contrary; in hardware it would be pretty hard to *not* do
> the right thing.
It just depends on how the instruction prefetcher is implemented.
If the check is only performed on the first byte of the opcode,
we can miss the tail. In my experience, intel processors have
been doing instruction checks right, but I have no reason to be
absolutely certain that all other vendors do that right, especially
those targetting cheap embedded systems.
Eventhough this one is not for this precise case, here's an example
of such a missed boundary check :
http://download.intel.com/design/processor/specupdt/315593.pdf
AK27: "general protection fault may not be signaled on data segment
limit violation above 4-G limit". Status: "no fix".
Note that I don't find this can present a significant vulnerability
risk. Maybe if someone comes with a concrete case where it is a real
trouble, they will fix it.
> >> I personally will consider something that doesn't implement proper
> >> security check to be a potential security hole and will NAK the patch.
> >
> > Even in the case of the NOPL instruction ? I clearly don't see
> > the potential security hole !
> >
>
> You have it backwards. Prove that there *isn't* one and we'll talk.
You're not helping. You're asking me to prove that something has no
issue, all I can say is that it does not have any issue I can imagine.
Proving the opposite with an example would be easier.
All I can say is that executing a NOP results in no state change in
the processor except the instruction pointer which points to the
next instruction after execution. Since a NOP changes nothing, it
cannot be used alone to provide any privilege, access to data or
any such thing. Since it does not perform any jump, it cannot either
be used to take back control of the execution flow. And it is certain
that the next instruction after it will be executed, so if the NOP
crosses a page boundary and completes on a non-executable one, the
next instruction will trigger the PF.
So I can't see how a NOP can be used to circumvent any protection.
Willy
On Wed, Nov 11, 2009 at 09:03:13AM +0100, Pavel Machek wrote:
>
> > (...)
> > > > CMOV/NOPL are rarely used, thus have no reason to cause a massive
> > > > performance drop, but are frequent enough (at least cmov) for almost
> > >
> > > *One* CMOV in the inner loop will make your performance go down 20x.
> >
> > yes, just like with emulated FPU or trapped unaligned accesses. It's
> > just like flying fishes. They exist but they aren't the most common
> > ones. If people encounter these cases on a specific program, then
> > they just have to recompile it if it is a problem. At least they
> > don't rebuild the whole distro. And once again, I've been using
> > cmpxchg/bswap emulation for years on my i386 without feeling any
> > need for a rebuild, and CMOV emulation for years now on my mini-itx
>
> And did you set cpu family to 6 for your 386?
No, not at all !
> That's the part I was objecting most. Yes, you can emulate, but
> emulation is very bad for performance... so don't lie about cpu
> family.
That's the point I agree with you, I don't like this setting either.
I don't want to incite applications to use features that are emulated.
However I think that allowing applications run when they don't/can't
perform the check is useful.
> (Imagine application that has NOPL in inner loop, for performance
> reasons. You want to use version _without_ the NOPL on processors that
> lack it.)
Exactly. Same for MMX, 3DNOW, SSE* and such.
> So... I don't like instruction emulation, but can live with it. But
> don't lie about supported instructions in /proc as original patch did.
100% agreed !
Regards,
Willy
On Tue, 10 Nov 2009 22:15:58 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On 11/10/2009 09:52 PM, Willy Tarreau wrote:
> >
> > - last, the probability of having an NX page just after an
> > executable one seems too tight to me to even constitute
> > an attack vector ! BTW, I'm not even certain that all CPUs
> > correctly implement this check !
> >
>
> Do you have *any* *evidence* *whatsoever* for that assertion?!
>
> I personally will consider something that doesn't implement proper
> security check to be a potential security hole and will NAK the patch.
Assuming you are doing the fault handling only for a CPU where you expect
to need it (which would be wise I think) then it's a question of whether
the CPU supports NX in the first place.
Even if it does the only thing you can reasonably hope to do is move the
program counter one instruction into the next page. The user access
checks will trap any attempt to cross 0xC0000000 and the protection
fault might just occur one or part of an instruction on in the other
cases.
Alan
> So in my opinion, we should have :
> - CMOV (for 486, Pentium, C3, K6, ...)
> - NOPL (newcomer)
>
> And if we want to extend down to i386 :
> - BSWAP (=htonl)
> - CMPXCHG (mutex)
> - XADD (never encoutered but cheap)
>
> I still have the 2.4 patch for BSWAP, CMPXCHG, CMOV and XADD lying
If we go that far it needs a lot more thought and probably to use the KVM
code simply because of all the complexities around prefixes and the like
(Not of course that the real CPUs are consistent either)
On Wed, Nov 11, 2009 at 10:21:36AM +0000, Alan Cox wrote:
>
> > So in my opinion, we should have :
> > - CMOV (for 486, Pentium, C3, K6, ...)
> > - NOPL (newcomer)
> >
> > And if we want to extend down to i386 :
> > - BSWAP (=htonl)
> > - CMPXCHG (mutex)
> > - XADD (never encoutered but cheap)
> >
> > I still have the 2.4 patch for BSWAP, CMPXCHG, CMOV and XADD lying
>
> If we go that far it needs a lot more thought and probably to use the KVM
> code simply because of all the complexities around prefixes and the like
well, ironically the KVM decoder can decode an infinite string
of prefixes while the very simple and limited one in the patch
I showed did correct checks for invalid cases (multiple segments,
repeated locks, etc...). It would only accept one data size prefix,
one address size prefix, one lock and one segment prefix.
I have nothing against the KVM one, it's just that it's a
full-featured emulator while we were speaking about a 686
emulators for lower-end processors. 98% of the instructions
supported by KVM will never be used for that purpose. This
is where I see a waste. We're comparing 7000 lines of code
supporting 64-bit, real mode, NX, etc... to 400. I fail to
see how we can guarantee that we do it right in that larger
code (and the example above proves it wrong).
And as you said, NX is not an issue on the CPUs we're
targetting.
Regards,
Willy
Hi!
On Tue, 2009-11-10 at 21:54 +0100, Pavel Machek wrote:
[...]
> > CMOV/NOPL are rarely used, thus have no reason to cause a massive
> > performance drop, but are frequent enough (at least cmov) for almost
>
> *One* CMOV in the inner loop will make your performance go down 20x.
But it runs.
The pragmatic side is:
If people notices the performance drop, it would be good to have
something in syslog and/or dmesg and/or /proc and/or sysfs
If people do not notice the performance drop, who cares?
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On 11/11/2009 02:43 AM, Willy Tarreau wrote:
>
> well, ironically the KVM decoder can decode an infinite string
> of prefixes while the very simple and limited one in the patch
> I showed did correct checks for invalid cases (multiple segments,
> repeated locks, etc...). It would only accept one data size prefix,
> one address size prefix, one lock and one segment prefix.
>
> I have nothing against the KVM one, it's just that it's a
> full-featured emulator while we were speaking about a 686
> emulators for lower-end processors. 98% of the instructions
> supported by KVM will never be used for that purpose. This
> is where I see a waste. We're comparing 7000 lines of code
> supporting 64-bit, real mode, NX, etc... to 400. I fail to
> see how we can guarantee that we do it right in that larger
> code (and the example above proves it wrong).
>
> And as you said, NX is not an issue on the CPUs we're
> targetting.
>
RIGHT NOW. Except that, guess what, once we have emulation in the
kernel, people are going to demand new instructions to cover *new* gaps
in the instruction set. And yes, this is going to mean 64 bits and what
not.
The main reason to unify with KVM is not because KVM is doing everything
right (I am perfectly aware that it doesn't), but because I don't really
want to see a plethora of half-arsed emulators spread across the kernel,
each with its own bugs. If unified, at least there is one codebase
which can get fixed.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
"H. Peter Anvin" <[email protected]> writes:
> On 11/10/2009 09:24 AM, Alan Cox wrote:
>>>
>>> In the short term, yes, of course. However, if we're going to do
>>> emulation, we might as well do it right.
>>
>> Why is using KVM doing it right ? It sounds like its doing it slowly,
>> and hideously memory inefficiently. You are solving an uninteresting
>> general case problem when you just need two tiny fixups (or perhaps 3 if
>> you want to fix up early x86-64 prefetch)
>
> Why do we only need "two tiny fixups"? Where do we draw the line in
> terms of ISA compatibility? One could easily argue that the Right
> Thing[TM] is to be able to process any optional instruction -- otherwise
> one has a very difficult place to draw a line.
>
> Consider SSE3, for example. Why should the same concept not apply to
> SSE3 instructions as to CMOV?
FWIW, the issue of the binary-only flashplayer.so came up later in the thread,
but to add my few cents:
When flash 10 was released the binary only 64-bit version generated
instructions from the LAHF set unconditionally, in part because Windows chose
to emulate those on the very few x86-64 platforms that didn't do them in
hardware.
At that time it would have been very nice from a "user support" point of view
to be able to add LAHF emulation to support the software. Yes, it is ugly,
binary-only code, but it is reasonably popular...
Daniel
...in the end, in fact, popular enough to have at least a couple of people
I know purchase a new CPU that did implement it, just for flash on Linux.
--
✣ Daniel Pittman ✉ [email protected] ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
On 11/11/2009 04:51 PM, Daniel Pittman wrote:
>>
>> Consider SSE3, for example. Why should the same concept not apply to
>> SSE3 instructions as to CMOV?
>
> FWIW, the issue of the binary-only flashplayer.so came up later in the thread,
> but to add my few cents:
>
> When flash 10 was released the binary only 64-bit version generated
> instructions from the LAHF set unconditionally, in part because Windows chose
> to emulate those on the very few x86-64 platforms that didn't do them in
> hardware.
>
> At that time it would have been very nice from a "user support" point of view
> to be able to add LAHF emulation to support the software. Yes, it is ugly,
> binary-only code, but it is reasonably popular...
>
> ...in the end, in fact, popular enough to have at least a couple of people
> I know purchase a new CPU that did implement it, just for flash on Linux.
The main use case for emulation is indeed to support binary-only or
otherwise precompiled software that exposes holes in the instruction
set. As such, emulation can also be used to "raise the baseline", which
can be a highly desirable thing to do.
My point in all of this is that this is not a static problem, and that
if we're going to do emulation we need to consider the requirements
going forward. I would *prefer* to have only one interpreter to deal
with when it's broken, and I certainly trust Avi & co to do the right
thing, but I'm certainly willing to entertain technical reasons why it
is not the right thing to do -- *not just now but in the future*. The
latter is an absolutely critical constraint, though.
Once we have a general enough interpreter framework, we can add new
instructions as needed; it should make it a lot easier to phase in new
instructions while not breaking old legacy machines.
-hpa
On Wed, Nov 11, 2009 at 1:32 AM, Willy Tarreau <[email protected]> wrote:
> All I can say is that executing a NOP results in no state change in
> the processor except the instruction pointer which points to the
> next instruction after execution. Since a NOP changes nothing, it
> cannot be used alone to provide any privilege, access to data or
> any such thing. Since it does not perform any jump, it cannot either
> be used to take back control of the execution flow. And it is certain
> that the next instruction after it will be executed, so if the NOP
> crosses a page boundary and completes on a non-executable one, the
> next instruction will trigger the PF.
>
> So I can't see how a NOP can be used to circumvent any protection.
So a nop(l) sled won't be a problem, right?
On Wed, Nov 11, 2009 at 06:23:14PM -0800, Matt Thrailkill wrote:
> On Wed, Nov 11, 2009 at 1:32 AM, Willy Tarreau <[email protected]> wrote:
> > All I can say is that executing a NOP results in no state change in
> > the processor except the instruction pointer which points to the
> > next instruction after execution. Since a NOP changes nothing, it
> > cannot be used alone to provide any privilege, access to data or
> > any such thing. Since it does not perform any jump, it cannot either
> > be used to take back control of the execution flow. And it is certain
> > that the next instruction after it will be executed, so if the NOP
> > crosses a page boundary and completes on a non-executable one, the
> > next instruction will trigger the PF.
> >
> > So I can't see how a NOP can be used to circumvent any protection.
>
> So a nop(l) sled won't be a problem, right?
Right. However we just noticed that with the KVM emulator, you
can make it loop for a long time if you feed it with prefixes
only. For instance, write a function which does zillions of 0x66
(data size prefix) then return (0xC3) : 66 66 66 ... 66 C3.
This is typically the sort of things we must be very careful about
in emulators, because we don't want users to steal large amounts
of system CPU time doing nothing.
Regards,
Willy
On 11/11/2009 09:27 PM, Willy Tarreau wrote:
>
> Right. However we just noticed that with the KVM emulator, you
> can make it loop for a long time if you feed it with prefixes
> only. For instance, write a function which does zillions of 0x66
> (data size prefix) then return (0xC3) : 66 66 66 ... 66 C3.
>
> This is typically the sort of things we must be very careful about
> in emulators, because we don't want users to steal large amounts
> of system CPU time doing nothing.
>
That is a (serious) bug in the KVM interpreter, and indeed the exact
kind of issues interpreters tend to have... which is why I'd like one
piece of code with one set of bugs, and more eyeballs on that one piece
of code so they can be fixed.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, Nov 11, 2009 at 09:31:26PM -0800, H. Peter Anvin wrote:
> On 11/11/2009 09:27 PM, Willy Tarreau wrote:
> >
> > Right. However we just noticed that with the KVM emulator, you
> > can make it loop for a long time if you feed it with prefixes
> > only. For instance, write a function which does zillions of 0x66
> > (data size prefix) then return (0xC3) : 66 66 66 ... 66 C3.
> >
> > This is typically the sort of things we must be very careful about
> > in emulators, because we don't want users to steal large amounts
> > of system CPU time doing nothing.
> >
>
> That is a (serious) bug in the KVM interpreter, and indeed the exact
> kind of issues interpreters tend to have... which is why I'd like one
> piece of code with one set of bugs, and more eyeballs on that one piece
> of code so they can be fixed.
Well, I could try to work on a fix (basically the same principle as in
mine, with prefix flags), but I simply don't know how to test the code.
I've never experimented with KVM yet and learned it embeds an emulator
for the first time a few days ago in this thread :-/ If it's easy to
make use of it, I'm not opposed to try.
Willy
Hi!
> > > The Geode NX (which no one has mentioned yet) is an Athlon derived chip.
> >
> > *As far as I know* K6 didn't have NOPL, whereas K7 does.
>
> Which would be mroe indication that the Geode LX may in fact be a K6
> based CPU design. The Geode NX is very clearly an Athlon (K7) and AMD
> has never even tried to hide that. They have never said what the LX is
> based on, other than it isn't based on the GX (fortunately, that thing
> was a buggy piece of crap).
Interesting...geode gx being buggy? I know pentium was famous for the
fdiv bug, but never heard of geode gx problems...
Are they something kernel can work around, or do those chips just have
to be avoided? Anything malicious user can use to crash the system?
Anything malicious user can use to gain extra priviledge?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu, 12 Nov 2009 13:18:05 +0100
Pavel Machek <[email protected]> wrote:
> Hi!
>
> > > > The Geode NX (which no one has mentioned yet) is an Athlon
> > > > derived chip.
> > >
> > > *As far as I know* K6 didn't have NOPL, whereas K7 does.
> >
> > Which would be mroe indication that the Geode LX may in fact be a K6
> > based CPU design. The Geode NX is very clearly an Athlon (K7) and
> > AMD has never even tried to hide that. They have never said what
> > the LX is based on, other than it isn't based on the GX
> > (fortunately, that thing was a buggy piece of crap).
>
> Interesting...geode gx being buggy? I know pentium was famous for the
> fdiv bug, but never heard of geode gx problems...
http://www.amd.com/files/connectivitysolutions/geode/geode_gx/31533E_gx_2.1_specupdate.pdf
1.4 is my favorite. The branch prediction logic can send the
instruction pointer off to la-la land.
While at olpc, we had a sad discussion about what the *real* impact of
the workaround was...
> Which would be mroe indication that the Geode LX may in fact be a K6
> based CPU design. The Geode NX is very clearly an Athlon (K7) and AMD
> has never even tried to hide that. They have never said what the LX is
> based on, other than it isn't based on the GX (fortunately, that thing
> was a buggy piece of crap).
Well, do you know the history of the Geode line? It originated in the Cyrix MediaGX, then that was sold to NSC and renamed to Geode, then that was sold to AMD later.
Yuhong Bao
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/-
> Interesting...geode gx being buggy? I know pentium was famous for the
> fdiv bug, but never heard of geode gx problems...
Depends which one - the older ones were very solid. The very first (5510)
had some interesting bugs but they were fairly soon stomped.
Alan
Lennart Sorensen wrote:
> Which would be mroe indication that the Geode LX may in fact be a K6
> based CPU design. The Geode NX is very clearly an Athlon (K7) and AMD
> has never even tried to hide that. They have never said what the LX is
> based on, other than it isn't based on the GX (fortunately, that thing
> was a buggy piece of crap).
When looking this up for http://www.pixelbeat.org/scripts/gcccpuopt
I determined that gcc --march=geode is for the LX which is
K6 + 3dnowext. Note gcc has no way currently of explicitly selecting 3dnowext.
cheers,
P?draig.
On Thu, Nov 12, 2009 at 01:18:05PM +0100, Pavel Machek wrote:
> Interesting...geode gx being buggy? I know pentium was famous for the
> fdiv bug, but never heard of geode gx problems...
>
> Are they something kernel can work around, or do those chips just have
> to be avoided? Anything malicious user can use to crash the system?
> Anything malicious user can use to gain extra priviledge?
Well the one problem I have encountered is in the jsm driver (exar pci
uart driver).
On a typical PC with a P3, P4, Core 2, athlong, etc, it works perfectly.
It works perfectly on a Geode LX as well. On a Geode SC1200 (which
is a GX with companion chip in one) it transmits out of order in a
consistent way.
If you transmit a chunk of data that looks like this:
0123456789abcdef
The data that comes out of the UART is:
123056749ab8defc
The fix to avoid the problem is:
- memcpy_toio(&ch->ch_neo_uart->txrxburst, ch->ch_wqueue + tail, s);
+ for(i=0;i<s;i++) {
+ memcpy_toio(&(ch->ch_neo_uart->txrxburst[i]), ch->ch_wqueue + tail + i, 1);
+ }
Calling memcpy_toio on the Geode SC1200 on anything larger than 1 byte
causes the data to be delivered (at least on the PCI bus) reordered in
that very consistent manner.
The alignment doesn't affect it (I tested that). Simply doing more than
1 byte at a time with memcpy_toio to the pci device messes up.
So as a result we run with this patch applied even though it certainly
is less efficient, but at least it works.
Every other CPU works fine that I have ever tried the driver on.
--
Len Sorensen
On Thu, Nov 12, 2009 at 09:55:12PM -0800, Yuhong Bao wrote:
> Well, do you know the history of the Geode line? It originated in the Cyrix MediaGX, then that was sold to NSC and renamed to Geode, then that was sold to AMD later.
That is the history of the Geode GX line yes. The Geode LX is an AMD
design, and the Geode NX is an athlon with a lower voltage. Hence the
LX probably isn't anything like the original GX. It certainly has none
of the bugs or performance issues of the GX.
--
Len Sorensen
On Fri, Nov 13, 2009 at 01:33:54PM +0000, Pádraig Brady wrote:
> Lennart Sorensen wrote:
> > Which would be mroe indication that the Geode LX may in fact be a K6
> > based CPU design. The Geode NX is very clearly an Athlon (K7) and AMD
> > has never even tried to hide that. They have never said what the LX is
> > based on, other than it isn't based on the GX (fortunately, that thing
> > was a buggy piece of crap).
>
> When looking this up for http://www.pixelbeat.org/scripts/gcccpuopt
> I determined that gcc --march=geode is for the LX which is
> K6 + 3dnowext. Note gcc has no way currently of explicitly selecting 3dnowext.
If that is the case then geode is an awful name choice given the GX,
LX and NX all have vastly different features and requirements, yet are
all geodes.
--
Len Sorensen
O> Calling memcpy_toio on the Geode SC1200 on anything larger than 1 byte
> causes the data to be delivered (at least on the PCI bus) reordered in
> that very consistent manner.
Sounds like you or your firmware have the caching and write combining
misconfigured.
>
> The alignment doesn't affect it (I tested that). Simply doing more than
> 1 byte at a time with memcpy_toio to the pci device messes up.
Which usually means some form of write gathering is enabled or something
thinks the device is write combining on the PCI bus. What does the PCI
bus and the RCRR MTRR set look like ?
On Fri, Nov 13, 2009 at 04:57:30PM +0000, Alan Cox wrote:
> O> Calling memcpy_toio on the Geode SC1200 on anything larger than 1 byte
> > causes the data to be delivered (at least on the PCI bus) reordered in
> > that very consistent manner.
>
> Sounds like you or your firmware have the caching and write combining
> misconfigured.
Well I can't wouch for the firmware, and unfortunately without knowing
what address the firmware placed the configuration registers at there
is no way for me to check the settings.
> > The alignment doesn't affect it (I tested that). Simply doing more than
> > 1 byte at a time with memcpy_toio to the pci device messes up.
>
> Which usually means some form of write gathering is enabled or something
> thinks the device is write combining on the PCI bus. What does the PCI
> bus and the RCRR MTRR set look like ?
Well there is no /proc/mtrr.
Here is lspci -vvx output. It is device 01:08.
00:00.0 Host bridge: Cyrix Corporation PCI Master
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
00: 78 10 01 00 07 00 80 22 00 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at f800 [size=256]
Region 1: Memory at e0000000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: ec 10 39 81 03 00 90 02 10 00 00 02 00 40 00 00
10: 01 f8 00 00 00 00 00 e0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ec 10 39 81
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 20 40
00:11.0 PCI bridge: Hint Corp Unknown device 0031 (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 32 bytes
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: e0100000-e01fffff
Prefetchable memory behind bridge: 00000000c0000000-00000000c03fffff
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Vital Product Data
00: 88 33 31 00 07 00 b0 02 01 00 04 06 08 40 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 d1 d1 a0 02
20: 10 e0 10 e0 01 c0 31 c0 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 00 00 00
00:12.0 ISA bridge: National Semiconductor Corporation SCx200 Bridge
Subsystem: National Semiconductor Corporation SCx200 Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 32 bytes
Region 0: I/O ports at 6400 [size=64]
Region 1: I/O ports at 6600 [size=64]
00: 0b 10 00 05 0f 00 80 02 00 00 01 06 08 00 80 00
10: 01 64 00 00 01 66 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0b 10 00 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:12.1 Bridge: National Semiconductor Corporation SCx200 SMI
Subsystem: National Semiconductor Corporation SCx200 SMI
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Region 0: I/O ports at 6000 [size=256]
00: 0b 10 01 05 01 00 80 02 00 00 80 06 00 00 00 00
10: 01 60 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0b 10 01 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:12.2 IDE interface: National Semiconductor Corporation SCx200 IDE (rev 01) (prog-if 80 [Master])
Subsystem: National Semiconductor Corporation SCx200 IDE
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8]
Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1]
Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8]
Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1]
Region 4: I/O ports at fc00 [size=16]
00: 0b 10 02 05 05 00 80 02 01 80 01 01 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 fc 00 00 00 00 00 00 00 00 00 00 0b 10 02 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:12.3 Multimedia audio controller: National Semiconductor Corporation SCx200 Audio
Subsystem: National Semiconductor Corporation SCx200 Audio
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at 40011000 (32-bit, non-prefetchable) [size=4K]
00: 0b 10 03 05 06 00 80 02 00 00 01 04 00 00 00 00
10: 00 10 01 40 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0b 10 03 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:12.4 VGA compatible controller: National Semiconductor Corporation SCx200 Video (rev 01) (prog-if 00 [VGA])
Subsystem: National Semiconductor Corporation SCx200 Video
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin C routed to IRQ 0
Region 0: Memory at 40800000 (32-bit, non-prefetchable) [size=4K]
Region 1: Memory at 40010000 (32-bit, non-prefetchable) [size=4K]
Region 2: Memory at e0004000 (32-bit, non-prefetchable) [size=4K]
00: 0b 10 04 05 02 00 80 02 01 00 00 03 00 00 00 00
10: 00 00 80 40 00 00 01 40 00 40 00 e0 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0b 10 04 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 fe 03 00 00
00:12.5 Bridge: National Semiconductor Corporation SCx200 XBus
Subsystem: National Semiconductor Corporation SCx200 XBus
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Region 0: I/O ports at 6200 [size=64]
00: 0b 10 05 05 03 00 80 02 00 00 80 06 00 00 00 00
10: 01 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0b 10 05 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:13.0 USB Controller: Compaq Computer Corporation ZFMicro Chipset USB (rev 08) (prog-if 10 [OHCI])
Subsystem: Compaq Computer Corporation ZFMicro Chipset USB
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0 (20000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at e0005000 (32-bit, non-prefetchable) [size=4K]
00: 11 0e f8 a0 07 00 80 02 08 10 03 0c 08 00 00 00
10: 00 50 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 11 0e f8 a0
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50
01:04.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 36)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (6000ns min, 6000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at dc00 [size=32]
Region 1: Memory at e0100000 (32-bit, non-prefetchable) [size=32]
Expansion ROM at c0000000 [disabled] [size=1M]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 22 10 00 20 07 00 90 02 36 00 00 02 00 40 00 00
10: 01 dc 00 00 00 00 10 e0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 ff 40 00 00 00 00 00 00 00 0b 01 18 18
01:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 36)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (6000ns min, 6000ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at d800 [size=32]
Region 1: Memory at e0100020 (32-bit, non-prefetchable) [size=32]
Expansion ROM at c0100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 22 10 00 20 07 00 90 02 36 00 00 02 00 40 00 00
10: 01 d8 00 00 20 00 10 e0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 ff 40 00 00 00 00 00 00 00 0a 01 18 18
01:06.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 36)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d400 [size=32]
Region 1: Memory at e0100040 (32-bit, non-prefetchable) [size=32]
Expansion ROM at c0200000 [disabled] [size=1M]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 22 10 00 20 03 00 90 02 36 00 00 02 00 40 00 00
10: 01 d4 00 00 40 00 10 e0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 ff 40 00 00 00 00 00 00 00 0b 01 18 18
01:07.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 36)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at d000 [size=32]
Region 1: Memory at e0100060 (32-bit, non-prefetchable) [size=32]
Expansion ROM at c0300000 [disabled] [size=1M]
Capabilities: [40] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 22 10 00 20 03 00 90 02 36 00 00 02 00 40 00 00
10: 01 d0 00 00 60 00 10 e0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 ff 40 00 00 00 00 00 00 00 0a 01 18 18
01:08.0 Serial controller: Unknown device 5243:0001 (rev 04) (prog-if 02 [16550])
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 11
Region 0: Memory at e0100800 (32-bit, non-prefetchable) [size=2K]
00: 43 52 01 00 02 00 80 00 04 02 00 07 00 00 00 00
10: 00 08 10 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 00
01:09.0 Network controller: Sangoma Technologies Corp. A104u Quad T1/E1 AFT
Subsystem: Unknown device a013:1400
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (1250ns min, 3750ns max)
Interrupt: pin A routed to IRQ 10
Region 0: Memory at e0110000 (32-bit, non-prefetchable) [size=64K]
00: 23 19 00 04 06 00 00 02 00 00 80 02 00 40 00 00
10: 00 00 11 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 13 a0 00 14
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 05 0f
Anything else I can grab that might give a clue?
--
Len Sorensen
> > Which usually means some form of write gathering is enabled or something
> > thinks the device is write combining on the PCI bus. What does the PCI
> > bus and the RCRR MTRR set look like ?
>
> Well there is no /proc/mtrr.
Linux doesn't have an RCRR driver - so you need to dump the MSRs directly
(see the Geode GX manual MSR list) and read out the msrs.
On Fri, Nov 13, 2009 at 09:21:48PM +0000, Alan Cox wrote:
> > > Which usually means some form of write gathering is enabled or something
> > > thinks the device is write combining on the PCI bus. What does the PCI
> > > bus and the RCRR MTRR set look like ?
> >
> > Well there is no /proc/mtrr.
>
> Linux doesn't have an RCRR driver - so you need to dump the MSRs directly
> (see the Geode GX manual MSR list) and read out the msrs.
Hmm, so looking at the Geode SC1200 data book, it doesn't mention mtrr,
rcrr or msr. I know the Geode LX has lots of MSR documentation.
Looking at the GX1 data book (which it seems the SC1200 refers to),
it does mention mtrr, as not supported. I have managed to find something
called the region configuration range registers, which must be RCRR.
I think this is a dump of the 8 region MSRs:
0x00001810->0xbf848ba8b7f282e8
0x00001811->0xbfcb5328b7f932e8
0x00001812->0xbfe93418b7f712e8
0x00001813->0xbfbd59b8b7fb52e8
0x00001814->0xbfb5f868b7f3d2e8
0x00001815->0xbf815018b7ef42e8
0x00001816->0xbfc4c0d8b7f2a2e8
0x00001817->0xbfe1eea8b7eff2e8
This must of course mean I read them wrong because those are not the
right values. Some bit ranges must be 0 and are not.
Trying rdmsr from msr-tools 1.2 gives me:
# ./rdmsr -x 0x00001810
rdmsr: CPU 0 cannot read MSR 0x00001810
Hmm, now what?
--
Len Sorensen
> Trying rdmsr from msr-tools 1.2 gives me:
>
> # ./rdmsr -x 0x00001810
> rdmsr: CPU 0 cannot read MSR 0x00001810
>
> Hmm, now what?
Beats me - I thought the region registers were read/write.
On Tue, Nov 17, 2009 at 11:59:29AM +0000, Alan Cox wrote:
> > Trying rdmsr from msr-tools 1.2 gives me:
> >
> > # ./rdmsr -x 0x00001810
> > rdmsr: CPU 0 cannot read MSR 0x00001810
> >
> > Hmm, now what?
>
> Beats me - I thought the region registers were read/write.
Actually it seems I can't read any MSRs on it.
Using the same kernel, same msr module, same rdmsr on a Geode LX system
works fine. I can read MSRs no problem. Seems the msr module isn't
working on the Geode SC1200 (and hence possibly the GX1).
--
Len Sorensen
On 11/17/2009 06:34 AM, Lennart Sorensen wrote:
> On Tue, Nov 17, 2009 at 11:59:29AM +0000, Alan Cox wrote:
>>> Trying rdmsr from msr-tools 1.2 gives me:
>>>
>>> # ./rdmsr -x 0x00001810
>>> rdmsr: CPU 0 cannot read MSR 0x00001810
>>>
>>> Hmm, now what?
>>
>> Beats me - I thought the region registers were read/write.
>
> Actually it seems I can't read any MSRs on it.
>
> Using the same kernel, same msr module, same rdmsr on a Geode LX system
> works fine. I can read MSRs no problem. Seems the msr module isn't
> working on the Geode SC1200 (and hence possibly the GX1).
>
That would seem highly unlikely ... unless the MSR flag isn't exposed in
CPUID.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Tue, 17 Nov 2009 11:59:29 GMT, Alan Cox said:
> > Trying rdmsr from msr-tools 1.2 gives me:
> >
> > # ./rdmsr -x 0x00001810
> > rdmsr: CPU 0 cannot read MSR 0x00001810
> >
> > Hmm, now what?
>
> Beats me - I thought the region registers were read/write.
Umm... 'modprobe msr'?
(Recently I had what *looked* like an xorg server bug, turned out to be a
missing CONFIG_INPUT_EVDEV.. Whoops. ;)
On Tue, Nov 17, 2009 at 08:43:45AM -0800, H. Peter Anvin wrote:
> On 11/17/2009 06:34 AM, Lennart Sorensen wrote:
> > On Tue, Nov 17, 2009 at 11:59:29AM +0000, Alan Cox wrote:
> >>> Trying rdmsr from msr-tools 1.2 gives me:
> >>>
> >>> # ./rdmsr -x 0x00001810
> >>> rdmsr: CPU 0 cannot read MSR 0x00001810
> >>>
> >>> Hmm, now what?
> >>
> >> Beats me - I thought the region registers were read/write.
> >
> > Actually it seems I can't read any MSRs on it.
> >
> > Using the same kernel, same msr module, same rdmsr on a Geode LX system
> > works fine. I can read MSRs no problem. Seems the msr module isn't
> > working on the Geode SC1200 (and hence possibly the GX1).
> >
>
> That would seem highly unlikely ... unless the MSR flag isn't exposed in
> CPUID.
That would be odd yes.
# cat /proc/cpuinfo
processor : 0
vendor_id : CyrixInstead
cpu family : 5
model : 9
model name : Geode(TM) Integrated Processor by National Semi
stepping : 1
cpu MHz : 266.612
cache size : 16 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu tsc msr cx8 cmov mmx cxmmx
bogomips : 534.41
clflush size : 32
power management:
That part looks OK. So why is reading MSRs not working on this stupid
CPU at any of the addresses the data book says should work.
--
Len Sorensen
On Tue, Nov 17, 2009 at 11:48:47AM -0500, [email protected] wrote:
> Umm... 'modprobe msr'?
>
> (Recently I had what *looked* like an xorg server bug, turned out to be a
> missing CONFIG_INPUT_EVDEV.. Whoops. ;)
I did. Makes no difference. If I unload it I get the same error.
# modinfo msr
filename: /lib/modules/2.6.26-2-gx1/kernel/arch/x86/kernel/msr.ko
author: H. Peter Anvin <[email protected]>
description: x86 generic MSR driver
license: GPL
vermagic: 2.6.26-2-gx1 mod_unload GEODEGX1
depends:
# lsmod |grep msr
msr 2692 0
I don't get it.
--
Len Sorensen
On 11/17/2009 09:25 AM, Lennart Sorensen wrote:
> On Tue, Nov 17, 2009 at 11:48:47AM -0500, [email protected] wrote:
>> Umm... 'modprobe msr'?
>>
>> (Recently I had what *looked* like an xorg server bug, turned out to be a
>> missing CONFIG_INPUT_EVDEV.. Whoops. ;)
>
> I did. Makes no difference. If I unload it I get the same error.
>
> # modinfo msr
> filename: /lib/modules/2.6.26-2-gx1/kernel/arch/x86/kernel/msr.ko
> author: H. Peter Anvin <[email protected]>
> description: x86 generic MSR driver
> license: GPL
> vermagic: 2.6.26-2-gx1 mod_unload GEODEGX1
> depends:
>
> # lsmod |grep msr
> msr 2692 0
>
> I don't get it.
>
Does /dev/cpu/*/msr exist?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Tue, Nov 17, 2009 at 09:33:02AM -0800, H. Peter Anvin wrote:
> On 11/17/2009 09:25 AM, Lennart Sorensen wrote:
> > On Tue, Nov 17, 2009 at 11:48:47AM -0500, [email protected] wrote:
> >> Umm... 'modprobe msr'?
> >>
> >> (Recently I had what *looked* like an xorg server bug, turned out to be a
> >> missing CONFIG_INPUT_EVDEV.. Whoops. ;)
> >
> > I did. Makes no difference. If I unload it I get the same error.
> >
> > # modinfo msr
> > filename: /lib/modules/2.6.26-2-gx1/kernel/arch/x86/kernel/msr.ko
> > author: H. Peter Anvin <[email protected]>
> > description: x86 generic MSR driver
> > license: GPL
> > vermagic: 2.6.26-2-gx1 mod_unload GEODEGX1
> > depends:
> >
> > # lsmod |grep msr
> > msr 2692 0
> >
> > I don't get it.
> >
>
> Does /dev/cpu/*/msr exist?
I ran MAKEDEV cpu in /dev, but to make sure...
# ls -l /dev/cpu/*/msr
crw------- 1 root root 202, 0 Nov 16 12:28 /dev/cpu/0/msr
crw------- 1 root root 202, 1 Nov 16 12:28 /dev/cpu/1/msr
crw------- 1 root root 202, 2 Nov 16 12:28 /dev/cpu/2/msr
crw------- 1 root root 202, 3 Nov 16 12:28 /dev/cpu/3/msr
Looks OK too.
# strace ./rdmsr -x 0x00001810
execve("./rdmsr", ["./rdmsr", "-x", "0x00001810"], [/* 13 vars */]) = 0
uname({sys="Linux", node="ruggedrouter", ...}) = 0
brk(0) = 0x8106000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f23000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f22000
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=14975, ...}) = 0
mmap2(NULL, 14975, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f1e000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240O\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1245488, ...}) = 0
mmap2(NULL, 1251484, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7dec000
mmap2(0xb7f14000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x128) = 0xb7f14000
mmap2(0xb7f1b000, 10396, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f1b000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7deb000
mprotect(0xb7f14000, 20480, PROT_READ) = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7deb6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f1e000, 14975) = 0
open("/dev/cpu/0/msr", O_RDONLY|O_LARGEFILE) = 3
pread64(3, 0xbf83ac00, 8, 6160) = -1 EIO (Input/output error)
write(2, "rdmsr: CPU 0 cannot read MSR 0x0"..., 40rdmsr: CPU 0 cannot read MSR 0x00001810
) = 40
exit_group(4) = ?
Process 15044 detached
--
Len Sorensen
On Tue, Nov 17, 2009 at 01:33:57PM -0500, Lennart Sorensen wrote:
> # strace ./rdmsr -x 0x00001810
> execve("./rdmsr", ["./rdmsr", "-x", "0x00001810"], [/* 13 vars */]) = 0
> uname({sys="Linux", node="ruggedrouter", ...}) = 0
> brk(0) = 0x8106000
> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f23000
> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f22000
> open("/etc/ld.so.cache", O_RDONLY) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=14975, ...}) = 0
> mmap2(NULL, 14975, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f1e000
> close(3) = 0
> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
> open("/lib/tls/libc.so.6", O_RDONLY) = 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240O\1"..., 512) = 512
> fstat64(3, {st_mode=S_IFREG|0644, st_size=1245488, ...}) = 0
> mmap2(NULL, 1251484, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7dec000
> mmap2(0xb7f14000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x128) = 0xb7f14000
> mmap2(0xb7f1b000, 10396, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f1b000
> close(3) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7deb000
> mprotect(0xb7f14000, 20480, PROT_READ) = 0
> set_thread_area({entry_number:-1 -> 6, base_addr:0xb7deb6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
> munmap(0xb7f1e000, 14975) = 0
> open("/dev/cpu/0/msr", O_RDONLY|O_LARGEFILE) = 3
> pread64(3, 0xbf83ac00, 8, 6160) = -1 EIO (Input/output error)
> write(2, "rdmsr: CPU 0 cannot read MSR 0x0"..., 40rdmsr: CPU 0 cannot read MSR 0x00001810
> ) = 40
> exit_group(4) = ?
> Process 15044 detached
OK, I added a bit of debuging print statements to msr.c and I find that
calling rdmsr 0x1810 returns error 0xfffffff2 (-14 that is).
It seems to end up calling:
static inline unsigned long long native_read_msr_safe(unsigned int msr,
int *err)
{
DECLARE_ARGS(val, low, high);
asm volatile("2: rdmsr ; xor %0,%0\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
"3: mov %3,%0 ; jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: "=r" (*err), EAX_EDX_RET(val, low, high)
: "c" (msr), "i" (-EFAULT));
return EAX_EDX_VAL(val, low, high);
}
That unfortunately is too low level for me to make sense of.
--
Len Sorensen
On 11/18/2009 12:21 PM, Lennart Sorensen wrote:
>
> OK, I added a bit of debuging print statements to msr.c and I find that
> calling rdmsr 0x1810 returns error 0xfffffff2 (-14 that is).
>
What this means is the RDMSR instruction traps -- the CPU doesn't
recognize this as a valid MSR.
-hpa
On Wed, Nov 18, 2009 at 12:59:36PM -0800, H. Peter Anvin wrote:
> On 11/18/2009 12:21 PM, Lennart Sorensen wrote:
> >
> > OK, I added a bit of debuging print statements to msr.c and I find that
> > calling rdmsr 0x1810 returns error 0xfffffff2 (-14 that is).
> >
>
> What this means is the RDMSR instruction traps -- the CPU doesn't
> recognize this as a valid MSR.
Strange how it says that for every value I have tried so far. Makes me
wonder if there is a register somewhere that disables MSR support.
I did find a lockout value but I can't read that one either (which says
it is always readable).
--
Len Sorensen
On Wed, Nov 18, 2009 at 04:11:22PM -0500, Lennart Sorensen wrote:
> Strange how it says that for every value I have tried so far. Makes me
> wonder if there is a register somewhere that disables MSR support.
> I did find a lockout value but I can't read that one either (which says
> it is always readable).
I tried doing rdmsr on 0x1810 from grub, and it faults too. cpuid says
MSRs are supported, I just can't find any that work.
--
Len Sorensen
Willy Tarreau <[email protected]> writes:
> On Wed, Nov 11, 2009 at 09:31:26PM -0800, H. Peter Anvin wrote:
>> On 11/11/2009 09:27 PM, Willy Tarreau wrote:
>> >
>> > Right. However we just noticed that with the KVM emulator, you
>> > can make it loop for a long time if you feed it with prefixes
>> > only. For instance, write a function which does zillions of 0x66
>> > (data size prefix) then return (0xC3) : 66 66 66 ... 66 C3.
>> >
>> > This is typically the sort of things we must be very careful about
>> > in emulators, because we don't want users to steal large amounts
>> > of system CPU time doing nothing.
>> >
>>
>> That is a (serious) bug in the KVM interpreter, and indeed the exact
>> kind of issues interpreters tend to have... which is why I'd like one
>> piece of code with one set of bugs, and more eyeballs on that one piece
>> of code so they can be fixed.
>
> Well, I could try to work on a fix (basically the same principle as in
> mine, with prefix flags), but I simply don't know how to test the code.
> I've never experimented with KVM yet and learned it embeds an emulator
> for the first time a few days ago in this thread :-/ If it's easy to
> make use of it, I'm not opposed to try.
When working on dosemu and emulating EGA 16 color graphics we had to
unmap the frame buffer so we would cause move instructions to fault.
Trapping for each mov instruction in the loops that wrote to the frame
buffer was unusably slow. Ultimately that was fixed by trapping on
the first instruction and then running in the emulator until we had
gone N instructions without hitting an instruction we would trap for.
The result was usable software emulated EGA graphics.
I expect the same logic will apply any time there is a trapped and
emulated instruction in an inner loop. Emulating the entire loop
will be more efficient than trapping for each loop iteration.
Eric
On 11/23/2009 11:27 AM, Eric W. Biederman wrote:
>
> When working on dosemu and emulating EGA 16 color graphics we had to
> unmap the frame buffer so we would cause move instructions to fault.
> Trapping for each mov instruction in the loops that wrote to the frame
> buffer was unusably slow. Ultimately that was fixed by trapping on
> the first instruction and then running in the emulator until we had
> gone N instructions without hitting an instruction we would trap for.
> The result was usable software emulated EGA graphics.
>
> I expect the same logic will apply any time there is a trapped and
> emulated instruction in an inner loop. Emulating the entire loop
> will be more efficient than trapping for each loop iteration.
>
Yes, this is pretty typical. In terms of EGA/VGA it depends heavily on
how the application is coded, since it is possible to put EGA/VGA into
modes where the frame buffer depends mostly like memory except at
specific I/O points, and other modes where the frame buffer behaves
nothing like memory at all and every reference needs to be handled
specially. In real use, the former tends to dominate simply because
it's the sane way to code, but the only way to make the latter perform
sanely at all is to interpret everything.
-hpa
"H. Peter Anvin" <[email protected]> writes:
> On 11/23/2009 11:27 AM, Eric W. Biederman wrote:
>>
>> When working on dosemu and emulating EGA 16 color graphics we had to
>> unmap the frame buffer so we would cause move instructions to fault.
>> Trapping for each mov instruction in the loops that wrote to the frame
>> buffer was unusably slow. Ultimately that was fixed by trapping on
>> the first instruction and then running in the emulator until we had
>> gone N instructions without hitting an instruction we would trap for.
>> The result was usable software emulated EGA graphics.
>>
>> I expect the same logic will apply any time there is a trapped and
>> emulated instruction in an inner loop. Emulating the entire loop
>> will be more efficient than trapping for each loop iteration.
>>
>
> Yes, this is pretty typical. In terms of EGA/VGA it depends heavily on
> how the application is coded, since it is possible to put EGA/VGA into
> modes where the frame buffer depends mostly like memory except at
> specific I/O points, and other modes where the frame buffer behaves
> nothing like memory at all and every reference needs to be handled
> specially. In real use, the former tends to dominate simply because
> it's the sane way to code, but the only way to make the latter perform
> sanely at all is to interpret everything.
For old applications that we were concerned about in dosemu the masked
modes where you write to multiple frame buffer pages at once with a
single write dominated because on real hardware it is faster.
As I recall if the things were setup so we did not need to trap dosemu
mapped the appropriate frame buffer page directly, so we did not need
to handle that case.
Regardless the point is that if we start emulating rare instructions
with traps I expect we will reach the point where we have inner loops
that we want to emulate entirely instead of taking a multitude of traps.
Eric