Aug 13 06:47:48 middle -- MARK --
Aug 13 06:53:03 middle kernel: printing eip:
Aug 13 06:53:03 middle kernel: c016c14a
Aug 13 06:53:03 middle kernel: Oops: 0000 [#1]
Aug 13 06:53:03 middle kernel: PREEMPT
Aug 13 06:53:03 middle kernel: CPU: 0
Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI
Aug 13 06:53:03 middle kernel: EFLAGS: 00010286
Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60
Aug 13 06:53:03 middle kernel: eax: f7b7e000 ebx: 000d5ff4 ecx: e68e9a48 edx: 00000000
Aug 13 06:53:03 middle kernel: esi: f7b7e000 edi: c1a50d80 ebp: f2f41e14 esp: f2f41e04
Aug 13 06:53:03 middle kernel: ds: 007b es: 007b ss: 0068
Aug 13 06:53:03 middle kernel: Process make (pid: 9500, threadinfo=f2f40000 task=eb0966a0)
Aug 13 06:53:03 middle kernel: Stack: f0c05cc0 f2f40000 f0271cc0 000d5ff4 f2f41e38 c016c7c0 f7b7e000 c1a50d80
Aug 13 06:53:03 middle kernel: 000d5ff4 c1a50d80 000d5ff4 f0271cc0 f7b7e000 f2f41e58 c018fc92 f7b7e000
Aug 13 06:53:03 middle kernel: 000d5ff4 c3600234 fffffff4 dddd2a74 dddd2a08 f2f41e7c c0160b10 dddd2a08
Aug 13 06:53:03 middle kernel: Call Trace:
Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0
Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0
Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0
Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90
Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870
Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60
Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60
Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40
Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb
Aug 13 06:53:03 middle kernel:
Aug 13 06:53:03 middle kernel: Code: 75 ca eb c6 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 04 8b 5d 10 8b 7d 0c 8b 75 08 8b 0f 85 c9 74 13 8b 11 <0f> 18 02 90 39 59 18 89 c8 74 10 85 d2 89 d1 75 ed 31 c0 83 c4
Aug 13 06:53:03 middle kernel: <6>note: make[9500] exited with preempt_count 1
Aug 13 06:53:03 middle kernel: Call Trace:
Aug 13 06:53:03 middle kernel: [<c011c29e>] schedule+0x58e/0x5a0
Aug 13 06:53:03 middle kernel: [<c0143f31>] unmap_page_range+0x41/0x70
Aug 13 06:53:03 middle kernel: [<c014411f>] unmap_vmas+0x1bf/0x220
Aug 13 06:53:03 middle kernel: [<c0147e39>] exit_mmap+0x79/0x190
Aug 13 06:53:03 middle kernel: [<c011dd9a>] mmput+0x7a/0xe0
Aug 13 06:53:03 middle kernel: [<c0121a88>] do_exit+0x118/0x3f0
Aug 13 06:53:03 middle kernel: [<c011a1e0>] do_page_fault+0x0/0x46b
Aug 13 06:53:03 middle kernel: [<c010b6c9>] die+0xf9/0x100
Aug 13 06:53:03 middle kernel: [<c011a30d>] do_page_fault+0x12d/0x46b
Aug 13 06:53:03 middle kernel: [<c0155b38>] __getblk+0x28/0x50
Aug 13 06:53:03 middle kernel: [<c018b8e2>] ext3_getblk+0x92/0x290
Aug 13 06:53:03 middle kernel: [<c01542b1>] wake_up_buffer+0x11/0x30
Aug 13 06:53:03 middle kernel: [<c01542fe>] unlock_buffer+0x2e/0x50
Aug 13 06:53:03 middle kernel: [<c0157afd>] ll_rw_block+0x4d/0x80
Aug 13 06:53:03 middle kernel: [<c018f957>] ext3_find_entry+0x307/0x3c0
Aug 13 06:53:03 middle kernel: [<c011a1e0>] do_page_fault+0x0/0x46b
Aug 13 06:53:03 middle kernel: [<c03cb19b>] error_code+0x2f/0x38
Aug 13 06:53:03 middle kernel: [<c016c14a>] find_inode_fast+0x1a/0x60
Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0
Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0
Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0
Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90
Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870
Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60
Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60
Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40
Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb
Aug 13 06:53:03 middle kernel:
Kind regards,
Jurriaan
--
All lies all lies all schemes all schemes
Every winner means a loser in the western dream.
News Model Army - Western Dream
Debian (Unstable) GNU/Linux 2.6.0-test3-mm1 4276 bogomips load av: 0.00 0.26 0.26
Jurriaan <[email protected]> wrote:
>
> Aug 13 06:47:48 middle -- MARK --
> Aug 13 06:53:03 middle kernel: printing eip:
> Aug 13 06:53:03 middle kernel: c016c14a
> Aug 13 06:53:03 middle kernel: Oops: 0000 [#1]
> Aug 13 06:53:03 middle kernel: PREEMPT
> Aug 13 06:53:03 middle kernel: CPU: 0
> Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI
> Aug 13 06:53:03 middle kernel: EFLAGS: 00010286
> Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60
> Aug 13 06:53:03 middle kernel: eax: f7b7e000 ebx: 000d5ff4 ecx: e68e9a48 edx: 00000000
> Aug 13 06:53:03 middle kernel: esi: f7b7e000 edi: c1a50d80 ebp: f2f41e14 esp: f2f41e04
> Aug 13 06:53:03 middle kernel: ds: 007b es: 007b ss: 0068
> Aug 13 06:53:03 middle kernel: Process make (pid: 9500, threadinfo=f2f40000 task=eb0966a0)
> Aug 13 06:53:03 middle kernel: Stack: f0c05cc0 f2f40000 f0271cc0 000d5ff4 f2f41e38 c016c7c0 f7b7e000 c1a50d80
> Aug 13 06:53:03 middle kernel: 000d5ff4 c1a50d80 000d5ff4 f0271cc0 f7b7e000 f2f41e58 c018fc92 f7b7e000
> Aug 13 06:53:03 middle kernel: 000d5ff4 c3600234 fffffff4 dddd2a74 dddd2a08 f2f41e7c c0160b10 dddd2a08
> Aug 13 06:53:03 middle kernel: Call Trace:
> Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0
> Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0
> Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0
> Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90
> Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870
> Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60
> Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60
> Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40
> Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb
> Aug 13 06:53:03 middle kernel:
> Aug 13 06:53:03 middle kernel: Code: 75 ca eb c6 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 04 8b 5d 10 8b 7d 0c 8b 75 08 8b 0f 85 c9 74 13 8b 11 <0f> 18 02 90 39 59 18 89 c8 74 10 85 d2 89 d1 75 ed 31 c0 83 c4
You oopsed here:
Code; c016c144 No symbols available
25: 85 c9 test %ecx,%ecx
Code; c016c146 No symbols available
27: 74 13 je 3c <_EIP+0x3c>
Code; c016c148 No symbols available
29: 8b 11 mov (%ecx),%edx
This decode from eip onwards should be reliable
Code; c016c14a No symbols available
00000000 <_EIP>:
Code; c016c14a No symbols available <=====
0: 0f 18 02 prefetchnta (%edx) <=====
Code; c016c14d No symbols available
3: 90 nop
Code; c016c14e No symbols available
4: 39 59 18 cmp %ebx,0x18(%ecx)
Code; c016c151 No symbols available
And indeed, your %edx is zero.
But if a prefetch of zero oopses then we should be oopsing in there all the
time.
hlist_for_each() is completely assuming that prefetch(0) is safe, and you
undoubtedly oopsed doing it.
Colour me confused, and let me Cc lots of x86 guys ;)
Exactly what sort of CPU are you using?
On Wed, Aug 13, 2003 at 01:47:46AM -0700, Andrew Morton wrote:
> Jurriaan <[email protected]> wrote:
> >
> > Aug 13 06:47:48 middle -- MARK --
> > Aug 13 06:53:03 middle kernel: printing eip:
> > Aug 13 06:53:03 middle kernel: c016c14a
> > Aug 13 06:53:03 middle kernel: Oops: 0000 [#1]
> > Aug 13 06:53:03 middle kernel: PREEMPT
> > Aug 13 06:53:03 middle kernel: CPU: 0
> > Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI
> > Aug 13 06:53:03 middle kernel: EFLAGS: 00010286
> > Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60
>
> And indeed, your %edx is zero.
>
> But if a prefetch of zero oopses then we should be oopsing in there all the
> time.
>
> hlist_for_each() is completely assuming that prefetch(0) is safe, and you
> undoubtedly oopsed doing it.
>
>
> Colour me confused, and let me Cc lots of x86 guys ;)
>
> Exactly what sort of CPU are you using?
> -
AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system.
Kind regards,
Jurriaan
Jurriaan on adsl-gate <[email protected]> wrote:
>
> > Exactly what sort of CPU are you using?
> > -
> AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system.
OK, thanks. The word is that Athlons will, very occasionally,
take a fault when prefetching from an unmapped address.
include/linux/list.h | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff -puN include/linux/list.h~hlist_for_each-fix include/linux/list.h
--- 25/include/linux/list.h~hlist_for_each-fix 2003-08-13 02:29:32.000000000 -0700
+++ 25-akpm/include/linux/list.h 2003-08-13 02:37:33.000000000 -0700
@@ -504,11 +504,15 @@ static __inline__ void hlist_add_after(s
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
-/* Cannot easily do prefetch unfortunately */
-#define hlist_for_each(pos, head) \
- for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; }); \
- pos = pos->next)
+#define hlist_for_each(pos, head) \
+ for ( pos = (head)->first; \
+ likely(pos) && ({ \
+ if (likely(pos->next)) \
+ prefetch(pos->next); \
+ 1; }); \
+ pos = pos->next)
+/* Cannot easily do prefetch unfortunately */
#define hlist_for_each_safe(pos, n, head) \
for (pos = (head)->first; n = pos ? pos->next : 0, pos; \
pos = n)
_
> But if a prefetch of zero oopses then we should be oopsing in there all the
> time.
>
If it's an Opteron then it's a known errata (#91 iirc). Update your BIOS in
this case.
The x86-64 kernel port also has a workaround for this (adding exception
handling to the prefetches)
-Andi
Alan Cox <[email protected]> writes:
> Put the likely(pos) in the asm/prefetch for Athlon until someone can
out what is going on with some specific Athlons, 2.6 and certain
> kernels (notably 4G/4G)
You can use the same workaround as x86-64. add an exception handler and
just jump back. Advantage is that it is completely outside the fast path.
But note you also have to add runtime sorting of __ex_table when you
do this, otherwise the __ex_table becomes unsorted when someone uses
list_for_each (which does prefetch) in a __init function
(all code is available in x86-64, just needs to be ported over)
-Andi
On Mer, 2003-08-13 at 10:55, Andrew Morton wrote:
> Jurriaan on adsl-gate <[email protected]> wrote:
> >
> > > Exactly what sort of CPU are you using?
> > > -
> > AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system.
>
> OK, thanks. The word is that Athlons will, very occasionally,
> take a fault when prefetching from an unmapped address.
Page zero in the kernel is mapped in 4Mb paging mode (which is what the
Athlon uses). Also your likely(pos) pretty much wiped out the point of
prefetching and punishes other processors because it is in the wrong
place. For that matter we could add a LIST_NULL that pointed somewhere
safe and wasn't NULL per se in 2.7.
Put the likely(pos) in the asm/prefetch for Athlon until someone can
figure out what is going on with some specific Athlons, 2.6 and certain
kernels (notably 4G/4G).
Long term we really do need to start supporting a zero page mapped at
0->64K when not debugging the kernel, then you can let the compiler do
NULL dereferences which is a _huge_ win because you can move stuff
around a lot of natural C conditionals to get better unrolling and
instruction scheduling.
The alternative is to start doing multipointer lists which is messier
and uses more memory (ie each node has next, prev, "several nodes on")
Alan
Alan Cox <[email protected]> wrote:
>
> Put the likely(pos) in the asm/prefetch for Athlon until someone can
> figure out what is going on with some specific Athlons, 2.6 and certain
> kernels (notably 4G/4G).
<riffles through random config options>
Like this?
What happens if someone runs a K6 kernel on a K7?
Or various other CPU types? What is the matrix here?
I don't like the way this is headed...
--- 25/include/asm-i386/processor.h~athlon-prefetch-fix 2003-08-13 04:21:01.000000000 -0700
+++ 25-akpm/include/asm-i386/processor.h 2003-08-13 04:22:10.000000000 -0700
@@ -568,6 +568,10 @@ static inline void rep_nop(void)
#define ARCH_HAS_PREFETCH
extern inline void prefetch(const void *x)
{
+#ifdef CONFIG_MK7
+ if (unlikely(x == NULL))
+ return; /* athlons like to oops in prefetch(0) */
+#endif
alternative_input(ASM_NOP4,
"prefetchnta (%1)",
X86_FEATURE_XMM,
_
On Mer, 2003-08-13 at 12:25, Andrew Morton wrote:
> Like this?
>
> What happens if someone runs a K6 kernel on a K7?
> Or various other CPU types? What is the matrix here?
Beats me, but then the prefetch code in 2.6 seems broken from
5 seconds of inspection anyway. We are testing the XMM feature
and using prefetchnta for Athlon, thats wrong for lots of athlon
processors that dont have XMM but do have prefetch/prefetchw,
(which btw also seem to work properly on all these processors
while prefetchnta seems to do funky things)
Perhaps someone should fix prefetch() before they worry about
the rest of the mess ?
For Athlon we should be testing 3Dnow, and using prefetch/prefetchw
for Intel cases we want to go for prefetchnta if XMM is set (PIII, PIV)
Alan
Alan Cox <[email protected]> writes:
> On Mer, 2003-08-13 at 12:25, Andrew Morton wrote:
> > Like this?
> >
> > What happens if someone runs a K6 kernel on a K7?
> > Or various other CPU types? What is the matrix here?
>
> Beats me, but then the prefetch code in 2.6 seems broken from
> 5 seconds of inspection anyway. We are testing the XMM feature
> and using prefetchnta for Athlon, thats wrong for lots of athlon
> processors that dont have XMM but do have prefetch/prefetchw,
> (which btw also seem to work properly on all these processors
> while prefetchnta seems to do funky things)
The early Athlon Specific test was not done to avoid too much bloat.
(three alternatives instead of two)
Most Athlons in existence should have XMM already and the rest works.
You can hardly call that broken.
I would be surprised if prefetch behaves differently than prefetchnta
on Athlon. If the bug is similar to what happens on Opteron then
I bet it won't make a difference.
> For Athlon we should be testing 3Dnow, and using prefetch/prefetchw
> for Intel cases we want to go for prefetchnta if XMM is set (PIII, PIV)
That's done for write prefetches correctly.
(as Intel does not have a write prefetch)
-Andi
On Mer, 2003-08-13 at 13:10, Andi Kleen wrote:
> > Beats me, but then the prefetch code in 2.6 seems broken from
> > 5 seconds of inspection anyway. We are testing the XMM feature
> > and using prefetchnta for Athlon, thats wrong for lots of athlon
> > processors that dont have XMM but do have prefetch/prefetchw,
> > (which btw also seem to work properly on all these processors
> > while prefetchnta seems to do funky things)
>
> The early Athlon Specific test was not done to avoid too much bloat.
> (three alternatives instead of two)
Lets replace working code with broken macros whoooo.. progress. Lots of
Athlons don't have XMM, most of the older ones where prefetch has the
most impact in fact. (The XMM using ones have the hw prefetcher too).
> Most Athlons in existence should have XMM already and the rest works.
Lots don't have XMM
> You can hardly call that broken.
I just did. Its worse than 2.4 behaviour.
> That's done for write prefetches correctly.
> (as Intel does not have a write prefetch)
Actually its iffy too. 3Dnow doesnt imply prefetchw. You
must test 3Dnow && vendor==AMD && Athlon. (K6 prefetchw
is slower than not using it, other 3Dnow chips dont have it
eg the Cyrix MII which may explain a couple of things. I don't
see anywhere we mask the 3Dnow property by these but I've not
dug through the CPU code right now to see if we have a "3Dnowplus"
type definition we can check.
I suspect the best way to do prefetch cleanly would be something
like this
#if defined(CONFIG_MK7)
alternative_input("prefetch" or "prefetchnta")
#else
alternative_input(ASM_NOP4 or "prefetchnta");
#endif
Ideally we want a 3 way patch table to fix up at boot time but the if
case at least gets us back to desirable situations. Also if I remember
the prefetch exception thing rightly you can misalign the prefetch
instruction as a workaround.
Alan
On Wed, Aug 13, 2003 at 01:48:45PM +0100, Alan Cox wrote:
> On Mer, 2003-08-13 at 13:10, Andi Kleen wrote:
> > > Beats me, but then the prefetch code in 2.6 seems broken from
> > > 5 seconds of inspection anyway. We are testing the XMM feature
> > > and using prefetchnta for Athlon, thats wrong for lots of athlon
> > > processors that dont have XMM but do have prefetch/prefetchw,
> > > (which btw also seem to work properly on all these processors
> > > while prefetchnta seems to do funky things)
> >
> > The early Athlon Specific test was not done to avoid too much bloat.
> > (three alternatives instead of two)
>
> Lets replace working code with broken macros whoooo.. progress. Lots of
> Athlons don't have XMM, most of the older ones where prefetch has the
> most impact in fact. (The XMM using ones have the hw prefetcher too).
hw prefetch has nothing to do with how the linux kernel uses prefetch.
It's only using it for data structures that cannot be handled by
the auto prefetcher.
[except the broken 3dnow! copy that was never enabled]
>
> > Most Athlons in existence should have XMM already and the rest works.
>
> Lots don't have XMM
All XPs have.
>
> > You can hardly call that broken.
>
> I just did. Its worse than 2.4 behaviour.
In 2.4 distribution users never got anything. That is what was really
broken.
>
> > That's done for write prefetches correctly.
> > (as Intel does not have a write prefetch)
>
> Actually its iffy too. 3Dnow doesnt imply prefetchw. You
My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow!
but doesn't have the instruction it's broken.
> must test 3Dnow && vendor==AMD && Athlon. (K6 prefetchw
> is slower than not using it, other 3Dnow chips dont have it
> eg the Cyrix MII which may explain a couple of things. I don't
I would consider the MII broken then. setup should clear the 3dnow
bit.
> see anywhere we mask the 3Dnow property by these but I've not
> dug through the CPU code right now to see if we have a "3Dnowplus"
> type definition we can check.
there is 3dnowext, set on Athlons, but K6 has prefetchw too and
it
But if you only want Athlon you can check for X86_FEATURE_K7.
The problem is that it doesn't include K8 and K8
has prefetchw too (alternative currently only allows a single
bit, not a bitmask). Better is to either clear 3dnow on the MII
or define a new pseudo bit that defines working and useful
prefetchw
>
> I suspect the best way to do prefetch cleanly would be something
> like this
>
> #if defined(CONFIG_MK7)
> alternative_input("prefetch" or "prefetchnta")
> #else
> alternative_input(ASM_NOP4 or "prefetchnta");
> #endif
No for weird combinations you define a new pseudo CPUID capability
bit, check for that in the CPU detection and use that in the alternative.
If you really want 3 way alternative you can just define a macro
for it. The basic data structure supports it - the macro
just needs to have two .altinstructions records and two replacement codes.
But I have my doubt it is worth it for this case.
No stinkin' ifdefs please, that would break the whole concept.
>
> Ideally we want a 3 way patch table to fix up at boot time but the if
> case at least gets us back to desirable situations. Also if I remember
> the prefetch exception thing rightly you can misalign the prefetch
> instruction as a workaround.
Nope, no misalignment. All it does is to just handle the exception
using __ex_table and jumps to the next instruction.
[the exceptions are very rare, they need very specific circumstances
in the CPU to trigger, so it's ok to make it slow]
Only trap is that you have to add the exception table sorting too...
-Andi
On Mer, 2003-08-13 at 14:14, Andi Kleen wrote:
> hw prefetch has nothing to do with how the linux kernel uses prefetch.
> It's only using it for data structures that cannot be handled by
> the auto prefetcher.
>
> [except the broken 3dnow! copy that was never enabled]
Not broken in 2.4, although the 2.4-ac kernel uses movntq instead for
Athlon as it is faster than mmx_memcpy, which we use for Cyrix/VIA/IDT
processors where it is a win.
> > > Most Athlons in existence should have XMM already and the rest works.
> >
> > Lots don't have XMM
>
> All XPs have.
And what about all the pre MP/XP ones, lots of those.
> My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow!
> but doesn't have the instruction it's broken.
My AMD docs list it as part of the AMD extended 3dnow. The original
3dnow as done by AMD/Cyrix does not have it
> I would consider the MII broken then. setup should clear the 3dnow
> bit.
"Mummy it doesnt work like I personally have decreed it shall lets break
it and screw all the users". Thats the Dan Bernstein school of charm
theory of software development.
> there is 3dnowext, set on Athlons, but K6 has prefetchw too and
> it
3dnowext is what we want here. It might end up doing a prefetchw on
K6 but at least K6 actually has the instruction...
> But if you only want Athlon you can check for X86_FEATURE_K7.
> The problem is that it doesn't include K8 and K8
> has prefetchw too (alternative currently only allows a single
> bit, not a bitmask). Better is to either clear 3dnow on the MII
> or define a new pseudo bit that defines working and useful
> prefetchw
We want a pseudobit - otherwise we'll break other code that checks
3dnow is present properly.
> > #if defined(CONFIG_MK7)
> > alternative_input("prefetch" or "prefetchnta")
> > #else
> > alternative_input(ASM_NOP4 or "prefetchnta");
> > #endif
>
> No for weird combinations you define a new pseudo CPUID capability
> bit, check for that in the CPU detection and use that in the alternative.
Ok
> If you really want 3 way alternative you can just define a macro
> for it. The basic data structure supports it - the macro
> just needs to have two .altinstructions records and two replacement codes.
> But I have my doubt it is worth it for this case.
prefetching is a big win on older Athlon because the CPU is fast and the
chipset/ram sucks hugely relative to it
> No stinkin' ifdefs please, that would break the whole concept.
Ok
> > case at least gets us back to desirable situations. Also if I remember
> > the prefetch exception thing rightly you can misalign the prefetch
> > instruction as a workaround.
>
> Nope, no misalignment. All it does is to just handle the exception
> using __ex_table and jumps to the next instruction.
If you misalign the instruction you don't seem to get the exception on
Athlon, dunno about the Opteron errata or if the opteron errata bites in
32bit. If it does I guess we should clear mmx, xmm for Opteron by your
arguments ;)
On Wed, Aug 13, 2003 at 03:09:55PM +0100, Alan Cox wrote:
> On Mer, 2003-08-13 at 14:14, Andi Kleen wrote:
> > hw prefetch has nothing to do with how the linux kernel uses prefetch.
> > It's only using it for data structures that cannot be handled by
> > the auto prefetcher.
> >
> > [except the broken 3dnow! copy that was never enabled]
>
> Not broken in 2.4, although the 2.4-ac kernel uses movntq instead for
> Athlon as it is faster than mmx_memcpy, which we use for Cyrix/VIA/IDT
> processors where it is a win.
movntq for a memcpy? That ise a very bad idea. It wins in micro benchmarks,
but the destination is pushed out of cache and the next code accessing
the destination will suffer badly from the cache misses. All the NT
stuff is basically useless in the kernel because it only helps with data
sets significantly bigger than your cache, and we usually only deal
with 4K chunks of everything.
[I did the same mistake early on Opteron/x86-64 for copy_page, but later
fixed it]
> > My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow!
> > but doesn't have the instruction it's broken.
>
> My AMD docs list it as part of the AMD extended 3dnow. The original
> 3dnow as done by AMD/Cyrix does not have it
The x86-64 manuals lists it as part of the 3dnow feature set.
The K6 has it, right?
Is there a "more original" 3dnow that what has been in the K6?
> > I would consider the MII broken then. setup should clear the 3dnow
> > bit.
>
> "Mummy it doesnt work like I personally have decreed it shall lets break
> it and screw all the users". Thats the Dan Bernstein school of charm
It doesn't work like the AMD instruction reference manual describes it.
> theory of software development.
Being a bit touchy from the heat today ? @)
Of course it should be fixed, but the fix as it is a bug workaround
doesn't have to be very fast. So it would be ok to just clear the 3dnow bit.
But then to handle the K6 case (which is interesting, I didn't know) too it
would be probably better to define a separate bit.
> > The problem is that it doesn't include K8 and K8
> > has prefetchw too (alternative currently only allows a single
> > bit, not a bitmask). Better is to either clear 3dnow on the MII
> > or define a new pseudo bit that defines working and useful
> > prefetchw
>
> We want a pseudobit - otherwise we'll break other code that checks
> 3dnow is present properly.
Ok. I will do that when I'm back next week unless someone beats me
to it ;-)
> > If you really want 3 way alternative you can just define a macro
> > for it. The basic data structure supports it - the macro
> > just needs to have two .altinstructions records and two replacement codes.
> > But I have my doubt it is worth it for this case.
>
> prefetching is a big win on older Athlon because the CPU is fast and the
> chipset/ram sucks hugely relative to it
Hmm ok. So it probably needs an alternative3().
It's not hard to do, just a bit ugly because the macro will have a lot of
arguments.
> > Nope, no misalignment. All it does is to just handle the exception
> > using __ex_table and jumps to the next instruction.
>
> If you misalign the instruction you don't seem to get the exception on
> Athlon, dunno about the Opteron errata or if the opteron errata bites in
> 32bit. If it does I guess we should clear mmx, xmm for Opteron by your
> arguments ;)
I didn't know about the misalignment bit. Interesting. Misalignment to
what boundary?
But is it slower than an aligned execution? If yes I would prefer my
solution because it keeps the fast path as fast as possible.
-Andi
On Mer, 2003-08-13 at 15:20, Andi Kleen wrote:
> stuff is basically useless in the kernel because it only helps with data
> sets significantly bigger than your cache, and we usually only deal
> with 4K chunks of everything.
Could be. I didnt write that code. I think Manfred also played with the
copy tricks that came from the AMD slides.
> The K6 has it, right?
> Is there a "more original" 3dnow that what has been in the K6?
K6-II/III does. I don't know about original K6. but I believe it
doesn't. The original 3Dnow was a joint Cyrix/AMD thing and it lacks
several instructions later added (including prefetch). The later Cyrix
also has a couple of the additional ones but not prefetch.
> > "Mummy it doesnt work like I personally have decreed it shall lets break
> > it and screw all the users". Thats the Dan Bernstein school of charm
>
> It doesn't work like the AMD instruction reference manual describes it.
Well there is a suprise, AMD didn't design it 8)
> Of course it should be fixed, but the fix as it is a bug workaround
> doesn't have to be very fast. So it would be ok to just clear the 3dnow bit.
> But then to handle the K6 case (which is interesting, I didn't know) too it
> would be probably better to define a separate bit.
What else checks the 3Dnow bit ?
> > We want a pseudobit - otherwise we'll break other code that checks
> > 3dnow is present properly.
>
> Ok. I will do that when I'm back next week unless someone beats me
> to it ;-)
Some kind of "has prefetch and its actually useful" 8)
> > If you misalign the instruction you don't seem to get the exception on
> > Athlon, dunno about the Opteron errata or if the opteron errata bites in
> > 32bit. If it does I guess we should clear mmx, xmm for Opteron by your
> > arguments ;)
>
> I didn't know about the misalignment bit. Interesting. Misalignment to
> what boundary?
I'll have to go check again. Its something RH internal testing found
when people were going "uh what the hell is going on here" 8)
> But is it slower than an aligned execution? If yes I would prefer my
> solution because it keeps the fast path as fast as possible.
Has AMD confirmed that your solution is ok for the K7 as well as K8 - ie
that if we hit the errata the fixup recovers the CPU from whatever
lunatic state it is now in ?
On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote:
> On Mer, 2003-08-13 at 15:20, Andi Kleen wrote:
> > stuff is basically useless in the kernel because it only helps with data
> > sets significantly bigger than your cache, and we usually only deal
> > with 4K chunks of everything.
>
> Could be. I didnt write that code. I think Manfred also played with the
> copy tricks that came from the AMD slides.
The AMD slides assume all very big data sets ;-)
I would recommend to remove it.
> > Of course it should be fixed, but the fix as it is a bug workaround
> > doesn't have to be very fast. So it would be ok to just clear the 3dnow bit.
> > But then to handle the K6 case (which is interesting, I didn't know) too it
> > would be probably better to define a separate bit.
>
> What else checks the 3Dnow bit ?
Nothing in kernel AFAIK, but it's possible that it is used by user space
reading /proc/cpuinfo.
> >
> > Ok. I will do that when I'm back next week unless someone beats me
> > to it ;-)
>
> Some kind of "has prefetch and its actually useful" 8)
X86_FEATURE_PREFETCHW
X86_FEATURE_PREFETCH3DNOW
(note I didn't volunteer to write alternative3 for the later,
someone else has to do that if they want it ;-)
> > But is it slower than an aligned execution? If yes I would prefer my
> > solution because it keeps the fast path as fast as possible.
>
> Has AMD confirmed that your solution is ok for the K7 as well as K8 - ie
> that if we hit the errata the fixup recovers the CPU from whatever
> lunatic state it is now in ?
My solution is a fix as the problem is described in the Opteron
Specification Update (and also as our own testing showed - we discovered
the problem originally)
The Errata is basically: When there is a prefetch and a load for the
same address in flight and the load faults and the CPU is in a
very specific complicated state then the Exception is reported
on the prefetch, not the fault.
The fix just handles the exception and doesn't crash.
At least on Opteron it can be also fixed with a magic bit in the BIOS,
maybe that's possible on XP too. But I opted to work around it in the kernel
to not force all people to get a new BIOS.
BTW we saw it mainly in the x86-64 copy_*_user and csum_copy_* functions
which do also prefetches. LTP would sometimes trigger it when it tests
how the kernel behaves with invalid addresses. But it happened very
rarely in the dcache hash too. But still it's hard to trigger, the
linked list one is very hard to hit. I tried to reproduce it in user space,
but failed. The LTP one is much easier, but still not that common.
-Andi
On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote:
> K6-II/III does. I don't know about original K6. but I believe it
> doesn't. The original 3Dnow was a joint Cyrix/AMD thing and it lacks
> several instructions later added (including prefetch). The later Cyrix
> also has a couple of the additional ones but not prefetch.
Which Cyrixen are you talking about ?
C3's up to and including Ezra-T should DTRT when it comes to
3dnow prefetch instruction, and pre-VIA Cyrixen didn't have 3dnow
at all iirc.
Dave
--
Dave Jones http://www.codemonkey.org.uk
On Mer, 2003-08-13 at 17:39, Dave Jones wrote:
> > several instructions later added (including prefetch). The later Cyrix
> > also has a couple of the additional ones but not prefetch.
Ok I got this crossed - the Cyrix/AMD thing was the extended MMX stuff,
they both did 3Dnow! but the Jalapeno old style Cyrix CPU with 3dnow was
canned. Ok so there is a different reason why my Cyrix crashes on boot
with 2.6test. Andi is right that 3Dnow safely implies prefetch. My docs
list it as part of extended MMX not 3dnow although the Cyrix seems to
not posess the instruction anyway.
So 3dnow == prefetch/prefetchw is ok but not useful on K6.
> Which Cyrixen are you talking about ?
> C3's up to and including Ezra-T should DTRT when it comes to
> 3dnow prefetch instruction, and pre-VIA Cyrixen didn't have 3dnow
> at all iirc.
pre VIA Cyrixen have MMX and CXMMX. The CPU also set bit 31 but doesn't
have 3dnow (which fooled me but the kernel does know about). C3's seem
to have prefetch/prefetchw (but not prefetchnta). I don't have a nemeiah
but I assume Nemeiah has prefetchnta too ?
I've tried building a summary list. Additional contributions welcomed
MMX: Pentium (later only), Cyrix MediaGX (later only), Cyrix 6x86/MII
Intel PII/PIII/PIV, AMD K6/Athlon/Opteron, VIA Cyrix III, VIA C3
CXMMX: Extended MMX - Cyrix MII/AMD K6(II+ ?)/K7/Opteron
3DNOW: AMD K6-II/III(not original K6),K7/,Opteron, VIA Cyrix III,
VIA C3 (pre Nemiah only ??)
"Enhanced" 3DNow: Athlon Tbird
SSE: Intel PII, PIII, Athlon (XP, Duron >=1Gz only)
SSE2: Pentium IV
So the prefetch fallback is needed for pre Nemiah C3, Duron < 1Ghz and
pre T-Bird Athlon if my table is right.
On Mer, 2003-08-13 at 16:32, Andi Kleen wrote:
> The AMD slides assume all very big data sets ;-)
>
> I would recommend to remove it.
I'll do some timings when I get a moment - the prefetching mmx copy
was a win (and faster than the others for small data as well as large
on the K7-550 (really a K7 not "Athlon" 8)) way back when.
> > What else checks the 3Dnow bit ?
>
> Nothing in kernel AFAIK, but it's possible that it is used by user space
> reading /proc/cpuinfo.
DaveJ and your docs are right on 3dnow it turns out so sorry about that
and ignore me on prefetchw, its just the prefetch side thats 3 way.
> BTW we saw it mainly in the x86-64 copy_*_user and csum_copy_* functions
> which do also prefetches. LTP would sometimes trigger it when it tests
> how the kernel behaves with invalid addresses. But it happened very
> rarely in the dcache hash too. But still it's hard to trigger, the
> linked list one is very hard to hit. I tried to reproduce it in user space,
> but failed. The LTP one is much easier, but still not that common.
Thanks
And one other update
Winchip C6 - MMX, extended MMX
Winchip II+ , MMX, extended MMX, 3Dnow (dunno if it has
prefetch I don't have one of these)
See below.
] -Rich ...
] AMD Fellow
] richard.brunner at amd com
> From: Andi Kleen [mailto:[email protected]]
> On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote:
> > On Mer, 2003-08-13 at 15:20, Andi Kleen wrote:
> > Has AMD confirmed that your solution is ok for the K7 as well as K8 -
> > ie that if we hit the errata the fixup recovers the CPU from whatever
> > lunatic state it is now in ?
>
> My solution is a fix as the problem is described in the
> Opteron Specification Update (and also as our own testing
> showed - we discovered the problem originally)
Hi, AMD has not confirmed anything with respect to this issue
on K7/Athlon. We are currently trying to get the code that reproduces
this bug into AMD so we can see what triggers it.
Andi's workaround for Opteron (before a BIOS fix was available),
is probably a fine *short-term* workaround until we can get back
to you on this. AMD believes that dimissing a exception on
prefetch as spurious on Athlon will work you around the current
problem.
Linking the Opteron bug to an Athlon bug is pre-mature at this point.
> The Errata is basically: When there is a prefetch and a load for the
> same address in flight and the load faults and the CPU is in
> a very specific complicated state then the Exception is
> reported on the prefetch, not the fault.
>
> The fix just handles the exception and doesn't crash.
>
> At least on Opteron it can be also fixed with a magic bit in
> the BIOS, maybe that's possible on XP too. But I opted to
> work around it in the kernel to not force all people to get a
> new BIOS.
Let us get back to you, ok? I am starting up our
internal validation people to go poke at this.
On Wed, Aug 13, 2003 at 07:44:24PM +0100, Alan Cox wrote:
> On Mer, 2003-08-13 at 16:32, Andi Kleen wrote:
> > The AMD slides assume all very big data sets ;-)
> >
> > I would recommend to remove it.
>
> I'll do some timings when I get a moment - the prefetching mmx copy
Microbenchmarks are useless for this. You have to bench the users too,
otherwise you don't recognize the additional cache misses.
> was a win (and faster than the others for small data as well as large
> on the K7-550 (really a K7 not "Athlon" 8)) way back when.
Possible. In my experience the best copy functions vary widly between
different steppings. Just optimizing for a single one is probably
not a good idea, especially not for an very old one
(except when you add dynamic patches for the different steppings, but
then it quickly gets ugly with too many variants)
When in doubt use the more simple function.
-Andi
On Wed, Aug 13, 2003 at 07:34:36PM +0100, Alan Cox wrote:
> pre VIA Cyrixen have MMX and CXMMX. The CPU also set bit 31 but doesn't
> have 3dnow (which fooled me but the kernel does know about). C3's seem
> to have prefetch/prefetchw (but not prefetchnta). I don't have a nemeiah
> but I assume Nemeiah has prefetchnta too ?
With Nehemiah, they dropped 3dnow, and went with SSE.
> MMX: Pentium (later only), Cyrix MediaGX (later only), Cyrix 6x86/MII
> Intel PII/PIII/PIV, AMD K6/Athlon/Opteron, VIA Cyrix III, VIA C3
> CXMMX: Extended MMX - Cyrix MII/AMD K6(II+ ?)/K7/Opteron
> 3DNOW: AMD K6-II/III(not original K6),K7/,Opteron, VIA Cyrix III,
> VIA C3 (pre Nemiah only ??)
"Nehemiah".
+ Winchip-2A (though as mentioned, prefetch is a nop, the rest of 3dnow worked
though iirc).
> "Enhanced" 3DNow: Athlon Tbird
> SSE: Intel PII, PIII, Athlon (XP, Duron >=1Gz only)
> SSE2: Pentium IV
Dave
--
Dave Jones http://www.codemonkey.org.uk
Hi!
> > Put the likely(pos) in the asm/prefetch for Athlon until someone can
> > figure out what is going on with some specific Athlons, 2.6 and certain
> > kernels (notably 4G/4G).
>
> <riffles through random config options>
>
> Like this?
> What happens if someone runs a K6 kernel on a K7?
You break things :-(.
Also prefetch with test for null does probably more harm than
good. What about simply assuming K7 can not do prefetch?
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]