I'm working on a driver for a custom PCI card on the i386 architecture.
The card uses a PLX9030 pci bridge to link an FPGA to the PCI bus using
a 16 bit bus. I found that something broke when moving from 2.6.10 to
2.6.17-rc4. In the driver, I use memcpy_toio to write 14 bytes to a
memory region in the FPGA.
To copy the 14 bytes, 2.6.10 does three 32 bit writes followed by one 16
bit write. 2.6.10 does three 32 bit writes followed by two 8 bit write.
The PLX9030 breaks the 32 bit writes into 16 bit writes for its local
bus just fine. The problem is that my board doesn't handle byte
enables. It was assumed that if all memory transfers were a multiple of
2 bytes, then byte accesses wouldn't be used. This is no longer true in
2.6.7-rc4.
I've solved the problem by padding to 16 bytes, but should this be
considered a bug in the kernel?
Both kernels use __memcpy to implement memcpy_toio. Here is the
relevent code from <asm-i386/string.h>
The 2.6.10 version:
static inline void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
"rep ; movsl\n\t"
"testb $2,%b4\n\t"
"je 1f\n\t"
"movsw\n"
"1:\ttestb $1,%b4\n\t"
"je 2f\n\t"
"movsb\n"
"2:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
:"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
: "memory");
return (to);
}
The 2.6.17-rc4 version:
static __always_inline void * __memcpy(void * to, const void * from,
size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
"rep ; movsl\n\t"
"movl %4,%%ecx\n\t"
"andl $3,%%ecx\n\t"
#if 1 /* want to pay 2 byte penalty for a chance to skip microcoded
rep? */
"jz 1f\n\t"
#endif
"rep ; movsb\n\t"
"1:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
: "memory");
return (to);
}
--
Chris Lesiak
[email protected]
Chris Lesiak wrote:
> I'm working on a driver for a custom PCI card on the i386 architecture.
> The card uses a PLX9030 pci bridge to link an FPGA to the PCI bus using
> a 16 bit bus. I found that something broke when moving from 2.6.10 to
> 2.6.17-rc4. In the driver, I use memcpy_toio to write 14 bytes to a
> memory region in the FPGA.
>
> To copy the 14 bytes, 2.6.10 does three 32 bit writes followed by one 16
> bit write. 2.6.10 does three 32 bit writes followed by two 8 bit write.
>
> The PLX9030 breaks the 32 bit writes into 16 bit writes for its local
> bus just fine. The problem is that my board doesn't handle byte
> enables. It was assumed that if all memory transfers were a multiple of
> 2 bytes, then byte accesses wouldn't be used. This is no longer true in
> 2.6.7-rc4.
>
> I've solved the problem by padding to 16 bytes, but should this be
> considered a bug in the kernel?
It does seem a little bit less efficient, but I don't know think it's
necessarily a bug. There's no guarantee of what size writes will be used
with the memcpy_to/fromio functions.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Followup to: <[email protected]>
By author: Robert Hancock <[email protected]>
In newsgroup: linux.dev.kernel
>
> It does seem a little bit less efficient, but I don't know think it's
> necessarily a bug. There's no guarantee of what size writes will be used
> with the memcpy_to/fromio functions.
>
There are only a few semantics that make sense: fixed 8, 16, 32, or 64
bits, plus "optimal"; the latter to be used for anything that doesn't
require a specific transfer size. Logically, an unqualified
"memcpy_to/fromio" should be the optimal size (as few transfers as
possible) -- we have a qualified "memcpy_to/fromio32" already, and 8-
and 16-bit variants could/should be added.
However, having the unqualified version do byte transfers seems like a
really bad idea.
-hpa
On Fri, 2006-05-26 at 17:46 -0600, Robert Hancock wrote:
> It does seem a little bit less efficient, but I don't know think it's
> necessarily a bug. There's no guarantee of what size writes will be used
> with the memcpy_to/fromio functions.
I shouldn't have made that assumption in the first place, but I suspect
that I am not the only one to have done so. Probably other hardware
also gets caught not supporting byte enables.
--
Chris Lesiak
[email protected]
On Tue, 30 May 2006, Chris Lesiak wrote:
> On Fri, 2006-05-26 at 17:46 -0600, Robert Hancock wrote:
>> It does seem a little bit less efficient, but I don't know think it's
>> necessarily a bug. There's no guarantee of what size writes will be used
>> with the memcpy_to/fromio functions.
>
> I shouldn't have made that assumption in the first place, but I suspect
> that I am not the only one to have done so. Probably other hardware
> also gets caught not supporting byte enables.
> --
> Chris Lesiak
> [email protected]
>
If byte writes are used, they should always be last for any
odd byte. I think you found a bug in spite of the fact that
whoever made the revision to memcpy probably thinks they
did something 'cool'. This is an example of cute code causing
problems. The classic example of a proper memcpy() that uses
the ix86 built-in macros runs like this:
pushl %esi # Save precious registers
pushl %edi
movl COUNT(%esp),%ecx
movl SOURCE(%esp),%esi
movl DEST(%esp),%edi
cld
shrl $1,%ecx # Make WORDS, possibly set carry
rep movsw # Copy the words
adcl %ecx,%ecx # Any spare byte
rep movsb # Copy any spare byte
popl %edi # Restore precious registers
popl %esi
Note that there isn't any code for moving dwords because the
chances of gaining anything are slim (alignment may hurt).
This kind of code results in the principle of least surprise.
More sophisticated code usually takes longer to execute although
it often looks 'cute' as the designer attempts to create some
sort of alignment, at least for one of the elements. The jumps
in such code usually negate the advantages of any such cuteness.
I've found that it's often necessary to create private functions
to get around the disadvantages of some of the recent cute code.
You can always make a MemcpyTo_io().... It won't ever change
unless you change it! That way, your modules will compile and
work forever, regardless of any "improvements" made in the
source-code tree.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.73 BogoMips).
New book: http://www.AbominableFirebug.com/
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
linux-os (Dick Johnson) wrote:
> If byte writes are used, they should always be last for any
> odd byte. I think you found a bug in spite of the fact that
> whoever made the revision to memcpy probably thinks they
> did something 'cool'. This is an example of cute code causing
> problems. The classic example of a proper memcpy() that uses
> the ix86 built-in macros runs like this:
>
> pushl %esi # Save precious registers
> pushl %edi
> movl COUNT(%esp),%ecx
> movl SOURCE(%esp),%esi
> movl DEST(%esp),%edi
> cld
> shrl $1,%ecx # Make WORDS, possibly set carry
> rep movsw # Copy the words
> adcl %ecx,%ecx # Any spare byte
> rep movsb # Copy any spare byte
> popl %edi # Restore precious registers
> popl %esi
>
> Note that there isn't any code for moving dwords because the
> chances of gaining anything are slim (alignment may hurt).
I'd say the chances of gaining something from executing half as many
instructions on copying a large block of memory are very good indeed..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Followup to: <[email protected]>
By author: Robert Hancock <[email protected]>
In newsgroup: linux.dev.kernel
> >
> > Note that there isn't any code for moving dwords because the
> > chances of gaining anything are slim (alignment may hurt).
>
> I'd say the chances of gaining something from executing half as many
> instructions on copying a large block of memory are very good indeed..
>
For something that generates I/O transactions, it's imperative to
generate the smallest possible number of transactions. Furthermore,
smaller than dword transactions aren't burstable, except at the
beginning and end of a burst.
-hpa