Hi all,
I'm trying to write a driver for a custom PCI-Board which is DMA-Busmaster capable (kernel 2.6.13 with SMP). Unfortunately I get some strange delay between the start of the transfer until the interrupt appears, which signals its completion.
Concerning a dma transfer from RAM to the pci device, my code does the following:
while (down_interruptible(my_device->write_semaphore));
my_device->dma_write_complete = 0;
my_device->dma_direction = PCI_DMA_TODEVICE;
my_device->bus_addr = pci_map_single(my_device->pci_device, pointer_to_buffer, my_device->dma_size, my_device->dma_direction);
writel (cpu_to_le32 (bus_addr), MY_DMA_ADDR_REGISTER);
writel (cpu_to_le32 (my_device->dma_size/4), MY_DMA_COUNT_REGISTER); //triggers dma transfer
if (wait_event_interruptible(write_wait_queue, my_device->dma_write_complete))
{
//handle error...
}
//test, if MY_DMA_COUNT_REGISTER contains 0
up(my_device->write_semaphore);
Inside the Interrupt-handler I do the following:
pci_unmap_single (my_device->pci_device, my_device->bus_addr, my_device->dma_size, my_device->dma_direction);
my_device->dma_write_complete = 1;
wake_up_interruptible(&write_wait_queue);
return IRQ_HANDLED;
Actually the dma transfer works but I get a strange timing issue, which seems to be caused by wait_event_interruptible(). I measured the clock ticks elapsing from the start of the transfer until the interrupt appears. Converted to microseconds I get more than 600 us for less than 3 kB buffers. If I try out busy waiting using "while (!my_device->dma_write_complete)" instead of wait_event_interruptible() the transfer already completes successfully after about 80 us. The device has to transport very large amounts of data, so I have to get the transfer rate as high as possible.
I'm sorry if I made a very simple mistake, because I'm quite unexperienced in driver development, so hints would be very appreciated.
Kind regards,
Burkhard
______________________________________________________________________
XXL-Speicher, PC-Virenschutz, Spartarife & mehr: Nur im WEB.DE Club!
Jetzt gratis testen! http://freemail.web.de/home/landingpad/?mc=021130
On Mon, 12 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
> Hi all,
>
> I'm trying to write a driver for a custom PCI-Board which is DMA-Busmaster capable (kernel 2.6.13 with SMP). Unfortunately I get some strange delay between the start of the transfer until the interrupt appears, which signals its completion.
>
> Concerning a dma transfer from RAM to the pci device, my code does the following:
>
> while (down_interruptible(my_device->write_semaphore));
> my_device->dma_write_complete = 0;
> my_device->dma_direction = PCI_DMA_TODEVICE;
> my_device->bus_addr = pci_map_single(my_device->pci_device, pointer_to_buffer, my_device->dma_size, my_device->dma_direction);
>
> writel (cpu_to_le32 (bus_addr), MY_DMA_ADDR_REGISTER);
> writel (cpu_to_le32 (my_device->dma_size/4), MY_DMA_COUNT_REGISTER); //triggers dma transfer
>
> if (wait_event_interruptible(write_wait_queue, my_device->dma_write_complete))
> {
> //handle error...
> }
> //test, if MY_DMA_COUNT_REGISTER contains 0
> up(my_device->write_semaphore);
>
> Inside the Interrupt-handler I do the following:
>
> pci_unmap_single (my_device->pci_device, my_device->bus_addr,
> my_device->dma_size, my_device->dma_direction);
> my_device->dma_write_complete = 1;
> wake_up_interruptible(&write_wait_queue);
> return IRQ_HANDLED;
>
> Actually the dma transfer works but I get a strange timing issue,
> which seems to be caused by wait_event_interruptible(). I measured the
> clock ticks elapsing from the start of the transfer until the interrupt
> appears. Converted to microseconds I get more than 600 us for less than
> 3 kB buffers. If I try out busy waiting using "while (!my_device->dma_write
>_complete)" instead of wait_event_interruptible() the transfer already
> completes successfully after about 80 us. The device has to transport very
> large amounts of data, so I have to get the transfer rate as high as possible.
>
> I'm sorry if I made a very simple mistake, because I'm quite unexperienced in driver development, so hints would be very appreciated.
>
Don't you get an interrupt both on a completion and error?
I think you should be using interruptible_sleep_on(&write_wait_queue),
not spinning in wait_event_interruptible().
Most all my DMA transfers use as above and from the time the DMA
completion occurs until the time user-mode code gets awakened in
poll() (Much worse latency than your code), the time is always
less than 120 us on a 400 MHz ix86 embedded machine with a 100 MHz
front-side bus.
> Kind regards,
> Burkhard
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
>"linux-os \(Dick Johnson\)" <[email protected]> schrieb am 12.12.05 21:14:39:
>
>
>On Mon, 12 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>
>> Hi all,
>>
>> I'm trying to write a driver for a custom PCI-Board which is DMA-Busmaster capable (kernel 2.6.13 with SMP). Unfortunately I get some strange delay between the start of the transfer >>until the interrupt appears, which signals its completion.
>>
>> Concerning a dma transfer from RAM to the pci device, my code does the following:
>>
>> while (down_interruptible(my_device->write_semaphore));
>> my_device->dma_write_complete = 0;
>> my_device->dma_direction = PCI_DMA_TODEVICE;
>> my_device->bus_addr = pci_map_single(my_device->pci_device, pointer_to_buffer, my_device->dma_size, my_device->dma_direction);
>>
>> writel (cpu_to_le32 (bus_addr), MY_DMA_ADDR_REGISTER);
>> writel (cpu_to_le32 (my_device->dma_size/4), MY_DMA_COUNT_REGISTER); //triggers dma transfer
>>
>> if (wait_event_interruptible(write_wait_queue, my_device->dma_write_complete))
>> {
>> //handle error...
>> }
>> //test, if MY_DMA_COUNT_REGISTER contains 0
>> up(my_device->write_semaphore);
>>
>> Inside the Interrupt-handler I do the following:
>>
>> pci_unmap_single (my_device->pci_device, my_device->bus_addr,
>> my_device->dma_size, my_device->dma_direction);
>> my_device->dma_write_complete = 1;
>> wake_up_interruptible(&write_wait_queue);
>> return IRQ_HANDLED;
>>
>> Actually the dma transfer works but I get a strange timing issue,
>> which seems to be caused by wait_event_interruptible(). I measured the
>> clock ticks elapsing from the start of the transfer until the interrupt
>> appears. Converted to microseconds I get more than 600 us for less than
>> 3 kB buffers. If I try out busy waiting using "while (!my_device->dma_write
>>_complete)" instead of wait_event_interruptible() the transfer already
>> completes successfully after about 80 us. The device has to transport very
>> large amounts of data, so I have to get the transfer rate as high as possible.
>>
>> I'm sorry if I made a very simple mistake, because I'm quite unexperienced in driver development, so hints would be very appreciated.
>>
>
>Don't you get an interrupt both on a completion and error?
>I think you should be using interruptible_sleep_on(&write_wait_queue),
>not spinning in wait_event_interruptible().
Thanks a lot for your answer!
I just tried out interruptible_sleep_on(), but couriously I got the same delay as before. On the hardware side, everything seems to be okay, because the data I'm transferring is relayed to a printhead of a laser printer (by an FPGA on the PCI-Board), whose LEDs light up as expected. The programmer of the FPGA (sitting next to me) says there would be no interrupt in the case of an error (so probably I should sleep with a timeout). But as there is an interrupt (and MY_DMA_COUNT_REGISTER contains really 0) in fact, I think the dma transfer succeeds, or could that be misleading? The only problem seems to be, that the interrupt comes much later, if I put the user process to sleep than let it do busy waiting. Do you have any idea, what could cause this strange behaviour? Could it be concerned with my SMP kernel (I use a processor with 2 cores)?
At first I used interruptible_sleep_on(), but then I changed to wait_event_interruptible(), because I read that the probability of a race condition is higher than with wait_event_interruptible(), so one shouldn't use this function any longer. Do you think interruptible_sleep_on() is okay for this case?
Kind regards,
Burkhard
>Most all my DMA transfers use as above and from the time the DMA
>completion occurs until the time user-mode code gets awakened in
>poll() (Much worse latency than your code), the time is always
>less than 120 us on a 400 MHz ix86 embedded machine with a 100 MHz
>front-side bus.
>
>> Kind regards,
>> Burkhard
>Cheers,
>Dick Johnson
>Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
>Warning : 98.36% of all statistics are fiction.
>
>****************************************************************
>The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities >other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email >to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
>
>Thank you.
______________________________________________________________________
XXL-Speicher, PC-Virenschutz, Spartarife & mehr: Nur im WEB.DE Club!
Jetzt gratis testen! http://freemail.web.de/home/landingpad/?mc=021130
On Tue, 13 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>> "linux-os \(Dick Johnson\)" <[email protected]> schrieb am 12.12.05 21:14:39:
>>
>>
>> On Mon, 12 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to write a driver for a custom PCI-Board which is DMA-Busmaster capable (kernel 2.6.13 with SMP). Unfortunately I get some strange delay between the start of the transfer >>until the interrupt appears, which signals its completion.
>>>
>>> Concerning a dma transfer from RAM to the pci device, my code does the following:
>>>
>>> while (down_interruptible(my_device->write_semaphore));
>>> my_device->dma_write_complete = 0;
>>> my_device->dma_direction = PCI_DMA_TODEVICE;
>>> my_device->bus_addr = pci_map_single(my_device->pci_device, pointer_to_buffer, my_device->dma_size, my_device->dma_direction);
>>>
>>> writel (cpu_to_le32 (bus_addr), MY_DMA_ADDR_REGISTER);
>>> writel (cpu_to_le32 (my_device->dma_size/4), MY_DMA_COUNT_REGISTER); //triggers dma transfer
>>>
>>> if (wait_event_interruptible(write_wait_queue, my_device->dma_write_complete))
>>> {
>>> //handle error...
>>> }
>>> //test, if MY_DMA_COUNT_REGISTER contains 0
>>> up(my_device->write_semaphore);
>>>
>>> Inside the Interrupt-handler I do the following:
>>>
>>> pci_unmap_single (my_device->pci_device, my_device->bus_addr,
>>> my_device->dma_size, my_device->dma_direction);
>>> my_device->dma_write_complete = 1;
>>> wake_up_interruptible(&write_wait_queue);
>>> return IRQ_HANDLED;
>>>
>>> Actually the dma transfer works but I get a strange timing issue,
>>> which seems to be caused by wait_event_interruptible(). I measured the
>>> clock ticks elapsing from the start of the transfer until the interrupt
>>> appears. Converted to microseconds I get more than 600 us for less than
>>> 3 kB buffers. If I try out busy waiting using "while (!my_device->dma_write
>>> _complete)" instead of wait_event_interruptible() the transfer already
>>> completes successfully after about 80 us. The device has to transport very
>>> large amounts of data, so I have to get the transfer rate as high as possible.
>>>
>>> I'm sorry if I made a very simple mistake, because I'm quite unexperienced in driver development, so hints would be very appreciated.
>>>
>>
>> Don't you get an interrupt both on a completion and error?
>> I think you should be using interruptible_sleep_on(&write_wait_queue),
>> not spinning in wait_event_interruptible().
>
> Thanks a lot for your answer!
> I just tried out interruptible_sleep_on(), but couriously I got the same
> delay as before. On the hardware side, everything seems to be okay, because
> the data I'm transferring is relayed to a printhead of a laser printer (by an
> FPGA on the PCI-Board), whose LEDs light up as expected. The programmer of
> the FPGA (sitting next to me) says there would be no interrupt in the case of
> an error (so probably I should sleep with a timeout). But as there is an
> interrupt (and MY_DMA_COUNT_REGISTER contains really 0) in fact, I think the
> dma transfer succeeds, or could that be misleading? The only problem seems
> to be, that the interrupt comes much later, if I put the user process to
> sleep than let it do busy waiting. Do you have any idea, what could cause
> this strange behaviour? Could it be concerned with my SMP kernel (I use a
> processor with 2 cores)?
>
I think I know what is happening. You are writing the count across the
PCI bus, thinking this will start the DMA transfer. However, the count
won't actually get to the device until the PCI interface is flushed
(it's a FIFO, waiting for more activity). You need to force that
write to occur NOW, by performing a dummy read in your address-space
on the PCI bus.
Then, you should find that the DMA seems to occur instantly and you
get your interrupt when you expect it. We use the PLX PCI 9656BA
for PCI interface on our datalink boards so I have a lot of
experience in this area.
In the case where you were polling the interface, the first read
if its status actually flushed the PCI bus and started the DMA
transfer. In the cases where you weren't polling, the count
got to the device whenever the PCI interface timed-out or when
there was other activity such as network.
> At first I used interruptible_sleep_on(), but then I changed to
> wait_event_interruptible(), because I read that the probability of a race
> condition is higher than with wait_event_interruptible(), so one shouldn't
> use this function any longer. Do you think interruptible_sleep_on() is
> okay for this case?
>
Every time somebody wants to rewrite a macro, they declare that the
previous one had some race condition. If, in fact, you have only
one DMA occurring from your device then no race is possible with
interruptible_sleep_on(). wait_event_interruptible() is the same thing
but with an additional access to some memory variable, possibly
causing a cache refill which means it might take more time.
In any event, then both work okay.
> Kind regards,
> Burkhard
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
linux-os (Dick Johnson) wrote:
> Every time somebody wants to rewrite a macro, they declare that the
> previous one had some race condition. If, in fact, you have only
> one DMA occurring from your device then no race is possible with
> interruptible_sleep_on(). wait_event_interruptible() is the same thing
> but with an additional access to some memory variable, possibly
> causing a cache refill which means it might take more time.
This is not correct. Using interruptible_sleep_on, there is no way to
prevent the race where the condition being waited on happens between the
test to see if it has become true and calling interruptible_sleep_on.
wait_event_interruptible puts the caller into the wait queue before
testing the condition, which prevents the race.
interruptible_sleep_on is, with good reason, no longer recommended for use.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
>On Tue, 13 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>
>> Thanks a lot for your answer!
>> I just tried out interruptible_sleep_on(), but couriously I got the same
>> delay as before. On the hardware side, everything seems to be okay, because
>> the data I'm transferring is relayed to a printhead of a laser printer (by an
>> FPGA on the PCI-Board), whose LEDs light up as expected. The programmer of
>> the FPGA (sitting next to me) says there would be no interrupt in the case of
>> an error (so probably I should sleep with a timeout). But as there is an
>> interrupt (and MY_DMA_COUNT_REGISTER contains really 0) in fact, I think the
>> dma transfer succeeds, or could that be misleading? The only problem seems
>> to be, that the interrupt comes much later, if I put the user process to
>> sleep than let it do busy waiting. Do you have any idea, what could cause
>> this strange behaviour? Could it be concerned with my SMP kernel (I use a
>> processor with 2 cores)?
>>
"linux-os \(Dick Johnson\)" <[email protected]> schrieb am 13.12.05 15:30:33:
>
>I think I know what is happening. You are writing the count across the
>PCI bus, thinking this will start the DMA transfer. However, the count
>won't actually get to the device until the PCI interface is flushed
>(it's a FIFO, waiting for more activity). You need to force that
>write to occur NOW, by performing a dummy read in your address-space
>on the PCI bus.
>
>Then, you should find that the DMA seems to occur instantly and you
>get your interrupt when you expect it. We use the PLX PCI 9656BA
>for PCI interface on our datalink boards so I have a lot of
>experience in this area.
>
>In the case where you were polling the interface, the first read
>if its status actually flushed the PCI bus and started the DMA
>transfer. In the cases where you weren't polling, the count
>got to the device whenever the PCI interface timed-out or when
>there was other activity such as network.
Thank you for your help! The dummy read was a very helpful hint to get the DMA stuff more reliable (although the fpga programmer had to admit that there was some other problem in the hardware after all). I think it should work fine soon.
I'm glad to meet somebody with dma experience, because I have some other difficulties concerning DMA buffers in RAM. The PCI-Board is to be applied in a large size copying machine, so it essentially has to transfer tons of data in 2 directions very fast without wasting cpu time (because the cpu has to run many image processing algorithms meanwhile on this data). So my approach is to allocate a quite large ringbuffer in kernel space (or more precisely one ringbuffer for each direction) which is capable of dma. Afterwards I would map this buffer to user space to avoid unnecessary memcopies/cpu usage. My problem is for now to get such a large DMA buffer. I tried out several things I read in O'Reilly's book, but they all failed so far. My current attempt is to take a high memory area with ioremap:
buffer_addr = ioremap( virt_to_phys(high_memory), large_size );
Mapping this buffer to user space works, but it does not seem to be DMA capable. Maybe it's just wrong to use ioremap() for that? I would be very glad for getting some advice.
Kind regards,
Burkhard
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193
On Wed, 14 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>> On Tue, 13 Dec 2005, [iso-8859-1] Burkhard Sch?lpen wrote:
>>
>>> Thanks a lot for your answer!
>>> I just tried out interruptible_sleep_on(), but couriously I got the same
>>> delay as before. On the hardware side, everything seems to be okay, because
>>> the data I'm transferring is relayed to a printhead of a laser printer (by an
>>> FPGA on the PCI-Board), whose LEDs light up as expected. The programmer of
>>> the FPGA (sitting next to me) says there would be no interrupt in the case of
>>> an error (so probably I should sleep with a timeout). But as there is an
>>> interrupt (and MY_DMA_COUNT_REGISTER contains really 0) in fact, I think the
>>> dma transfer succeeds, or could that be misleading? The only problem seems
>>> to be, that the interrupt comes much later, if I put the user process to
>>> sleep than let it do busy waiting. Do you have any idea, what could cause
>>> this strange behaviour? Could it be concerned with my SMP kernel (I use a
>>> processor with 2 cores)?
>>>
> "linux-os \(Dick Johnson\)" <[email protected]> schrieb am 13.12.05 15:30:33:
>>
>> I think I know what is happening. You are writing the count across the
>> PCI bus, thinking this will start the DMA transfer. However, the count
>> won't actually get to the device until the PCI interface is flushed
>> (it's a FIFO, waiting for more activity). You need to force that
>> write to occur NOW, by performing a dummy read in your address-space
>> on the PCI bus.
>>
>> Then, you should find that the DMA seems to occur instantly and you
>> get your interrupt when you expect it. We use the PLX PCI 9656BA
>> for PCI interface on our datalink boards so I have a lot of
>> experience in this area.
>>
>> In the case where you were polling the interface, the first read
>> if its status actually flushed the PCI bus and started the DMA
>> transfer. In the cases where you weren't polling, the count
>> got to the device whenever the PCI interface timed-out or when
>> there was other activity such as network.
>
> Thank you for your help! The dummy read was a very helpful hint to get the DMA stuff more reliable (although the fpga programmer had to admit that there was some other problem in the hardware after all). I think it should work fine soon.
>
> I'm glad to meet somebody with dma experience, because I have some other difficulties concerning DMA buffers in RAM. The PCI-Board is to be applied in a large size copying machine, so it essentially has to transfer tons of data in 2 directions very fast without wasting cpu time (because the cpu has to run many image processing algorithms meanwhile on this data). So my approach is to allocate a quite large ringbuffer in kernel space (or more precisely one ringbuffer for each direction) which is capable of dma. Afterwards I would map this buffer to user space to avoid unnecessary memcopies/cpu usage. My problem is for now to get such a large DMA buffer. I tried out several things I read in O'Reilly's book, but they all failed so far. My current attempt is to take a high memory area with ioremap:
>
> buffer_addr = ioremap( virt_to_phys(high_memory), large_size );
>
> Mapping this buffer to user space works, but it does not seem to be DMA capable. Maybe it's just wrong to use ioremap() for that? I would be very glad for getting some advice.
>
> Kind regards,
> Burkhard
I have attached a "driver" that does nothing but map DMA-able pages
to user-space. It should show you what you need to do. It's really
quite simple, but the devil is in the details.
Also, if you are using the PLX or similar PCI interface device, you
can use the scatter-list capability so that the DMA pages don't
have to be contiguous. The mapping to user-space makes them
virtually contiguous to the user, but you can use pages from
anywhere in memory as long as its addressable by your controller.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
Burkhard Sch?lpen wrote:
> I'm glad to meet somebody with dma experience, because I have
> some other difficulties concerning DMA buffers in RAM. The PCI-Board
> is to be applied in a large size copying machine, so it essentially
> has to transfer tons of data in 2 directions very fast without wasting
> cpu time (because the cpu has to run many image processing algorithms
> meanwhile on this data). So my approach is to allocate a quite large
> ringbuffer in kernel space (or more precisely one ringbuffer for each
> direction) which is capable of dma. Afterwards I would map this buffer
> to user space to avoid unnecessary memcopies/cpu usage. My problem is
> for now to get such a large DMA buffer. I tried out several things I
> read in O'Reilly's book, but they all failed so far. My current
> attempt is to take a high memory area with ioremap:
You can't ioremap normal memory like that. ioremap is only for MMIO
address regions.
Better than trying to allocate lots of memory in the kernel (which you
can't, really), would be to make the userspace application allocate the
ringbuffer and do DMA from the device to userspace memory. To do this,
you'll have to either make the device do a separate DMA for every
contiguous chunk, or better yet make the device support scatter-gather
DMA so that it can read/write from discontiguous physical blocks of
memory. Have a look at Documentation/DMA-mapping.txt and
Documentation/DMA-API.txt, also at the Linux Device Drivers 3rd ed.
online book, these all have info on how this can be done.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/