These 2 patches address issues encountered while attempting to use the
DMA subsystem in the NTB driver. While outside of the scope of that
patch (which will be forthcoming), these fixes are necessary to get it
working.
Thanks,
Jon
dma_sync_wait is declared regardless of whether CONFIG_DMA_ENGINE is
enabled, but calling the function without CONFIG_DMA_ENGINE enabled
results in the following gcc error.
ERROR: "dma_sync_wait" [drivers/ntb/ntb.ko] undefined!
To get around this, declare dma_sync_wait as an inline function if
CONFIG_DMA_ENGINE is undefined.
Signed-off-by: Jon Mason <[email protected]>
Acked-by: Dave Jiang <[email protected]>
---
include/linux/dmaengine.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 96d3e4a..e3652ac 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -963,8 +963,8 @@ dma_set_tx_state(struct dma_tx_state *st, dma_cookie_t last, dma_cookie_t used,
}
}
-enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
#ifdef CONFIG_DMA_ENGINE
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
enum dma_status dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx);
void dma_issue_pending_all(void);
struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
@@ -972,6 +972,10 @@ struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
struct dma_chan *dma_request_slave_channel(struct device *dev, const char *name);
void dma_release_channel(struct dma_chan *chan);
#else
+static inline enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
+{
+ return DMA_SUCCESS;
+}
static inline enum dma_status dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx)
{
return DMA_SUCCESS;
--
1.7.9.5
Set the DMA_PRIVATE dma_transaction_type in the capability mask. This
enables the ability to get an exclusive ioatdma DMA channel for any
devices that requests one via the dma_request_channel function call.
Signed-off-by: Jon Mason <[email protected]>
Acked-by: Dave Jiang <[email protected]>
---
drivers/dma/ioat/dma_v3.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index ca6ea9b..ac2aeef 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -1883,6 +1883,7 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
dma->copy_align = 6;
dma_cap_set(DMA_INTERRUPT, dma->cap_mask);
+ dma_cap_set(DMA_PRIVATE, dma->cap_mask);
dma->device_prep_dma_interrupt = ioat3_prep_interrupt_lock;
device->cap = readl(device->reg_base + IOAT_DMA_CAP_OFFSET);
--
1.7.9.5
On Tue, Jun 18, 2013 at 5:46 PM, Jon Mason <[email protected]> wrote:
> Set the DMA_PRIVATE dma_transaction_type in the capability mask. This
> enables the ability to get an exclusive ioatdma DMA channel for any
> devices that requests one via the dma_request_channel function call.
>
> Signed-off-by: Jon Mason <[email protected]>
> Acked-by: Dave Jiang <[email protected]>
> ---
> drivers/dma/ioat/dma_v3.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
> index ca6ea9b..ac2aeef 100644
> --- a/drivers/dma/ioat/dma_v3.c
> +++ b/drivers/dma/ioat/dma_v3.c
> @@ -1883,6 +1883,7 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
> dma->copy_align = 6;
>
> dma_cap_set(DMA_INTERRUPT, dma->cap_mask);
> + dma_cap_set(DMA_PRIVATE, dma->cap_mask);
> dma->device_prep_dma_interrupt = ioat3_prep_interrupt_lock;
DMA_PRIVATE here keeps all channels private, so they couldn't be used
elsewhere, for example raid offload. Do you need a private allocation
or can you get away with a dynamically assigned channel?
On Tue, Jun 18, 2013 at 5:46 PM, Jon Mason <[email protected]> wrote:
> dma_sync_wait is declared regardless of whether CONFIG_DMA_ENGINE is
> enabled, but calling the function without CONFIG_DMA_ENGINE enabled
> results in the following gcc error.
> ERROR: "dma_sync_wait" [drivers/ntb/ntb.ko] undefined!
>
> To get around this, declare dma_sync_wait as an inline function if
> CONFIG_DMA_ENGINE is undefined.
>
> Signed-off-by: Jon Mason <[email protected]>
> Acked-by: Dave Jiang <[email protected]>
> ---
> include/linux/dmaengine.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 96d3e4a..e3652ac 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -963,8 +963,8 @@ dma_set_tx_state(struct dma_tx_state *st, dma_cookie_t last, dma_cookie_t used,
> }
> }
>
> -enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
> #ifdef CONFIG_DMA_ENGINE
> +enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
> enum dma_status dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx);
> void dma_issue_pending_all(void);
> struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
> @@ -972,6 +972,10 @@ struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
> struct dma_chan *dma_request_slave_channel(struct device *dev, const char *name);
> void dma_release_channel(struct dma_chan *chan);
> #else
> +static inline enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
> +{
> + return DMA_SUCCESS;
> +}
Seems like something we should fix, but why is this not an indication
that the code calling this is missing a "depends on DMA_ENGINE".
On second look dma_sync_wait is a use as last resort hack that
probably should not escape its internal use in dmaengine.c. The other
escape in drivers/media/platform/timblogiw.c already looks problematic
by having a live cpu spin immediately following a sleeping wait,
obviously something event based was wanted. What's the use case in
NTB?
--
Dan
On Tue, Jun 18, 2013 at 06:13:28PM -0700, Dan Williams wrote:
> On Tue, Jun 18, 2013 at 5:46 PM, Jon Mason <[email protected]> wrote:
> > dma_sync_wait is declared regardless of whether CONFIG_DMA_ENGINE is
> > enabled, but calling the function without CONFIG_DMA_ENGINE enabled
> > results in the following gcc error.
> > ERROR: "dma_sync_wait" [drivers/ntb/ntb.ko] undefined!
> >
> > To get around this, declare dma_sync_wait as an inline function if
> > CONFIG_DMA_ENGINE is undefined.
> >
> > Signed-off-by: Jon Mason <[email protected]>
> > Acked-by: Dave Jiang <[email protected]>
> > ---
> > include/linux/dmaengine.h | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index 96d3e4a..e3652ac 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -963,8 +963,8 @@ dma_set_tx_state(struct dma_tx_state *st, dma_cookie_t last, dma_cookie_t used,
> > }
> > }
> >
> > -enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
> > #ifdef CONFIG_DMA_ENGINE
> > +enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
> > enum dma_status dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx);
> > void dma_issue_pending_all(void);
> > struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
> > @@ -972,6 +972,10 @@ struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
> > struct dma_chan *dma_request_slave_channel(struct device *dev, const char *name);
> > void dma_release_channel(struct dma_chan *chan);
> > #else
> > +static inline enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
> > +{
> > + return DMA_SUCCESS;
> > +}
>
> Seems like something we should fix, but why is this not an indication
> that the code calling this is missing a "depends on DMA_ENGINE".
The driver could use CPU or DMA to move data. If there are no free
DMA channels or none present, use CPU. In this way, there is no real
dependency on DMA_ENGINE being enabled.
> On second look dma_sync_wait is a use as last resort hack that
> probably should not escape its internal use in dmaengine.c. The other
> escape in drivers/media/platform/timblogiw.c already looks problematic
> by having a live cpu spin immediately following a sleeping wait,
> obviously something event based was wanted. What's the use case in
> NTB?
NTB is currently using it to flush any pending DMAs. This is needed
to allow the DMA engine and the CPU to perform operations on the same
"Memory Window". Without this, it is possible for the operations to
complete out of order, which is not a desired outcome for any network
traffic over NTB. CPU is preferred over DMA engine for small
transfers. Also, it provides an alternative for errors in the DMA
engine copy process (e.g., DMA mapping, device_prep_dma_memcpy, and
dmaengine_submit).
Thanks,
Jon
>
> --
> Dan
On 06/18/2013 05:59 PM, Dan Williams wrote:
> On Tue, Jun 18, 2013 at 5:46 PM, Jon Mason <[email protected]> wrote:
>> Set the DMA_PRIVATE dma_transaction_type in the capability mask. This
>> enables the ability to get an exclusive ioatdma DMA channel for any
>> devices that requests one via the dma_request_channel function call.
>>
>> Signed-off-by: Jon Mason <[email protected]>
>> Acked-by: Dave Jiang <[email protected]>
>> ---
>> drivers/dma/ioat/dma_v3.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
>> index ca6ea9b..ac2aeef 100644
>> --- a/drivers/dma/ioat/dma_v3.c
>> +++ b/drivers/dma/ioat/dma_v3.c
>> @@ -1883,6 +1883,7 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
>> dma->copy_align = 6;
>>
>> dma_cap_set(DMA_INTERRUPT, dma->cap_mask);
>> + dma_cap_set(DMA_PRIVATE, dma->cap_mask);
>> dma->device_prep_dma_interrupt = ioat3_prep_interrupt_lock;
> DMA_PRIVATE here keeps all channels private, so they couldn't be used
> elsewhere, for example raid offload. Do you need a private allocation
> or can you get away with a dynamically assigned channel?
So is there no way to reserve a channel as private if no one is using
it? It seems the current setup is async_tx or nothing.
--
Dave Jiang
Application Engineer, Storage Divsion
Intel Corp.
[email protected]
On Tue, Jun 18, 2013 at 05:59:59PM -0700, Dan Williams wrote:
> On Tue, Jun 18, 2013 at 5:46 PM, Jon Mason <[email protected]> wrote:
> > Set the DMA_PRIVATE dma_transaction_type in the capability mask. This
> > enables the ability to get an exclusive ioatdma DMA channel for any
> > devices that requests one via the dma_request_channel function call.
> >
> > Signed-off-by: Jon Mason <[email protected]>
> > Acked-by: Dave Jiang <[email protected]>
> > ---
> > drivers/dma/ioat/dma_v3.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
> > index ca6ea9b..ac2aeef 100644
> > --- a/drivers/dma/ioat/dma_v3.c
> > +++ b/drivers/dma/ioat/dma_v3.c
> > @@ -1883,6 +1883,7 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
> > dma->copy_align = 6;
> >
> > dma_cap_set(DMA_INTERRUPT, dma->cap_mask);
> > + dma_cap_set(DMA_PRIVATE, dma->cap_mask);
> > dma->device_prep_dma_interrupt = ioat3_prep_interrupt_lock;
>
> DMA_PRIVATE here keeps all channels private, so they couldn't be used
> elsewhere, for example raid offload. Do you need a private allocation
> or can you get away with a dynamically assigned channel?
I would like to have a dedicated DMA engine. async_tx could cause the
copies to complete out of order. Do I need to add infrastructure to
allow for private channel usage, and when unused allow the channel to
be used by async_tx?
Thanks,
Jon
On Wed, Jun 19, 2013 at 10:52 AM, Jon Mason <[email protected]> wrote:
>> DMA_PRIVATE here keeps all channels private, so they couldn't be used
>> elsewhere, for example raid offload. Do you need a private allocation
>> or can you get away with a dynamically assigned channel?
>
> I would like to have a dedicated DMA engine. async_tx could cause the
> copies to complete out of order. Do I need to add infrastructure to
> allow for private channel usage, and when unused allow the channel to
> be used by async_tx?
Can NTB just call dma_find_channel() once and be done? Also, async_tx
will only get things out of order if you allow it to pick a new
channel for every operation, but as long as you specify a dependency
chain it will keep things in order (same as remembering the result of
dma_find_channel).
--
Dan
On Wed, Jun 19, 2013 at 11:56:38AM -0700, Dan Williams wrote:
> On Wed, Jun 19, 2013 at 10:52 AM, Jon Mason <[email protected]> wrote:
> >> DMA_PRIVATE here keeps all channels private, so they couldn't be used
> >> elsewhere, for example raid offload. Do you need a private allocation
> >> or can you get away with a dynamically assigned channel?
> >
> > I would like to have a dedicated DMA engine. async_tx could cause the
> > copies to complete out of order. Do I need to add infrastructure to
> > allow for private channel usage, and when unused allow the channel to
> > be used by async_tx?
>
> Can NTB just call dma_find_channel() once and be done? Also, async_tx
> will only get things out of order if you allow it to pick a new
> channel for every operation, but as long as you specify a dependency
> chain it will keep things in order (same as remembering the result of
> dma_find_channel).
Neither of these allow a device exclusive access to a channel. Is
this not something that is desired by other users of DMA Engines, or
do I have a unique usage model?
Thanks,
Jon
>
> --
> Dan
On Wed, Jun 19, 2013 at 1:10 PM, Jon Mason <[email protected]> wrote:
> On Wed, Jun 19, 2013 at 11:56:38AM -0700, Dan Williams wrote:
>> On Wed, Jun 19, 2013 at 10:52 AM, Jon Mason <[email protected]> wrote:
>> >> DMA_PRIVATE here keeps all channels private, so they couldn't be used
>> >> elsewhere, for example raid offload. Do you need a private allocation
>> >> or can you get away with a dynamically assigned channel?
>> >
>> > I would like to have a dedicated DMA engine. async_tx could cause the
>> > copies to complete out of order. Do I need to add infrastructure to
>> > allow for private channel usage, and when unused allow the channel to
>> > be used by async_tx?
>>
>> Can NTB just call dma_find_channel() once and be done? Also, async_tx
>> will only get things out of order if you allow it to pick a new
>> channel for every operation, but as long as you specify a dependency
>> chain it will keep things in order (same as remembering the result of
>> dma_find_channel).
>
> Neither of these allow a device exclusive access to a channel. Is
> this not something that is desired by other users of DMA Engines, or
> do I have a unique usage model?
>
Yes, ioatdma has never been used in a mixed model. Other platforms
are either using devicetree bindings for specifying channels to their
master devices or are exclusively slave channels. Other slave usage
models require use of a dedicated channel, since NTB is alternating
between cpu and dma it seems it could get away with using a "public"
channel.
On Wed, Jun 19, 2013 at 9:28 AM, Jon Mason <[email protected]> wrote:
> On Tue, Jun 18, 2013 at 06:13:28PM -0700, Dan Williams wrote:
[..]
> NTB is currently using it to flush any pending DMAs. This is needed
> to allow the DMA engine and the CPU to perform operations on the same
> "Memory Window". Without this, it is possible for the operations to
> complete out of order, which is not a desired outcome for any network
> traffic over NTB. CPU is preferred over DMA engine for small
> transfers. Also, it provides an alternative for errors in the DMA
> engine copy process (e.g., DMA mapping, device_prep_dma_memcpy, and
> dmaengine_submit).
Ok there really isn't a better alternative, is NTB always polling or
does it also use completion callbacks?
On Thu, Jun 20, 2013 at 02:20:30PM -0700, Dan Williams wrote:
> On Wed, Jun 19, 2013 at 9:28 AM, Jon Mason <[email protected]> wrote:
> > On Tue, Jun 18, 2013 at 06:13:28PM -0700, Dan Williams wrote:
> [..]
> > NTB is currently using it to flush any pending DMAs. This is needed
> > to allow the DMA engine and the CPU to perform operations on the same
> > "Memory Window". Without this, it is possible for the operations to
> > complete out of order, which is not a desired outcome for any network
> > traffic over NTB. CPU is preferred over DMA engine for small
> > transfers. Also, it provides an alternative for errors in the DMA
> > engine copy process (e.g., DMA mapping, device_prep_dma_memcpy, and
> > dmaengine_submit).
>
> Ok there really isn't a better alternative, is NTB always polling or
> does it also use completion callbacks?
It uses the callbacks. I've change the code to use dma_find_channel,
and will push it for review as a RFC shortly.
Thanks,
Jon
On Wed, Jun 19, 2013 at 01:44:08PM -0700, Dan Williams wrote:
> On Wed, Jun 19, 2013 at 1:10 PM, Jon Mason <[email protected]> wrote:
> > On Wed, Jun 19, 2013 at 11:56:38AM -0700, Dan Williams wrote:
> >> On Wed, Jun 19, 2013 at 10:52 AM, Jon Mason <[email protected]> wrote:
> >> >> DMA_PRIVATE here keeps all channels private, so they couldn't be used
> >> >> elsewhere, for example raid offload. Do you need a private allocation
> >> >> or can you get away with a dynamically assigned channel?
> >> >
> >> > I would like to have a dedicated DMA engine. async_tx could cause the
> >> > copies to complete out of order. Do I need to add infrastructure to
> >> > allow for private channel usage, and when unused allow the channel to
> >> > be used by async_tx?
> >>
> >> Can NTB just call dma_find_channel() once and be done? Also, async_tx
> >> will only get things out of order if you allow it to pick a new
> >> channel for every operation, but as long as you specify a dependency
> >> chain it will keep things in order (same as remembering the result of
> >> dma_find_channel).
> >
> > Neither of these allow a device exclusive access to a channel. Is
> > this not something that is desired by other users of DMA Engines, or
> > do I have a unique usage model?
> >
>
> Yes, ioatdma has never been used in a mixed model. Other platforms
> are either using devicetree bindings for specifying channels to their
> master devices or are exclusively slave channels. Other slave usage
> models require use of a dedicated channel, since NTB is alternating
> between cpu and dma it seems it could get away with using a "public"
> channel.
Unfortunately, the dma_find_channel model does not led itself to
optimal usage of the available channels, as it seems to give out the
same channel. Adding some randomizer (or other way to spread the
channel selection over multiple channels) would be greatly
beneficial.
Alternatively, adding an async_memcpy_mmio function (and perhaps some
minimal size to use the DMA engine) would provide a solution that
would work.
However, dma_find_channel should be sufficient to get NTB use of DMA
engines out for review. I'll clean it up and send it out shortly.
Thanks for the insight.
Thanks,
Jon
On 6/26/13 4:55 PM, "Jon Mason" <[email protected]> wrote:
>Unfortunately, the dma_find_channel model does not led itself to
>optimal usage of the available channels, as it seems to give out the
>same channel. Adding some randomizer (or other way to spread the
>channel selection over multiple channels) would be greatly
>beneficial.
Right now the allocation scheme is per-cpu. If you have multiple
submission contexts on different cpus it should use all available channels.
>Alternatively, adding an async_memcpy_mmio function (and perhaps some
>minimal size to use the DMA engine) would provide a solution that
>would work.
It?s hard. The size is arch and use specific so it may be best to leave
it up to the client.
>However, dma_find_channel should be sufficient to get NTB use of DMA
>engines out for review. I'll clean it up and send it out shortly.
>Thanks for the insight.
Cool, I?ll take a look.