32-bit ARM kernels may have a 64-bit dma_addr_t but have no
implementation of the compiler helper for 64-bit unsigned division,
therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
the link error "undefined reference to `__aeabi_uldivmod'"
As the burst value is always a power of two we can fix the problem, and
make the code more efficient, by replacing "% burst" with "& (burst-1)".
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Jon Medhurst <[email protected]>
---
Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
in linux-next is part of a stable branch or if the SHA1 might change
before hitting mainline. If it stable then the line should be...
Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
drivers/dma/pl330.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 38c9617..52c4c62 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst,
* parameters because our DMA programming algorithm doesn't cope with
* transfers which straddle an entry in the DMA device's MFIFO.
*/
- while (burst > 1) {
- if (!((src | dst | len) % burst))
- break;
+ while ((src | dst | len) & (burst - 1))
burst /= 2;
- }
desc->rqcfg.brst_size = 0;
while (burst != (1 << desc->rqcfg.brst_size))
--
2.1.1
On Thursday 13 November 2014 16:27:27 Jon Medhurst wrote:
> 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> implementation of the compiler helper for 64-bit unsigned division,
> therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> the link error "undefined reference to `__aeabi_uldivmod'"
>
> As the burst value is always a power of two we can fix the problem, and
> make the code more efficient, by replacing "% burst" with "& (burst-1)".
>
> Reported-by: kbuild test robot <[email protected]>
> Signed-off-by: Jon Medhurst <[email protected]>
>
Just saw the same thing and was going to send a different patch, but
yours is better.
Acked-by: Arnd Bergmann <[email protected]>
On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote:
> 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> implementation of the compiler helper for 64-bit unsigned division,
> therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> the link error "undefined reference to `__aeabi_uldivmod'"
>
> As the burst value is always a power of two we can fix the problem, and
> make the code more efficient, by replacing "% burst" with "& (burst-1)".
>
> Reported-by: kbuild test robot <[email protected]>
> Signed-off-by: Jon Medhurst <[email protected]>
> ---
>
> Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
> in linux-next is part of a stable branch or if the SHA1 might change
> before hitting mainline. If it stable then the line should be...
>
> Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
I have applied this for now but...
While at it and also related to Fixes, typically the fixes branch wont be
rebased before its sent to Linus and merged. But this is introduced in patch
which is sent, should I just fold it in and not cause this regression in
first place...?
--
~Vinod
>
>
> drivers/dma/pl330.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 38c9617..52c4c62 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst,
> * parameters because our DMA programming algorithm doesn't cope with
> * transfers which straddle an entry in the DMA device's MFIFO.
> */
> - while (burst > 1) {
> - if (!((src | dst | len) % burst))
> - break;
> + while ((src | dst | len) & (burst - 1))
> burst /= 2;
> - }
>
> desc->rqcfg.brst_size = 0;
> while (burst != (1 << desc->rqcfg.brst_size))
> --
> 2.1.1
>
>
--
On Thu, 2014-11-13 at 16:27 +0000, Jon Medhurst (Tixy) wrote:
> 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> implementation of the compiler helper for 64-bit unsigned division,
> therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> the link error "undefined reference to `__aeabi_uldivmod'"
>
> As the burst value is always a power of two we can fix the problem, and
> make the code more efficient, by replacing "% burst" with "& (burst-1)".
>
> Reported-by: kbuild test robot <[email protected]>
> Signed-off-by: Jon Medhurst <[email protected]>
> ---
>
> Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
> in linux-next is part of a stable branch or if the SHA1 might change
> before hitting mainline. If it stable then the line should be...
>
> Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
>
>
> drivers/dma/pl330.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 38c9617..52c4c62 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst,
> * parameters because our DMA programming algorithm doesn't cope with
> * transfers which straddle an entry in the DMA device's MFIFO.
> */
> - while (burst > 1) {
> - if (!((src | dst | len) % burst))
> - break;
> + while ((src | dst | len) & (burst - 1))
> burst /= 2;
> - }
Maybe something like:
div = ffs(src | dst | len);
if (burst > 1 && div)
burst >>= div;
?
dunno if dma_addr_t src or dst can ever be a 64 bit value
for AMBA or not. If so, the ffs would need to be different.
Maybe:
if (sizeof(dma_addr_t) == sizeof(u64))
div = __ffs64(src | dst | len);
else
div = ffs(src | dst | len);
if (burst > 1 && div)
burst >>= div;
On Thu, 2014-11-13 at 22:31 +0530, Vinod Koul wrote:
> On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote:
> > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> > implementation of the compiler helper for 64-bit unsigned division,
> > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> > the link error "undefined reference to `__aeabi_uldivmod'"
> >
> > As the burst value is always a power of two we can fix the problem, and
> > make the code more efficient, by replacing "% burst" with "& (burst-1)".
> >
> > Reported-by: kbuild test robot <[email protected]>
> > Signed-off-by: Jon Medhurst <[email protected]>
> > ---
> >
> > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
> > in linux-next is part of a stable branch or if the SHA1 might change
> > before hitting mainline. If it stable then the line should be...
> >
> > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
> I have applied this for now but...
>
> While at it and also related to Fixes, typically the fixes branch wont be
> rebased before its sent to Linus and merged. But this is introduced in patch
> which is sent, should I just fold it in and not cause this regression in
> first place...?
I have no objection to folding it in, but then doesn't that remove
credit for Fengguang Wu's test system for finding and reporting errors?
--
Tixy
On Thu, 2014-11-13 at 09:02 -0800, Joe Perches wrote:
> On Thu, 2014-11-13 at 16:27 +0000, Jon Medhurst (Tixy) wrote:
> > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> > implementation of the compiler helper for 64-bit unsigned division,
> > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> > the link error "undefined reference to `__aeabi_uldivmod'"
> >
> > As the burst value is always a power of two we can fix the problem, and
> > make the code more efficient, by replacing "% burst" with "& (burst-1)".
> >
> > Reported-by: kbuild test robot <[email protected]>
> > Signed-off-by: Jon Medhurst <[email protected]>
> > ---
> >
> > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
> > in linux-next is part of a stable branch or if the SHA1 might change
> > before hitting mainline. If it stable then the line should be...
> >
> > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
> >
> >
> > drivers/dma/pl330.c | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> > index 38c9617..52c4c62 100644
> > --- a/drivers/dma/pl330.c
> > +++ b/drivers/dma/pl330.c
> > @@ -2464,11 +2464,8 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst,
> > * parameters because our DMA programming algorithm doesn't cope with
> > * transfers which straddle an entry in the DMA device's MFIFO.
> > */
> > - while (burst > 1) {
> > - if (!((src | dst | len) % burst))
> > - break;
> > + while ((src | dst | len) & (burst - 1))
> > burst /= 2;
> > - }
>
> Maybe something like:
>
> div = ffs(src | dst | len);
> if (burst > 1 && div)
> burst >>= div;
That doesn't work, the code is trying to limit burst to make it a factor
of src, dst and len, so it would need to be something like
div = ffs(src | dst | len);
if (div)
burst = min(burst, 1 << div);
There are many ways to code the limiting of the burst width, but as it
starts out as the data bus width the DMA can handle (maximum 16 bytes)
then at most we'll be going round the existing while loop 4 times so I
don't think it's that much overhead, and probably less code size than
using ffs.
And as the driver has been broken for the unaligned memcpy case since
the day it was added then I can't see that anyone is actually using it
that way anyway, so all existing users (if any) must already be doing
bus aligned copies and the current while loop will iterate zero times.
That's probably enough bikeshedding from me :-)
> ?
>
> dunno if dma_addr_t src or dst can ever be a 64 bit value
> for AMBA or not.
The pl330 TRM I have and the current Linux driver explicitly have 32-bit
addresses, so you would need an IOMMU to access addresses above 4GB.
--
Tixy
On Thu, 2014-11-13 at 18:19 +0000, Jon Medhurst (Tixy) wrote:
> There are many ways to code the limiting of the burst width, but as it
> starts out as the data bus width the DMA can handle (maximum 16 bytes)
> then at most we'll be going round the existing while loop 4 times so I
> don't think it's that much overhead, and probably less code size than
> using ffs.
For arm, isn't ffs just a few instruction with no loops?
> And as the driver has been broken for the unaligned memcpy case since
> the day it was added then I can't see that anyone is actually using it
> that way anyway, so all existing users (if any) must already be doing
> bus aligned copies and the current while loop will iterate zero times.
That's probably right, I just don't like reading
while loops where ffs/fls might be suitable.
> That's probably enough bikeshedding from me :-)
;) Me too. cheers, Joe
On Thu, Nov 13, 2014 at 05:11:28PM +0000, Jon Medhurst (Tixy) wrote:
> On Thu, 2014-11-13 at 22:31 +0530, Vinod Koul wrote:
> > On Thu, Nov 13, 2014 at 04:27:27PM +0000, Jon Medhurst (Tixy) wrote:
> > > 32-bit ARM kernels may have a 64-bit dma_addr_t but have no
> > > implementation of the compiler helper for 64-bit unsigned division,
> > > therefore the use of the modulo operator in pl330_prep_dma_memcpy causes
> > > the link error "undefined reference to `__aeabi_uldivmod'"
> > >
> > > As the burst value is always a power of two we can fix the problem, and
> > > make the code more efficient, by replacing "% burst" with "& (burst-1)".
> > >
> > > Reported-by: kbuild test robot <[email protected]>
> > > Signed-off-by: Jon Medhurst <[email protected]>
> > > ---
> > >
> > > Vinod. I haven't added a 'Fixes:' line because I was unsure if the patch
> > > in linux-next is part of a stable branch or if the SHA1 might change
> > > before hitting mainline. If it stable then the line should be...
> > >
> > > Fixes: 63369d0a96dc ("dmaengine: pl330: Align DMA memcpy operations to MFIFO width")
> > I have applied this for now but...
> >
> > While at it and also related to Fixes, typically the fixes branch wont be
> > rebased before its sent to Linus and merged. But this is introduced in patch
> > which is sent, should I just fold it in and not cause this regression in
> > first place...?
>
> I have no objection to folding it in, but then doesn't that remove
> credit for Fengguang Wu's test system for finding and reporting errors?
I added entry for that and retiained credit to him.
--
~Vinod