2019-05-14 14:42:43

by Laurentiu Tudor

[permalink] [raw]
Subject: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

From: Laurentiu Tudor <[email protected]>

For HCs that have local memory, replace the current DMA API usage
with a genalloc generic allocator to manage the mappings for these
devices.
This is in preparation for dropping the existing "coherent" dma
mem declaration APIs. Current implementation was relying on a short
circuit in the DMA API that in the end, was acting as an allocator
for these type of devices.

Only compiled tested, so any volunteers willing to test are most welcome.

Thank you!

For context, see thread here: https://lkml.org/lkml/2019/4/22/357

Changes in v2:
- use genalloc also in core usb (based on comment from Robin)
- moved genpool decl to usb_hcd to be accesible from core usb

Laurentiu Tudor (3):
USB: use genalloc for USB HCs with local memory
usb: host: ohci-sm501: init genalloc for local memory
usb: host: ohci-tmio: init genalloc for local memory

drivers/usb/core/buffer.c | 12 ++++++-
drivers/usb/host/ohci-hcd.c | 23 +++++++++++---
drivers/usb/host/ohci-sm501.c | 60 +++++++++++++++++++----------------
drivers/usb/host/ohci-tmio.c | 23 +++++++++-----
include/linux/usb/hcd.h | 3 ++
5 files changed, 80 insertions(+), 41 deletions(-)

--
2.17.1


2019-05-14 15:19:14

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

+Fredrik, Juergen

On 14/05/2019 15:38, [email protected] wrote:
> From: Laurentiu Tudor <[email protected]>
>
> For HCs that have local memory, replace the current DMA API usage
> with a genalloc generic allocator to manage the mappings for these
> devices.
> This is in preparation for dropping the existing "coherent" dma
> mem declaration APIs. Current implementation was relying on a short
> circuit in the DMA API that in the end, was acting as an allocator
> for these type of devices.
>
> Only compiled tested, so any volunteers willing to test are most welcome.

I recall an out-of-tree PlayStation 2 OHCI driver being another
HCD_LOCAL_MEM user - if Fredrik and Juergen are still active on that,
hopefully they might be able to comment on whether this approach can
work for them too. Patchwork link just in case:
https://patchwork.kernel.org/project/linux-usb/list/?series=117433

Robin.

>
> Thank you!
>
> For context, see thread here: https://lkml.org/lkml/2019/4/22/357
>
> Changes in v2:
> - use genalloc also in core usb (based on comment from Robin)
> - moved genpool decl to usb_hcd to be accesible from core usb
>
> Laurentiu Tudor (3):
> USB: use genalloc for USB HCs with local memory
> usb: host: ohci-sm501: init genalloc for local memory
> usb: host: ohci-tmio: init genalloc for local memory
>
> drivers/usb/core/buffer.c | 12 ++++++-
> drivers/usb/host/ohci-hcd.c | 23 +++++++++++---
> drivers/usb/host/ohci-sm501.c | 60 +++++++++++++++++++----------------
> drivers/usb/host/ohci-tmio.c | 23 +++++++++-----
> include/linux/usb/hcd.h | 3 ++
> 5 files changed, 80 insertions(+), 41 deletions(-)
>

2019-05-14 18:40:42

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Thanks Robin!

> > For HCs that have local memory, replace the current DMA API usage
> > with a genalloc generic allocator to manage the mappings for these
> > devices.
> > This is in preparation for dropping the existing "coherent" dma
> > mem declaration APIs. Current implementation was relying on a short
> > circuit in the DMA API that in the end, was acting as an allocator
> > for these type of devices.
> >
> > Only compiled tested, so any volunteers willing to test are most welcome.
>
> I recall an out-of-tree PlayStation 2 OHCI driver being another
> HCD_LOCAL_MEM user - if Fredrik and Juergen are still active on that,
> hopefully they might be able to comment on whether this approach can
> work for them too. Patchwork link just in case:
> https://patchwork.kernel.org/project/linux-usb/list/?series=117433

True. In fact I'm preparing a patch submission for this PS2 OHCI driver,
along with about a hundred other patches (unrelated to the USB subsystem).
Hopefully in a few weeks. My patches are currently on top of v5.0. What
branch/version is recommended to try this DMA update?

Here is the v5.0.11 PS2 OHCI driver, for reference:

https://github.com/frno7/linux/blob/ps2-v5.0.11/drivers/usb/host/ohci-ps2.c

Fredrik

2019-05-15 10:48:03

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hello,

On 14.05.2019 21:29, Fredrik Noring wrote:
> Thanks Robin!
>
>>> For HCs that have local memory, replace the current DMA API usage
>>> with a genalloc generic allocator to manage the mappings for these
>>> devices.
>>> This is in preparation for dropping the existing "coherent" dma
>>> mem declaration APIs. Current implementation was relying on a short
>>> circuit in the DMA API that in the end, was acting as an allocator
>>> for these type of devices.
>>>
>>> Only compiled tested, so any volunteers willing to test are most welcome.
>>
>> I recall an out-of-tree PlayStation 2 OHCI driver being another
>> HCD_LOCAL_MEM user - if Fredrik and Juergen are still active on that,
>> hopefully they might be able to comment on whether this approach can
>> work for them too. Patchwork link just in case:
>> https://patchwork.kernel.org/project/linux-usb/list/?series=117433
>
> True. In fact I'm preparing a patch submission for this PS2 OHCI driver,
> along with about a hundred other patches (unrelated to the USB subsystem).
> Hopefully in a few weeks. My patches are currently on top of v5.0. What
> branch/version is recommended to try this DMA update?

I think that any recent kernel will do, so I'd say your current branch
should be fine.

> Here is the v5.0.11 PS2 OHCI driver, for reference:
>
> https://github.com/frno7/linux/blob/ps2-v5.0.11/drivers/usb/host/ohci-ps2.c
Please note that the driver will need to be updated, see here for an
example:

https://patchwork.kernel.org/patch/10943105/

---
Best Regards, Lauretniu

2019-05-15 16:30:44

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Lauretniu,

> I think that any recent kernel will do, so I'd say your current branch
> should be fine.

The kernel oopses with "unable to handle kernel paging request at virtual
address 000aba0b" in hcd_alloc_coherent via usb_hcd_map_urb_for_dma. This
relates to patch 2/3 that I didn't quite understand, where

- retval = dma_declare_coherent_memory(dev, mem->start,
- mem->start - mem->parent->start,
- resource_size(mem));

becomes

+ retval = gen_pool_add_virt(hcd->localmem_pool,
+ (unsigned long)mem->start -
+ mem->parent->start,
+ mem->start, resource_size(mem),

so that arguments two and three switch places. Given

int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
dma_addr_t device_addr, size_t size);

and

int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys
size_t size, int nid)

it seems that the device address (dma_addr_t device_addr) now becomes a
virtual address (unsigned long virt). Is that intended?

My corresponding patch is below for reference. I applied your 1/3 patch
to test it.

Fredrik

diff --git a/drivers/usb/host/ohci-ps2.c b/drivers/usb/host/ohci-ps2.c
--- a/drivers/usb/host/ohci-ps2.c
+++ b/drivers/usb/host/ohci-ps2.c
@@ -7,6 +7,7 @@
*/

#include <linux/dma-mapping.h>
+#include <linux/genalloc.h>
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/usb.h>
@@ -92,12 +93,12 @@ static irqreturn_t ohci_ps2_irq(struct usb_hcd *hcd)
return ohci_irq(hcd); /* Call normal IRQ handler. */
}

-static int iopheap_alloc_coherent(struct platform_device *pdev,
- size_t size, int flags)
+static int iopheap_alloc_coherent(struct platform_device *pdev, size_t size)
{
struct device *dev = &pdev->dev;
struct usb_hcd *hcd = platform_get_drvdata(pdev);
struct ps2_hcd *ps2priv = hcd_to_priv(hcd);
+ int err;

if (ps2priv->iop_dma_addr != 0)
return 0;
@@ -112,28 +113,37 @@ static int iopheap_alloc_coherent(struct platform_device *pdev,
return -ENOMEM;
}

- if (dma_declare_coherent_memory(dev,
- iop_bus_to_phys(ps2priv->iop_dma_addr),
- ps2priv->iop_dma_addr, size, flags)) {
- dev_err(dev, "dma_declare_coherent_memory failed\n");
- iop_free(ps2priv->iop_dma_addr);
- ps2priv->iop_dma_addr = 0;
- return -ENOMEM;
+ hcd->localmem_pool = devm_gen_pool_create(dev,
+ PAGE_SHIFT, dev_to_node(dev), DRV_NAME);
+ if (IS_ERR(hcd->localmem_pool)) {
+ err = PTR_ERR(hcd->localmem_pool);
+ goto out_err;
+ }
+
+ err = gen_pool_add_virt(hcd->localmem_pool, ps2priv->iop_dma_addr,
+ iop_bus_to_phys(ps2priv->iop_dma_addr), size, dev_to_node(dev));
+ if (err < 0) {
+ dev_err(dev, "gen_pool_add_virt failed with %d\n", err);
+ goto out_err;
}

return 0;
+
+out_err:
+ iop_free(ps2priv->iop_dma_addr);
+ ps2priv->iop_dma_addr = 0;
+
+ return err;
}

static void iopheap_free_coherent(struct platform_device *pdev)
{
- struct device *dev = &pdev->dev;
struct usb_hcd *hcd = platform_get_drvdata(pdev);
struct ps2_hcd *ps2priv = hcd_to_priv(hcd);

if (ps2priv->iop_dma_addr == 0)
return;

- dma_release_declared_memory(dev);
iop_free(ps2priv->iop_dma_addr);
ps2priv->iop_dma_addr = 0;
}
@@ -177,8 +187,7 @@ static int ohci_hcd_ps2_probe(struct platform_device *pdev)
goto err;
}

- ret = iopheap_alloc_coherent(pdev,
- DMA_BUFFER_SIZE, DMA_MEMORY_EXCLUSIVE);
+ ret = iopheap_alloc_coherent(pdev, DMA_BUFFER_SIZE);
if (ret != 0)
goto err_alloc_coherent;

2019-05-16 09:37:40

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Fredrik,

Thanks very much for taking the time to look into this. Please see
comments inline.

On 15.05.2019 19:28, Fredrik Noring wrote:
> Hi Lauretniu,
>
>> I think that any recent kernel will do, so I'd say your current branch
>> should be fine.
>
> The kernel oopses with "unable to handle kernel paging request at virtual
> address 000aba0b" in hcd_alloc_coherent via usb_hcd_map_urb_for_dma.

By any chance, does this address looks like the dma_addr that the device
should target?

> This relates to patch 2/3 that I didn't quite understand, where
>
> - retval = dma_declare_coherent_memory(dev, mem->start,
> - mem->start - mem->parent->start,
> - resource_size(mem));
>
> becomes
>
> + retval = gen_pool_add_virt(hcd->localmem_pool,
> + (unsigned long)mem->start -
> + mem->parent->start,
> + mem->start, resource_size(mem),
>
> so that arguments two and three switch places. Given
>
> int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
> dma_addr_t device_addr, size_t size);
>
> and
>
> int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys
> size_t size, int nid)
>
> it seems that the device address (dma_addr_t device_addr) now becomes a
> virtual address (unsigned long virt). Is that intended?

Actually, I think I'm misusing genalloc and also it appears that i need
to add a mapping on the phys address. So my plan is to change the
"unsigned long virt" to be the void * returned by the mapping operation
and the phys_addr_t be the dma_addr_t. I'll return with a patch.

Regarding the usage of unsigned long in genalloc, yeah seems a bit
strange but by looking at other users in the kernel it appears that
that's the design.

---
Best Regards, Laurentiu

> My corresponding patch is below for reference. I applied your 1/3 patch
> to test it.
>
> Fredrik
>
> diff --git a/drivers/usb/host/ohci-ps2.c b/drivers/usb/host/ohci-ps2.c
> --- a/drivers/usb/host/ohci-ps2.c
> +++ b/drivers/usb/host/ohci-ps2.c
> @@ -7,6 +7,7 @@
> */
>
> #include <linux/dma-mapping.h>
> +#include <linux/genalloc.h>
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/usb.h>
> @@ -92,12 +93,12 @@ static irqreturn_t ohci_ps2_irq(struct usb_hcd *hcd)
> return ohci_irq(hcd); /* Call normal IRQ handler. */
> }
>
> -static int iopheap_alloc_coherent(struct platform_device *pdev,
> - size_t size, int flags)
> +static int iopheap_alloc_coherent(struct platform_device *pdev, size_t size)
> {
> struct device *dev = &pdev->dev;
> struct usb_hcd *hcd = platform_get_drvdata(pdev);
> struct ps2_hcd *ps2priv = hcd_to_priv(hcd);
> + int err;
>
> if (ps2priv->iop_dma_addr != 0)
> return 0;
> @@ -112,28 +113,37 @@ static int iopheap_alloc_coherent(struct platform_device *pdev,
> return -ENOMEM;
> }
>
> - if (dma_declare_coherent_memory(dev,
> - iop_bus_to_phys(ps2priv->iop_dma_addr),
> - ps2priv->iop_dma_addr, size, flags)) {
> - dev_err(dev, "dma_declare_coherent_memory failed\n");
> - iop_free(ps2priv->iop_dma_addr);
> - ps2priv->iop_dma_addr = 0;
> - return -ENOMEM;
> + hcd->localmem_pool = devm_gen_pool_create(dev,
> + PAGE_SHIFT, dev_to_node(dev), DRV_NAME);
> + if (IS_ERR(hcd->localmem_pool)) {
> + err = PTR_ERR(hcd->localmem_pool);
> + goto out_err;
> + }
> +
> + err = gen_pool_add_virt(hcd->localmem_pool, ps2priv->iop_dma_addr,
> + iop_bus_to_phys(ps2priv->iop_dma_addr), size, dev_to_node(dev));
> + if (err < 0) {
> + dev_err(dev, "gen_pool_add_virt failed with %d\n", err);
> + goto out_err;
> }
>
> return 0;
> +
> +out_err:
> + iop_free(ps2priv->iop_dma_addr);
> + ps2priv->iop_dma_addr = 0;
> +
> + return err;
> }
>
> static void iopheap_free_coherent(struct platform_device *pdev)
> {
> - struct device *dev = &pdev->dev;
> struct usb_hcd *hcd = platform_get_drvdata(pdev);
> struct ps2_hcd *ps2priv = hcd_to_priv(hcd);
>
> if (ps2priv->iop_dma_addr == 0)
> return;
>
> - dma_release_declared_memory(dev);
> iop_free(ps2priv->iop_dma_addr);
> ps2priv->iop_dma_addr = 0;
> }
> @@ -177,8 +187,7 @@ static int ohci_hcd_ps2_probe(struct platform_device *pdev)
> goto err;
> }
>
> - ret = iopheap_alloc_coherent(pdev,
> - DMA_BUFFER_SIZE, DMA_MEMORY_EXCLUSIVE);
> + ret = iopheap_alloc_coherent(pdev, DMA_BUFFER_SIZE);
> if (ret != 0)
> goto err_alloc_coherent;
>
>

2019-05-16 11:51:40

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Fredrick,

On 15.05.2019 19:28, Fredrik Noring wrote:
> Hi Lauretniu,
>
>> I think that any recent kernel will do, so I'd say your current branch
>> should be fine.
>
> The kernel oopses with "unable to handle kernel paging request at virtual
> address 000aba0b" in hcd_alloc_coherent via usb_hcd_map_urb_for_dma. This
> relates to patch 2/3 that I didn't quite understand, where
>
> - retval = dma_declare_coherent_memory(dev, mem->start,
> - mem->start - mem->parent->start,
> - resource_size(mem));
>
> becomes
>
> + retval = gen_pool_add_virt(hcd->localmem_pool,
> + (unsigned long)mem->start -
> + mem->parent->start,
> + mem->start, resource_size(mem),
>
> so that arguments two and three switch places. Given
>
> int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
> dma_addr_t device_addr, size_t size);
>
> and
>
> int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys
> size_t size, int nid)
>
> it seems that the device address (dma_addr_t device_addr) now becomes a
> virtual address (unsigned long virt). Is that intended?
>
> My corresponding patch is below for reference. I applied your 1/3 patch
> to test it.

I took your code, added the missing mapping and placed it in a patch.
Please see attached (compile tested only).

---
Thanks & Best Regards, Laurentiu


Attachments:
0001-usb-host-ohci-ps2-init-genalloc-for-local-memory.patch (3.35 kB)
0001-usb-host-ohci-ps2-init-genalloc-for-local-memory.patch

2019-05-16 13:57:56

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Laurentiu,

> > The kernel oopses with "unable to handle kernel paging request at virtual
> > address 000aba0b" in hcd_alloc_coherent via usb_hcd_map_urb_for_dma.
>
> By any chance, does this address looks like the dma_addr that the device
> should target?

Yes, that looks like a typical device address suitable for its DMA.

> Actually, I think I'm misusing genalloc and also it appears that i need
> to add a mapping on the phys address. So my plan is to change the
> "unsigned long virt" to be the void * returned by the mapping operation
> and the phys_addr_t be the dma_addr_t. I'll return with a patch.

Great, I'm happy to test your next patch revision.

Fredrik

2019-05-16 15:18:02

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Laurentiu,

> I took your code, added the missing mapping and placed it in a patch.
> Please see attached (compile tested only).

Thanks! Unfortunately, the OHCI fails with errors such as

usb 1-1: device descriptor read/64, error -12

that I tracked down to the calls

hub_port_init
-> usb_control_msg
-> usb_internal_control_msg
-> usb_start_wait_urb
-> usb_submit_urb
-> usb_hcd_submit_urb
-> hcd->driver->urb_enqueue
-> ohci_urb_enqueue
-> ed_get
-> ed_alloc
-> dma_pool_zalloc
-> dma_pool_alloc
-> pool_alloc_page,

which returns NULL. Then I noticed

/* pool_alloc_page() might sleep, so temporarily drop &pool->lock */

that might be a problem considering that the HCD handles pool memory in
IRQ handlers, for instance:

do_IRQ
-> generic_handle_irq
-> handle_level_irq
-> handle_irq_event
-> handle_irq_event_percpu
-> __handle_irq_event_percpu
-> usb_hcd_irq
-> ohci_irq
-> ohci_work
-> finish_urb
-> __usb_hcd_giveback_urb
-> usb_hcd_unmap_urb_for_dma
-> hcd_buffer_free

Also, DMA_BUFFER_SIZE in ohci-ps2.c is only 256 KiB in total. Is the new
pool implementation at least as efficient as the previous one?

Fredrik

2019-05-17 12:21:25

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Fredrik,

On 16.05.2019 18:15, Fredrik Noring wrote:
> Hi Laurentiu,
>
>> I took your code, added the missing mapping and placed it in a patch.
>> Please see attached (compile tested only).
>
> Thanks! Unfortunately, the OHCI fails with errors such as
>
> usb 1-1: device descriptor read/64, error -12

Alright, at least the crash went away. :-)

> that I tracked down to the calls
>
> hub_port_init
> -> usb_control_msg
> -> usb_internal_control_msg
> -> usb_start_wait_urb
> -> usb_submit_urb
> -> usb_hcd_submit_urb
> -> hcd->driver->urb_enqueue
> -> ohci_urb_enqueue
> -> ed_get
> -> ed_alloc
> -> dma_pool_zalloc
> -> dma_pool_alloc
> -> pool_alloc_page,
>
> which returns NULL. Then I noticed
>
> /* pool_alloc_page() might sleep, so temporarily drop &pool->lock */
> that might be a problem considering that the HCD handles pool memory in
> IRQ handlers, for instance:
>
> do_IRQ
> -> generic_handle_irq
> -> handle_level_irq
> -> handle_irq_event
> -> handle_irq_event_percpu
> -> __handle_irq_event_percpu
> -> usb_hcd_irq
> -> ohci_irq
> -> ohci_work
> -> finish_urb
> -> __usb_hcd_giveback_urb
> -> usb_hcd_unmap_urb_for_dma
> -> hcd_buffer_free


That looks indeed worrisome but I'd expect that it's not related to
these genalloc changes that I'm working on. I'll try to look into it but
as I'm not familiar with the area maybe others will comment.

> Also, DMA_BUFFER_SIZE in ohci-ps2.c is only 256 KiB in total. Is the new
> pool implementation at least as efficient as the previous one?

I don't see any obvious reasons for genalloc to be less efficient.

One thing I noticed between genalloc and the original implementation is
that previously the device memory was mapped with memremap() while with
genalloc I'm doing a devm_ioremap(). I can't think of a relevant
difference except that memremap() maps with WC buth maybe there are
other arch specific subtleties. I've attached a new version of the
ohci-ps2 patch replacing the devm_ioremap() with devm_memremap() to
better match the original implementation. Please test if you find some time.

---
Thanks & Best Regards, Laurentiu


Attachments:
v2-0001-usb-host-ohci-ps2-init-genalloc-for-local-memory.patch (3.37 kB)
v2-0001-usb-host-ohci-ps2-init-genalloc-for-local-memory.patch

2019-05-17 18:36:25

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Laurentiu,

> > that I tracked down to the calls
> >
> > hub_port_init
> > -> usb_control_msg
> > -> usb_internal_control_msg
> > -> usb_start_wait_urb
> > -> usb_submit_urb
> > -> usb_hcd_submit_urb
> > -> hcd->driver->urb_enqueue
> > -> ohci_urb_enqueue
> > -> ed_get
> > -> ed_alloc

I found that the generic OHCI driver takes a wrong turn here, in ed_alloc,
and eventually also in td_alloc. Fortunately, your patch can be easily
extended to fix them as well. Please see attached patch below.

With that, the OHCI seems to work as expected with HCD_LOCAL_MEM. :)

Would you like me to submit gen_pool_dma_zalloc as a separate patch?

> That looks indeed worrisome but I'd expect that it's not related to
> these genalloc changes that I'm working on. I'll try to look into it but
> as I'm not familiar with the area maybe others will comment.

It seems appropriate to verify that the IRQ will not end up sleeping
somewhere in the pool.

> > Also, DMA_BUFFER_SIZE in ohci-ps2.c is only 256 KiB in total. Is the new
> > pool implementation at least as efficient as the previous one?
>
> I don't see any obvious reasons for genalloc to be less efficient.

OK, good.

> One thing I noticed between genalloc and the original implementation is
> that previously the device memory was mapped with memremap() while with
> genalloc I'm doing a devm_ioremap(). I can't think of a relevant
> difference except that memremap() maps with WC buth maybe there are
> other arch specific subtleties. I've attached a new version of the
> ohci-ps2 patch replacing the devm_ioremap() with devm_memremap() to
> better match the original implementation. Please test if you find some time.

You're right, it makes no difference.

Fredrik

diff --git a/drivers/usb/host/ohci-mem.c b/drivers/usb/host/ohci-mem.c
--- a/drivers/usb/host/ohci-mem.c
+++ b/drivers/usb/host/ohci-mem.c
@@ -82,10 +82,14 @@ dma_to_td (struct ohci_hcd *hc, dma_addr_t td_dma)
static struct td *
td_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
{
+ struct usb_hcd *hcd = ohci_to_hcd(hc);
dma_addr_t dma;
struct td *td;

- td = dma_pool_zalloc (hc->td_cache, mem_flags, &dma);
+ if (hcd->driver->flags & HCD_LOCAL_MEM)
+ td = gen_pool_dma_zalloc (hcd->localmem_pool, sizeof(*td), &dma);
+ else
+ td = dma_pool_zalloc (hc->td_cache, mem_flags, &dma);
if (td) {
/* in case hc fetches it, make it look dead */
td->hwNextTD = cpu_to_hc32 (hc, dma);
@@ -98,6 +102,7 @@ td_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
static void
td_free (struct ohci_hcd *hc, struct td *td)
{
+ struct usb_hcd *hcd = ohci_to_hcd(hc);
struct td **prev = &hc->td_hash [TD_HASH_FUNC (td->td_dma)];

while (*prev && *prev != td)
@@ -106,7 +111,11 @@ td_free (struct ohci_hcd *hc, struct td *td)
*prev = td->td_hash;
else if ((td->hwINFO & cpu_to_hc32(hc, TD_DONE)) != 0)
ohci_dbg (hc, "no hash for td %p\n", td);
- dma_pool_free (hc->td_cache, td, td->td_dma);
+
+ if (hcd->driver->flags & HCD_LOCAL_MEM)
+ gen_pool_free (hcd->localmem_pool, (unsigned long)td, sizeof(*td));
+ else
+ dma_pool_free (hc->td_cache, td, td->td_dma);
}

/*-------------------------------------------------------------------------*/
@@ -115,10 +124,14 @@ td_free (struct ohci_hcd *hc, struct td *td)
static struct ed *
ed_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
{
+ struct usb_hcd *hcd = ohci_to_hcd(hc);
dma_addr_t dma;
struct ed *ed;

- ed = dma_pool_zalloc (hc->ed_cache, mem_flags, &dma);
+ if (hcd->driver->flags & HCD_LOCAL_MEM)
+ ed = gen_pool_dma_zalloc (hcd->localmem_pool, sizeof(*ed), &dma);
+ else
+ ed = dma_pool_zalloc (hc->ed_cache, mem_flags, &dma);
if (ed) {
INIT_LIST_HEAD (&ed->td_list);
ed->dma = dma;
@@ -129,6 +142,11 @@ ed_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
static void
ed_free (struct ohci_hcd *hc, struct ed *ed)
{
- dma_pool_free (hc->ed_cache, ed, ed->dma);
+ struct usb_hcd *hcd = ohci_to_hcd(hc);
+
+ if (hcd->driver->flags & HCD_LOCAL_MEM)
+ gen_pool_free (hcd->localmem_pool, (unsigned long)ed, sizeof(*ed));
+ else
+ dma_pool_free (hc->ed_cache, ed, ed->dma);
}

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,8 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
genpool_algo_t algo, void *data);
extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
+extern void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
diff --git a/lib/genalloc.c b/lib/genalloc.c
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -362,6 +362,17 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
}
EXPORT_SYMBOL(gen_pool_dma_alloc);

+void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
+{
+ void *vaddr = gen_pool_dma_alloc(pool, size, dma);
+
+ if (vaddr)
+ memset(vaddr, 0, size);
+
+ return vaddr;
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc);
+
/**
* gen_pool_free - free allocated special memory back to the pool
* @pool: pool to free to

2019-05-20 12:17:40

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hello Fredrik,

On 17.05.2019 20:41, Fredrik Noring wrote:
> Hi Laurentiu,
>
>>> that I tracked down to the calls
>>>
>>> hub_port_init
>>> -> usb_control_msg
>>> -> usb_internal_control_msg
>>> -> usb_start_wait_urb
>>> -> usb_submit_urb
>>> -> usb_hcd_submit_urb
>>> -> hcd->driver->urb_enqueue
>>> -> ohci_urb_enqueue
>>> -> ed_get
>>> -> ed_alloc
>
> I found that the generic OHCI driver takes a wrong turn here, in ed_alloc,
> and eventually also in td_alloc. Fortunately, your patch can be easily
> extended to fix them as well. Please see attached patch below.
>
> With that, the OHCI seems to work as expected with HCD_LOCAL_MEM. :)

Wow, that's excellent news! Thanks a lot for looking into this.
Are you ok if I add your Signed-Off-by and maybe also Tested-by in the
first patch of the series?
As a side note, I plan to add a new HCD function and name it something
like hcd_setup_local_mem() that would take care of setting up the
genalloc for drivers.

> Would you like me to submit gen_pool_dma_zalloc as a separate patch?

Yes, I think it would make sense to put the new API in a distinct patch.
I think we can either include it in the next version of the patch series
or you can submit separately and I'll mention it as dependency for this
patch series, however you prefer.


---
Thanks & Best Regards, Laurentiu

2019-05-20 18:17:52

by Fredrik Noring

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

Hi Laurentiu,

> Wow, that's excellent news! Thanks a lot for looking into this.

You are welcome!

> Are you ok if I add your Signed-Off-by and maybe also Tested-by in the
> first patch of the series?

Yes, but I have two comments:

1. ohci_mem_init() allocates two DMA pools that are no longer relevant, so
it seems appropriate to assign NULL to ohci->td_cache and ohci->ed_cache,
and document this exception in struct ohci_hcd. Unless something more
elegant can be done, of course.

2. A device address is supplied as phys_addr_t phys to gen_pool_add_virt().
This seems to work in this particular DMA application, but there will be
problems if someone does phys_to_virt() or suchlike. Can this be improved
or clearly explained? gen_pool_virt_to_phys() searches in address ranges,
for example, so it appears the implementation uses phys quite carefully.

> As a side note, I plan to add a new HCD function and name it something
> like hcd_setup_local_mem() that would take care of setting up the
> genalloc for drivers.

Good! Then I suppose the HCD_LOCAL_MEM assignment can be removed from the
drivers too? Like this one:

ohci_ps2_hc_driver.flags |= HCD_LOCAL_MEM;

Regarding the previous HCD IRQ question, lib/genalloc.c says that

It is safe to use the allocator in NMI handlers and other special
unblockable contexts that could otherwise deadlock on locks. This
is implemented by using atomic operations and retries on any
conflicts. The disadvantage is that there may be livelocks in
extreme cases. For better scalability, one allocator can be used
for each CPU.

so it appears to be good enough for USB purposes.

> Yes, I think it would make sense to put the new API in a distinct patch.
> I think we can either include it in the next version of the patch series
> or you can submit separately and I'll mention it as dependency for this
> patch series, however you prefer.

Please find the patch below and if possible include it in your patch series.

Fredrik

lib/genalloc.c: Add gen_pool_dma_zalloc() for zeroed DMA allocations

gen_pool_dma_zalloc() is a zeroed memory variant of gen_pool_dma_alloc().
Document return values of both, and indicate NULL as a "%NULL" constant.

Signed-off-by: Fredrik Noring <[email protected]>

--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,8 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
genpool_algo_t algo, void *data);
extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
+extern void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -337,12 +337,14 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
* gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
* @pool: pool to allocate from
* @size: number of bytes to allocate from the pool
- * @dma: dma-view physical address return value. Use NULL if unneeded.
+ * @dma: dma-view physical address return value. Use %NULL if unneeded.
*
* Allocate the requested number of bytes from the specified pool.
* Uses the pool allocation function (with first-fit algorithm by default).
* Can not be used in NMI handler on architectures without
* NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated memory, or %NULL on failure
*/
void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
{
@@ -362,6 +364,30 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
}
EXPORT_SYMBOL(gen_pool_dma_alloc);

+/**
+ * gen_pool_dma_zalloc - allocate special zeroed memory from the pool for DMA usage
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: dma-view physical address return value. Use %NULL if unneeded.
+ *
+ * Allocate the requested number of zeroed bytes from the specified pool.
+ * Uses the pool allocation function (with first-fit algorithm by default).
+ * Can not be used in NMI handler on architectures without
+ * NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated zeroed memory, or %NULL on failure
+ */
+void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
+{
+ void *vaddr = gen_pool_dma_alloc(pool, size, dma);
+
+ if (vaddr)
+ memset(vaddr, 0, size);
+
+ return vaddr;
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc);
+
/**
* gen_pool_free - free allocated special memory back to the pool
* @pool: pool to free to

2019-05-21 10:34:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] prerequisites for device reserved local mem rework

On Mon, May 20, 2019 at 05:41:56PM +0200, Fredrik Noring wrote:
> 2. A device address is supplied as phys_addr_t phys to gen_pool_add_virt().
> This seems to work in this particular DMA application, but there will be
> problems if someone does phys_to_virt() or suchlike. Can this be improved
> or clearly explained? gen_pool_virt_to_phys() searches in address ranges,
> for example, so it appears the implementation uses phys quite carefully.

Well, it is a physical address, but not one that has a kernel mapping.
So phys_addr_t seems right here, but an additional comment explaining it
is not in the kernel mapping never hurts.