2019-05-29 10:30:36

by Laurentiu Tudor

[permalink] [raw]
Subject: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

From: Laurentiu Tudor <[email protected]>

In preparation for dropping the existing "coherent" dma mem declaration
APIs, replace the current dma_declare_coherent_memory() based mechanism
with the creation of a genalloc pool that will be used in the OHCI
subsystem as replacement for the DMA APIs.

For context, see thread here: https://lkml.org/lkml/2019/4/22/357

Signed-off-by: Laurentiu Tudor <[email protected]>
---
drivers/usb/host/ohci-sm501.c | 47 +++++++++++++++--------------------
1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/drivers/usb/host/ohci-sm501.c b/drivers/usb/host/ohci-sm501.c
index c26228c25f99..b710e100aec9 100644
--- a/drivers/usb/host/ohci-sm501.c
+++ b/drivers/usb/host/ohci-sm501.c
@@ -110,40 +110,18 @@ static int ohci_hcd_sm501_drv_probe(struct platform_device *pdev)
goto err0;
}

- /* The sm501 chip is equipped with local memory that may be used
- * by on-chip devices such as the video controller and the usb host.
- * This driver uses dma_declare_coherent_memory() to make sure
- * usb allocations with dma_alloc_coherent() allocate from
- * this local memory. The dma_handle returned by dma_alloc_coherent()
- * will be an offset starting from 0 for the first local memory byte.
- *
- * So as long as data is allocated using dma_alloc_coherent() all is
- * fine. This is however not always the case - buffers may be allocated
- * using kmalloc() - so the usb core needs to be told that it must copy
- * data into our local memory if the buffers happen to be placed in
- * regular memory. The HCD_LOCAL_MEM flag does just that.
- */
-
- retval = dma_declare_coherent_memory(dev, mem->start,
- mem->start - mem->parent->start,
- resource_size(mem));
- if (retval) {
- dev_err(dev, "cannot declare coherent memory\n");
- goto err1;
- }
-
/* allocate, reserve and remap resources for registers */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (res == NULL) {
dev_err(dev, "no resource definition for registers\n");
retval = -ENOENT;
- goto err2;
+ goto err1;
}

hcd = usb_create_hcd(driver, &pdev->dev, dev_name(&pdev->dev));
if (!hcd) {
retval = -ENOMEM;
- goto err2;
+ goto err1;
}

hcd->rsrc_start = res->start;
@@ -164,6 +142,24 @@ static int ohci_hcd_sm501_drv_probe(struct platform_device *pdev)

ohci_hcd_init(hcd_to_ohci(hcd));

+ /* The sm501 chip is equipped with local memory that may be used
+ * by on-chip devices such as the video controller and the usb host.
+ * This driver uses genalloc so that usb allocations with
+ * gen_pool_dma_alloc() allocate from this local memory. The dma_handle
+ * returned by gen_pool_dma_alloc() will be an offset starting from 0
+ * for the first local memory byte.
+ *
+ * So as long as data is allocated using gen_pool_dma_alloc() all is
+ * fine. This is however not always the case - buffers may be allocated
+ * using kmalloc() - so the usb core needs to be told that it must copy
+ * data into our local memory if the buffers happen to be placed in
+ * regular memory. The HCD_LOCAL_MEM flag does just that.
+ */
+
+ if (usb_hcd_setup_local_mem(hcd, mem->start,
+ mem->start - mem->parent->start,
+ resource_size(mem)) < 0)
+ goto err5;
retval = usb_add_hcd(hcd, irq, IRQF_SHARED);
if (retval)
goto err5;
@@ -181,8 +177,6 @@ static int ohci_hcd_sm501_drv_probe(struct platform_device *pdev)
release_mem_region(hcd->rsrc_start, hcd->rsrc_len);
err3:
usb_put_hcd(hcd);
-err2:
- dma_release_declared_memory(dev);
err1:
release_mem_region(mem->start, resource_size(mem));
err0:
@@ -197,7 +191,6 @@ static int ohci_hcd_sm501_drv_remove(struct platform_device *pdev)
usb_remove_hcd(hcd);
release_mem_region(hcd->rsrc_start, hcd->rsrc_len);
usb_put_hcd(hcd);
- dma_release_declared_memory(&pdev->dev);
mem = platform_get_resource(pdev, IORESOURCE_MEM, 1);
if (mem)
release_mem_region(mem->start, resource_size(mem));
--
2.17.1


2019-06-05 21:49:52

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

On Wed, May 29, 2019 at 01:28:41PM +0300, [email protected] wrote:
> From: Laurentiu Tudor <[email protected]>
>
> In preparation for dropping the existing "coherent" dma mem declaration
> APIs, replace the current dma_declare_coherent_memory() based mechanism
> with the creation of a genalloc pool that will be used in the OHCI
> subsystem as replacement for the DMA APIs.
>
> For context, see thread here: https://lkml.org/lkml/2019/4/22/357
>
> Signed-off-by: Laurentiu Tudor <[email protected]>

This patch results in usb access failures when trying to boot from the
sm501-usb controller on sh4 with qemu.

usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2172 flags 80700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
print_req_error: I/O error, dev sda, sector 474 flags 84700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
print_req_error: I/O error, dev sda, sector 730 flags 84700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2896 flags 84700

Qemu command line is:

The qemu command line is:

qemu-system-sh4 -M r2d \
-kernel ./arch/sh/boot/zImage \
-snapshot \
-usb -device usb-storage,drive=d0 \
-drive file=rootfs.ext2,if=none,id=d0,format=raw \
-append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait console=ttySC1,115200 earlycon=scif,mmio16,0xffe80000 noiotrap' \
-serial null -serial stdio \
-net nic,model=rtl8139 -net user -nographic -monitor null

Reverting this patch as well as "USB: drop HCD_LOCAL_MEM flag" fixes the
problem. Reverting "USB: drop HCD_LOCAL_MEM flag" alone does not help.

Guenter

2019-06-11 14:05:22

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

On Wed, Jun 05, 2019 at 02:46:22PM -0700, Guenter Roeck wrote:
> On Wed, May 29, 2019 at 01:28:41PM +0300, [email protected] wrote:
> > From: Laurentiu Tudor <[email protected]>
> >
> > In preparation for dropping the existing "coherent" dma mem declaration
> > APIs, replace the current dma_declare_coherent_memory() based mechanism
> > with the creation of a genalloc pool that will be used in the OHCI
> > subsystem as replacement for the DMA APIs.
> >
> > For context, see thread here: https://lkml.org/lkml/2019/4/22/357
> >
> > Signed-off-by: Laurentiu Tudor <[email protected]>
>
> This patch results in usb access failures when trying to boot from the
> sm501-usb controller on sh4 with qemu.
>
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 2172 flags 80700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 474 flags 84700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 730 flags 84700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 2896 flags 84700
>
> Qemu command line is:
>
> The qemu command line is:
>
> qemu-system-sh4 -M r2d \
> -kernel ./arch/sh/boot/zImage \
> -snapshot \
> -usb -device usb-storage,drive=d0 \
> -drive file=rootfs.ext2,if=none,id=d0,format=raw \
> -append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait console=ttySC1,115200 earlycon=scif,mmio16,0xffe80000 noiotrap' \
> -serial null -serial stdio \
> -net nic,model=rtl8139 -net user -nographic -monitor null
>
> Reverting this patch as well as "USB: drop HCD_LOCAL_MEM flag" fixes the
> problem. Reverting "USB: drop HCD_LOCAL_MEM flag" alone does not help.
>

This problem is still seen in next-20190611.
Has anyone actually tested this code ?

Guenter

2019-06-11 17:27:50

by Fredrik Noring

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Guenter,

> > This patch results in usb access failures when trying to boot from the
> > sm501-usb controller on sh4 with qemu.
> >
> > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
> > print_req_error: I/O error, dev sda, sector 2172 flags 80700
> > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
> > print_req_error: I/O error, dev sda, sector 474 flags 84700
> > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
> > print_req_error: I/O error, dev sda, sector 730 flags 84700
> > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
> > print_req_error: I/O error, dev sda, sector 2896 flags 84700
> >
> > Qemu command line is:
> >
> > The qemu command line is:
> >
> > qemu-system-sh4 -M r2d \
> > -kernel ./arch/sh/boot/zImage \
> > -snapshot \
> > -usb -device usb-storage,drive=d0 \
> > -drive file=rootfs.ext2,if=none,id=d0,format=raw \
> > -append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait console=ttySC1,115200 earlycon=scif,mmio16,0xffe80000 noiotrap' \
> > -serial null -serial stdio \
> > -net nic,model=rtl8139 -net user -nographic -monitor null
> >
> > Reverting this patch as well as "USB: drop HCD_LOCAL_MEM flag" fixes the
> > problem. Reverting "USB: drop HCD_LOCAL_MEM flag" alone does not help.
> >
>
> This problem is still seen in next-20190611.
> Has anyone actually tested this code ?

I tested patches 1, 2 and 5 with v5.0.19. Perhaps yet another part of the
OHCI subsystem allocates memory from the wrong pool? With some luck it is
relatively easy to trace backwards from the error messages to the point
where the memory is being allocated. One way to establish this is to
sprinkle printk around if-statements. There may be 10-20 levels of calls
including one or two indirect calls via pointers. Would you be able to do
that?

Fredrik

2019-06-11 22:03:02

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

On Tue, Jun 11, 2019 at 07:26:54PM +0200, Fredrik Noring wrote:
> Hi Guenter,
>
> > > This patch results in usb access failures when trying to boot from the
> > > sm501-usb controller on sh4 with qemu.
> > >
> > > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
> > > print_req_error: I/O error, dev sda, sector 2172 flags 80700
> > > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
> > > print_req_error: I/O error, dev sda, sector 474 flags 84700
> > > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
> > > print_req_error: I/O error, dev sda, sector 730 flags 84700
> > > usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> > > sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> > > sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
> > > print_req_error: I/O error, dev sda, sector 2896 flags 84700
> > >
> > > Qemu command line is:
> > >
> > > The qemu command line is:
> > >
> > > qemu-system-sh4 -M r2d \
> > > -kernel ./arch/sh/boot/zImage \
> > > -snapshot \
> > > -usb -device usb-storage,drive=d0 \
> > > -drive file=rootfs.ext2,if=none,id=d0,format=raw \
> > > -append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait console=ttySC1,115200 earlycon=scif,mmio16,0xffe80000 noiotrap' \
> > > -serial null -serial stdio \
> > > -net nic,model=rtl8139 -net user -nographic -monitor null
> > >
> > > Reverting this patch as well as "USB: drop HCD_LOCAL_MEM flag" fixes the
> > > problem. Reverting "USB: drop HCD_LOCAL_MEM flag" alone does not help.
> > >
> >
> > This problem is still seen in next-20190611.
> > Has anyone actually tested this code ?
>
> I tested patches 1, 2 and 5 with v5.0.19. Perhaps yet another part of the
> OHCI subsystem allocates memory from the wrong pool? With some luck it is
> relatively easy to trace backwards from the error messages to the point
> where the memory is being allocated. One way to establish this is to
> sprinkle printk around if-statements. There may be 10-20 levels of calls
> including one or two indirect calls via pointers. Would you be able to do
> that?
>

I don't think I'll have time to do that anytime soon. Not that I know what
exactly to look for in the first place.

Can you do that debugging yourself ? All you would need is a cross-compiler
(eg from kernel.org), qemu, and a working configuration (the root file
system doesn't really matter since the code doesn't get to the point of
loading it, but you can use [1]). For the configuration file, you can use
rts7751r2dplus_defconfig with CONFIG_CMDLINE and CONFIG_CMDLINE_OVERWRITE
removed.

Thanks,
Guenter

---
[1] https://github.com/groeck/linux-build-test/blob/master/rootfs/sh/rootfs.ext2.gz

2019-06-13 15:11:13

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Fredrik,

On 6/13/19 6:40 AM, Fredrik Noring wrote:
> Hi Guenter,
>
>> I don't think I'll have time to do that anytime soon. Not that I know what
>> exactly to look for in the first place.
>
> I can confirm that there is a problem with mass storage devices and these
> local memory patches.
>

Thanks for the confirmation. Do you see the problem only with the
ohci-sm501 driver or also with others ?

Thanks,
Guenter

2019-06-13 15:13:57

by Fredrik Noring

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Guenter,

> I don't think I'll have time to do that anytime soon. Not that I know what
> exactly to look for in the first place.

I can confirm that there is a problem with mass storage devices and these
local memory patches.

Fredrik

2019-06-13 15:36:46

by Fredrik Noring

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Guenter,

> Thanks for the confirmation. Do you see the problem only with the
> ohci-sm501 driver or also with others ?

All are likely affected, but it depends, because I believe the problem is
that the USB subsystem runs out of memory. Please try the attached patch!

The pool assumed 4096 byte page alignment for every allocation, which is
excessive given that many requests are for 16 and 32 bytes. In the patch
below, I have turned down the order to 5, which is good enough for the ED
and TD structures of the OHCI, but not enough for the HCCA that needs 256
byte alignment. With some luck, the WARN_ON_ONCE will not trigger in your
test, though. If it does, you may try to increase the order from 5 to 8.

I have observed strange things happen when the USB subsystem runs out of
memory. The mass storage drivers often seem to busy-wait on -ENOMEM,
consuming a lot of processor resources. It would be much more efficient
to sleep waiting for memory to become available.

In your case I suspect that allocation failures are not correctly
attributed. Certain kinds of temporary freezes may also occur, as the
various devices are reset due to host memory allocation errors.

Fredrik

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -3011,7 +3011,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
int err;
void __iomem *local_mem;

- hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, PAGE_SHIFT,
+ hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 5,
dev_to_node(hcd->self.sysdev),
dev_name(hcd->self.sysdev));
if (IS_ERR(hcd->localmem_pool))
diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -517,6 +517,7 @@ static int ohci_init (struct ohci_hcd *ohci)
GFP_KERNEL);
if (!ohci->hcca)
return -ENOMEM;
+ WARN_ON_ONCE(ohci->hcca_dma & 0xff);

if ((ret = ohci_mem_init (ohci)) < 0)
ohci_stop (hcd);

2019-06-13 18:06:47

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

On 6/13/19 8:34 AM, Fredrik Noring wrote:
> Hi Guenter,
>
>> Thanks for the confirmation. Do you see the problem only with the
>> ohci-sm501 driver or also with others ?
>
> All are likely affected, but it depends, because I believe the problem is
> that the USB subsystem runs out of memory. Please try the attached patch!
>
> The pool assumed 4096 byte page alignment for every allocation, which is
> excessive given that many requests are for 16 and 32 bytes. In the patch
> below, I have turned down the order to 5, which is good enough for the ED
> and TD structures of the OHCI, but not enough for the HCCA that needs 256
> byte alignment. With some luck, the WARN_ON_ONCE will not trigger in your
> test, though. If it does, you may try to increase the order from 5 to 8.
>

You are right, the patch below fixes the problem. I did not get the warning
with order==5. Nevertheless, I also tested with order==8; that works as well.

Thanks a lot for tracking this down!
Guenter

> I have observed strange things happen when the USB subsystem runs out of
> memory. The mass storage drivers often seem to busy-wait on -ENOMEM,
> consuming a lot of processor resources. It would be much more efficient
> to sleep waiting for memory to become available.
>
> In your case I suspect that allocation failures are not correctly
> attributed. Certain kinds of temporary freezes may also occur, as the
> various devices are reset due to host memory allocation errors.
> > Fredrik
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -3011,7 +3011,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
> int err;
> void __iomem *local_mem;
>
> - hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, PAGE_SHIFT,
> + hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 5,
> dev_to_node(hcd->self.sysdev),
> dev_name(hcd->self.sysdev));
> if (IS_ERR(hcd->localmem_pool))
> diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
> --- a/drivers/usb/host/ohci-hcd.c
> +++ b/drivers/usb/host/ohci-hcd.c
> @@ -517,6 +517,7 @@ static int ohci_init (struct ohci_hcd *ohci)
> GFP_KERNEL);
> if (!ohci->hcca)
> return -ENOMEM;
> + WARN_ON_ONCE(ohci->hcca_dma & 0xff);
>
> if ((ret = ohci_mem_init (ohci)) < 0)
> ohci_stop (hcd);
>

2019-06-14 14:30:23

by Fredrik Noring

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Guenter,

> You are right, the patch below fixes the problem. I did not get the warning
> with order==5. Nevertheless, I also tested with order==8; that works as well.
>
> Thanks a lot for tracking this down!

You are welcome, and thanks for your report!

This patch series needs some redesign, I think, because the problem you
reported would come back if one attaches two or more devices to the
system. Local memory devices are typically memory constrained and so it
has to be used efficiently. I believe there are four kinds of alignments
to consider when memory is allocated in the pool:

- 256 bytes for the host controller communication area (HCCA);
- 32 bytes for the general transfer descriptors (TDs);
- 16 bytes for the endpoint descriptors (EDs);
- buffer alignment for data.

Using the greatest common alignment for all is clearly an undesirable
regression. The TDs and EDs could have their own subpools, perhaps, as
they are abundant. There is only one instance of the HCCA.

As mentioned, the USB subsystem could be improved to properly report
allocation failures, and the logic to retry allocations could be more
efficient by avoiding polling loops.

Fredrik

2019-06-18 10:49:44

by Laurentiu Tudor

[permalink] [raw]
Subject: RE: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hello,

> -----Original Message-----
> From: Guenter Roeck <[email protected]> On Behalf Of Guenter Roeck
> Sent: Thursday, June 13, 2019 9:06 PM
>
> On 6/13/19 8:34 AM, Fredrik Noring wrote:
> > Hi Guenter,
> >
> >> Thanks for the confirmation. Do you see the problem only with the
> >> ohci-sm501 driver or also with others ?
> >
> > All are likely affected, but it depends, because I believe the problem
> is
> > that the USB subsystem runs out of memory. Please try the attached patch!
> >
> > The pool assumed 4096 byte page alignment for every allocation, which is
> > excessive given that many requests are for 16 and 32 bytes. In the patch
> > below, I have turned down the order to 5, which is good enough for the
> ED
> > and TD structures of the OHCI, but not enough for the HCCA that needs
> 256
> > byte alignment. With some luck, the WARN_ON_ONCE will not trigger in
> your
> > test, though. If it does, you may try to increase the order from 5 to 8.
> >
>
> You are right, the patch below fixes the problem. I did not get the
> warning
> with order==5. Nevertheless, I also tested with order==8; that works as
> well.

Sorry for the late reply, I was OOO for past week+ and many thanks for taking a look at this.
So if my understanding is correct, an order of PAGE_SHIFT is too large and leads to waste of memory and in the end to an out of memory condition. This leaves me wondering what a safe order value would be.
I'll try to look into the legacy dma coherent allocation code maybe I can gather some info on the subject.

---
Best Regards, Laurentiu

>
> > I have observed strange things happen when the USB subsystem runs out of
> > memory. The mass storage drivers often seem to busy-wait on -ENOMEM,
> > consuming a lot of processor resources. It would be much more efficient
> > to sleep waiting for memory to become available.
> >
> > In your case I suspect that allocation failures are not correctly
> > attributed. Certain kinds of temporary freezes may also occur, as the
> > various devices are reset due to host memory allocation errors.
> > > Fredrik
> >
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -3011,7 +3011,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd,
> phys_addr_t phys_addr,
> > int err;
> > void __iomem *local_mem;
> >
> > - hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev,
> PAGE_SHIFT,
> > + hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 5,
> > dev_to_node(hcd->self.sysdev),
> > dev_name(hcd->self.sysdev));
> > if (IS_ERR(hcd->localmem_pool))
> > diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
> > --- a/drivers/usb/host/ohci-hcd.c
> > +++ b/drivers/usb/host/ohci-hcd.c
> > @@ -517,6 +517,7 @@ static int ohci_init (struct ohci_hcd *ohci)
> > GFP_KERNEL);
> > if (!ohci->hcca)
> > return -ENOMEM;
> > + WARN_ON_ONCE(ohci->hcca_dma & 0xff);
> >
> > if ((ret = ohci_mem_init (ohci)) < 0)
> > ohci_stop (hcd);
> >

2019-06-24 06:42:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Can you send me the patch formally so that I can queue it up for the
dma-mapping tree?

2019-06-24 14:13:14

by Fredrik Noring

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

Hi Christoph,

> Can you send me the patch formally so that I can queue it up for the
> dma-mapping tree?

That patch would be detrimental to local memory devices, as previously
discussed, so I would like to suggest a much better approach, as shown below,
where allocations are aligned as required but not necessarily much more than
that.

Fredrik

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -3014,7 +3014,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
int err;
void __iomem *local_mem;

- hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, PAGE_SHIFT,
+ hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 4,
dev_to_node(hcd->self.sysdev),
dev_name(hcd->self.sysdev));
if (IS_ERR(hcd->localmem_pool))
diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -507,9 +507,9 @@ static int ohci_init (struct ohci_hcd *ohci)
ohci->prev_frame_no = IO_WATCHDOG_OFF;

if (hcd->localmem_pool)
- ohci->hcca = gen_pool_dma_alloc(hcd->localmem_pool,
+ ohci->hcca = gen_pool_dma_alloc_align(hcd->localmem_pool,
sizeof(*ohci->hcca),
- &ohci->hcca_dma);
+ &ohci->hcca_dma, 256);
else
ohci->hcca = dma_alloc_coherent(hcd->self.controller,
sizeof(*ohci->hcca),
diff --git a/drivers/usb/host/ohci-mem.c b/drivers/usb/host/ohci-mem.c
--- a/drivers/usb/host/ohci-mem.c
+++ b/drivers/usb/host/ohci-mem.c
@@ -94,7 +94,8 @@ td_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
struct usb_hcd *hcd = ohci_to_hcd(hc);

if (hcd->localmem_pool)
- td = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*td), &dma);
+ td = gen_pool_dma_zalloc_align(hcd->localmem_pool,
+ sizeof(*td), &dma, 32);
else
td = dma_pool_zalloc(hc->td_cache, mem_flags, &dma);
if (td) {
@@ -137,7 +138,8 @@ ed_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
struct usb_hcd *hcd = ohci_to_hcd(hc);

if (hcd->localmem_pool)
- ed = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*ed), &dma);
+ ed = gen_pool_dma_zalloc_align(hcd->localmem_pool,
+ sizeof(*ed), &dma, 16);
else
ed = dma_pool_zalloc(hc->ed_cache, mem_flags, &dma);
if (ed) {
diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,7 +121,15 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
genpool_algo_t algo, void *data);
extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
-void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
+extern void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data);
+extern void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align);
+extern void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
+extern void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data);
+extern void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
diff --git a/lib/genalloc.c b/lib/genalloc.c
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -347,13 +347,33 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
* Return: virtual address of the allocated memory, or %NULL on failure
*/
void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
+{
+ return gen_pool_dma_alloc_algo(pool, size, dma, pool->algo, pool->data);
+}
+EXPORT_SYMBOL(gen_pool_dma_alloc);
+
+/**
+ * gen_pool_dma_alloc_algo - allocate special memory from the pool for DMA
+ * usage with the given pool algorithm
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: dma-view physical address return value. Use NULL if unneeded.
+ * @algo: algorithm passed from caller
+ * @data: data passed to algorithm
+ *
+ * Allocate the requested number of bytes from the specified pool. Uses the
+ * given pool allocation function. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ */
+void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data)
{
unsigned long vaddr;

if (!pool)
return NULL;

- vaddr = gen_pool_alloc(pool, size);
+ vaddr = gen_pool_alloc_algo(pool, size, algo, data);
if (!vaddr)
return NULL;

@@ -362,7 +382,31 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)

return (void *)vaddr;
}
-EXPORT_SYMBOL(gen_pool_dma_alloc);
+EXPORT_SYMBOL(gen_pool_dma_alloc_algo);
+
+/**
+ * gen_pool_dma_zalloc_align - allocate special from the pool for DMA usage
+ * with the given alignment
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: dma-view physical address return value. Use %NULL if unneeded.
+ * @align: alignment in bytes for starting address
+ *
+ * Allocate the requested number bytes from the specified pool, with the given
+ * alignment restriction. Can not be used in NMI handler on architectures
+ * without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated memory, or %NULL on failure
+ */
+void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align)
+{
+ struct genpool_data_align data = { .align = align };
+
+ return gen_pool_dma_alloc_algo(pool, size, dma,
+ gen_pool_first_fit_align, &data);
+}
+EXPORT_SYMBOL(gen_pool_dma_alloc_align);

/**
* gen_pool_dma_zalloc - allocate special zeroed memory from the pool for
@@ -380,14 +424,60 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
*/
void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
{
- void *vaddr = gen_pool_dma_alloc(pool, size, dma);
+ return gen_pool_dma_zalloc_algo(pool, size, dma, pool->algo, pool->data);
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc);
+
+/**
+ * gen_pool_dma_zalloc_algo - allocate special zeroed memory from the pool for
+ * DMA usage with the given pool algorithm
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: dma-view physical address return value. Use %NULL if unneeded.
+ * @algo: algorithm passed from caller
+ * @data: data passed to algorithm
+ *
+ * Allocate the requested number of zeroed bytes from the specified pool. Uses
+ * the pool allocation function. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated zeroed memory, or %NULL on failure
+ */
+void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data)
+{
+ void *vaddr = gen_pool_dma_alloc_algo(pool, size, dma, algo, data);

if (vaddr)
memset(vaddr, 0, size);

return vaddr;
}
-EXPORT_SYMBOL(gen_pool_dma_zalloc);
+EXPORT_SYMBOL(gen_pool_dma_zalloc_algo);
+
+/**
+ * gen_pool_dma_zalloc_align - allocate special zeroed memory from the pool for
+ * DMA usage with the given alignment
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: dma-view physical address return value. Use %NULL if unneeded.
+ * @align: alignment in bytes for starting address
+ *
+ * Allocate the requested number of zeroed bytes from the specified pool,
+ * with the given alignment restriction. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated zeroed memory, or %NULL on failure
+ */
+void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align)
+{
+ struct genpool_data_align data = { .align = align };
+
+ return gen_pool_dma_zalloc_algo(pool, size, dma,
+ gen_pool_first_fit_align, &data);
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc_align);

/**
* gen_pool_free - free allocated special memory back to the pool

2019-06-25 07:33:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v7 3/5] usb: host: ohci-sm501: init genalloc for local memory

On Mon, Jun 24, 2019 at 02:59:16PM +0200, Fredrik Noring wrote:
> Hi Christoph,
>
> > Can you send me the patch formally so that I can queue it up for the
> > dma-mapping tree?
>
> That patch would be detrimental to local memory devices, as previously
> discussed, so I would like to suggest a much better approach, as shown below,
> where allocations are aligned as required but not necessarily much more than
> that.

This looks sensible to me. Can you submit it with a proper patch
description and split into a separate patch for genalloc vs the user of
the new interface?

2019-06-25 15:06:44

by Fredrik Noring

[permalink] [raw]
Subject: [PATCH 1/2] lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators

Provide the algorithm option to DMA allocators as well, along with
convenience variants for zeroed and aligned memory. The following
four functions are added:

- gen_pool_dma_alloc_algo()
- gen_pool_dma_alloc_align()
- gen_pool_dma_zalloc_algo()
- gen_pool_dma_zalloc_align()

Signed-off-by: Fredrik Noring <[email protected]>
---
Hi Christoph,

This patch is based on my v5.0.21 branch, with Laurentiu Tudor's other
local memory changes.

Fredrik
---
include/linux/genalloc.h | 10 +++-
lib/genalloc.c | 100 +++++++++++++++++++++++++++++++++++++--
2 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,7 +121,15 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
genpool_algo_t algo, void *data);
extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
-void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
+extern void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data);
+extern void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align);
+extern void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
+extern void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data);
+extern void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
diff --git a/lib/genalloc.c b/lib/genalloc.c
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -347,13 +347,35 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
* Return: virtual address of the allocated memory, or %NULL on failure
*/
void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
+{
+ return gen_pool_dma_alloc_algo(pool, size, dma, pool->algo, pool->data);
+}
+EXPORT_SYMBOL(gen_pool_dma_alloc);
+
+/**
+ * gen_pool_dma_alloc_algo - allocate special memory from the pool for DMA
+ * usage with the given pool algorithm
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: DMA-view physical address return value. Use %NULL if unneeded.
+ * @algo: algorithm passed from caller
+ * @data: data passed to algorithm
+ *
+ * Allocate the requested number of bytes from the specified pool. Uses the
+ * given pool allocation function. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated memory, or %NULL on failure
+ */
+void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data)
{
unsigned long vaddr;

if (!pool)
return NULL;

- vaddr = gen_pool_alloc(pool, size);
+ vaddr = gen_pool_alloc_algo(pool, size, algo, data);
if (!vaddr)
return NULL;

@@ -362,7 +384,31 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)

return (void *)vaddr;
}
-EXPORT_SYMBOL(gen_pool_dma_alloc);
+EXPORT_SYMBOL(gen_pool_dma_alloc_algo);
+
+/**
+ * gen_pool_dma_alloc_align - allocate special memory from the pool for DMA
+ * usage with the given alignment
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: DMA-view physical address return value. Use %NULL if unneeded.
+ * @align: alignment in bytes for starting address
+ *
+ * Allocate the requested number bytes from the specified pool, with the given
+ * alignment restriction. Can not be used in NMI handler on architectures
+ * without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated memory, or %NULL on failure
+ */
+void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align)
+{
+ struct genpool_data_align data = { .align = align };
+
+ return gen_pool_dma_alloc_algo(pool, size, dma,
+ gen_pool_first_fit_align, &data);
+}
+EXPORT_SYMBOL(gen_pool_dma_alloc_align);

/**
* gen_pool_dma_zalloc - allocate special zeroed memory from the pool for
@@ -380,14 +426,60 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
*/
void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
{
- void *vaddr = gen_pool_dma_alloc(pool, size, dma);
+ return gen_pool_dma_zalloc_algo(pool, size, dma, pool->algo, pool->data);
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc);
+
+/**
+ * gen_pool_dma_zalloc_algo - allocate special zeroed memory from the pool for
+ * DMA usage with the given pool algorithm
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: DMA-view physical address return value. Use %NULL if unneeded.
+ * @algo: algorithm passed from caller
+ * @data: data passed to algorithm
+ *
+ * Allocate the requested number of zeroed bytes from the specified pool. Uses
+ * the given pool allocation function. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated zeroed memory, or %NULL on failure
+ */
+void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, genpool_algo_t algo, void *data)
+{
+ void *vaddr = gen_pool_dma_alloc_algo(pool, size, dma, algo, data);

if (vaddr)
memset(vaddr, 0, size);

return vaddr;
}
-EXPORT_SYMBOL(gen_pool_dma_zalloc);
+EXPORT_SYMBOL(gen_pool_dma_zalloc_algo);
+
+/**
+ * gen_pool_dma_zalloc_align - allocate special zeroed memory from the pool for
+ * DMA usage with the given alignment
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @dma: DMA-view physical address return value. Use %NULL if unneeded.
+ * @align: alignment in bytes for starting address
+ *
+ * Allocate the requested number of zeroed bytes from the specified pool,
+ * with the given alignment restriction. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
+ *
+ * Return: virtual address of the allocated zeroed memory, or %NULL on failure
+ */
+void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
+ dma_addr_t *dma, int align)
+{
+ struct genpool_data_align data = { .align = align };
+
+ return gen_pool_dma_zalloc_algo(pool, size, dma,
+ gen_pool_first_fit_align, &data);
+}
+EXPORT_SYMBOL(gen_pool_dma_zalloc_align);

/**
* gen_pool_free - free allocated special memory back to the pool
--
2.21.0

2019-06-25 15:11:00

by Fredrik Noring

[permalink] [raw]
Subject: [PATCH 2/2] usb: host: Fix excessive alignment restriction for local memory allocations

The PAGE_SHIFT alignment restriction to devm_gen_pool_create() quickly
exhaust local memory because most allocations are much smaller than
PAGE_SIZE. This causes USB device failures such as

usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2172 flags 80700

when trying to boot from the SM501 USB controller on SH4 with QEMU.

Align allocations as required but not necessarily much more than that.
The HCCA, TD and ED structures align with 256, 32 and 16 byte memory
boundaries, as specified by the Open HCI[1]. The min_alloc_order argument
to devm_gen_pool_create is now somewhat arbitrarily set to 4 (16 bytes).
Perhaps it could be somewhat lower for general buffer allocations.

Reference:

[1] "Open Host Controller Interface Specification for USB",
release 1.0a, Compaq, Microsoft, National Semiconductor, 1999,
pp. 16, 19, 33.

Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Fredrik Noring <[email protected]>
---
drivers/usb/core/hcd.c | 2 +-
drivers/usb/host/ohci-hcd.c | 4 ++--
drivers/usb/host/ohci-mem.c | 6 ++++--
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index b2362303d32f..48483fa71854 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -3014,7 +3014,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
int err;
void __iomem *local_mem;

- hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, PAGE_SHIFT,
+ hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 4,
dev_to_node(hcd->self.sysdev),
dev_name(hcd->self.sysdev));
if (IS_ERR(hcd->localmem_pool))
diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index 5801858d867e..b457fdaff297 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -507,9 +507,9 @@ static int ohci_init (struct ohci_hcd *ohci)
ohci->prev_frame_no = IO_WATCHDOG_OFF;

if (hcd->localmem_pool)
- ohci->hcca = gen_pool_dma_alloc(hcd->localmem_pool,
+ ohci->hcca = gen_pool_dma_alloc_align(hcd->localmem_pool,
sizeof(*ohci->hcca),
- &ohci->hcca_dma);
+ &ohci->hcca_dma, 256);
else
ohci->hcca = dma_alloc_coherent(hcd->self.controller,
sizeof(*ohci->hcca),
diff --git a/drivers/usb/host/ohci-mem.c b/drivers/usb/host/ohci-mem.c
index 4afe27cc7e46..1425335c6baf 100644
--- a/drivers/usb/host/ohci-mem.c
+++ b/drivers/usb/host/ohci-mem.c
@@ -94,7 +94,8 @@ td_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
struct usb_hcd *hcd = ohci_to_hcd(hc);

if (hcd->localmem_pool)
- td = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*td), &dma);
+ td = gen_pool_dma_zalloc_align(hcd->localmem_pool,
+ sizeof(*td), &dma, 32);
else
td = dma_pool_zalloc(hc->td_cache, mem_flags, &dma);
if (td) {
@@ -137,7 +138,8 @@ ed_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
struct usb_hcd *hcd = ohci_to_hcd(hc);

if (hcd->localmem_pool)
- ed = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*ed), &dma);
+ ed = gen_pool_dma_zalloc_align(hcd->localmem_pool,
+ sizeof(*ed), &dma, 16);
else
ed = dma_pool_zalloc(hc->ed_cache, mem_flags, &dma);
if (ed) {
--
2.21.0

2019-06-25 21:03:24

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 2/2] usb: host: Fix excessive alignment restriction for local memory allocations

On Tue, Jun 25, 2019 at 05:08:23PM +0200, Fredrik Noring wrote:
> The PAGE_SHIFT alignment restriction to devm_gen_pool_create() quickly
> exhaust local memory because most allocations are much smaller than
> PAGE_SIZE. This causes USB device failures such as
>
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 2172 flags 80700
>
> when trying to boot from the SM501 USB controller on SH4 with QEMU.
>
> Align allocations as required but not necessarily much more than that.
> The HCCA, TD and ED structures align with 256, 32 and 16 byte memory
> boundaries, as specified by the Open HCI[1]. The min_alloc_order argument
> to devm_gen_pool_create is now somewhat arbitrarily set to 4 (16 bytes).
> Perhaps it could be somewhat lower for general buffer allocations.
>
> Reference:
>
> [1] "Open Host Controller Interface Specification for USB",
> release 1.0a, Compaq, Microsoft, National Semiconductor, 1999,
> pp. 16, 19, 33.
>
> Reported-by: Guenter Roeck <[email protected]>
> Signed-off-by: Fredrik Noring <[email protected]>

Tested-by: Guenter Roeck <[email protected]>

> ---
> drivers/usb/core/hcd.c | 2 +-
> drivers/usb/host/ohci-hcd.c | 4 ++--
> drivers/usb/host/ohci-mem.c | 6 ++++--
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index b2362303d32f..48483fa71854 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -3014,7 +3014,7 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
> int err;
> void __iomem *local_mem;
>
> - hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, PAGE_SHIFT,
> + hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 4,
> dev_to_node(hcd->self.sysdev),
> dev_name(hcd->self.sysdev));
> if (IS_ERR(hcd->localmem_pool))
> diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
> index 5801858d867e..b457fdaff297 100644
> --- a/drivers/usb/host/ohci-hcd.c
> +++ b/drivers/usb/host/ohci-hcd.c
> @@ -507,9 +507,9 @@ static int ohci_init (struct ohci_hcd *ohci)
> ohci->prev_frame_no = IO_WATCHDOG_OFF;
>
> if (hcd->localmem_pool)
> - ohci->hcca = gen_pool_dma_alloc(hcd->localmem_pool,
> + ohci->hcca = gen_pool_dma_alloc_align(hcd->localmem_pool,
> sizeof(*ohci->hcca),
> - &ohci->hcca_dma);
> + &ohci->hcca_dma, 256);
> else
> ohci->hcca = dma_alloc_coherent(hcd->self.controller,
> sizeof(*ohci->hcca),
> diff --git a/drivers/usb/host/ohci-mem.c b/drivers/usb/host/ohci-mem.c
> index 4afe27cc7e46..1425335c6baf 100644
> --- a/drivers/usb/host/ohci-mem.c
> +++ b/drivers/usb/host/ohci-mem.c
> @@ -94,7 +94,8 @@ td_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
> struct usb_hcd *hcd = ohci_to_hcd(hc);
>
> if (hcd->localmem_pool)
> - td = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*td), &dma);
> + td = gen_pool_dma_zalloc_align(hcd->localmem_pool,
> + sizeof(*td), &dma, 32);
> else
> td = dma_pool_zalloc(hc->td_cache, mem_flags, &dma);
> if (td) {
> @@ -137,7 +138,8 @@ ed_alloc (struct ohci_hcd *hc, gfp_t mem_flags)
> struct usb_hcd *hcd = ohci_to_hcd(hc);
>
> if (hcd->localmem_pool)
> - ed = gen_pool_dma_zalloc(hcd->localmem_pool, sizeof(*ed), &dma);
> + ed = gen_pool_dma_zalloc_align(hcd->localmem_pool,
> + sizeof(*ed), &dma, 16);
> else
> ed = dma_pool_zalloc(hc->ed_cache, mem_flags, &dma);
> if (ed) {
> --
> 2.21.0
>

2019-06-25 21:04:15

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/2] lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators

On Tue, Jun 25, 2019 at 05:05:58PM +0200, Fredrik Noring wrote:
> Provide the algorithm option to DMA allocators as well, along with
> convenience variants for zeroed and aligned memory. The following
> four functions are added:
>
> - gen_pool_dma_alloc_algo()
> - gen_pool_dma_alloc_align()
> - gen_pool_dma_zalloc_algo()
> - gen_pool_dma_zalloc_align()
>
> Signed-off-by: Fredrik Noring <[email protected]>

The series fixes the problem I had observed in linux-next.

Tested-by: Guenter Roeck <[email protected]>

Guenter

> ---
> Hi Christoph,
>
> This patch is based on my v5.0.21 branch, with Laurentiu Tudor's other
> local memory changes.
>
> Fredrik
> ---
> include/linux/genalloc.h | 10 +++-
> lib/genalloc.c | 100 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 105 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -121,7 +121,15 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
> genpool_algo_t algo, void *data);
> extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
> dma_addr_t *dma);
> -void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
> +extern void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, genpool_algo_t algo, void *data);
> +extern void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, int align);
> +extern void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
> +extern void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, genpool_algo_t algo, void *data);
> +extern void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, int align);
> extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
> extern void gen_pool_for_each_chunk(struct gen_pool *,
> void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
> diff --git a/lib/genalloc.c b/lib/genalloc.c
> --- a/lib/genalloc.c
> +++ b/lib/genalloc.c
> @@ -347,13 +347,35 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
> * Return: virtual address of the allocated memory, or %NULL on failure
> */
> void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
> +{
> + return gen_pool_dma_alloc_algo(pool, size, dma, pool->algo, pool->data);
> +}
> +EXPORT_SYMBOL(gen_pool_dma_alloc);
> +
> +/**
> + * gen_pool_dma_alloc_algo - allocate special memory from the pool for DMA
> + * usage with the given pool algorithm
> + * @pool: pool to allocate from
> + * @size: number of bytes to allocate from the pool
> + * @dma: DMA-view physical address return value. Use %NULL if unneeded.
> + * @algo: algorithm passed from caller
> + * @data: data passed to algorithm
> + *
> + * Allocate the requested number of bytes from the specified pool. Uses the
> + * given pool allocation function. Can not be used in NMI handler on
> + * architectures without NMI-safe cmpxchg implementation.
> + *
> + * Return: virtual address of the allocated memory, or %NULL on failure
> + */
> +void *gen_pool_dma_alloc_algo(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, genpool_algo_t algo, void *data)
> {
> unsigned long vaddr;
>
> if (!pool)
> return NULL;
>
> - vaddr = gen_pool_alloc(pool, size);
> + vaddr = gen_pool_alloc_algo(pool, size, algo, data);
> if (!vaddr)
> return NULL;
>
> @@ -362,7 +384,31 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
>
> return (void *)vaddr;
> }
> -EXPORT_SYMBOL(gen_pool_dma_alloc);
> +EXPORT_SYMBOL(gen_pool_dma_alloc_algo);
> +
> +/**
> + * gen_pool_dma_alloc_align - allocate special memory from the pool for DMA
> + * usage with the given alignment
> + * @pool: pool to allocate from
> + * @size: number of bytes to allocate from the pool
> + * @dma: DMA-view physical address return value. Use %NULL if unneeded.
> + * @align: alignment in bytes for starting address
> + *
> + * Allocate the requested number bytes from the specified pool, with the given
> + * alignment restriction. Can not be used in NMI handler on architectures
> + * without NMI-safe cmpxchg implementation.
> + *
> + * Return: virtual address of the allocated memory, or %NULL on failure
> + */
> +void *gen_pool_dma_alloc_align(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, int align)
> +{
> + struct genpool_data_align data = { .align = align };
> +
> + return gen_pool_dma_alloc_algo(pool, size, dma,
> + gen_pool_first_fit_align, &data);
> +}
> +EXPORT_SYMBOL(gen_pool_dma_alloc_align);
>
> /**
> * gen_pool_dma_zalloc - allocate special zeroed memory from the pool for
> @@ -380,14 +426,60 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
> */
> void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
> {
> - void *vaddr = gen_pool_dma_alloc(pool, size, dma);
> + return gen_pool_dma_zalloc_algo(pool, size, dma, pool->algo, pool->data);
> +}
> +EXPORT_SYMBOL(gen_pool_dma_zalloc);
> +
> +/**
> + * gen_pool_dma_zalloc_algo - allocate special zeroed memory from the pool for
> + * DMA usage with the given pool algorithm
> + * @pool: pool to allocate from
> + * @size: number of bytes to allocate from the pool
> + * @dma: DMA-view physical address return value. Use %NULL if unneeded.
> + * @algo: algorithm passed from caller
> + * @data: data passed to algorithm
> + *
> + * Allocate the requested number of zeroed bytes from the specified pool. Uses
> + * the given pool allocation function. Can not be used in NMI handler on
> + * architectures without NMI-safe cmpxchg implementation.
> + *
> + * Return: virtual address of the allocated zeroed memory, or %NULL on failure
> + */
> +void *gen_pool_dma_zalloc_algo(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, genpool_algo_t algo, void *data)
> +{
> + void *vaddr = gen_pool_dma_alloc_algo(pool, size, dma, algo, data);
>
> if (vaddr)
> memset(vaddr, 0, size);
>
> return vaddr;
> }
> -EXPORT_SYMBOL(gen_pool_dma_zalloc);
> +EXPORT_SYMBOL(gen_pool_dma_zalloc_algo);
> +
> +/**
> + * gen_pool_dma_zalloc_align - allocate special zeroed memory from the pool for
> + * DMA usage with the given alignment
> + * @pool: pool to allocate from
> + * @size: number of bytes to allocate from the pool
> + * @dma: DMA-view physical address return value. Use %NULL if unneeded.
> + * @align: alignment in bytes for starting address
> + *
> + * Allocate the requested number of zeroed bytes from the specified pool,
> + * with the given alignment restriction. Can not be used in NMI handler on
> + * architectures without NMI-safe cmpxchg implementation.
> + *
> + * Return: virtual address of the allocated zeroed memory, or %NULL on failure
> + */
> +void *gen_pool_dma_zalloc_align(struct gen_pool *pool, size_t size,
> + dma_addr_t *dma, int align)
> +{
> + struct genpool_data_align data = { .align = align };
> +
> + return gen_pool_dma_zalloc_algo(pool, size, dma,
> + gen_pool_first_fit_align, &data);
> +}
> +EXPORT_SYMBOL(gen_pool_dma_zalloc_align);
>
> /**
> * gen_pool_free - free allocated special memory back to the pool
> --
> 2.21.0
>

2019-06-28 05:59:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/2] lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators

Thanks,

I've added both patches to the dma-mapping for-next tree.