2009-01-07 23:18:18

by Vadim Lobanov

[permalink] [raw]
Subject: amd5536udc interrupts bug

Hello,

>From perusing the code and playing with the module, it seems to me that
the amd5536udc driver's handling of interrupts is currently "bustificated".

The long story:

During the amd5536udc initialization sequence within udc_pci_probe(),
the code attempts to request a shared irq for the device thusly:
request_irq(pdev->irq, udc_irq, IRQF_SHARED, name, dev)
where 'dev' is the internal state structure. By the time this call is made,
the 'dev' structure is still not fully initialized and contains blank/zero
data for many of the fields, in particular both 'dev->lock' and 'dev->regs'
which are both clearly used within the udc_irq() handler. Those get
initialized a bit later, namely inside the udc_probe() call at the bottom of
udc_pci_probe().

It is my understanding that a handler for a shared interrupt can be
invoked at any time after the corresponding request_irq() call is made,
simply because some other device on the same interrupt may already
be active. This leaves us with a very noticeable race condition, where
udc_irq() can be invoked with uninitialized 'dev' data.

In particular, this effect is very noticeable when I try modprobing the
amd5536udc driver on a kernel that is built with CONFIG_DEBUG_SHIRQ
set. Given that the debug config option forces an invocation of the irq
handler from within the request_irq() function, the immediate effect is a
masterful kernel NULL-dereference OOPS within udc_irq().

The simple fix may be to say that amd5536udc does not support shared
irqs, and to remove the IRQF_SHARED flag from the request_irq() call. A
more complicated fix may be to try to shuffle all the code around to
make sure that 'dev' is fully initialized before we request the irq, but I
don't understand the code well enough (yet) to comfortably do this.

Comments? Thoughts?

On a side note, it occurs to me that the CONFIG_DEBUG_SHIRQ option
went into the kernel a bit less than two years ago, and that it exposes a
very immediate and reproducible OOPS in this driver. Does this mean
that noone uses the 5536 UDC functionality with any recent kernels?
Should I be worried? :)

-- Vadim Lobanov


2009-01-08 02:33:18

by Robert Hancock

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

Vadim Lobanov wrote:
> Hello,
>
> From perusing the code and playing with the module, it seems to me that
> the amd5536udc driver's handling of interrupts is currently "bustificated".
>
> The long story:
>
> During the amd5536udc initialization sequence within udc_pci_probe(),
> the code attempts to request a shared irq for the device thusly:
> request_irq(pdev->irq, udc_irq, IRQF_SHARED, name, dev)
> where 'dev' is the internal state structure. By the time this call is made,
> the 'dev' structure is still not fully initialized and contains blank/zero
> data for many of the fields, in particular both 'dev->lock' and 'dev->regs'
> which are both clearly used within the udc_irq() handler. Those get
> initialized a bit later, namely inside the udc_probe() call at the bottom of
> udc_pci_probe().
>
> It is my understanding that a handler for a shared interrupt can be
> invoked at any time after the corresponding request_irq() call is made,
> simply because some other device on the same interrupt may already
> be active. This leaves us with a very noticeable race condition, where
> udc_irq() can be invoked with uninitialized 'dev' data.

Yes, your analysis appears correct.

>
> In particular, this effect is very noticeable when I try modprobing the
> amd5536udc driver on a kernel that is built with CONFIG_DEBUG_SHIRQ
> set. Given that the debug config option forces an invocation of the irq
> handler from within the request_irq() function, the immediate effect is a
> masterful kernel NULL-dereference OOPS within udc_irq().
>
> The simple fix may be to say that amd5536udc does not support shared
> irqs, and to remove the IRQF_SHARED flag from the request_irq() call. A

That will bust any other hardware that tries to share the interrupt. If
a driver requests the interrupt without IRQF_SHARED, nothing else can
request that interrupt line.

> more complicated fix may be to try to shuffle all the code around to
> make sure that 'dev' is fully initialized before we request the irq, but I
> don't understand the code well enough (yet) to comfortably do this.

Yeah, that's the proper fix.

>
> Comments? Thoughts?
>
> On a side note, it occurs to me that the CONFIG_DEBUG_SHIRQ option
> went into the kernel a bit less than two years ago, and that it exposes a
> very immediate and reproducible OOPS in this driver. Does this mean
> that noone uses the 5536 UDC functionality with any recent kernels?
> Should I be worried? :)

Presumably nobody uses it with CONFIG_DEBUG_SHIRQ, that option wouldn't
normally be used on non-debug kernels..

>
> -- Vadim Lobanov
>

2009-01-08 16:40:46

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug


> Vadim Lobanov wrote:
>> It is my understanding that a handler for a shared interrupt can be
>> invoked at any time after the corresponding request_irq() call is
>> made, simply because some other device on the same interrupt may
>> already be active. This leaves us with a very noticeable race
>> condition, where udc_irq() can be invoked with uninitialized 'dev'
>> data.
>
> Yes, your analysis appears correct.

Ah, right. Thanks for your analysis! Never ran into this condition as on the
Geode systems I had the IRQ is shared between UDC and UOC/OTG
(both on the CS5536 chip) and UOC never fires interrupts while UDC
driver is loaded as both drivers (UOC/OTG driver is not in the kernel yet)
register each other which makes sure that interrupts are enabled
after udc_pci_probe().

>> more complicated fix may be to try to shuffle all the code around to
>> make sure that 'dev' is fully initialized before we request the irq,
>> but I don't understand the code well enough (yet) to comfortably do
>> this.
>
> Yeah, that's the proper fix.

I could provide a fix within the next weeks. Currently I move to another
company and
have no hardware to test it.

Maybe you want to try this. It should work to place the register init
from udc_probe()

/* udc csr registers base */
dev->csr = dev->virt_addr + UDC_CSR_ADDR;
/* dev registers base */
dev->regs = dev->virt_addr + UDC_DEVCFG_ADDR;
/* ep registers base */
dev->ep_regs = dev->virt_addr + UDC_EPREGS_ADDR;
/* fifo's base */
dev->rxfifo = (u32 __iomem *)(dev->virt_addr + UDC_RXFIFO_ADDR);
dev->txfifo = (u32 __iomem *)(dev->virt_addr + UDC_TXFIFO_ADDR);

just before request_irq(...) to allow the interrupt handler to read the
interrupt status
registers.

Thomas


2009-01-08 18:27:28

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Thursday 08 January 2009 08:40:28 Thomas Dahlmann wrote:
> Ah, right. Thanks for your analysis! Never ran into this condition as on
> the Geode systems I had the IRQ is shared between UDC and
UOC/OTG
> (both on the CS5536 chip) and UOC never fires interrupts while UDC
> driver is loaded as both drivers (UOC/OTG driver is not in the kernel
yet)
> register each other which makes sure that interrupts are enabled
> after udc_pci_probe().

Ok, I understand.

> I could provide a fix within the next weeks. Currently I move to another
> company and
> have no hardware to test it.

No worries. :) I can also test your patches on my board here, if it helps
any.

> Maybe you want to try this. It should work to place the register init
> from udc_probe()
>
> /* udc csr registers base */
> dev->csr = dev->virt_addr + UDC_CSR_ADDR;
> /* dev registers base */
> dev->regs = dev->virt_addr + UDC_DEVCFG_ADDR;
> /* ep registers base */
> dev->ep_regs = dev->virt_addr + UDC_EPREGS_ADDR;
> /* fifo's base */
> dev->rxfifo = (u32 __iomem *)(dev->virt_addr + UDC_RXFIFO_ADDR);
> dev->txfifo = (u32 __iomem *)(dev->virt_addr + UDC_TXFIFO_ADDR);
>
> just before request_irq(...) to allow the interrupt handler to read the
> interrupt status
> registers.

I'll slip this into the next kernel build and see what happens. Thanks!

-- Vadim Lobanov

2009-01-09 02:02:53

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Thursday 08 January 2009 08:40:28 Thomas Dahlmann wrote:
> Maybe you want to try this. It should work to place the register init
> from udc_probe()
>
> /* udc csr registers base */
> dev->csr = dev->virt_addr + UDC_CSR_ADDR;
> /* dev registers base */
> dev->regs = dev->virt_addr + UDC_DEVCFG_ADDR;
> /* ep registers base */spin_lock_init(&dev->lock);
> dev->ep_regs = dev->virt_addr + UDC_EPREGS_ADDR;
> /* fifo's base */
> dev->rxfifo = (u32 __iomem *)(dev->virt_addr + UDC_RXFIFO_ADDR);
> dev->txfifo = (u32 __iomem *)(dev->virt_addr + UDC_TXFIFO_ADDR);
>
> just before request_irq(...) to allow the interrupt handler to read the
> interrupt status
> registers.

I did this. Actually, I also yanked the "spin_lock_init(&dev->lock)" bit
before the request_irq() as well, since that field was also obviously
needed for the irq routine. (I didn't check for any other necessary but
less-obvious fields.) With these changes, the module modprobes just
fine.

Alas, the hardware doesn't work. When I try plugging in the other end of
the USB cable, absolutely nothing happens. Not even an interrupt:
/proc/interrupts for the amd5536udc line stays at zero. Any thoughts on
possible ways to tackle this / what could be going wrong / etc?

-- Vadim Lobanov

2009-01-09 06:25:34

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Wednesday 07 January 2009 18:32:57 Robert Hancock wrote:
> Vadim Lobanov wrote:
> > The simple fix may be to say that amd5536udc does not support shared
> > irqs, and to remove the IRQF_SHARED flag from the request_irq() call. A
>
> That will bust any other hardware that tries to share the interrupt. If
> a driver requests the interrupt without IRQF_SHARED, nothing else can
> request that interrupt line.

Of course.

And I do still need to check what other bits and pieces of hardware, if any,
may want to make use of that particular interrupt. Given that I'm trying to
get this to run on a very specific system (a Hurricane LX-800), the simple
solution may very well work locally.

In fact, it may even be a good first problem-solving step for the vanilla
kernel, if only to quickly mark the driver as "known bad with shared irqs" to
possibly save someone else from having to perform a debugging session. The
full fix can then happen later.

> > more complicated fix may be to try to shuffle all the code around to
> > make sure that 'dev' is fully initialized before we request the irq, but
> > I don't understand the code well enough (yet) to comfortably do this.
>
> Yeah, that's the proper fix.
>
> > On a side note, it occurs to me that the CONFIG_DEBUG_SHIRQ option
> > went into the kernel a bit less than two years ago, and that it exposes a
> > very immediate and reproducible OOPS in this driver. Does this mean
> > that noone uses the 5536 UDC functionality with any recent kernels?
> > Should I be worried? :)
>
> Presumably nobody uses it with CONFIG_DEBUG_SHIRQ, that option wouldn't
> normally be used on non-debug kernels..

Which does nicely illustrate the value of running a debugging-enabled kernel
during development. Random OOPSen out in the field == ouch. Much pain was saved
this time around.

-- Vadim Lobanov

2009-01-09 11:41:45

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

Vadim Lobanov schrieb:
> On Thursday 08 January 2009 08:40:28 Thomas Dahlmann wrote:
>
>> Maybe you want to try this. It should work to place the register init
>> from udc_probe()
>>
>> /* udc csr registers base */
>> dev->csr = dev->virt_addr + UDC_CSR_ADDR;
>> /* dev registers base */
>> dev->regs = dev->virt_addr + UDC_DEVCFG_ADDR;
>> /* ep registers base */spin_lock_init(&dev->lock);
>> dev->ep_regs = dev->virt_addr + UDC_EPREGS_ADDR;
>> /* fifo's base */
>> dev->rxfifo = (u32 __iomem *)(dev->virt_addr + UDC_RXFIFO_ADDR);
>> dev->txfifo = (u32 __iomem *)(dev->virt_addr + UDC_TXFIFO_ADDR);
>>
>> just before request_irq(...) to allow the interrupt handler to read the
>> interrupt status
>> registers.
>>
>
> I did this. Actually, I also yanked the "spin_lock_init(&dev->lock)" bit
> before the request_irq() as well, since that field was also obviously
> needed for the irq routine. (I didn't check for any other necessary but
> less-obvious fields.) With these changes, the module modprobes just
> fine.
>

Thanks for the test!

> Alas, the hardware doesn't work. When I try plugging in the other end of
> the USB cable, absolutely nothing happens. Not even an interrupt:
> /proc/interrupts for the amd5536udc line stays at zero. Any thoughts on
> possible ways to tackle this / what could be going wrong / etc?
>

Is there any output in the kernel messages on the host side complaining
about that device is
not answering? If not than USB device port is probably not connected to
UDC PHY. Please
check in BIOS setup that port 4 is assigned to UDC.

This will set bits PAD_EN and APU in UOC controller (DEVID 0x2097):
http://www.amd.com/files/connectivitysolutions/geode/geode_lx/33238G_cs5536_db.pdf

Thomas
> -- Vadim Lobanov
>
>

2009-01-09 22:40:32

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Friday 09 January 2009 03:41:21 Thomas Dahlmann wrote:
> Vadim Lobanov schrieb:
> > Alas, the hardware doesn't work. When I try plugging in the other end of
> > the USB cable, absolutely nothing happens. Not even an interrupt:
> > /proc/interrupts for the amd5536udc line stays at zero. Any thoughts on
> > possible ways to tackle this / what could be going wrong / etc?

Just realized that I forgot to mention that I do also have g_zero sitting on
top of amd5536udc, so there should at least be a dummy USB device to
enumerate.

> Is there any output in the kernel messages on the host side complaining
> about that device is
> not answering? If not than USB device port is probably not connected to
> UDC PHY. Please
> check in BIOS setup that port 4 is assigned to UDC.

Nope, nothing at all shows up in the logs or the interrupt counts when the
cable is plugged, neither for the host drivers nor the device drivers. (The
quickest test I have is to hook the device up to itself - one A port and one B
port.)

The BIOS is configured "correctly" if the manuals are to be trusted. Here are
the relevant settings
OHCI: Enabled
EHCI: Enabled
UDC: Enabled
UOC: Disabled
Overcurrent reporting: Disabled
Port 4 assignment: Device

> This will set bits PAD_EN and APU in UOC controller (DEVID 0x2097):
>
http://www.amd.com/files/connectivitysolutions/geode/geode_lx/33238G_cs5536
>_db.pdf

Please pardon my ignorance, but what would be the best way to check if
these bits are set?

Thanks.

2009-01-10 20:28:37

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

Vadim Lobanov schrieb:
> On Friday 09 January 2009 03:41:21 Thomas Dahlmann wrote:
>
>> Vadim Lobanov schrieb:
>>
>>> Alas, the hardware doesn't work. When I try plugging in the other end of
>>> the USB cable, absolutely nothing happens. Not even an interrupt:
>>> /proc/interrupts for the amd5536udc line stays at zero. Any thoughts on
>>> possible ways to tackle this / what could be going wrong / etc?
>>>
>
> Just realized that I forgot to mention that I do also have g_zero sitting on
> top of amd5536udc, so there should at least be a dummy USB device to
> enumerate.
>
>
Right, g_zero should enumerate. It also provides data sink/source and
loopback functionality.
If you want to test data traffic then g_file_storage would be a good
choice as it behaves as a mass storage
device.

Example:
dd bs=1M count=64 if=/dev/zero of=/tmp/image_file
modrobe g_file_storage file=/tmp/image_file
(create partition and file system from host side)

>> Is there any output in the kernel messages on the host side complaining
>> about that device is
>> not answering? If not than USB device port is probably not connected to
>> UDC PHY. Please
>> check in BIOS setup that port 4 is assigned to UDC.
>>
>
> Nope, nothing at all shows up in the logs or the interrupt counts when the
> cable is plugged, neither for the host drivers nor the device drivers. (The
> quickest test I have is to hook the device up to itself - one A port and one B
> port.)
>
> The BIOS is configured "correctly" if the manuals are to be trusted. Here are
> the relevant settings
> OHCI: Enabled
> EHCI: Enabled
> UDC: Enabled
> UOC: Disabled
> Overcurrent reporting: Disabled
> Port 4 assignment: Device
>
UOC has to be enabled. This controller switches the UDC PHY to port 4
and owns the mentioned
PAD_EN and APU bits. These bits live in the memory mapped register space
of UOC. I use a
company intern tools to look at memory spaces of PCI devices. Lets
forget these bits so far as I am
pretty sure that enabling UOC fixes the problem.

Thomas

2009-01-12 19:02:57

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Saturday 10 January 2009 12:28:20 Thomas Dahlmann wrote:
> Vadim Lobanov schrieb:
> > Nope, nothing at all shows up in the logs or the interrupt counts when
> > the cable is plugged, neither for the host drivers nor the device
> > drivers. (The quickest test I have is to hook the device up to itself -
> > one A port and one B port.)
> >
> > The BIOS is configured "correctly" if the manuals are to be trusted. Here
> > are the relevant settings
> > OHCI: Enabled
> > EHCI: Enabled
> > UDC: Enabled
> > UOC: Disabled
> > Overcurrent reporting: Disabled
> > Port 4 assignment: Device
>
> UOC has to be enabled. This controller switches the UDC PHY to port 4
> and owns the mentioned
> PAD_EN and APU bits. These bits live in the memory mapped register space
> of UOC. I use a
> company intern tools to look at memory spaces of PCI devices. Lets
> forget these bits so far as I am
> pretty sure that enabling UOC fixes the problem.

That'll teach me for trusting the board vendor's documentation. :)

[ ... doing some tests ... ]

Unfortunately, however, even enabling UOC in the BIOS does not make the
USB connection work. The symptoms are exactly the same as before.

I'm going to try to figure out how to get at those two bits, and also will
start talking with the board vendor as well (since it's starting to look more
and more like a BIOS and/or HW problem).

Thanks for your help so far!

2009-01-13 19:20:09

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Saturday 10 January 2009 12:28:20 Thomas Dahlmann wrote:
> UOC has to be enabled. This controller switches the UDC PHY to port 4
> and owns the mentioned
> PAD_EN and APU bits. These bits live in the memory mapped register space
> of UOC. I use a
> company intern tools to look at memory spaces of PCI devices. Lets
> forget these bits so far as I am
> pretty sure that enabling UOC fixes the problem.

I've managed to get at some values that look suspiciously like the UOC
register space, and they seem to be OK.

Here is the full description of what I did, so that google can slurp this
information up for others to use in the future. First things first, the AMD
Geode CS5536 Companion Device Data Book is a necessary resource, and tells us
where the the UOC-related MSR lives, how big the UOC register space is, and
what all the bits mean.

Armed with this data, we first grab the UOC MSR:

[root@hurricane ~]# modprobe msr
[root@hurricane ~]# rdmsr 0x5120000B
aefa06000

We note the UOC base address value, which, according to the data book, is in
bits 31:8 of the MSR. Using this information and a few more looks into the
data book, we get the following kernel code:

#define UOC_REGISTERS_ADDR 0xEFA06000
#define UOC_REGISTERS_SIZE 0x0100

struct uoc_registers {
u32 cap;
u32 mux;
u32 reserved;
u32 ctl;
} __attribute__((packed));

static void observe_uoc_registers(void)
{
struct uoc_registers __iomem *uoc;

if (!request_mem_region(UOC_REGISTERS_ADDR,
UOC_REGISTERS_SIZE,
"amd5536uoc")) {
printk(KERN_INFO "amd5536uoc mem region already used\n");
goto finish;
}
uoc = ioremap_nocache(UOC_REGISTERS_ADDR,
UOC_REGISTERS_SIZE);
if (uoc == NULL) {
printk(KERN_INFO "amd5536uoc io memory cannot be mapped\n");
goto release;
}
printk(KERN_INFO "amd5536uoc cap=0x%08X\n", readl(&uoc->cap));
printk(KERN_INFO "amd5536uoc mux=0x%08X\n", readl(&uoc->mux));
printk(KERN_INFO "amd5536uoc ctl=0x%08X\n", readl(&uoc->ctl));
iounmap(uoc);
release:
release_mem_region(UOC_REGISTERS_ADDR,
UOC_REGISTERS_SIZE);
finish:
printk(KERN_INFO "amd5536uoc observation finished\n");
}

Put a call to observe_uoc_registers() at the top of udc_pci_probe(), rebuild,
modprobe the amd5536udc driver, and observe the data. Namely, the values that
came back, as well as their interpreted meanings according to the data book,
are:

cap=0x000083EA
The host power control bits affect USB_PWR_EN1 for ports 1-3.
The host power control bits affect USB_PWR_EN2 for port 4.
Port power enabled with output of 1 on USB_PWR_EN1 and USB_PWR_EN2.
No overcurrent reporting.
Automatic pull-up (APU) is enabled.

mux=0x00000003
The muxed port is assigned to the device controller.

ctl=0x00000083
The muxed port is assigned to the device controller.
The comparator circuitry for VBUS detection (PADEN) is enabled.

The mystery deepens. :)

-- Vadim Lobanov

2009-01-14 12:44:17

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

Vadim Lobanov schrieb:
> On Saturday 10 January 2009 12:28:20 Thomas Dahlmann wrote:
>
>> UOC has to be enabled. This controller switches the UDC PHY to port 4
>> and owns the mentioned
>> PAD_EN and APU bits. These bits live in the memory mapped register space
>> of UOC. I use a
>> company intern tools to look at memory spaces of PCI devices. Lets
>> forget these bits so far as I am
>> pretty sure that enabling UOC fixes the problem.
>>
>
> I've managed to get at some values that look suspiciously like the UOC
> register space, and they seem to be OK.
>
> Here is the full description of what I did, so that google can slurp this
> information up for others to use in the future. First things first, the AMD
> Geode CS5536 Companion Device Data Book is a necessary resource, and tells us
> where the the UOC-related MSR lives, how big the UOC register space is, and
> what all the bits mean.
>
> Armed with this data, we first grab the UOC MSR:
>
> [root@hurricane ~]# modprobe msr
> [root@hurricane ~]# rdmsr 0x5120000B
> aefa06000
>
> We note the UOC base address value, which, according to the data book, is in
> bits 31:8 of the MSR. Using this information and a few more looks into the
> data book, we get the following kernel code:
>
> #define UOC_REGISTERS_ADDR 0xEFA06000
> #define UOC_REGISTERS_SIZE 0x0100
>
> struct uoc_registers {
> u32 cap;
> u32 mux;
> u32 reserved;
> u32 ctl;
> } __attribute__((packed));
>
> static void observe_uoc_registers(void)
> {
> struct uoc_registers __iomem *uoc;
>
> if (!request_mem_region(UOC_REGISTERS_ADDR,
> UOC_REGISTERS_SIZE,
> "amd5536uoc")) {
> printk(KERN_INFO "amd5536uoc mem region already used\n");
> goto finish;
> }
> uoc = ioremap_nocache(UOC_REGISTERS_ADDR,
> UOC_REGISTERS_SIZE);
> if (uoc == NULL) {
> printk(KERN_INFO "amd5536uoc io memory cannot be mapped\n");
> goto release;
> }
> printk(KERN_INFO "amd5536uoc cap=0x%08X\n", readl(&uoc->cap));
> printk(KERN_INFO "amd5536uoc mux=0x%08X\n", readl(&uoc->mux));
> printk(KERN_INFO "amd5536uoc ctl=0x%08X\n", readl(&uoc->ctl));
> iounmap(uoc);
> release:
> release_mem_region(UOC_REGISTERS_ADDR,
> UOC_REGISTERS_SIZE);
> finish:
> printk(KERN_INFO "amd5536uoc observation finished\n");
> }
>
> Put a call to observe_uoc_registers() at the top of udc_pci_probe(), rebuild,
> modprobe the amd5536udc driver, and observe the data. Namely, the values that
> came back, as well as their interpreted meanings according to the data book,
> are:
>
> cap=0x000083EA
> The host power control bits affect USB_PWR_EN1 for ports 1-3.
> The host power control bits affect USB_PWR_EN2 for port 4.
> Port power enabled with output of 1 on USB_PWR_EN1 and USB_PWR_EN2.
> No overcurrent reporting.
> Automatic pull-up (APU) is enabled.
>
> mux=0x00000003
> The muxed port is assigned to the device controller.
>
> ctl=0x00000083
> The muxed port is assigned to the device controller.
> The comparator circuitry for VBUS detection (PADEN) is enabled.
>
> The mystery deepens. :)
>
> -- Vadim Lobanov
>
>
Beside reading the MSR register for UOC_REGISTERS_ADDR
it should be possible to use BAR0 of UOC PCI config
space (if UOC enabled in BIOS).

The UOC registers look good to me too except for bit VBUSVLD
(bit 8 in UOCMUX). This bit should get set when cable is
connected. Maybe VBUS pin is not connected. In this case the
internal pull up will not get connected. For testing the pull up can
be connected manually by clearing APU in UOCCAP and
setting PUEN in UOCMUX.

There is one remaining bit which could prevent the physical USB
connection. It is bit SD (soft disconnect) in UDC DEVCTL
register (bit 10). This bit has to be cleared. This is done by
usb_gadget_register_driver() in amd5536udc.c. This function is
called by gadget driver like g_zero when gagdet driver loads.
Maybe you want add a printk() there to check that the bit was
cleared. Furthermore it could help to enable DEBUG and
#define UDC_VERBOSE in amd5536udc.c to have a closer
look at what UDC driver is doing.

But if SD bit is cleared (and UOC bits seem OK as you showed)
then only hardware can be the root cause of the issue. If host
detects a USB connection it will automatically send a
GET_DESCRIPTOR message to UDC. You would see kernel
messages on host side, at least messages indicating that device
is not responding.

I found a Hurricane LX800 manual at

http://www.lippert-at.de/fileadmin/lippert-at/products/EPIC/Hurricane-LX800/TME-EPIC-HURLX-R1V4.pdf

saying that there is no type B device port and device port is port 3.
Do you have a different one?

Chapter 3.8, page 19:
4 standard USB 2.0 host ports are provided at the I/O panel
of the Hurricane LX800. . . Port 3 can be used as USB device
port. This feature must be enabled in BIOS
under Motherboard Device Configuration -> PCI Configuration.
UDC must be enabled; Port 3 assignment should be set to device.

Thomas



2009-01-14 22:49:53

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Wednesday 14 January 2009 04:43:40 Thomas Dahlmann wrote:
> Vadim Lobanov schrieb:
> > cap=0x000083EA
> > The host power control bits affect USB_PWR_EN1 for ports 1-3.
> > The host power control bits affect USB_PWR_EN2 for port 4.
> > Port power enabled with output of 1 on USB_PWR_EN1 and USB_PWR_EN2.
> > No overcurrent reporting.
> > Automatic pull-up (APU) is enabled.
> >
> > mux=0x00000003
> > The muxed port is assigned to the device controller.
> >
> > ctl=0x00000083
> > The muxed port is assigned to the device controller.
> > The comparator circuitry for VBUS detection (PADEN) is enabled.
>
> The UOC registers look good to me too except for bit VBUSVLD
> (bit 8 in UOCMUX). This bit should get set when cable is
> connected. Maybe VBUS pin is not connected. In this case the
> internal pull up will not get connected. For testing the pull up can
> be connected manually by clearing APU in UOCCAP and
> setting PUEN in UOCMUX.

I modprobed the driver before plugging in the cable, so that would explain the
VBUSVLD bit not being set.

> But if SD bit is cleared (and UOC bits seem OK as you showed)
> then only hardware can be the root cause of the issue. If host
> detects a USB connection it will automatically send a
> GET_DESCRIPTOR message to UDC. You would see kernel
> messages on host side, at least messages indicating that device
> is not responding.

It seems that the ultimate cause for all this strangeness was a mis-wired
board: the vendor at long last admitted that the VBUS signal is not connected
in the hardware. Why they didn't simply document this fact, and thereby save
me a lot of wasted effort, remains a mystery.

As a workaround, they suggested setting bit 0x00000010 in the ctl register,
which is strangely enough inside the "reserved" space according to the AMD
data book. Worth a shot, I thought. After hacking up the amd5536udc driver
even more to do this operation at load time, the register states change to:

cap=0x000083EA
mux=0x00000007
ctl=0x00000193

Seems that a few other "reserved" bits also raised in response. (I still do
not understand exactly how they wired up the CS5536 IO chip to be different
from the data book, but oh well.) In this mode, the USB link is finally
detected, and the board is finally able to enumerate itself as a g_zero
device.

Please accept my sincere thank-you for your help. I'm hoping that everything
will work from now on, and that I won't send any more questions your way. :)
I'll also await an update to the irq stuff in the amd5536udc driver that we
talked about earlier, if you get around to making the fix for it first.

> I found a Hurricane LX800 manual at
>
> http://www.lippert-at.de/fileadmin/lippert-at/products/EPIC/Hurricane-LX800
>/TME-EPIC-HURLX-R1V4.pdf
>
> saying that there is no type B device port and device port is port 3.
> Do you have a different one?

Yep, that's the one. We're using a USB A-to-B adapter to turn port 3 from an A
port to a B-port. The standard USB cable uses this adapter.

> Chapter 3.8, page 19:
> 4 standard USB 2.0 host ports are provided at the I/O panel
> of the Hurricane LX800. . . Port 3 can be used as USB device
> port. This feature must be enabled in BIOS
> under Motherboard Device Configuration -> PCI Configuration.
> UDC must be enabled; Port 3 assignment should be set to device.

See how they conveniently forgot to mention UOC in their description, much
less some magical bit in the ctl register? Tricky, tricky! :)

-- Vadim Lobanov

2009-01-15 09:26:28

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug


> It seems that the ultimate cause for all this strangeness was a mis-wired
> board: the vendor at long last admitted that the VBUS signal is not connected
> in the hardware. Why they didn't simply document this fact, and thereby save
> me a lot of wasted effort, remains a mystery.
>
> As a workaround, they suggested setting bit 0x00000010 in the ctl register,
> which is strangely enough inside the "reserved" space according to the AMD
> data book. Worth a shot, I thought. After hacking up the amd5536udc driver
> even more to do this operation at load time, the register states change to:
>
> cap=0x000083EA
> mux=0x00000007
> ctl=0x00000193
>
> Seems that a few other "reserved" bits also raised in response. (I still do
> not understand exactly how they wired up the CS5536 IO chip to be different
> from the data book, but oh well.) In this mode, the USB link is finally
> detected, and the board is finally able to enumerate itself as a g_zero
> device.
>
This reserved bit seems to be an alias to the earlier mentioned PUEN bit
which allows to connect the pull up manually regardless of VBUS.
You will find that bit at page 276 in

http://www.amd.com/files/connectivitysolutions/aufamily/au1200/32798e_Au1200_ds.pdf

If you compare the registers you will find that UDC controller of
Au1200 is 99.9% compatible with UDC of CS5536. Same for
CS5536 UOC and Au1200 OTG Controller. But OTG Controller of
Au1200 has more functionality and bits.

Unfortunately this bit is not the complete solution for your problem.
When setting PUE then connect and disconnect events cannot be
detected by UDC. If you disconnect and connect again you
probably will notice that there will be no new enumeration as
expected.
> Please accept my sincere thank-you for your help. I'm hoping that everything
> will work from now on, and that I won't send any more questions your way. :)
>
My pleasure! It is fun for me to continue this thread.
> I'll also await an update to the irq stuff in the amd5536udc driver that we
> talked about earlier, if you get around to making the fix for it first.
>
I will do when I have a new CS5536 board. Thanks for your
analysis again!

Thomas

2009-01-17 00:17:29

by Vadim Lobanov

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug

On Thursday 15 January 2009 01:26:04 Thomas Dahlmann wrote:
> > It seems that the ultimate cause for all this strangeness was a mis-wired
> > board: the vendor at long last admitted that the VBUS signal is not
> > connected in the hardware. Why they didn't simply document this fact, and
> > thereby save me a lot of wasted effort, remains a mystery.
> >
> > As a workaround, they suggested setting bit 0x00000010 in the ctl
> > register, which is strangely enough inside the "reserved" space according
> > to the AMD data book. Worth a shot, I thought. After hacking up the
> > amd5536udc driver even more to do this operation at load time, the
> > register states change to:
> >
> > cap=0x000083EA
> > mux=0x00000007
> > ctl=0x00000193
> >
> > Seems that a few other "reserved" bits also raised in response. (I still
> > do not understand exactly how they wired up the CS5536 IO chip to be
> > different from the data book, but oh well.) In this mode, the USB link is
> > finally detected, and the board is finally able to enumerate itself as a
> > g_zero device.
>
> This reserved bit seems to be an alias to the earlier mentioned PUEN bit
> which allows to connect the pull up manually regardless of VBUS.
> You will find that bit at page 276 in
>
> http://www.amd.com/files/connectivitysolutions/aufamily/au1200/32798e_Au120
>0_ds.pdf
>
> If you compare the registers you will find that UDC controller of
> Au1200 is 99.9% compatible with UDC of CS5536. Same for
> CS5536 UOC and Au1200 OTG Controller. But OTG Controller of
> Au1200 has more functionality and bits.
>
> Unfortunately this bit is not the complete solution for your problem.
> When setting PUE then connect and disconnect events cannot be
> detected by UDC. If you disconnect and connect again you
> probably will notice that there will be no new enumeration as
> expected.

Interestingly enough, when running in this configuration, unplugging and
replugging the cable does cause a re-enumeration. Hmmmm.

-- Vadim Lobanov

2009-01-19 12:24:18

by Thomas Dahlmann

[permalink] [raw]
Subject: Re: amd5536udc interrupts bug


>> Unfortunately this bit is not the complete solution for your problem.
>> When setting PUE then connect and disconnect events cannot be
>> detected by UDC. If you disconnect and connect again you
>> probably will notice that there will be no new enumeration as
>> expected.
>
> Interestingly enough, when running in this configuration, unplugging and
> replugging the cable does cause a re-enumeration. Hmmmm.
>
> -- Vadim Lobanov
>

Ah, I see. So PUE would be OK so far. I assume this worked as host
recognizes a disconnect/connect and performs a USB reset. USB reset
has almost the same effect as disconnect/connect on device side.

Thomas