2013-07-18 13:21:56

by Nestor Lopez Casado

[permalink] [raw]
Subject: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.

This patch re-adds the workaround introduced by 596264082f10dd4
which was reverted by 8af6c08830b1ae114.

The original patch 596264 was needed to overcome a situation where
the hid-core would drop incoming reports while probe() was being
executed.

This issue was solved by c849a6143bec520af which added
hid_device_io_start() and hid_device_io_stop() that enable a specific
hid driver to opt-in for input reports while its probe() is being
executed.

Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
functionality added to hid-core. Having done that, workaround 596264
was no longer necessary and was reverted by 8af6c08.

We now encounter a different problem that ends up 'again' thwarting
the Unifying receiver enumeration. The problem is time and usb controller
dependent. Ocasionally the reports sent to the usb receiver to start
the paired devices enumeration fail with -EPIPE and the receiver never
gets to enumerate the paired devices.

With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb
driver became asynchronous and none was catching the error from the
failing URB.

As the root cause for this failing SET_REPORT is not understood yet,
-possibly a race on the usb controller drivers or a problem with the
Unifying receiver- reintroducing this workaround solves the problem.

Overall what this workaround does is: If an input report from an
unknown device is received, then a (re)enumeration is performed.

related bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1194649

Signed-off-by: Nestor Lopez Casado <[email protected]>
---
drivers/hid/hid-logitech-dj.c | 45 +++++++++++++++++++++++++++++++++++++++++
drivers/hid/hid-logitech-dj.h | 1 +
2 files changed, 46 insertions(+)

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index db3192b..0d13389 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -192,6 +192,7 @@ static struct hid_ll_driver logi_dj_ll_driver;
static int logi_dj_output_hidraw_report(struct hid_device *hid, u8 * buf,
size_t count,
unsigned char report_type);
+static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev);

static void logi_dj_recv_destroy_djhid_device(struct dj_receiver_dev *djrcv_dev,
struct dj_report *dj_report)
@@ -232,6 +233,7 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev,
if (dj_report->report_params[DEVICE_PAIRED_PARAM_SPFUNCTION] &
SPFUNCTION_DEVICE_LIST_EMPTY) {
dbg_hid("%s: device list is empty\n", __func__);
+ djrcv_dev->querying_devices = false;
return;
}

@@ -242,6 +244,12 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev,
return;
}

+ if (djrcv_dev->paired_dj_devices[dj_report->device_index]) {
+ /* The device is already known. No need to reallocate it. */
+ dbg_hid("%s: device is already known\n", __func__);
+ return;
+ }
+
dj_hiddev = hid_allocate_device();
if (IS_ERR(dj_hiddev)) {
dev_err(&djrcv_hdev->dev, "%s: hid_allocate_device failed\n",
@@ -305,6 +313,7 @@ static void delayedwork_callback(struct work_struct *work)
struct dj_report dj_report;
unsigned long flags;
int count;
+ int retval;

dbg_hid("%s\n", __func__);

@@ -337,6 +346,25 @@ static void delayedwork_callback(struct work_struct *work)
logi_dj_recv_destroy_djhid_device(djrcv_dev, &dj_report);
break;
default:
+ /* A normal report (i. e. not belonging to a pair/unpair notification)
+ * arriving here, means that the report arrived but we did not have a
+ * paired dj_device associated to the report's device_index, this
+ * means that the original "device paired" notification corresponding
+ * to this dj_device never arrived to this driver. The reason is that
+ * hid-core discards all packets coming from a device while probe() is
+ * executing. */
+ if (!djrcv_dev->paired_dj_devices[dj_report.device_index]) {
+ /* ok, we don't know the device, just re-ask the
+ * receiver for the list of connected devices. */
+ retval = logi_dj_recv_query_paired_devices(djrcv_dev);
+ if (!retval) {
+ /* everything went fine, so just leave */
+ break;
+ }
+ dev_err(&djrcv_dev->hdev->dev,
+ "%s:logi_dj_recv_query_paired_devices "
+ "error:%d\n", __func__, retval);
+ }
dbg_hid("%s: unexpected report type\n", __func__);
}
}
@@ -367,6 +395,12 @@ static void logi_dj_recv_forward_null_report(struct dj_receiver_dev *djrcv_dev,
if (!djdev) {
dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]"
" is NULL, index %d\n", dj_report->device_index);
+ kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report));
+
+ if (schedule_work(&djrcv_dev->work) == 0) {
+ dbg_hid("%s: did not schedule the work item, was already "
+ "queued\n", __func__);
+ }
return;
}

@@ -397,6 +431,12 @@ static void logi_dj_recv_forward_report(struct dj_receiver_dev *djrcv_dev,
if (dj_device == NULL) {
dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]"
" is NULL, index %d\n", dj_report->device_index);
+ kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report));
+
+ if (schedule_work(&djrcv_dev->work) == 0) {
+ dbg_hid("%s: did not schedule the work item, was already "
+ "queued\n", __func__);
+ }
return;
}

@@ -444,6 +484,10 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
struct dj_report *dj_report;
int retval;

+ /* no need to protect djrcv_dev->querying_devices */
+ if (djrcv_dev->querying_devices)
+ return 0;
+
dj_report = kzalloc(sizeof(struct dj_report), GFP_KERNEL);
if (!dj_report)
return -ENOMEM;
@@ -455,6 +499,7 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
return retval;
}

+
static int logi_dj_recv_switch_to_dj_mode(struct dj_receiver_dev *djrcv_dev,
unsigned timeout)
{
diff --git a/drivers/hid/hid-logitech-dj.h b/drivers/hid/hid-logitech-dj.h
index fd28a5e..4a40003 100644
--- a/drivers/hid/hid-logitech-dj.h
+++ b/drivers/hid/hid-logitech-dj.h
@@ -101,6 +101,7 @@ struct dj_receiver_dev {
struct work_struct work;
struct kfifo notif_fifo;
spinlock_t lock;
+ bool querying_devices;
};

struct dj_device {
--
1.7.9.5


2013-07-18 13:22:40

by Nestor Lopez Casado

[permalink] [raw]
Subject: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

Set querying_devices flag to true when we start the enumeration
process.

This was missing from the original patch. It never produced
undesirable effects as it is highly improbable to have a second
enumeration triggered while a first one was still in progress.

Signed-off-by: Nestor Lopez Casado <[email protected]>
---
drivers/hid/hid-logitech-dj.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index 0d13389..d4657a5 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -488,6 +488,8 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
if (djrcv_dev->querying_devices)
return 0;

+ djrcv_dev->querying_devices = true;
+
dj_report = kzalloc(sizeof(struct dj_report), GFP_KERNEL);
if (!dj_report)
return -ENOMEM;
--
1.7.9.5

2013-07-18 20:28:09

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

[ +cc Sarah Sharp, linux-usb ]

On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
> This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>
> This patch re-adds the workaround introduced by 596264082f10dd4
> which was reverted by 8af6c08830b1ae114.
>
> The original patch 596264 was needed to overcome a situation where
> the hid-core would drop incoming reports while probe() was being
> executed.
>
> This issue was solved by c849a6143bec520af which added
> hid_device_io_start() and hid_device_io_stop() that enable a specific
> hid driver to opt-in for input reports while its probe() is being
> executed.
>
> Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
> functionality added to hid-core. Having done that, workaround 596264
> was no longer necessary and was reverted by 8af6c08.
>
> We now encounter a different problem that ends up 'again' thwarting
> the Unifying receiver enumeration. The problem is time and usb controller
> dependent. Ocasionally the reports sent to the usb receiver to start
> the paired devices enumeration fail with -EPIPE and the receiver never
> gets to enumerate the paired devices.
>
> With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb
> driver became asynchronous and none was catching the error from the
> failing URB.
>
> As the root cause for this failing SET_REPORT is not understood yet,
> -possibly a race on the usb controller drivers or a problem with the
> Unifying receiver- reintroducing this workaround solves the problem.


Before we revert to using the workaround, I'd like to suggest that
this new "hidden" problem may be an interaction with the xhci_hcd host
controller driver only.

Looking at the related bug, the OP indicates the machine only has
USB3 ports. Additionally, comments #7, #100, and #104 of the original
bug report [1] add additional information that would seem to confirm
this suspicion.

Let me add I have this USB device running on the uhci_hcd driver
with or without this workaround on v3.10.


> Overall what this workaround does is: If an input report from an
> unknown device is received, then a (re)enumeration is performed.
>
> related bug:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1194649

I thought I saw someone reporting this problem recently on LKML;
where is the Reported-by so that Sarah can follow-up with the
reporter directly, if desired?

Regards,
Peter Hurley

[1]
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1039143


> Signed-off-by: Nestor Lopez Casado <[email protected]>
> ---
> drivers/hid/hid-logitech-dj.c | 45 +++++++++++++++++++++++++++++++++++++++++
> drivers/hid/hid-logitech-dj.h | 1 +
> 2 files changed, 46 insertions(+)
>
> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> index db3192b..0d13389 100644
> --- a/drivers/hid/hid-logitech-dj.c
> +++ b/drivers/hid/hid-logitech-dj.c
> @@ -192,6 +192,7 @@ static struct hid_ll_driver logi_dj_ll_driver;
> static int logi_dj_output_hidraw_report(struct hid_device *hid, u8 * buf,
> size_t count,
> unsigned char report_type);
> +static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev);
>
> static void logi_dj_recv_destroy_djhid_device(struct dj_receiver_dev *djrcv_dev,
> struct dj_report *dj_report)
> @@ -232,6 +233,7 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev,
> if (dj_report->report_params[DEVICE_PAIRED_PARAM_SPFUNCTION] &
> SPFUNCTION_DEVICE_LIST_EMPTY) {
> dbg_hid("%s: device list is empty\n", __func__);
> + djrcv_dev->querying_devices = false;
> return;
> }
>
> @@ -242,6 +244,12 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev,
> return;
> }
>
> + if (djrcv_dev->paired_dj_devices[dj_report->device_index]) {
> + /* The device is already known. No need to reallocate it. */
> + dbg_hid("%s: device is already known\n", __func__);
> + return;
> + }
> +
> dj_hiddev = hid_allocate_device();
> if (IS_ERR(dj_hiddev)) {
> dev_err(&djrcv_hdev->dev, "%s: hid_allocate_device failed\n",
> @@ -305,6 +313,7 @@ static void delayedwork_callback(struct work_struct *work)
> struct dj_report dj_report;
> unsigned long flags;
> int count;
> + int retval;
>
> dbg_hid("%s\n", __func__);
>
> @@ -337,6 +346,25 @@ static void delayedwork_callback(struct work_struct *work)
> logi_dj_recv_destroy_djhid_device(djrcv_dev, &dj_report);
> break;
> default:
> + /* A normal report (i. e. not belonging to a pair/unpair notification)
> + * arriving here, means that the report arrived but we did not have a
> + * paired dj_device associated to the report's device_index, this
> + * means that the original "device paired" notification corresponding
> + * to this dj_device never arrived to this driver. The reason is that
> + * hid-core discards all packets coming from a device while probe() is
> + * executing. */
> + if (!djrcv_dev->paired_dj_devices[dj_report.device_index]) {
> + /* ok, we don't know the device, just re-ask the
> + * receiver for the list of connected devices. */
> + retval = logi_dj_recv_query_paired_devices(djrcv_dev);
> + if (!retval) {
> + /* everything went fine, so just leave */
> + break;
> + }
> + dev_err(&djrcv_dev->hdev->dev,
> + "%s:logi_dj_recv_query_paired_devices "
> + "error:%d\n", __func__, retval);
> + }
> dbg_hid("%s: unexpected report type\n", __func__);
> }
> }
> @@ -367,6 +395,12 @@ static void logi_dj_recv_forward_null_report(struct dj_receiver_dev *djrcv_dev,
> if (!djdev) {
> dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]"
> " is NULL, index %d\n", dj_report->device_index);
> + kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report));
> +
> + if (schedule_work(&djrcv_dev->work) == 0) {
> + dbg_hid("%s: did not schedule the work item, was already "
> + "queued\n", __func__);
> + }
> return;
> }
>
> @@ -397,6 +431,12 @@ static void logi_dj_recv_forward_report(struct dj_receiver_dev *djrcv_dev,
> if (dj_device == NULL) {
> dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]"
> " is NULL, index %d\n", dj_report->device_index);
> + kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report));
> +
> + if (schedule_work(&djrcv_dev->work) == 0) {
> + dbg_hid("%s: did not schedule the work item, was already "
> + "queued\n", __func__);
> + }
> return;
> }
>
> @@ -444,6 +484,10 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
> struct dj_report *dj_report;
> int retval;
>
> + /* no need to protect djrcv_dev->querying_devices */
> + if (djrcv_dev->querying_devices)
> + return 0;
> +
> dj_report = kzalloc(sizeof(struct dj_report), GFP_KERNEL);
> if (!dj_report)
> return -ENOMEM;
> @@ -455,6 +499,7 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
> return retval;
> }
>
> +
> static int logi_dj_recv_switch_to_dj_mode(struct dj_receiver_dev *djrcv_dev,
> unsigned timeout)
> {
> diff --git a/drivers/hid/hid-logitech-dj.h b/drivers/hid/hid-logitech-dj.h
> index fd28a5e..4a40003 100644
> --- a/drivers/hid/hid-logitech-dj.h
> +++ b/drivers/hid/hid-logitech-dj.h
> @@ -101,6 +101,7 @@ struct dj_receiver_dev {
> struct work_struct work;
> struct kfifo notif_fifo;
> spinlock_t lock;
> + bool querying_devices;
> };
>
> struct dj_device {
>

2013-07-18 22:09:56

by Sarah Sharp

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Thu, Jul 18, 2013 at 04:28:01PM -0400, Peter Hurley wrote:
> [ +cc Sarah Sharp, linux-usb ]
>
> On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
> >This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
> >
> >This patch re-adds the workaround introduced by 596264082f10dd4
> >which was reverted by 8af6c08830b1ae114.
> >
> >The original patch 596264 was needed to overcome a situation where
> >the hid-core would drop incoming reports while probe() was being
> >executed.
> >
> >This issue was solved by c849a6143bec520af which added
> >hid_device_io_start() and hid_device_io_stop() that enable a specific
> >hid driver to opt-in for input reports while its probe() is being
> >executed.
> >
> >Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
> >functionality added to hid-core. Having done that, workaround 596264
> >was no longer necessary and was reverted by 8af6c08.
> >
> >We now encounter a different problem that ends up 'again' thwarting
> >the Unifying receiver enumeration. The problem is time and usb controller
> >dependent. Ocasionally the reports sent to the usb receiver to start
> >the paired devices enumeration fail with -EPIPE and the receiver never
> >gets to enumerate the paired devices.
> >
> >With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb
> >driver became asynchronous and none was catching the error from the
> >failing URB.
> >
> >As the root cause for this failing SET_REPORT is not understood yet,
> >-possibly a race on the usb controller drivers or a problem with the
> >Unifying receiver- reintroducing this workaround solves the problem.
>
>
> Before we revert to using the workaround, I'd like to suggest that
> this new "hidden" problem may be an interaction with the xhci_hcd host
> controller driver only.
>
> Looking at the related bug, the OP indicates the machine only has
> USB3 ports. Additionally, comments #7, #100, and #104 of the original
> bug report [1] add additional information that would seem to confirm
> this suspicion.

Question: does this USB device need a control transfer to reset its
endpoints when the endpoints are not actually halted? If so, yes, that
is a known xHCI driver bug that needs to be fixed. The xHCI host will
not accept a Reset Endpoint command when the endpoints are not actually
halted, but the USB core will send the control transfer to reset the
endpoint. That means the device and host toggles will be out of sync,
and all messages will start to fail with -EPIPE.

Can the OP capture a usbmon trace when the device starts failing? That
will reveal whether this actually is the issue. dmesg output with
CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
be helpful.

Sarah Sharp

2013-07-18 23:37:10

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 07/18/2013 06:09 PM, Sarah Sharp wrote:
> On Thu, Jul 18, 2013 at 04:28:01PM -0400, Peter Hurley wrote:
>> [ +cc Sarah Sharp, linux-usb ]
>>
>> On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
>>> This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>>>
>>> This patch re-adds the workaround introduced by 596264082f10dd4
>>> which was reverted by 8af6c08830b1ae114.
>>>
>>> The original patch 596264 was needed to overcome a situation where
>>> the hid-core would drop incoming reports while probe() was being
>>> executed.
>>>
>>> This issue was solved by c849a6143bec520af which added
>>> hid_device_io_start() and hid_device_io_stop() that enable a specific
>>> hid driver to opt-in for input reports while its probe() is being
>>> executed.
>>>
>>> Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
>>> functionality added to hid-core. Having done that, workaround 596264
>>> was no longer necessary and was reverted by 8af6c08.
>>>
>>> We now encounter a different problem that ends up 'again' thwarting
>>> the Unifying receiver enumeration. The problem is time and usb controller
>>> dependent. Ocasionally the reports sent to the usb receiver to start
>>> the paired devices enumeration fail with -EPIPE and the receiver never
>>> gets to enumerate the paired devices.
>>>
>>> With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb
>>> driver became asynchronous and none was catching the error from the
>>> failing URB.
>>>
>>> As the root cause for this failing SET_REPORT is not understood yet,
>>> -possibly a race on the usb controller drivers or a problem with the
>>> Unifying receiver- reintroducing this workaround solves the problem.
>>
>>
>> Before we revert to using the workaround, I'd like to suggest that
>> this new "hidden" problem may be an interaction with the xhci_hcd host
>> controller driver only.
>>
>> Looking at the related bug, the OP indicates the machine only has
>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>> bug report [1] add additional information that would seem to confirm
>> this suspicion.
>
> Question: does this USB device need a control transfer to reset its
> endpoints when the endpoints are not actually halted? If so, yes, that
> is a known xHCI driver bug that needs to be fixed. The xHCI host will
> not accept a Reset Endpoint command when the endpoints are not actually
> halted, but the USB core will send the control transfer to reset the
> endpoint. That means the device and host toggles will be out of sync,
> and all messages will start to fail with -EPIPE.
>
> Can the OP capture a usbmon trace when the device starts failing? That
> will reveal whether this actually is the issue. dmesg output with
> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
> be helpful.

Sarah,

I forwarded your usbmon capture request to the OP in the bug report
(I don't have an email address for the reporter).

As far as getting printk output from a custom kernel, I think that may
be beyond the reporter's capability. Perhaps one of the Ubuntu devs
triaging this bug could provide a test kernel for the OP with those
options on.

Joseph, would you be willing to do that?

Regards,
Peter Hurley

2013-07-19 08:36:08

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

Hi Peter,

thanks for forwarding this to the appropriate people & mailing list.

Hi Sarah,

thanks for starting investigating this :)

On Fri, Jul 19, 2013 at 1:37 AM, Peter Hurley <[email protected]> wrote:
>>>
>>>
>>>
>>> Before we revert to using the workaround, I'd like to suggest that
>>> this new "hidden" problem may be an interaction with the xhci_hcd host
>>> controller driver only.
>>>
>>> Looking at the related bug, the OP indicates the machine only has
>>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>>> bug report [1] add additional information that would seem to confirm
>>> this suspicion.

Definitively, this is a USB3 problem. However, it is not generic (I
can not reproduce it with my USB3 boards.)

>>
>>
>> Question: does this USB device need a control transfer to reset its
>> endpoints when the endpoints are not actually halted? If so, yes, that
>> is a known xHCI driver bug that needs to be fixed. The xHCI host will
>> not accept a Reset Endpoint command when the endpoints are not actually
>> halted, but the USB core will send the control transfer to reset the
>> endpoint. That means the device and host toggles will be out of sync,
>> and all messages will start to fail with -EPIPE.
>>
>> Can the OP capture a usbmon trace when the device starts failing? That
>> will reveal whether this actually is the issue. dmesg output with
>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
>> be helpful.
>

Here is another linux-input thread were you have the usbmon traces:
http://www.spinics.net/lists/linux-input/msg26542.html
Wujun Zhou already did one test of a kernel patch for me (which did
not solve the problem, because I was not at the USB level), so I bet
he will be able to do some testings for you.

In the logs he posted (logitech_work.pcapng.gz), the interesting part
is starting from the capture #45:

#45: SET_REPORT request to switch the receiver to the "DJ" mode (the
receiver stops sending regular HID events, but goes into its
proprietary protocol)
#47: SET_REPORT response -> all good
#48: SET_REPORT request to ask the receiver to enumerate all of his
devices (it is called right after we received the previous response)
#49: SET_REPORT response -> -EPIPE
#50: URB_INTERRUPT_IN (~3 seconds later) -> the device is working normally

The weird thing is that only the first enumeration message failed with
-EPIPE: the device answers later control transfer correctly (#54 /
#55).

>
> Sarah,
>
> I forwarded your usbmon capture request to the OP in the bug report
> (I don't have an email address for the reporter).
>

Here are some other helpful information:
the first "fix" we have done is dcd9006b1b053c7b1c. It is linked to
several bugs:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1072082
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1039143
https://bugzilla.redhat.com/show_bug.cgi?id=840391
https://bugzilla.kernel.org/show_bug.cgi?id=49781

Most of them are people complaining, but in one of the comments,
adding a 500ms wait between the two control transfer (switch to DJ +
enumerate) fixed the -EPIPE problem. I interpreted it as a scheduled
problem (using direct call to usb_control_msg() vs use the scheduled
one usbhid_submit_message()) but it was just delaying the problem out
of the probe. Unfortunately, I missed that as I did not asked for the
usbmon traces at that time.

One last thing, I understood that Linus is also experiencing this
problem... Adding him in CC to let him know of the progress.

Cheers,
Benjamin

2013-07-19 14:39:16

by Joseph Salisbury

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 07/18/2013 07:37 PM, Peter Hurley wrote:
> On 07/18/2013 06:09 PM, Sarah Sharp wrote:
>> On Thu, Jul 18, 2013 at 04:28:01PM -0400, Peter Hurley wrote:
>>> [ +cc Sarah Sharp, linux-usb ]
>>>
>>> On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
>>>> This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>>>>
>>>> This patch re-adds the workaround introduced by 596264082f10dd4
>>>> which was reverted by 8af6c08830b1ae114.
>>>>
>>>> The original patch 596264 was needed to overcome a situation where
>>>> the hid-core would drop incoming reports while probe() was being
>>>> executed.
>>>>
>>>> This issue was solved by c849a6143bec520af which added
>>>> hid_device_io_start() and hid_device_io_stop() that enable a specific
>>>> hid driver to opt-in for input reports while its probe() is being
>>>> executed.
>>>>
>>>> Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
>>>> functionality added to hid-core. Having done that, workaround 596264
>>>> was no longer necessary and was reverted by 8af6c08.
>>>>
>>>> We now encounter a different problem that ends up 'again' thwarting
>>>> the Unifying receiver enumeration. The problem is time and usb
>>>> controller
>>>> dependent. Ocasionally the reports sent to the usb receiver to start
>>>> the paired devices enumeration fail with -EPIPE and the receiver never
>>>> gets to enumerate the paired devices.
>>>>
>>>> With dcd9006b1b053c7b1c the problem was "hidden" as the call to the
>>>> usb
>>>> driver became asynchronous and none was catching the error from the
>>>> failing URB.
>>>>
>>>> As the root cause for this failing SET_REPORT is not understood yet,
>>>> -possibly a race on the usb controller drivers or a problem with the
>>>> Unifying receiver- reintroducing this workaround solves the problem.
>>>
>>>
>>> Before we revert to using the workaround, I'd like to suggest that
>>> this new "hidden" problem may be an interaction with the xhci_hcd host
>>> controller driver only.
>>>
>>> Looking at the related bug, the OP indicates the machine only has
>>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>>> bug report [1] add additional information that would seem to confirm
>>> this suspicion.
>>
>> Question: does this USB device need a control transfer to reset its
>> endpoints when the endpoints are not actually halted? If so, yes, that
>> is a known xHCI driver bug that needs to be fixed. The xHCI host will
>> not accept a Reset Endpoint command when the endpoints are not actually
>> halted, but the USB core will send the control transfer to reset the
>> endpoint. That means the device and host toggles will be out of sync,
>> and all messages will start to fail with -EPIPE.
>>
>> Can the OP capture a usbmon trace when the device starts failing? That
>> will reveal whether this actually is the issue. dmesg output with
>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
>> be helpful.
>
> Sarah,
>
> I forwarded your usbmon capture request to the OP in the bug report
> (I don't have an email address for the reporter).
>
> As far as getting printk output from a custom kernel, I think that may
> be beyond the reporter's capability. Perhaps one of the Ubuntu devs
> triaging this bug could provide a test kernel for the OP with those
> options on.
>
> Joseph, would you be willing to do that?

Sure thing. I'll build a kernel and request that the bug reporter
collect usbmon data.

Thanks everyone for the feedback!

>
> Regards,
> Peter Hurley

2013-07-19 15:14:13

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Thu, 18 Jul 2013, Sarah Sharp wrote:

> Question: does this USB device need a control transfer to reset its
> endpoints when the endpoints are not actually halted? If so, yes, that
> is a known xHCI driver bug that needs to be fixed. The xHCI host will
> not accept a Reset Endpoint command when the endpoints are not actually
> halted, but the USB core will send the control transfer to reset the
> endpoint. That means the device and host toggles will be out of sync,
> and all messages will start to fail with -EPIPE.

Why -EPIPE? Isn't that code reserved to indicate a STALL?

In fact, there's no way to detect a toggle mismatch problem with a
USB-2 device. Packets with the wrong toggle value are simply ignored
(or acknowledged and ignored). The problem ends up appearing as an
indefinite delay (for an IN transfer) or as data lost (for an OUT
transfer).

I don't know what happens with USB-3 devices when the sequence numbers
get out of alignment.

Alan Stern

2013-07-19 16:44:17

by Nestor Lopez Casado

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

Hi Sarah,

On Fri, Jul 19, 2013 at 12:09 AM, Sarah Sharp
<[email protected]> wrote:
> On Thu, Jul 18, 2013 at 04:28:01PM -0400, Peter Hurley wrote:
>> [ +cc Sarah Sharp, linux-usb ]
>>
>> On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
>> >This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>> >
>> >This patch re-adds the workaround introduced by 596264082f10dd4
>> >which was reverted by 8af6c08830b1ae114.
>> >
>> >The original patch 596264 was needed to overcome a situation where
>> >the hid-core would drop incoming reports while probe() was being
>> >executed.
>> >
>> >This issue was solved by c849a6143bec520af which added
>> >hid_device_io_start() and hid_device_io_stop() that enable a specific
>> >hid driver to opt-in for input reports while its probe() is being
>> >executed.
>> >
>> >Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
>> >functionality added to hid-core. Having done that, workaround 596264
>> >was no longer necessary and was reverted by 8af6c08.
>> >
>> >We now encounter a different problem that ends up 'again' thwarting
>> >the Unifying receiver enumeration. The problem is time and usb controller
>> >dependent. Ocasionally the reports sent to the usb receiver to start
>> >the paired devices enumeration fail with -EPIPE and the receiver never
>> >gets to enumerate the paired devices.
>> >
>> >With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb
>> >driver became asynchronous and none was catching the error from the
>> >failing URB.
>> >
>> >As the root cause for this failing SET_REPORT is not understood yet,
>> >-possibly a race on the usb controller drivers or a problem with the
>> >Unifying receiver- reintroducing this workaround solves the problem.
>>
>>
>> Before we revert to using the workaround, I'd like to suggest that
>> this new "hidden" problem may be an interaction with the xhci_hcd host
>> controller driver only.
>>
>> Looking at the related bug, the OP indicates the machine only has
>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>> bug report [1] add additional information that would seem to confirm
>> this suspicion.
>
> Question: does this USB device need a control transfer to reset its
> endpoints when the endpoints are not actually halted? If so, yes, that
> is a known xHCI driver bug that needs to be fixed. The xHCI host will
> not accept a Reset Endpoint command when the endpoints are not actually
> halted, but the USB core will send the control transfer to reset the
> endpoint. That means the device and host toggles will be out of sync,
> and all messages will start to fail with -EPIPE.

Could you re-phrase your question providing a bit more detail? I don't
quite get the idea.

>
> Can the OP capture a usbmon trace when the device starts failing? That
> will reveal whether this actually is the issue. dmesg output with
> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
> be helpful.
>
> Sarah Sharp

Cheers,
Nestor

2013-07-19 21:31:29

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 07/19/2013 10:38 AM, Joseph Salisbury wrote:
> On 07/18/2013 07:37 PM, Peter Hurley wrote:
>> On 07/18/2013 06:09 PM, Sarah Sharp wrote:
>>> On Thu, Jul 18, 2013 at 04:28:01PM -0400, Peter Hurley wrote:
>>>> [ +cc Sarah Sharp, linux-usb ]
>>>>
>>>> On 07/18/2013 09:21 AM, Nestor Lopez Casado wrote:
>>>>> This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>>>>>
>>>>> This patch re-adds the workaround introduced by 596264082f10dd4
>>>>> which was reverted by 8af6c08830b1ae114.
>>>>>
>>>>> The original patch 596264 was needed to overcome a situation where
>>>>> the hid-core would drop incoming reports while probe() was being
>>>>> executed.
>>>>>
>>>>> This issue was solved by c849a6143bec520af which added
>>>>> hid_device_io_start() and hid_device_io_stop() that enable a specific
>>>>> hid driver to opt-in for input reports while its probe() is being
>>>>> executed.
>>>>>
>>>>> Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
>>>>> functionality added to hid-core. Having done that, workaround 596264
>>>>> was no longer necessary and was reverted by 8af6c08.
>>>>>
>>>>> We now encounter a different problem that ends up 'again' thwarting
>>>>> the Unifying receiver enumeration. The problem is time and usb
>>>>> controller
>>>>> dependent. Ocasionally the reports sent to the usb receiver to start
>>>>> the paired devices enumeration fail with -EPIPE and the receiver never
>>>>> gets to enumerate the paired devices.
>>>>>
>>>>> With dcd9006b1b053c7b1c the problem was "hidden" as the call to the
>>>>> usb
>>>>> driver became asynchronous and none was catching the error from the
>>>>> failing URB.
>>>>>
>>>>> As the root cause for this failing SET_REPORT is not understood yet,
>>>>> -possibly a race on the usb controller drivers or a problem with the
>>>>> Unifying receiver- reintroducing this workaround solves the problem.
>>>>
>>>>
>>>> Before we revert to using the workaround, I'd like to suggest that
>>>> this new "hidden" problem may be an interaction with the xhci_hcd host
>>>> controller driver only.
>>>>
>>>> Looking at the related bug, the OP indicates the machine only has
>>>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>>>> bug report [1] add additional information that would seem to confirm
>>>> this suspicion.
>>>
>>> Question: does this USB device need a control transfer to reset its
>>> endpoints when the endpoints are not actually halted? If so, yes, that
>>> is a known xHCI driver bug that needs to be fixed. The xHCI host will
>>> not accept a Reset Endpoint command when the endpoints are not actually
>>> halted, but the USB core will send the control transfer to reset the
>>> endpoint. That means the device and host toggles will be out of sync,
>>> and all messages will start to fail with -EPIPE.
>>>
>>> Can the OP capture a usbmon trace when the device starts failing? That
>>> will reveal whether this actually is the issue. dmesg output with
>>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on would also
>>> be helpful.
>>
>> Sarah,
>>
>> I forwarded your usbmon capture request to the OP in the bug report
>> (I don't have an email address for the reporter).
>>
>> As far as getting printk output from a custom kernel, I think that may
>> be beyond the reporter's capability. Perhaps one of the Ubuntu devs
>> triaging this bug could provide a test kernel for the OP with those
>> options on.
>>
>> Joseph, would you be willing to do that?
>
> Sure thing. I'll build a kernel and request that the bug reporter
> collect usbmon data.

Thanks Joseph for building the test kernel and getting it to the reporter!

Sarah,

I've attached the dmesg capture supplied by the original reporter on
a 3.10 custom kernel w/ the kbuild options you requested.

It seems as if your initial suspicion is correct:

[ 46.785490] xhci_hcd 0000:00:14.0: Endpoint 0x81 not halted, refusing to reset.
[ 46.785493] xhci_hcd 0000:00:14.0: Endpoint 0x82 not halted, refusing to reset.
[ 46.785496] xhci_hcd 0000:00:14.0: Endpoint 0x83 not halted, refusing to reset.
[ 46.785952] xhci_hcd 0000:00:14.0: Waiting for status stage event

At this point, would you recommend proceeding with the workaround or
waiting for an xHCI bug fix?

Regards,
Peter Hurley


Attachments:
dmesg.logitech_bug.reported_20130719 (79.53 kB)

2013-07-22 11:44:43

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Fri, 19 Jul 2013, Peter Hurley wrote:

> > > As far as getting printk output from a custom kernel, I think that may
> > > be beyond the reporter's capability. Perhaps one of the Ubuntu devs
> > > triaging this bug could provide a test kernel for the OP with those
> > > options on.
> > >
> > > Joseph, would you be willing to do that?
> >
> > Sure thing. I'll build a kernel and request that the bug reporter
> > collect usbmon data.
>
> Thanks Joseph for building the test kernel and getting it to the reporter!
>
> Sarah,
>
> I've attached the dmesg capture supplied by the original reporter on
> a 3.10 custom kernel w/ the kbuild options you requested.
>
> It seems as if your initial suspicion is correct:
>
> [ 46.785490] xhci_hcd 0000:00:14.0: Endpoint 0x81 not halted, refusing to
> reset.
> [ 46.785493] xhci_hcd 0000:00:14.0: Endpoint 0x82 not halted, refusing to
> reset.
> [ 46.785496] xhci_hcd 0000:00:14.0: Endpoint 0x83 not halted, refusing to
> reset.
> [ 46.785952] xhci_hcd 0000:00:14.0: Waiting for status stage event
>
> At this point, would you recommend proceeding with the workaround or
> waiting for an xHCI bug fix?

Thanks for your efforts.

Seems like this might be a part of the picture, but not a complete one.
Linus claims to have similar problem, but his receiver is not connected
through xHCI (I got this as an off-list report, so can't really provide a
pointer to ML archives).

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-22 14:03:56

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 07/22/2013 07:44 AM, Jiri Kosina wrote:
> On Fri, 19 Jul 2013, Peter Hurley wrote:
>
>>>> As far as getting printk output from a custom kernel, I think that may
>>>> be beyond the reporter's capability. Perhaps one of the Ubuntu devs
>>>> triaging this bug could provide a test kernel for the OP with those
>>>> options on.
>>>>
>>>> Joseph, would you be willing to do that?
>>>
>>> Sure thing. I'll build a kernel and request that the bug reporter
>>> collect usbmon data.
>>
>> Thanks Joseph for building the test kernel and getting it to the reporter!
>>
>> Sarah,
>>
>> I've attached the dmesg capture supplied by the original reporter on
>> a 3.10 custom kernel w/ the kbuild options you requested.
>>
>> It seems as if your initial suspicion is correct:
>>
>> [ 46.785490] xhci_hcd 0000:00:14.0: Endpoint 0x81 not halted, refusing to
>> reset.
>> [ 46.785493] xhci_hcd 0000:00:14.0: Endpoint 0x82 not halted, refusing to
>> reset.
>> [ 46.785496] xhci_hcd 0000:00:14.0: Endpoint 0x83 not halted, refusing to
>> reset.
>> [ 46.785952] xhci_hcd 0000:00:14.0: Waiting for status stage event
>>
>> At this point, would you recommend proceeding with the workaround or
>> waiting for an xHCI bug fix?
>
> Thanks for your efforts.
>
> Seems like this might be a part of the picture, but not a complete one.
> Linus claims to have similar problem, but his receiver is not connected
> through xHCI (I got this as an off-list report, so can't really provide a
> pointer to ML archives).

Ah, ok. I wasn't aware of that. I'll assume then that the necessary
people have the complete picture.

Regards,
Peter Hurley

2013-07-22 14:35:32

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Thu, 18 Jul 2013, Nestor Lopez Casado wrote:

> This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887.
>
> This patch re-adds the workaround introduced by 596264082f10dd4
> which was reverted by 8af6c08830b1ae114.
>
> The original patch 596264 was needed to overcome a situation where
> the hid-core would drop incoming reports while probe() was being
> executed.
>
> This issue was solved by c849a6143bec520af which added
> hid_device_io_start() and hid_device_io_stop() that enable a specific
> hid driver to opt-in for input reports while its probe() is being
> executed.
>
> Commit a9dd22b730857347 modified hid-logitech-dj so as to use the
> functionality added to hid-core. Having done that, workaround 596264
> was no longer necessary and was reverted by 8af6c08.
>
> We now encounter a different problem that ends up 'again' thwarting
> the Unifying receiver enumeration. The problem is time and usb controller
> dependent. Ocasionally the reports sent to the usb receiver to start
> the paired devices enumeration fail with -EPIPE and the receiver never
> gets to enumerate the paired devices.
[ ... snip ... ]

Ok, this is now in

git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid.git for-3.11/logitech-enumeration-fix

Linus added to CC.

Linus -- could you please by any chance test whether the two patches in
that branch make the problem you are observing any better? (and no, this
is not a pull request yet).

It's still not clear whether we are chasing two different issues here, or
not.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-22 15:27:46

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Mon, 22 Jul 2013, Peter Hurley wrote:

> On 07/22/2013 07:44 AM, Jiri Kosina wrote:
...
> > Seems like this might be a part of the picture, but not a complete one.
> > Linus claims to have similar problem, but his receiver is not connected
> > through xHCI (I got this as an off-list report, so can't really provide a
> > pointer to ML archives).
>
> Ah, ok. I wasn't aware of that. I'll assume then that the necessary
> people have the complete picture.

It might help to get a usbmon trace and dmesg log from Linus, if he
still has this problem.

Alan Stern

2013-07-22 19:21:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Mon, Jul 22, 2013 at 7:35 AM, Jiri Kosina <[email protected]> wrote:
>
> Linus -- could you please by any chance test whether the two patches in
> that branch make the problem you are observing any better? (and no, this
> is not a pull request yet).
>
> It's still not clear whether we are chasing two different issues here, or
> not.

I think it's two different issues. It sounds like the USB3 one is
fairly repeatable?

Mine is quite rare. I think it's so far happened just once during the
the 3.11 merge window+, and that's despite me rebooting usually a few
times a day (less now that things are calming down).

On Mon, Jul 22, 2013 at 8:27 AM, Jiri Kosina <[email protected]> wrote:
>
> It might help to get a usbmon trace and dmesg log from Linus, if he
> still has this problem.

So see above. I've sent the dmesg - not that it's useful, since it
just lacks the lines showing the actual wireless device after plugging
it in (but the receiver is recognized). No error messages, just not
any working keyboard.

And it's not common. I think it has happened a handful of times over
the last four months or so.

Linus

2013-08-01 10:09:48

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Thu, Jul 18, 2013 at 3:21 PM, Nestor Lopez Casado
<[email protected]> wrote:
> Set querying_devices flag to true when we start the enumeration
> process.
>
> This was missing from the original patch. It never produced
> undesirable effects as it is highly improbable to have a second
> enumeration triggered while a first one was still in progress.
>
> Signed-off-by: Nestor Lopez Casado <[email protected]>
> ---
> drivers/hid/hid-logitech-dj.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> index 0d13389..d4657a5 100644
> --- a/drivers/hid/hid-logitech-dj.c
> +++ b/drivers/hid/hid-logitech-dj.c
> @@ -488,6 +488,8 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
> if (djrcv_dev->querying_devices)
> return 0;
>
> + djrcv_dev->querying_devices = true;
> +

Unfortunately, this breaks the fallback mechanism :(
We tried to add the two patches in Fedora [1], but this doesn't fix
the bug because the driver actually things that it already asked for
the enumeration, but as we get the -EPIPE error, the request was never
sent.

So, Jiri, if you were to submit that series to Linus (or Greg) for
fixing the bug, please just drop this second patch.

Cheers,
Benjamin

[1] https://bugzilla.redhat.com/show_bug.cgi?id=989138

> dj_report = kzalloc(sizeof(struct dj_report), GFP_KERNEL);
> if (!dj_report)
> return -ENOMEM;
> --
> 1.7.9.5
>

2013-08-02 01:11:39

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Thu, 1 Aug 2013, Benjamin Tissoires wrote:

> > Set querying_devices flag to true when we start the enumeration
> > process.
> >
> > This was missing from the original patch. It never produced
> > undesirable effects as it is highly improbable to have a second
> > enumeration triggered while a first one was still in progress.
> >
> > Signed-off-by: Nestor Lopez Casado <[email protected]>
> > ---
> > drivers/hid/hid-logitech-dj.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> > index 0d13389..d4657a5 100644
> > --- a/drivers/hid/hid-logitech-dj.c
> > +++ b/drivers/hid/hid-logitech-dj.c
> > @@ -488,6 +488,8 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
> > if (djrcv_dev->querying_devices)
> > return 0;
> >
> > + djrcv_dev->querying_devices = true;
> > +
>
> Unfortunately, this breaks the fallback mechanism :(
> We tried to add the two patches in Fedora [1], but this doesn't fix
> the bug because the driver actually things that it already asked for
> the enumeration, but as we get the -EPIPE error, the request was never
> sent.
>
> So, Jiri, if you were to submit that series to Linus (or Greg) for
> fixing the bug, please just drop this second patch.

It's already on its way to Linus (he hasn't pulled yet though) ... which
is not a big deal per se, I can always push a revert, but I have to admit
I don't understand the breakage it is causing at all.

Could you please elaborate? (and put an elaborate description to revert
commit log perhaps?)

Thanks,

--
Jiri Kosina
SUSE Labs

2013-08-02 18:32:04

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Fri, Aug 2, 2013 at 3:11 AM, Jiri Kosina <[email protected]> wrote:
> On Thu, 1 Aug 2013, Benjamin Tissoires wrote:
>
>> > Set querying_devices flag to true when we start the enumeration
>> > process.
>> >
>> > This was missing from the original patch. It never produced
>> > undesirable effects as it is highly improbable to have a second
>> > enumeration triggered while a first one was still in progress.
>> >
>> > Signed-off-by: Nestor Lopez Casado <[email protected]>
>> > ---
>> > drivers/hid/hid-logitech-dj.c | 2 ++
>> > 1 file changed, 2 insertions(+)
>> >
>> > diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
>> > index 0d13389..d4657a5 100644
>> > --- a/drivers/hid/hid-logitech-dj.c
>> > +++ b/drivers/hid/hid-logitech-dj.c
>> > @@ -488,6 +488,8 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev)
>> > if (djrcv_dev->querying_devices)
>> > return 0;
>> >
>> > + djrcv_dev->querying_devices = true;
>> > +
>>
>> Unfortunately, this breaks the fallback mechanism :(
>> We tried to add the two patches in Fedora [1], but this doesn't fix
>> the bug because the driver actually things that it already asked for
>> the enumeration, but as we get the -EPIPE error, the request was never
>> sent.
>>
>> So, Jiri, if you were to submit that series to Linus (or Greg) for
>> fixing the bug, please just drop this second patch.
>
> It's already on its way to Linus (he hasn't pulled yet though) ... which
> is not a big deal per se, I can always push a revert, but I have to admit
> I don't understand the breakage it is causing at all.
>
> Could you please elaborate? (and put an elaborate description to revert
> commit log perhaps?)

Sure, so here is the revert commit log:

--

Commit "HID: hid-logitech-dj, querying_devices was never set" activate
a flag which guarantees that we do not ask the receiver for too many
enumeration. When the flag is set, each following enumeration call is
discarded (the usb request is not forwarded to the receiver). The flag
is then released when the driver receive a pairing information event,
which normally follows the enumeration request.
However, the USB3 bug makes the driver think the enumeration request
has been forwarded to the receiver. However, it is actually not the
case because the USB stack returns -EPIPE. So, when a new unknown
device appears, the workaround consisting in asking for a new
enumeration is not working anymore: this new enumeration is discarded
because of the flag, which is never reset.

A solution could be to trigger a timeout before releasing it, but for
now, let's just revert the patch.

--

Does that makes it more understandable? I'm sorry I was not clear last
time, I was trying to catch this up between two sessions at GUADEC.

Cheers,
Benjamin

2013-08-05 13:22:20

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Fri, 2 Aug 2013, Benjamin Tissoires wrote:

> > Could you please elaborate? (and put an elaborate description to revert
> > commit log perhaps?)
>
> Sure, so here is the revert commit log:
>
> --
>
> Commit "HID: hid-logitech-dj, querying_devices was never set" activate
> a flag which guarantees that we do not ask the receiver for too many
> enumeration. When the flag is set, each following enumeration call is
> discarded (the usb request is not forwarded to the receiver). The flag
> is then released when the driver receive a pairing information event,
> which normally follows the enumeration request.
> However, the USB3 bug makes the driver think the enumeration request
> has been forwarded to the receiver. However, it is actually not the
> case because the USB stack returns -EPIPE. So, when a new unknown
> device appears, the workaround consisting in asking for a new
> enumeration is not working anymore: this new enumeration is discarded
> because of the flag, which is never reset.
>
> A solution could be to trigger a timeout before releasing it, but for
> now, let's just revert the patch.
>
> --

Thanks Benjamin.

I'd like to have a bit more clarification about the USB3 bug, as this
whole issue is not completely clear to me.

To be more specific -- when exactly do we receive -EPIPE, why do we
receive it and why do we not handle it properly?

Thanks again,

--
Jiri Kosina
SUSE Labs

2013-08-05 14:40:46

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Mon, Aug 5, 2013 at 3:22 PM, Jiri Kosina <[email protected]> wrote:
> On Fri, 2 Aug 2013, Benjamin Tissoires wrote:
>
>> > Could you please elaborate? (and put an elaborate description to revert
>> > commit log perhaps?)
>>
>> Sure, so here is the revert commit log:
>>
>> --
>>
>> Commit "HID: hid-logitech-dj, querying_devices was never set" activate
>> a flag which guarantees that we do not ask the receiver for too many
>> enumeration. When the flag is set, each following enumeration call is
>> discarded (the usb request is not forwarded to the receiver). The flag
>> is then released when the driver receive a pairing information event,
>> which normally follows the enumeration request.
>> However, the USB3 bug makes the driver think the enumeration request
>> has been forwarded to the receiver. However, it is actually not the
>> case because the USB stack returns -EPIPE. So, when a new unknown
>> device appears, the workaround consisting in asking for a new
>> enumeration is not working anymore: this new enumeration is discarded
>> because of the flag, which is never reset.
>>
>> A solution could be to trigger a timeout before releasing it, but for
>> now, let's just revert the patch.
>>
>> --
>
> Thanks Benjamin.
>
> I'd like to have a bit more clarification about the USB3 bug, as this
> whole issue is not completely clear to me.
>
> To be more specific -- when exactly do we receive -EPIPE, why do we
> receive it and why do we not handle it properly?

Sure, I'll try (though the more I think of it, the more it seems
blurry to me :( ).

So the initial probe function in hid-logitech-dj was implemented by
using a direct call to hid_output_raw_report(). This call was
synchronous, so we did get the -EPIPE return code. Then, the probe()
function returns the -EPIPE error, cleaning the receiver and
unregister it from the hid bus.

However, now, we use hid_hw_request(), which is asynchronous (from
what I can read). At least, this code returns "void" as the set_report
command seems to be scheduled for later handling. In usbhid, when the
queue is flushed, I did not found a way to retrieve the error code...

So basically, the -EPIPE is received in usbhid_restart_ctrl_queue(),
but nothing notifies hid-logitech-dj from the error. In the end, the
probe() function returns without error code, but the receiver never
received the notification.

Cheers,
Benjamin

2013-08-06 21:13:51

by Sune Mølgaard

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

Being affected by this bug, I can confirm that Linux 3.11-rc4 still
exhibits the unwanted behaviour for me, but that commenting out the
single line from the second patch makes it work.

Thus, for requesting a revert on that line, you are most welcome to put
me down as a "Tested-By".

Best regards,

Sune Mølgaard

Benjamin Tissoires wrote:
> On Mon, Aug 5, 2013 at 3:22 PM, Jiri Kosina <[email protected]> wrote:
>> On Fri, 2 Aug 2013, Benjamin Tissoires wrote:
>>
>>>> Could you please elaborate? (and put an elaborate description to revert
>>>> commit log perhaps?)
>>>
>>> Sure, so here is the revert commit log:
>>>
>>> --
>>>
>>> Commit "HID: hid-logitech-dj, querying_devices was never set" activate
>>> a flag which guarantees that we do not ask the receiver for too many
>>> enumeration. When the flag is set, each following enumeration call is
>>> discarded (the usb request is not forwarded to the receiver). The flag
>>> is then released when the driver receive a pairing information event,
>>> which normally follows the enumeration request.
>>> However, the USB3 bug makes the driver think the enumeration request
>>> has been forwarded to the receiver. However, it is actually not the
>>> case because the USB stack returns -EPIPE. So, when a new unknown
>>> device appears, the workaround consisting in asking for a new
>>> enumeration is not working anymore: this new enumeration is discarded
>>> because of the flag, which is never reset.
>>>
>>> A solution could be to trigger a timeout before releasing it, but for
>>> now, let's just revert the patch.
>>>
>>> --
>>
>> Thanks Benjamin.
>>
>> I'd like to have a bit more clarification about the USB3 bug, as this
>> whole issue is not completely clear to me.
>>
>> To be more specific -- when exactly do we receive -EPIPE, why do we
>> receive it and why do we not handle it properly?
>
> Sure, I'll try (though the more I think of it, the more it seems
> blurry to me :( ).
>
> So the initial probe function in hid-logitech-dj was implemented by
> using a direct call to hid_output_raw_report(). This call was
> synchronous, so we did get the -EPIPE return code. Then, the probe()
> function returns the -EPIPE error, cleaning the receiver and
> unregister it from the hid bus.
>
> However, now, we use hid_hw_request(), which is asynchronous (from
> what I can read). At least, this code returns "void" as the set_report
> command seems to be scheduled for later handling. In usbhid, when the
> queue is flushed, I did not found a way to retrieve the error code...
>
> So basically, the -EPIPE is received in usbhid_restart_ctrl_queue(),
> but nothing notifies hid-logitech-dj from the error. In the end, the
> probe() function returns without error code, but the receiver never
> received the notification.
>
> Cheers,
> Benjamin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


--
"Testiculos habet et bene pendentes"
See http://www.newint.org/features/1993/06/05/curious/

2013-08-09 09:36:27

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Tue, 6 Aug 2013, Sune Mølgaard wrote:

> Being affected by this bug, I can confirm that Linux 3.11-rc4 still
> exhibits the unwanted behaviour for me, but that commenting out the
> single line from the second patch makes it work.
>
> Thus, for requesting a revert on that line, you are most welcome to put
> me down as a "Tested-By".

Okay, this is now queued, and I will be pushing it to Linus.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-08-10 17:21:36

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 2/2] HID: hid-logitech-dj, querying_devices was never set

On Tue, 6 Aug 2013, Sune Mølgaard wrote:

> Being affected by this bug, I can confirm that Linux 3.11-rc4 still
> exhibits the unwanted behaviour for me, but that commenting out the
> single line from the second patch makes it work.
>
> Thus, for requesting a revert on that line, you are most welcome to put
> me down as a "Tested-By".

Will be reverted in 3.11-rc5 (8e5654ce69).

Thanks,

--
Jiri Kosina
SUSE Labs

2013-08-12 21:55:22

by Peter Wu

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Thursday 18 July 2013 16:28:01 Peter Hurley wrote:
> Before we revert to using the workaround, I'd like to suggest that
> this new "hidden" problem may be an interaction with the xhci_hcd host
> controller driver only.
>
> Looking at the related bug, the OP indicates the machine only has
> USB3 ports. Additionally, comments #7, #100, and #104 of the original
> bug report add additional information that would seem to confirm
> this suspicion.
>
> Let me add I have this USB device running on the uhci_hcd driver
> with or without this workaround on v3.10.

This problem does not seem specific to xhci, uhci seems also effected. Today I
upgraded a system (running Arch Linux) from kernel 3.9.9 to 3.10.5. After a
reboot to 3.10.5, things broke. The setup:

- There are two USB receivers plugged into USB 1.1 ports (different buses
according to lsusb, uhci), each receiver is paired to a K360 keyboard.
- One of the receivers are passed to a QEMU guest with -usbdevice host:$busid.
$devid. This keyboard is working (probably because QEMU performed a reset).
- Since 3.10.5, the keyboard that is *not* passed to the QEMU guest is not
functioning on reboot.

After closing the QEMU guest, the USB bus gets reset(?) after which the other
keyboard suddenly gets detected. I had only booted 3.10.5 twice before rolling
back to 3.9.9, both boots triggered the issue. Do I need to provide a usbmon,
lsusb, dmesg and/ or other details from 3.10.5?


Note that there are other Arch Linux users who have reported issues[1][2]
since upgrading to 3.10.z. Triggering a re-enumeration by writing the magic
HID++ message[3] makes the paired devices appear again (as reported in
forums[1], I haven't tried this on the affected UHCI machine).

While the underlying bug is fixed, can this patch be forwarded to stable? I see
that 3.10.6 has been released, but still without this patch.

Regards,
Peter

[1]: https://bbs.archlinux.org/viewtopic.php?id=167210
[2]: https://bugs.archlinux.org/task/35991
[3]: https://bbs.archlinux.org/viewtopic.php?pid=1309535#p1309535

2013-08-13 12:13:24

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 08/12/2013 05:54 PM, Peter Wu wrote:
> On Thursday 18 July 2013 16:28:01 Peter Hurley wrote:
>> Before we revert to using the workaround, I'd like to suggest that
>> this new "hidden" problem may be an interaction with the xhci_hcd host
>> controller driver only.
>>
>> Looking at the related bug, the OP indicates the machine only has
>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>> bug report add additional information that would seem to confirm
>> this suspicion.
>>
>> Let me add I have this USB device running on the uhci_hcd driver
>> with or without this workaround on v3.10.
>
> This problem does not seem specific to xhci, uhci seems also effected.

If true, it would certainly help to have a bug report confirming uhci
failure from a bare-metal system which contained:
1) kernel version
2) complete dmesg output
3) lsusb -v output
4) lsmod output
5) usbmon capture from a plug attempt

> Today I
> upgraded a system (running Arch Linux) from kernel 3.9.9 to 3.10.5. After a
> reboot to 3.10.5, things broke. The setup:
>
> - There are two USB receivers plugged into USB 1.1 ports (different buses
> according to lsusb, uhci), each receiver is paired to a K360 keyboard.
> - One of the receivers are passed to a QEMU guest with -usbdevice host:$busid.
> $devid. This keyboard is working (probably because QEMU performed a reset).
> - Since 3.10.5, the keyboard that is *not* passed to the QEMU guest is not
> functioning on reboot.
>
> After closing the QEMU guest, the USB bus gets reset(?) after which the other
> keyboard suddenly gets detected. I had only booted 3.10.5 twice before rolling
> back to 3.9.9, both boots triggered the issue. Do I need to provide a usbmon,
> lsusb, dmesg and/ or other details from 3.10.5?

Do both keyboards work on bare metal? Seems like this problem might be
specific to qemu (or kvm) and you may get more insight on those lists.

> Note that there are other Arch Linux users who have reported issues[1][2]

Unfortunately, not even one user in the referenced reports identified
the usb hub the receiver was plugged into.

> since upgrading to 3.10.z. Triggering a re-enumeration by writing the magic
> HID++ message[3] makes the paired devices appear again (as reported in
> forums[1], I haven't tried this on the affected UHCI machine).
>
> While the underlying bug is fixed, can this patch be forwarded to stable? I see
> that 3.10.6 has been released, but still without this patch.

This is still a workaround and not really a fix for the underlying bug.

> Regards,
> Peter
>
> [1]: https://bbs.archlinux.org/viewtopic.php?id=167210
> [2]: https://bugs.archlinux.org/task/35991
> [3]: https://bbs.archlinux.org/viewtopic.php?pid=1309535#p1309535

2013-08-13 15:42:45

by Peter Wu

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On Tuesday 13 August 2013 08:13:17 Peter Hurley wrote:
> On 08/12/2013 05:54 PM, Peter Wu wrote:
> > On Thursday 18 July 2013 16:28:01 Peter Hurley wrote:
> >> Before we revert to using the workaround, I'd like to suggest that
> >> this new "hidden" problem may be an interaction with the xhci_hcd host
> >> controller driver only.
> >>
> >> Looking at the related bug, the OP indicates the machine only has
> >> USB3 ports. Additionally, comments #7, #100, and #104 of the original
> >> bug report add additional information that would seem to confirm
> >> this suspicion.
> >>
> >> Let me add I have this USB device running on the uhci_hcd driver
> >> with or without this workaround on v3.10.
> >
> > This problem does not seem specific to xhci, uhci seems also effected.
>
> If true, it would certainly help to have a bug report confirming uhci
> failure from a bare-metal system which contained:
> 1) kernel version
> 2) complete dmesg output
> 3) lsusb -v output
> 4) lsmod output
> 5) usbmon capture from a plug attempt

I was too fast in drawing a conclusion, besides the kernel I also upgraded
some other packages. Today the issue also showed up in 3.9.9 + updated
packages.

When checking the dmesg, the issue solved by this patch did not occur (the
enumeration was successful).

> > Today I
> > upgraded a system (running Arch Linux) from kernel 3.9.9 to 3.10.5. After
> > a
> > reboot to 3.10.5, things broke. The setup:
> >
> > - There are two USB receivers plugged into USB 1.1 ports (different buses
> > according to lsusb, uhci), each receiver is paired to a K360 keyboard.
> > - One of the receivers are passed to a QEMU guest with -usbdevice
> > host:$busid. $devid. This keyboard is working (probably because QEMU
> > performed a reset). - Since 3.10.5, the keyboard that is *not* passed to
> > the QEMU guest is not functioning on reboot.
> >
> > After closing the QEMU guest, the USB bus gets reset(?) after which the
> > other keyboard suddenly gets detected. I had only booted 3.10.5 twice
> > before rolling back to 3.9.9, both boots triggered the issue. Do I need
> > to provide a usbmon, lsusb, dmesg and/ or other details from 3.10.5?
>
> Do both keyboards work on bare metal? Seems like this problem might be
> specific to qemu (or kvm) and you may get more insight on those lists.

I haven't tested that, the system automatically boots into openbox + QEMU.
Previously, both keyboards worked on bare metal, so I think it still works.

> > Note that there are other Arch Linux users who have reported issues[1][2]
>
> Unfortunately, not even one user in the referenced reports identified
> the usb hub the receiver was plugged into.

I've asked it now.

> > since upgrading to 3.10.z. Triggering a re-enumeration by writing the
> > magic
> > HID++ message[3] makes the paired devices appear again (as reported in
> > forums[1], I haven't tried this on the affected UHCI machine).
> >
> > While the underlying bug is fixed, can this patch be forwarded to stable?
> > I see that 3.10.6 has been released, but still without this patch.
>
> This is still a workaround and not really a fix for the underlying bug.

I meant to say, "while the underlying bug is *being* fixed". Anyway, can this
patch be applied to 3.10?

Sorry for the confusion with uhci, looking further it seems that the wrong USB
receiver is being passed to QEMU. It's not a kernel issue, perhaps I can blame
libusbx.

Regards,
Peter

2013-08-13 16:35:04

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""

On 08/13/2013 11:42 AM, Peter Wu wrote:
> On Tuesday 13 August 2013 08:13:17 Peter Hurley wrote:
>> On 08/12/2013 05:54 PM, Peter Wu wrote:
>>> On Thursday 18 July 2013 16:28:01 Peter Hurley wrote:
>>>> Before we revert to using the workaround, I'd like to suggest that
>>>> this new "hidden" problem may be an interaction with the xhci_hcd host
>>>> controller driver only.
>>>>
>>>> Looking at the related bug, the OP indicates the machine only has
>>>> USB3 ports. Additionally, comments #7, #100, and #104 of the original
>>>> bug report add additional information that would seem to confirm
>>>> this suspicion.
>>>>
>>>> Let me add I have this USB device running on the uhci_hcd driver
>>>> with or without this workaround on v3.10.
>>>
>>> This problem does not seem specific to xhci, uhci seems also effected.
>>
>> If true, it would certainly help to have a bug report confirming uhci
>> failure from a bare-metal system which contained:
>> 1) kernel version
>> 2) complete dmesg output
>> 3) lsusb -v output
>> 4) lsmod output
>> 5) usbmon capture from a plug attempt
>
> I was too fast in drawing a conclusion, besides the kernel I also upgraded
> some other packages. Today the issue also showed up in 3.9.9 + updated
> packages.
>
> When checking the dmesg, the issue solved by this patch did not occur (the
> enumeration was successful).

Thanks for double-checking.

>>> Today I
>>> upgraded a system (running Arch Linux) from kernel 3.9.9 to 3.10.5. After
>>> a
>>> reboot to 3.10.5, things broke. The setup:
>>>
>>> - There are two USB receivers plugged into USB 1.1 ports (different buses
>>> according to lsusb, uhci), each receiver is paired to a K360 keyboard.
>>> - One of the receivers are passed to a QEMU guest with -usbdevice
>>> host:$busid. $devid. This keyboard is working (probably because QEMU
>>> performed a reset). - Since 3.10.5, the keyboard that is *not* passed to
>>> the QEMU guest is not functioning on reboot.
>>>
>>> After closing the QEMU guest, the USB bus gets reset(?) after which the
>>> other keyboard suddenly gets detected. I had only booted 3.10.5 twice
>>> before rolling back to 3.9.9, both boots triggered the issue. Do I need
>>> to provide a usbmon, lsusb, dmesg and/ or other details from 3.10.5?
>>
>> Do both keyboards work on bare metal? Seems like this problem might be
>> specific to qemu (or kvm) and you may get more insight on those lists.
>
> I haven't tested that, the system automatically boots into openbox + QEMU.
> Previously, both keyboards worked on bare metal, so I think it still works.
>
>>> Note that there are other Arch Linux users who have reported issues[1][2]
>>
>> Unfortunately, not even one user in the referenced reports identified
>> the usb hub the receiver was plugged into.
>
> I've asked it now.

Thanks. And if someone has a uhci failure, filing a new bug on
bugzilla.kernel.org _and_ posting to linux-usb@ and
[email protected] improves the chances of the right
people seeing the problem. [xhci already has several reports plus I
subscribed to the kernel bug linked from the ArchLinux bug report.]

>>> since upgrading to 3.10.z. Triggering a re-enumeration by writing the
>>> magic
>>> HID++ message[3] makes the paired devices appear again (as reported in
>>> forums[1], I haven't tried this on the affected UHCI machine).
>>>
>>> While the underlying bug is fixed, can this patch be forwarded to stable?
>>> I see that 3.10.6 has been released, but still without this patch.
>>
>> This is still a workaround and not really a fix for the underlying bug.
>
> I meant to say, "while the underlying bug is *being* fixed". Anyway, can this
> patch be applied to 3.10?

That's really Jiri's call. TBH, I can't blame him for wanting to shake this
out in 3.11 before pushing to stable.

> Sorry for the confusion with uhci, looking further it seems that the wrong USB
> receiver is being passed to QEMU. It's not a kernel issue, perhaps I can blame
> libusbx.

No apologies necessary.

Regards,
Peter Hurley