2013-03-06 02:35:11

by Benson Leung

[permalink] [raw]
Subject: v3.9-rc1 instability on Chromebook Pixel with gmbus irq

Hi Daniel Vetter,

I'm working on touch devices Chromium OS, and I've noticed a
regression between 3.8 and 3.9-rc1, which was posted yesterday.

The hardware in question is a Chromebook Pixel. For this device, we
have i2c input devices: atmel mxt224s touchpad and atmel mxt1664s
touchscreen. The touchpad is on bus 1, "i915 gmbus vga" at 1-004b. The
touchscreen is on bus 2, "i915 gmbus panel" at 2-004a.

I was testing v3.9-rc1 on the Pixel and the touchscreen driver is
being returned -110 (-ETIMEDOUT) on an i2c_transfer after several
seconds of both touch devices working correctly. At the time of the
failure, there are no error messages from GMBUS that I can see, but
the bus never recovers. I can keep interacting with the touchscreen or
touchpad, and the interrupts trigger reads, but all subsequent reads
return -110.

I noticed that between 3.8 and 3.9-rc1, your patch series to add gmbus
irq support was merged. After bisecting, I found that this commit
seems to be causing the timeout problem. Reverting it makes the
problem go away, and the bus is stable.

commit 2c438c0273b76d6cb158f8bdd0aa3ebf66e48a28
Author: Daniel Vetter <[email protected]>
Date: Sat Dec 1 13:53:46 2012 +0100

drm/i915: use gmbus irq to wait for gmbus idle

GMBUS_ACTIVE has inverted sense and so doesn't fit into the
wait_hw_status helper, hence create a new gmbus_wait_idle functions.
Also, we only care about the idle irq event and nothing else, which
allows us to use the wait_event_timeout helper directly without
jumping through hoops to catch NAKs.

Since gen2/3 don't have gmbus interrupts, handle them separately with
the old wait_for macro.

This shaves another few ms off reading EDID from a hdmi screen on my
testbox here. EDID reading with interrupt driven gmbus is now as fast
as with busy-looping gmbus at 28 ms here (with negligible cpu
overhead).

Reviewed-by: Imre Deak <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>


Is there anything I can do to help debug this some more?

--
Benson Leung
Software Engineer, Chrom* OS
[email protected]


2013-03-06 08:14:35

by Daniel Vetter

[permalink] [raw]
Subject: Re: v3.9-rc1 instability on Chromebook Pixel with gmbus irq

On Wed, Mar 6, 2013 at 3:35 AM, Benson Leung <[email protected]> wrote:
> I'm working on touch devices Chromium OS, and I've noticed a
> regression between 3.8 and 3.9-rc1, which was posted yesterday.
>
> The hardware in question is a Chromebook Pixel. For this device, we
> have i2c input devices: atmel mxt224s touchpad and atmel mxt1664s
> touchscreen. The touchpad is on bus 1, "i915 gmbus vga" at 1-004b. The
> touchscreen is on bus 2, "i915 gmbus panel" at 2-004a.
>
> I was testing v3.9-rc1 on the Pixel and the touchscreen driver is
> being returned -110 (-ETIMEDOUT) on an i2c_transfer after several
> seconds of both touch devices working correctly. At the time of the
> failure, there are no error messages from GMBUS that I can see, but
> the bus never recovers. I can keep interacting with the touchscreen or
> touchpad, and the interrupts trigger reads, but all subsequent reads
> return -110.
>
> I noticed that between 3.8 and 3.9-rc1, your patch series to add gmbus
> irq support was merged. After bisecting, I found that this commit
> seems to be causing the timeout problem. Reverting it makes the
> problem go away, and the bus is stable.

Can you please retest with latest drm-intel-fixes merged into -rc1?
Paulo's patch fixes a race in handling PCH interrupts (where the gmbus
hw is) which matches rather well with your description here.

Thanks, Daniel

> commit 2c438c0273b76d6cb158f8bdd0aa3ebf66e48a28
> Author: Daniel Vetter <[email protected]>
> Date: Sat Dec 1 13:53:46 2012 +0100
>
> drm/i915: use gmbus irq to wait for gmbus idle
>
> GMBUS_ACTIVE has inverted sense and so doesn't fit into the
> wait_hw_status helper, hence create a new gmbus_wait_idle functions.
> Also, we only care about the idle irq event and nothing else, which
> allows us to use the wait_event_timeout helper directly without
> jumping through hoops to catch NAKs.
>
> Since gen2/3 don't have gmbus interrupts, handle them separately with
> the old wait_for macro.
>
> This shaves another few ms off reading EDID from a hdmi screen on my
> testbox here. EDID reading with interrupt driven gmbus is now as fast
> as with busy-looping gmbus at 28 ms here (with negligible cpu
> overhead).
>
> Reviewed-by: Imre Deak <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
>
>
> Is there anything I can do to help debug this some more?
>
> --
> Benson Leung
> Software Engineer, Chrom* OS
> [email protected]



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-03-06 08:36:36

by Jani Nikula

[permalink] [raw]
Subject: Re: v3.9-rc1 instability on Chromebook Pixel with gmbus irq

On Wed, 06 Mar 2013, Daniel Vetter <[email protected]> wrote:
> On Wed, Mar 6, 2013 at 3:35 AM, Benson Leung <[email protected]> wrote:
>> I'm working on touch devices Chromium OS, and I've noticed a
>> regression between 3.8 and 3.9-rc1, which was posted yesterday.
>>
>> The hardware in question is a Chromebook Pixel. For this device, we
>> have i2c input devices: atmel mxt224s touchpad and atmel mxt1664s
>> touchscreen. The touchpad is on bus 1, "i915 gmbus vga" at 1-004b. The
>> touchscreen is on bus 2, "i915 gmbus panel" at 2-004a.
>>
>> I was testing v3.9-rc1 on the Pixel and the touchscreen driver is
>> being returned -110 (-ETIMEDOUT) on an i2c_transfer after several
>> seconds of both touch devices working correctly. At the time of the
>> failure, there are no error messages from GMBUS that I can see, but
>> the bus never recovers. I can keep interacting with the touchscreen or
>> touchpad, and the interrupts trigger reads, but all subsequent reads
>> return -110.
>>
>> I noticed that between 3.8 and 3.9-rc1, your patch series to add gmbus
>> irq support was merged. After bisecting, I found that this commit
>> seems to be causing the timeout problem. Reverting it makes the
>> problem go away, and the bus is stable.
>
> Can you please retest with latest drm-intel-fixes merged into -rc1?
> Paulo's patch fixes a race in handling PCH interrupts (where the gmbus
> hw is) which matches rather well with your description here.

Also, some error paths only print with debug on. Did you try with
drm.debug=0xe?

BR,
Jani.


>
> Thanks, Daniel
>
>> commit 2c438c0273b76d6cb158f8bdd0aa3ebf66e48a28
>> Author: Daniel Vetter <[email protected]>
>> Date: Sat Dec 1 13:53:46 2012 +0100
>>
>> drm/i915: use gmbus irq to wait for gmbus idle
>>
>> GMBUS_ACTIVE has inverted sense and so doesn't fit into the
>> wait_hw_status helper, hence create a new gmbus_wait_idle functions.
>> Also, we only care about the idle irq event and nothing else, which
>> allows us to use the wait_event_timeout helper directly without
>> jumping through hoops to catch NAKs.
>>
>> Since gen2/3 don't have gmbus interrupts, handle them separately with
>> the old wait_for macro.
>>
>> This shaves another few ms off reading EDID from a hdmi screen on my
>> testbox here. EDID reading with interrupt driven gmbus is now as fast
>> as with busy-looping gmbus at 28 ms here (with negligible cpu
>> overhead).
>>
>> Reviewed-by: Imre Deak <[email protected]>
>> Signed-off-by: Daniel Vetter <[email protected]>
>>
>>
>> Is there anything I can do to help debug this some more?
>>
>> --
>> Benson Leung
>> Software Engineer, Chrom* OS
>> [email protected]
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

2013-03-06 19:22:49

by Benson Leung

[permalink] [raw]
Subject: Re: v3.9-rc1 instability on Chromebook Pixel with gmbus irq

Hi Daniel,

I've just tried drm-intel-fixes merged into v3.9-rc1, and so far it's
looking good. No suspicious timeouts.
Thanks for the quick response!

Benson

On Wed, Mar 6, 2013 at 12:14 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Mar 6, 2013 at 3:35 AM, Benson Leung <[email protected]> wrote:
>> I'm working on touch devices Chromium OS, and I've noticed a
>> regression between 3.8 and 3.9-rc1, which was posted yesterday.
>>
>> The hardware in question is a Chromebook Pixel. For this device, we
>> have i2c input devices: atmel mxt224s touchpad and atmel mxt1664s
>> touchscreen. The touchpad is on bus 1, "i915 gmbus vga" at 1-004b. The
>> touchscreen is on bus 2, "i915 gmbus panel" at 2-004a.
>>
>> I was testing v3.9-rc1 on the Pixel and the touchscreen driver is
>> being returned -110 (-ETIMEDOUT) on an i2c_transfer after several
>> seconds of both touch devices working correctly. At the time of the
>> failure, there are no error messages from GMBUS that I can see, but
>> the bus never recovers. I can keep interacting with the touchscreen or
>> touchpad, and the interrupts trigger reads, but all subsequent reads
>> return -110.
>>
>> I noticed that between 3.8 and 3.9-rc1, your patch series to add gmbus
>> irq support was merged. After bisecting, I found that this commit
>> seems to be causing the timeout problem. Reverting it makes the
>> problem go away, and the bus is stable.
>
> Can you please retest with latest drm-intel-fixes merged into -rc1?
> Paulo's patch fixes a race in handling PCH interrupts (where the gmbus
> hw is) which matches rather well with your description here.
>
> Thanks, Daniel
>
>> commit 2c438c0273b76d6cb158f8bdd0aa3ebf66e48a28
>> Author: Daniel Vetter <[email protected]>
>> Date: Sat Dec 1 13:53:46 2012 +0100
>>
>> drm/i915: use gmbus irq to wait for gmbus idle
>>
>> GMBUS_ACTIVE has inverted sense and so doesn't fit into the
>> wait_hw_status helper, hence create a new gmbus_wait_idle functions.
>> Also, we only care about the idle irq event and nothing else, which
>> allows us to use the wait_event_timeout helper directly without
>> jumping through hoops to catch NAKs.
>>
>> Since gen2/3 don't have gmbus interrupts, handle them separately with
>> the old wait_for macro.
>>
>> This shaves another few ms off reading EDID from a hdmi screen on my
>> testbox here. EDID reading with interrupt driven gmbus is now as fast
>> as with busy-looping gmbus at 28 ms here (with negligible cpu
>> overhead).
>>
>> Reviewed-by: Imre Deak <[email protected]>
>> Signed-off-by: Daniel Vetter <[email protected]>
>>
>>
>> Is there anything I can do to help debug this some more?
>>
>> --
>> Benson Leung
>> Software Engineer, Chrom* OS
>> [email protected]
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



--
Benson Leung
Software Engineer, Chrom* OS
[email protected]