In af93a167eda9, i2c_hid_parse was changed to continue with reading the
report descriptor before waiting for reset to be acknowledged.
This has lead to two regressions:
1. We fail to handle reset acknowledgement if it happens while reading
the report descriptor. The transfer sets I2C_HID_READ_PENDING, which
causes the IRQ handler to return without doing anything.
This affects both a Wacom touchscreen and a Sensel touchpad.
2. On a Sensel touchpad, reading the report descriptor this quickly
after reset results in all zeroes or partial zeroes.
The issues were observed on the Lenovo Thinkpad Z16 Gen 2.
The change in question was made based on a Microsoft article[0] stating
that Windows 8 *may* read the report descriptor in parallel with
awaiting reset acknowledgement, intended as a slight reset performance
optimization. Perhaps they only do this if reset is not completing
quickly enough for their tastes?
As the code is not currently ready to read registers in parallel with a
pending reset acknowledgement, and as reading quickly breaks the report
descriptor on the Sensel touchpad, revert to waiting for reset
acknowledgement before proceeding to read the report descriptor.
[0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management
Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor")
Signed-off-by: Kenny Levinsen <[email protected]>
---
drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
index 2df1ab3c31cc..72d2bccf5621 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid)
mutex_lock(&ihid->reset_lock);
do {
ret = i2c_hid_start_hwreset(ihid);
- if (ret)
+ if (ret == 0)
+ ret = i2c_hid_finish_hwreset(ihid);
+ else
msleep(1000);
} while (tries-- > 0 && ret);
+ mutex_unlock(&ihid->reset_lock);
if (ret)
goto abort_reset;
@@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid)
}
}
- /*
- * Windows directly reads the report-descriptor after sending reset
- * and then waits for resets completion afterwards. Some touchpads
- * actually wait for the report-descriptor to be read before signalling
- * reset completion.
- */
- ret = i2c_hid_finish_hwreset(ihid);
abort_reset:
clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
- mutex_unlock(&ihid->reset_lock);
if (ret)
goto out;
--
2.44.0
Hi Kenny,
Sorry for causing this regression and thank you for your fix.
One small remark comment below. In the hope of getting this merged
soon I'll prepare a v3 addressing this myself (keeping you as the author).
On 3/31/24 8:24 PM, Kenny Levinsen wrote:
> In af93a167eda9, i2c_hid_parse was changed to continue with reading the
> report descriptor before waiting for reset to be acknowledged.
>
> This has lead to two regressions:
>
> 1. We fail to handle reset acknowledgement if it happens while reading
> the report descriptor. The transfer sets I2C_HID_READ_PENDING, which
> causes the IRQ handler to return without doing anything.
>
> This affects both a Wacom touchscreen and a Sensel touchpad.
>
> 2. On a Sensel touchpad, reading the report descriptor this quickly
> after reset results in all zeroes or partial zeroes.
>
> The issues were observed on the Lenovo Thinkpad Z16 Gen 2.
>
> The change in question was made based on a Microsoft article[0] stating
> that Windows 8 *may* read the report descriptor in parallel with
> awaiting reset acknowledgement, intended as a slight reset performance
> optimization. Perhaps they only do this if reset is not completing
> quickly enough for their tastes?
>
> As the code is not currently ready to read registers in parallel with a
> pending reset acknowledgement, and as reading quickly breaks the report
> descriptor on the Sensel touchpad, revert to waiting for reset
> acknowledgement before proceeding to read the report descriptor.
>
> [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management
>
> Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor")
> Signed-off-by: Kenny Levinsen <[email protected]>
> ---
> drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
> index 2df1ab3c31cc..72d2bccf5621 100644
> --- a/drivers/hid/i2c-hid/i2c-hid-core.c
> +++ b/drivers/hid/i2c-hid/i2c-hid-core.c
> @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid)
> mutex_lock(&ihid->reset_lock);
> do {
> ret = i2c_hid_start_hwreset(ihid);
> - if (ret)
> + if (ret == 0)
> + ret = i2c_hid_finish_hwreset(ihid);
> + else
> msleep(1000);
> } while (tries-- > 0 && ret);
> + mutex_unlock(&ihid->reset_lock);
>
> if (ret)
> goto abort_reset;
The abort_reset label here and in other places now is no longer
necessary. i2c_hid_start_hwreset() (on error) and i2c_hid_finish_hwreset()
(regardless of error or not) always clear I2C_HID_RESET_PENDING.
And we only do "goto abort_reset;" here and in 2 other places
below in a "if (ret) {}" branch, and abort_reset itself is:
abort_reset:
clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
if (ret)
goto out;
Since the reset loop now always exits with I2C_HID_RESET_PENDING
cleared, the clear_bit() is not necessary after your changes and
ret != 0 is always true when doing goto abort_reset so
"goto abort_reset" can be replaced with "goto out" or if there is
nothing to cleanup with a simple "return ret".
As mentioned above I'll post a v3 with this addressed myself,
so that we can hopefully get the fix upstream soonest.
Regards,
Hans
> @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid)
> }
> }
>
> - /*
> - * Windows directly reads the report-descriptor after sending reset
> - * and then waits for resets completion afterwards. Some touchpads
> - * actually wait for the report-descriptor to be read before signalling
> - * reset completion.
> - */
> - ret = i2c_hid_finish_hwreset(ihid);
> abort_reset:
> clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
> - mutex_unlock(&ihid->reset_lock);
> if (ret)
> goto out;
>
Hi,
On 4/2/24 1:30 PM, Kenny Levinsen wrote:
> On 4/2/24 1:06 PM, Hans de Goede wrote:
>> One small remark comment below. In the hope of getting this merged
>> soon I'll prepare a v3 addressing this myself (keeping you as the author).
>
>
> A-OK from my side, the abort_reset label is indeed redundant now.
>
> (The split between start and finish is also technically redundant when we always finish after start, but I wanted to keep the change minimal.)
Right actually for undoing the moving of the finish-reset
you could also have done a straight forward revert:
git revert af93a167eda9
My v3 is pretty close to this, but not exactly a full revert
since I kept your minimal approach.
Undoing the split however does not cleanly revert.
Regards,
Hans
On 4/2/24 1:06 PM, Hans de Goede wrote:
> One small remark comment below. In the hope of getting this merged
> soon I'll prepare a v3 addressing this myself (keeping you as the author).
A-OK from my side, the abort_reset label is indeed redundant now.
(The split between start and finish is also technically redundant when
we always finish after start, but I wanted to keep the change minimal.)
On 31.03.24 20:24, Kenny Levinsen wrote:
> In af93a167eda9, i2c_hid_parse was changed to continue with reading the
> report descriptor before waiting for reset to be acknowledged.
>
> This has lead to two regressions:
Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a
6.8-rc1 regression after more than two and half weeks is not yet
mainlined? Or is there some good reason why we should be should be extra
cautious?
Side note: I noticed this due to the tracking today, but I also saw a
user that recently ran into the problem the quoted fix is supposed to
resolve: https://social.lol/@major/112294923280815017
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
#regzbot poke
> 1. We fail to handle reset acknowledgement if it happens while reading
> the report descriptor. The transfer sets I2C_HID_READ_PENDING, which
> causes the IRQ handler to return without doing anything.
>
> This affects both a Wacom touchscreen and a Sensel touchpad.
>
> 2. On a Sensel touchpad, reading the report descriptor this quickly
> after reset results in all zeroes or partial zeroes.
>
> The issues were observed on the Lenovo Thinkpad Z16 Gen 2.
>
> The change in question was made based on a Microsoft article[0] stating
> that Windows 8 *may* read the report descriptor in parallel with
> awaiting reset acknowledgement, intended as a slight reset performance
> optimization. Perhaps they only do this if reset is not completing
> quickly enough for their tastes?
>
> As the code is not currently ready to read registers in parallel with a
> pending reset acknowledgement, and as reading quickly breaks the report
> descriptor on the Sensel touchpad, revert to waiting for reset
> acknowledgement before proceeding to read the report descriptor.
>
> [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management
>
> Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor")
> Signed-off-by: Kenny Levinsen <[email protected]>
> ---
> drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
> index 2df1ab3c31cc..72d2bccf5621 100644
> --- a/drivers/hid/i2c-hid/i2c-hid-core.c
> +++ b/drivers/hid/i2c-hid/i2c-hid-core.c
> @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid)
> mutex_lock(&ihid->reset_lock);
> do {
> ret = i2c_hid_start_hwreset(ihid);
> - if (ret)
> + if (ret == 0)
> + ret = i2c_hid_finish_hwreset(ihid);
> + else
> msleep(1000);
> } while (tries-- > 0 && ret);
> + mutex_unlock(&ihid->reset_lock);
>
> if (ret)
> goto abort_reset;
> @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid)
> }
> }
>
> - /*
> - * Windows directly reads the report-descriptor after sending reset
> - * and then waits for resets completion afterwards. Some touchpads
> - * actually wait for the report-descriptor to be read before signalling
> - * reset completion.
> - */
> - ret = i2c_hid_finish_hwreset(ihid);
> abort_reset:
> clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
> - mutex_unlock(&ihid->reset_lock);
> if (ret)
> goto out;
>
On Mon, Apr 22, 2024 at 7:11 PM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> On 31.03.24 20:24, Kenny Levinsen wrote:
> > In af93a167eda9, i2c_hid_parse was changed to continue with reading the
> > report descriptor before waiting for reset to be acknowledged.
> >
> > This has lead to two regressions:
>
> Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a
> 6.8-rc1 regression after more than two and half weeks is not yet
> mainlined? Or is there some good reason why we should be should be extra
> cautious?
No special reasons I guess. Neither Jiri nor I have sent a HID update
for this rc cycle, so it's still there, waiting to be pushed.
I've been quite busy with BPF lately and dropped the ball slightly on
the HID maintainer side, but I'm sure we'll send the PR to Linus this
week or the next.
Cheers,
Benjamin
>
>
> Side note: I noticed this due to the tracking today, but I also saw a
> user that recently ran into the problem the quoted fix is supposed to
> resolve: https://social.lol/@major/112294923280815017
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> > 1. We fail to handle reset acknowledgement if it happens while reading
> > the report descriptor. The transfer sets I2C_HID_READ_PENDING, which
> > causes the IRQ handler to return without doing anything.
> >
> > This affects both a Wacom touchscreen and a Sensel touchpad.
> >
> > 2. On a Sensel touchpad, reading the report descriptor this quickly
> > after reset results in all zeroes or partial zeroes.
> >
> > The issues were observed on the Lenovo Thinkpad Z16 Gen 2.
> >
> > The change in question was made based on a Microsoft article[0] stating
> > that Windows 8 *may* read the report descriptor in parallel with
> > awaiting reset acknowledgement, intended as a slight reset performance
> > optimization. Perhaps they only do this if reset is not completing
> > quickly enough for their tastes?
> >
> > As the code is not currently ready to read registers in parallel with a
> > pending reset acknowledgement, and as reading quickly breaks the report
> > descriptor on the Sensel touchpad, revert to waiting for reset
> > acknowledgement before proceeding to read the report descriptor.
> >
> > [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management
> >
> > Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor")
> > Signed-off-by: Kenny Levinsen <[email protected]>
> > ---
> > drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++---------
> > 1 file changed, 4 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
> > index 2df1ab3c31cc..72d2bccf5621 100644
> > --- a/drivers/hid/i2c-hid/i2c-hid-core.c
> > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c
> > @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid)
> > mutex_lock(&ihid->reset_lock);
> > do {
> > ret = i2c_hid_start_hwreset(ihid);
> > - if (ret)
> > + if (ret == 0)
> > + ret = i2c_hid_finish_hwreset(ihid);
> > + else
> > msleep(1000);
> > } while (tries-- > 0 && ret);
> > + mutex_unlock(&ihid->reset_lock);
> >
> > if (ret)
> > goto abort_reset;
> > @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid)
> > }
> > }
> >
> > - /*
> > - * Windows directly reads the report-descriptor after sending reset
> > - * and then waits for resets completion afterwards. Some touchpads
> > - * actually wait for the report-descriptor to be read before signalling
> > - * reset completion.
> > - */
> > - ret = i2c_hid_finish_hwreset(ihid);
> > abort_reset:
> > clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
> > - mutex_unlock(&ihid->reset_lock);
> > if (ret)
> > goto out;
> >
>
Linus,
On 23.04.24 16:59, Benjamin Tissoires wrote:
> On Mon, Apr 22, 2024 at 7:11 PM Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
>> On 31.03.24 20:24, Kenny Levinsen wrote:
>
> [previous subject: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor]
>
>>> In af93a167eda9, i2c_hid_parse was changed to continue with reading the
>>> report descriptor before waiting for reset to be acknowledged.
>>>
>>> This has lead to two regressions:
>>
>> Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a
>> 6.8-rc1 regression after more than two and half weeks is not yet
>> mainlined? Or is there some good reason why we should be should be extra
>> cautious?
>
> No special reasons I guess. Neither Jiri nor I have sent a HID update
> for this rc cycle, so it's still there, waiting to be pushed.
> I've been quite busy with BPF lately and dropped the ball slightly on
> the HID maintainer side, but I'm sure we'll send the PR to Linus this
> week or the next.
out of interest: what's your stance on regression fixes sitting in
subsystem git trees for a week or longer before being mainlined?
The quoted patch is such a case. It fixes a regression caused by a
change that made it into 6.8-rc1, but the problem afaik was only
reported on 2024-03-19, e.g. ~nine days after 6.8 was out[1]; Kenny, the
author of the fix, apparently noticed and fixed the problem a bit later
independently[2]. Jiri merged a newer version of the fix on
2024-04-03[3], which was included in -next a day later -- the Thursday
before 6.9-rc3.
The fix thus would even have gotten two days of testing in -next, if
Benjamin or Jiri would have send it your way for that pre-release. But
from Benjamin's statement quoted above it seems the fix might even make
-rc6.
That obviously heavily reduces the time the fix will be tested before
6.9 is released.
It obviously also means that 6.8.y is as of now still unfixed, as the
stable team usually only applies fixes once they landed in mainline.
Which also means that even more people ran into the problem with
6.8.y[4] or mainline even after Jiri merged the patch into the hid tree
-- and maybe some of those people wasted their time on a bisection only
to find out that a fix exists.
That sounds, ehh, sub-optimal to me. Which is why I wonder what's your
stance here, as I encounter similar situations frequently[5] -- which
sometimes is kinda demotivating. :-/
Ciao, Thorsten
[1]
https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3]
https://lore.kernel.org/all/[email protected]/
[4] https://social.lol/@major/112294923280815017
[5] This fix for example:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=afc89870ea677bd5a44516eb981f7a259b74280c
Reports:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
> [...]
>>> 1. We fail to handle reset acknowledgement if it happens while reading
>>> the report descriptor. The transfer sets I2C_HID_READ_PENDING, which
>>> causes the IRQ handler to return without doing anything.
>>>
>>> This affects both a Wacom touchscreen and a Sensel touchpad.
>>>
>>> 2. On a Sensel touchpad, reading the report descriptor this quickly
>>> after reset results in all zeroes or partial zeroes.
>>>
>>> The issues were observed on the Lenovo Thinkpad Z16 Gen 2.
>>>
>>> The change in question was made based on a Microsoft article[0] stating
>>> that Windows 8 *may* read the report descriptor in parallel with
>>> awaiting reset acknowledgement, intended as a slight reset performance
>>> optimization. Perhaps they only do this if reset is not completing
>>> quickly enough for their tastes?
>>>
>>> As the code is not currently ready to read registers in parallel with a
>>> pending reset acknowledgement, and as reading quickly breaks the report
>>> descriptor on the Sensel touchpad, revert to waiting for reset
>>> acknowledgement before proceeding to read the report descriptor.
>>>
>>> [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management
>>>
>>> Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor")
>>> Signed-off-by: Kenny Levinsen <[email protected]>
>>> ---
>>> drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++---------
>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
>>> index 2df1ab3c31cc..72d2bccf5621 100644
>>> --- a/drivers/hid/i2c-hid/i2c-hid-core.c
>>> +++ b/drivers/hid/i2c-hid/i2c-hid-core.c
>>> @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid)
>>> mutex_lock(&ihid->reset_lock);
>>> do {
>>> ret = i2c_hid_start_hwreset(ihid);
>>> - if (ret)
>>> + if (ret == 0)
>>> + ret = i2c_hid_finish_hwreset(ihid);
>>> + else
>>> msleep(1000);
>>> } while (tries-- > 0 && ret);
>>> + mutex_unlock(&ihid->reset_lock);
>>>
>>> if (ret)
>>> goto abort_reset;
>>> @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid)
>>> }
>>> }
>>>
>>> - /*
>>> - * Windows directly reads the report-descriptor after sending reset
>>> - * and then waits for resets completion afterwards. Some touchpads
>>> - * actually wait for the report-descriptor to be read before signalling
>>> - * reset completion.
>>> - */
>>> - ret = i2c_hid_finish_hwreset(ihid);
>>> abort_reset:
>>> clear_bit(I2C_HID_RESET_PENDING, &ihid->flags);
>>> - mutex_unlock(&ihid->reset_lock);
>>> if (ret)
>>> goto out;
>>>
>>
>
>
>
On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
<[email protected]> wrote:
>
> out of interest: what's your stance on regression fixes sitting in
> subsystem git trees for a week or longer before being mainlined?
Annoying, but probably depends on circumstances. The fact that it took
a while to even be noticed presumably means it's not common or holding
anything up.
That said, th4e last HID pull I have is from March 14. If the issue is
just that there's nothing else happening, I think people should just
point me to the patch and say "can you apply this single fix?"
Linus
On 24.04.24 20:53, Linus Torvalds wrote:
> On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
> <[email protected]> wrote:
>>
>> out of interest: what's your stance on regression fixes sitting in
>> subsystem git trees for a week or longer before being mainlined?
>
> Annoying, but probably depends on circumstances. The fact that it took
> a while to even be noticed presumably means it's not common or holding
> anything up.
Well, I searched and found quite a few users that reported the problem:
https://bbs.archlinux.org/viewtopic.php?id=293971 (at least 4 people)
https://bbs.archlinux.org/viewtopic.php?id=293978 (2 people)
https://bugzilla.redhat.com/show_bug.cgi?id=2271136 (1)
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2061040 (1)
https://forums.opensuse.org/t/no-touchpad-found-el-touchpad-a-veces-es-reconocido-por-el-sistema/174100 (1)
https://oldos.me/@jay/112294956758222518 (1)
There are also these two I mentioned earlier already:
https://social.lol/@major/112294920993272987 (1)
https://lore.kernel.org/all/[email protected]/ (1)
Side note: there were more discussions about it here:
https://forums.lenovo.com/t5/Fedora/PSA-Z16-Gen-2-touchpad-not-working-on-kernel-6-8/m-p/5299530
https://www.reddit.com/r/thinkpad/comments/1bwxwnr/review_thinkpad_z16_gen_2_with_arch_linux/
https://www.reddit.com/r/linuxhardware/comments/1bwxhwa/review_thinkpad_z16_gen_2_arch_linux/
And the arch linux wiki even documents a workaround:
https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure
Those are just the reports and discussions I found. And you know how
it is: many people that struggle will never report a problem.
IMHO this all casts a bad light on our "no regression" rule, as the
fix is ready, just not mainlined and backported. And as I mentioned:
I see similar situations all the time. That's why I made noise here.
> That said, th4e last HID pull I have is from March 14. If the issue is
> just that there's nothing else happening, I think people should just
> point me to the patch and say "can you apply this single fix?"
Then I'll likely do so in my regression reports more often.
Is cherry picking from -next as easy for you? Maintainers sometimes
improve small details when merging a fix, so it might be better to
take fixes from there instead of pulling them from lore.
Ciao, Thorsten
P.S: Wondering if I should team up with the kernel package maintainers
of Arch Linux, Fedora, and openSUSE and start a git tree based on the
latest stable tree with additional fixes and reverts for regressions
not yet fixed upstream...[1] But that feels kinda wrong: it IMHO
would be better to resolve those problems quickly in the proper
upstream trees.
[1] yes, I'm fully aware that such a tree can only address some of the
issues; but from what I see that already would make quite a difference.
On Apr 25 2024, Thorsten Leemhuis wrote:
> On 24.04.24 20:53, Linus Torvalds wrote:
> > On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
> > <[email protected]> wrote:
> >>
> >> out of interest: what's your stance on regression fixes sitting in
> >> subsystem git trees for a week or longer before being mainlined?
> >
> > Annoying, but probably depends on circumstances. The fact that it took
> > a while to even be noticed presumably means it's not common or holding
> > anything up.
>
> Well, I searched and found quite a few users that reported the problem:
>
> https://bbs.archlinux.org/viewtopic.php?id=293971 (at least 4 people)
> https://bbs.archlinux.org/viewtopic.php?id=293978 (2 people)
> https://bugzilla.redhat.com/show_bug.cgi?id=2271136 (1)
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2061040 (1)
> https://forums.opensuse.org/t/no-touchpad-found-el-touchpad-a-veces-es-reconocido-por-el-sistema/174100 (1)
> https://oldos.me/@jay/112294956758222518 (1)
>
> There are also these two I mentioned earlier already:
> https://social.lol/@major/112294920993272987 (1)
> https://lore.kernel.org/all/[email protected]/ (1)
>
> Side note: there were more discussions about it here:
> https://forums.lenovo.com/t5/Fedora/PSA-Z16-Gen-2-touchpad-not-working-on-kernel-6-8/m-p/5299530
> https://www.reddit.com/r/thinkpad/comments/1bwxwnr/review_thinkpad_z16_gen_2_with_arch_linux/
> https://www.reddit.com/r/linuxhardware/comments/1bwxhwa/review_thinkpad_z16_gen_2_arch_linux/
>
> And the arch linux wiki even documents a workaround:
> https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure
>
> Those are just the reports and discussions I found. And you know how
> it is: many people that struggle will never report a problem.
>
short FYI, (I've Cc-ed you on the PR), but I just sent the PR for HID,
which includes this fix.
>
> IMHO this all casts a bad light on our "no regression" rule, as the
> fix is ready, just not mainlined and backported. And as I mentioned:
> I see similar situations all the time. That's why I made noise here.
>
>
> > That said, th4e last HID pull I have is from March 14. If the issue is
> > just that there's nothing else happening, I think people should just
> > point me to the patch and say "can you apply this single fix?"
>
> Then I'll likely do so in my regression reports more often.
>
> Is cherry picking from -next as easy for you? Maintainers sometimes
> improve small details when merging a fix, so it might be better to
> take fixes from there instead of pulling them from lore.
Maybe one suggestion that might help to reduce these kind of situations
in the future: can you configure your bot to notify the maintainers
after a couple of days that the patch has been merged that it would be
nice if they could send the PR to Linus?
In this case I bet Jiri forgot to send it because he was overloaded and
so was I. So a friendly reminder could make things go faster.
And maybe, before sending the reminder, if you could also check that the
target branch hasn't been touched in 2 days that would prevent annoyances
when we just added a commit and want to let it stay in for-next for 24h
before sending the full branch.
>
> Ciao, Thorsten
>
> P.S: Wondering if I should team up with the kernel package maintainers
> of Arch Linux, Fedora, and openSUSE and start a git tree based on the
> latest stable tree with additional fixes and reverts for regressions
> not yet fixed upstream...[1] But that feels kinda wrong: it IMHO
> would be better to resolve those problems quickly in the proper
> upstream trees.
I would also say that this is wrong. Unless all regressions go through
your tree and you then send PR to Linus, you might quickly get
overloaded because sometimes the fix can not be cherry-picked if there
is one other change just before.
However, do you have some kind of dashboard that you could share with
the package maintainers? This way they could easily compare the not-yet
applied fixes with their bugs and decide to backport them themselves.
In other words: let others do the hard work, you are doing a lot already
:)
Anyway, I really think a friendly reminder would help makes things go
faster. Something like "Hey, it seems that you applied a regression fix
that I am currently tracking and that you haven't sent the PR to Linus
yet. Could you please send it ASAP as we already have several users
reporting the issue?".
Cheers,
Benjamin
>
> [1] yes, I'm fully aware that such a tree can only address some of the
> issues; but from what I see that already would make quite a difference.
On 25.04.24 10:44, Benjamin Tissoires wrote:
> On Apr 25 2024, Thorsten Leemhuis wrote:
>> On 24.04.24 20:53, Linus Torvalds wrote:
>>> On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
>>> <[email protected]> wrote:
>> [...]
>> And the arch linux wiki even documents a workaround:
>> https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure
>>
>> Those are just the reports and discussions I found. And you know how
>> it is: many people that struggle will never report a problem.
>
> short FYI, (I've Cc-ed you on the PR), but I just sent the PR for HID,
> which includes this fix.
Great, many thx. Saw it right after sending my mail... :-/
>> Is cherry picking from -next as easy for you? Maintainers sometimes
>> improve small details when merging a fix, so it might be better to
>> take fixes from there instead of pulling them from lore.
>
> Maybe one suggestion that might help to reduce these kind of situations
> in the future: can you configure your bot to notify the maintainers
> after a couple of days that the patch has been merged that it would be
> nice if they could send the PR to Linus?
Yes, that is an idea in the long run, but I'm not sure if it's wise now
or later. People easily get annoyed by these mails (which I totally
understand!) and then will start hating the bot or regression tracking
in general. That's why I'm really careful here.
There are also the subsystems that regularly flush their fixes shortly
before a new -rc, so they likely never want to see such reminders. And
sending them right after a new -rc is better than nothing, but not ideal
either.
IOW: it's complicated. :-/
> In this case I bet Jiri forgot to send it because he was overloaded and
> so was I.
Understood and no worries. But this became a good opportunity to raise
the general problem, as that is something that bugs me. Sorry. Hope you
don't mind to much that I used that chance.
> So a friendly reminder could make things go faster.
I'll already did this occasionally manually, but that of course does not
scale. Sometimes I wonder if it would be more efficient for nearly all
of us if subsystems just flushed their -fixes branch shortly before each
new -rc, as Linus apparently is not bothered by PRs that contain just a
change or two. But that of course creates work for each of the subsystem
maintainers, unless they creates scripts to handle that work nearly for
free (it seems to me the x86 folks have something like that).
Of course that would mean...
> And maybe, before sending the reminder, if you could also check that the
> target branch hasn't been touched in 2 days that would prevent annoyances
> when we just added a commit and want to let it stay in for-next for 24h
> before sending the full branch.
..that nothing big or slightly dangerous should be merged to -fixes
branches on Fridays.
>> P.S: Wondering if I should team up with the kernel package maintainers
>> of Arch Linux, Fedora, and openSUSE and start a git tree based on the
>> latest stable tree with additional fixes and reverts for regressions
>> not yet fixed upstream...[1] But that feels kinda wrong: it IMHO
>> would be better to resolve those problems quickly in the proper
>> upstream trees.
>
> I would also say that this is wrong. Unless all regressions go through
> your tree and you then send PR to Linus, [...]
Ohh, sorry, I was not clear here, as that would be totally wrong --
fixes definitely should go through the subsystems trees, as they have
the knowledge and the infra to check them (hmm, maybe a dedicated tree
might make sense for the smaller subsystems, but let's ignore that).
What I meant was just a tree those distros could merge into their
kernels to quickly resolve issues that upstream is slow to fix. But that
obviously has downsides, too. And is yet more work.
> However, do you have some kind of dashboard that you could share with
> the package maintainers? This way they could easily compare the not-yet
> applied fixes with their bugs and decide to backport them themselves.
I have for the kernel overall, but nothing subsystem specific. But that
is pretty high on my todo list, as...
> In other words: let others do the hard work, you are doing a lot already
..I'm very well aware of this. :-/
Ciao, Thorsten