2016-03-29 02:40:25

by Andy Lutomirski

[permalink] [raw]
Subject: i915 4.5 bugfix backport and release management issue?

Hi all-

AFAICT something got rather screwed up in i915 land for 4.5.

$ git log --oneline --grep='Pretend cursor is always on' v4.5
drivers/gpu/drm/i915/
e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)

$ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
drivers/gpu/drm/i915/
e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)
b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)

The two patches there are almost, but not quite, the same thing, which
makes me wonder how they both ended up in Linus' tree without an
obvious merge conflict.

I have no idea what caused this. However, I think (on very little
inspection, but it's consistent with problems I have with 4.5 on my
laptop) that the first one is an *incorrect* fix for a regression in
4.5 and the second is a correct fix for the same regression. 4.6-rc1
seems okay.

I reported the regression and everyone involved has known about it for
weeks. Nonetheless, 4.5 final is busted.

Can you please:

a) figure out what happened and send a backport of whatever needs to
be backported to [email protected].

b) do whatever needs to be done so this doesn't happen again

c) teach the i915 CI system to test Linus' tree as-is in addition to
the development trees. Linus' tree and the versions of i915 in actual
released versions of Linux are supposed to work.


I hate to nag, but this is at least the third time I've noticed weird
release management issues in i915. I tripped on a regression a few
releases ago that was known to the i915 team and fixed but the fix
wasn't actually queued up for the current release. Before that, I
tripped on a regression caused by an intentional behavior change that
was folded in to a merge commit, making it essentially impossible to
bisect and making it pointlessly hard to understand what was going on
even once I found the offending code.

Thanks,
Andy


2016-03-29 07:43:26

by Daniel Vetter

[permalink] [raw]
Subject: Re: i915 4.5 bugfix backport and release management issue?

On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski <[email protected]> wrote:
> AFAICT something got rather screwed up in i915 land for 4.5.
>
> $ git log --oneline --grep='Pretend cursor is always on' v4.5
> drivers/gpu/drm/i915/
> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
> calculations (v2)
>
> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
> drivers/gpu/drm/i915/
> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
> calculations (v2)
> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
> calculations (v2)
>
> The two patches there are almost, but not quite, the same thing, which
> makes me wonder how they both ended up in Linus' tree without an
> obvious merge conflict.
>
> I have no idea what caused this. However, I think (on very little
> inspection, but it's consistent with problems I have with 4.5 on my
> laptop) that the first one is an *incorrect* fix for a regression in
> 4.5 and the second is a correct fix for the same regression. 4.6-rc1
> seems okay.
>
> I reported the regression and everyone involved has known about it for
> weeks. Nonetheless, 4.5 final is busted.

Quoting from e2e407dc093f

"(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"

i.e. this is intentionally twice in the history. We started to soak
bugfixes in -next and then cherry pick them because we had too much
fun with things blowing up, and also too much fun with really messy
conflicts. It's not a botched patch in 4.5 or anything else nefarious
at all.

Which just means that there's another bugfix in 4.6-rc1 that needs to
be backported to 4.5 but hasn't arrived in there yet. Two possible
cases:
- Greg is too slow with cc: stable patches - especially around
releases we occasionally put the patch into the wrong branch at first,
or it simply misses the last pull. 4.6 closes for drm drivers already
at 4.5-rc5, so there's a sizeable overlap where placing patches
correctly requires some juggling.

- We've genuinely failed to cherry-pick a bugfix over. It happens,
despite our best efforts (which of course includes running stuff on
Linus' tree). Please do a reverse bisect so we know which precise
commit fell through the cracks.

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2016-03-29 07:50:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: i915 4.5 bugfix backport and release management issue?

On Tue, Mar 29, 2016 at 12:43 AM, Daniel Vetter <[email protected]> wrote:
> On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski <[email protected]> wrote:
>> AFAICT something got rather screwed up in i915 land for 4.5.
>>
>> $ git log --oneline --grep='Pretend cursor is always on' v4.5
>> drivers/gpu/drm/i915/
>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>>
>> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
>> drivers/gpu/drm/i915/
>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>>
>> The two patches there are almost, but not quite, the same thing, which
>> makes me wonder how they both ended up in Linus' tree without an
>> obvious merge conflict.
>>
>> I have no idea what caused this. However, I think (on very little
>> inspection, but it's consistent with problems I have with 4.5 on my
>> laptop) that the first one is an *incorrect* fix for a regression in
>> 4.5 and the second is a correct fix for the same regression. 4.6-rc1
>> seems okay.
>>
>> I reported the regression and everyone involved has known about it for
>> weeks. Nonetheless, 4.5 final is busted.
>
> Quoting from e2e407dc093f
>
> "(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"
>
> i.e. this is intentionally twice in the history. We started to soak
> bugfixes in -next and then cherry pick them because we had too much
> fun with things blowing up, and also too much fun with really messy
> conflicts. It's not a botched patch in 4.5 or anything else nefarious
> at all.

Bah, sorry, I read it wrong. They have the same final state but they
were on different bases. I somehow reversed this in my head and
thought they had the same initial state and different final states.

>
> - We've genuinely failed to cherry-pick a bugfix over. It happens,
> despite our best efforts (which of course includes running stuff on
> Linus' tree). Please do a reverse bisect so we know which precise
> commit fell through the cracks.

If I find some time, I'll try. I've already failed miserably at
bisecting this thing once.

--Andy

2016-03-29 16:16:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: i915 4.5 bugfix backport and release management issue?

On Tue, Mar 29, 2016 at 12:49 AM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Mar 29, 2016 at 12:43 AM, Daniel Vetter <[email protected]> wrote:
>> On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski <[email protected]> wrote:
>>> AFAICT something got rather screwed up in i915 land for 4.5.
>>>
>>> $ git log --oneline --grep='Pretend cursor is always on' v4.5
>>> drivers/gpu/drm/i915/
>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>>
>>> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
>>> drivers/gpu/drm/i915/
>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>>
>>> The two patches there are almost, but not quite, the same thing, which
>>> makes me wonder how they both ended up in Linus' tree without an
>>> obvious merge conflict.
>>>
>>> I have no idea what caused this. However, I think (on very little
>>> inspection, but it's consistent with problems I have with 4.5 on my
>>> laptop) that the first one is an *incorrect* fix for a regression in
>>> 4.5 and the second is a correct fix for the same regression. 4.6-rc1
>>> seems okay.
>>>
>>> I reported the regression and everyone involved has known about it for
>>> weeks. Nonetheless, 4.5 final is busted.
>>
>> Quoting from e2e407dc093f
>>
>> "(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"
>>
>> i.e. this is intentionally twice in the history. We started to soak
>> bugfixes in -next and then cherry pick them because we had too much
>> fun with things blowing up, and also too much fun with really messy
>> conflicts. It's not a botched patch in 4.5 or anything else nefarious
>> at all.
>
> Bah, sorry, I read it wrong. They have the same final state but they
> were on different bases. I somehow reversed this in my head and
> thought they had the same initial state and different final states.
>

Also, sorry for the excessive diatribe. I plead sleepiness and
mis-reading of code.

--Andy

2016-03-30 08:45:14

by Daniel Vetter

[permalink] [raw]
Subject: Re: i915 4.5 bugfix backport and release management issue?

On Tue, Mar 29, 2016 at 6:16 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Mar 29, 2016 at 12:49 AM, Andy Lutomirski <[email protected]> wrote:
>> On Tue, Mar 29, 2016 at 12:43 AM, Daniel Vetter <[email protected]> wrote:
>>> On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski <[email protected]> wrote:
>>>> AFAICT something got rather screwed up in i915 land for 4.5.
>>>>
>>>> $ git log --oneline --grep='Pretend cursor is always on' v4.5
>>>> drivers/gpu/drm/i915/
>>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>>> calculations (v2)
>>>>
>>>> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
>>>> drivers/gpu/drm/i915/
>>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>>> calculations (v2)
>>>> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
>>>> calculations (v2)
>>>>
>>>> The two patches there are almost, but not quite, the same thing, which
>>>> makes me wonder how they both ended up in Linus' tree without an
>>>> obvious merge conflict.
>>>>
>>>> I have no idea what caused this. However, I think (on very little
>>>> inspection, but it's consistent with problems I have with 4.5 on my
>>>> laptop) that the first one is an *incorrect* fix for a regression in
>>>> 4.5 and the second is a correct fix for the same regression. 4.6-rc1
>>>> seems okay.
>>>>
>>>> I reported the regression and everyone involved has known about it for
>>>> weeks. Nonetheless, 4.5 final is busted.
>>>
>>> Quoting from e2e407dc093f
>>>
>>> "(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"
>>>
>>> i.e. this is intentionally twice in the history. We started to soak
>>> bugfixes in -next and then cherry pick them because we had too much
>>> fun with things blowing up, and also too much fun with really messy
>>> conflicts. It's not a botched patch in 4.5 or anything else nefarious
>>> at all.
>>
>> Bah, sorry, I read it wrong. They have the same final state but they
>> were on different bases. I somehow reversed this in my head and
>> thought they had the same initial state and different final states.
>>
>
> Also, sorry for the excessive diatribe. I plead sleepiness and
> mis-reading of code.

Thanks. We really appreciate friendly discussions here on intel-gfx,
the technical challenges are hard enough as is.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch