2011-05-12 17:15:25

by Andrew Lutomirski

[permalink] [raw]
Subject: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Thu, May 12, 2011 at 9:31 AM, Andrew Lutomirski <[email protected]> wrote:
> I just installed 9f381a6 (-linus from yesterday) on my Sandy Bridge
> desktop, and it locks up hard within a few seconds of logging in.
> netconsole says:
>
> [ ?506.629723] block group 24725422080 has an wrong amount of free space
> [ ?506.629723] block group 24725422080 has an wrong amount of free space
> [ ?506.808501] fuse init (API version 7.16)
> [ ?506.819996] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
> [ ?506.829847] SELinux: initialized (dev fusectl, type fusectl), uses
> genfs_contexts
> [ ?506.808501] fuse init (API version 7.16)
> [ ?506.819996] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
> [ ?506.829847] SELinux: initialized (dev fusectl, type fusectl), uses
> genfs_contexts
>
> If it's any help, the system is locked so hard that the reset button
> doesn't work. ?It's an Intel DQ67SW board, which apparently doesn't
> have the most reliable reset button in the world :)
>
> 2.6.38.{4,5,6} are all rock-solid on this box.
>
> I've started bisecting, but I don't expect to finish today. ?I need to
> do some work other than kernel hacking...

OK, this sucks. In the course of bisecting this, I've hit two other
apparently unrelated bugs that prevent my from testing large numbers
of kernels. Do I have two questions:

1. Anyone have any ideas from looking at the log?

It looks like most of what's left is network code, so cc netdev.

2. The !&$#@ bisection is skipping all over the place. I've seen
2.6.37 versions and all manner of -rc's out of order. Linus, and
other people who like pontificating about git bisection: is there any
way to get the bisection to follow Linus' tree? I think that if
bisect could be persuaded to consider only changes that are reached by
following only the *first* merge parent all the way from the bad
revision to the good revision, then the bisection would build versions
that were at least good enough for Linus to pull and might have fewer
bisection-killing bugs.

(This isn't a new idea [1], and git rev-list --bisect --first-parent
isn't so bad except that it doesn't bisect.)



Here's the log.

$ git bisect log
# bad: [9f381a61f58bb6487c93ce2233bb9992f8ea9211] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
# good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect start 'HEAD' 'v2.6.38'
# skip: [6899608533410557e6698cb9d4ff6df553916e98] Merge branch
'for-linus' of git://codeaurora.org/quic/kernel/davidb/linux-msm
# ******* This revision didn't build due to PSTORE.
# ******* Fixed config for the rest but no point in retrying...
git bisect skip 6899608533410557e6698cb9d4ff6df553916e98
# bad: [d3e458d78167102cc961237cfceef6fffc80c0b3] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect bad d3e458d78167102cc961237cfceef6fffc80c0b3
# good: [6445ced8670f37cfc2c5e24a9de9b413dbfc788d] Merge branch
'staging-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect good 6445ced8670f37cfc2c5e24a9de9b413dbfc788d
# bad: [40c7f2112ce18fa5eb6dc209c50dd0f046790191] Merge branch
'drm-core-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect bad 40c7f2112ce18fa5eb6dc209c50dd0f046790191
# bad: [23b41168fc942a4a041325a04ecc1bd17d031a3e] netdevice: make
initial group visible to userspace
git bisect bad 23b41168fc942a4a041325a04ecc1bd17d031a3e
# bad: [c0c84ef5c130f8871adbdaac2ba824b9195cb6d9] Merge branch
'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
git bisect bad c0c84ef5c130f8871adbdaac2ba824b9195cb6d9
# skip: [3ad97fbcc233a295f2ccc2c6bdeb32323e360a5e] mac80211: remove
unneeded check
# ******* This revision hangs at edd=off
git bisect skip 3ad97fbcc233a295f2ccc2c6bdeb32323e360a5e
# skip: [5bec3e5ade813ee4bdbab03af1bb6f85859272ea] ath9k: fix tx queue
index confusion in debugfs code
# ******* This revision hangs at edd=off
git bisect skip 5bec3e5ade813ee4bdbab03af1bb6f85859272ea
# skip: [c210de8f88215db31cf3529c9763fc3124d6e09d] ath5k: Fix fast
channel switching
# ******* This revision hangs at edd=off
git bisect skip c210de8f88215db31cf3529c9763fc3124d6e09d

# ******* For added fun, 479600777bb588724d044815415f7d708d06644b gets
stuck in systemd initialization.

--Andy


2011-05-12 17:38:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Thu, May 12, 2011 at 10:15 AM, Andrew Lutomirski <[email protected]> wrote:
>
> OK, this sucks. ?In the course of bisecting this, I've hit two other
> apparently unrelated bugs that prevent my from testing large numbers
> of kernels. ?Do I have two questions:
>
> 1. Anyone have any ideas from looking at the log?

Nope, that doesn't look very helpful.

> 2. ?The !&$#@ bisection is skipping all over the place. ?I've seen
> 2.6.37 versions and all manner of -rc's out of order.

That's the _point_ of bisection. It jumps around. You can start off
trying to pick points on my development tree, but I only have a
hundred merges or so. You're going to start delving into the actual
development versions very quickly. And if you don't do it early,
bisection is going to be much much slower, because it's not going to
pick half-way points.

So bisection works so well exactly because it picks points that are
far away from each other, and you should just totally ignore the
version number. It's meaningless. Looking at it just confuses you.
Don't do it.

Now, "pick stable points" would obviously be nice, but that is going
to have to be manual. You can certainly make some helper scripts, and
that's where that "--first-parent" thing comes in. So if you want to,
just use "git bisect reset" to the commit you want to test.

If you think it's networking, for example, and you've bisected into
there but aren't sure, do "gitk --bisect", find the point where I
merge, and pick that (and my parent), and "git bisect reset" those
points. That way you can verify that it's the networking merge (or
verify that it isn't).

Linus

2011-05-12 19:26:44

by Johannes Sixt

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

Am 12.05.2011 19:37, schrieb Linus Torvalds:
> If you think it's networking, for example, and you've bisected into
> there but aren't sure, do "gitk --bisect", find the point where I
> merge, and pick that (and my parent), and "git bisect reset" those
> points.

Except that you should git reset --hard; git bisect reset gets you out
of bisect-mode, no?

-- Hannes

2011-05-12 19:17:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Thu, May 12, 2011 at 11:54 AM, Johannes Sixt <[email protected]> wrote:
>
> Except that you should git reset --hard; git bisect reset gets you out
> of bisect-mode, no?

Yeah, sorry, my bad.

Linus

2011-05-13 08:21:00

by Christian Couder

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Thu, May 12, 2011 at 7:15 PM, Andrew Lutomirski <[email protected]> wrote:
>
> OK, this sucks. ?In the course of bisecting this, I've hit two other
> apparently unrelated bugs that prevent my from testing large numbers
> of kernels. ?Do I have two questions:
>
> 1. Anyone have any ideas from looking at the log?
>
> It looks like most of what's left is network code, so cc netdev.
>
> 2. ?The !&$#@ bisection is skipping all over the place. ?I've seen
> 2.6.37 versions and all manner of -rc's out of order. ?Linus, and
> other people who like pontificating about git bisection: is there any
> way to get the bisection to follow Linus' tree? ?I think that if
> bisect could be persuaded to consider only changes that are reached by
> following only the *first* merge parent all the way from the bad
> revision to the good revision, then the bisection would build versions
> that were at least good enough for Linus to pull and might have fewer
> bisection-killing bugs.
>
> (This isn't a new idea [1], and git rev-list --bisect --first-parent
> isn't so bad except that it doesn't bisect.)

Did you forget to put the reference [1] in your email? Was it this one
you were thinking about:

http://thread.gmane.org/gmane.comp.version-control.git/165433/

?

Thanks,
Christian.

2011-05-13 13:38:40

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 4:20 AM, Christian Couder
<[email protected]> wrote:
> On Thu, May 12, 2011 at 7:15 PM, Andrew Lutomirski <[email protected]> wrote:
>>
>> OK, this sucks. ?In the course of bisecting this, I've hit two other
>> apparently unrelated bugs that prevent my from testing large numbers
>> of kernels. ?Do I have two questions:
>>
>> 1. Anyone have any ideas from looking at the log?
>>
>> It looks like most of what's left is network code, so cc netdev.
>>
>> 2. ?The !&$#@ bisection is skipping all over the place. ?I've seen
>> 2.6.37 versions and all manner of -rc's out of order. ?Linus, and
>> other people who like pontificating about git bisection: is there any
>> way to get the bisection to follow Linus' tree? ?I think that if
>> bisect could be persuaded to consider only changes that are reached by
>> following only the *first* merge parent all the way from the bad
>> revision to the good revision, then the bisection would build versions
>> that were at least good enough for Linus to pull and might have fewer
>> bisection-killing bugs.
>>
>> (This isn't a new idea [1], and git rev-list --bisect --first-parent
>> isn't so bad except that it doesn't bisect.)
>
> Did you forget to put the reference [1] in your email? Was it this one
> you were thinking about:
>
> http://thread.gmane.org/gmane.comp.version-control.git/165433/

No, it was this:

http://stackoverflow.com/questions/5638211/how-do-you-get-git-bisect-to-ignore-merged-branches

--Andy

>
> ?
>
> Thanks,
> Christian.
>

2011-05-13 13:39:37

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

[resend because the Android gmail client apparently generates HTML
emails even for plain text]

On Thu, May 12, 2011 at 1:37 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, May 12, 2011 at 10:15 AM, Andrew Lutomirski <[email protected]> wrote:
>>
>> OK, this sucks. ?In the course of bisecting this, I've hit two other
>> apparently unrelated bugs that prevent my from testing large numbers
>> of kernels. ?Do I have two questions:
>>
>> 1. Anyone have any ideas from looking at the log?
>
> Nope, that doesn't look very helpful.
>
>> 2. ?The !&$#@ bisection is skipping all over the place. ?I've seen
>> 2.6.37 versions and all manner of -rc's out of order.
>
> That's the _point_ of bisection. It jumps around. You can start off
> trying to pick points on my development tree, but I only have a
> hundred merges or so. You're going to start delving into the actual
> development versions very quickly. And if you don't do it early,
> bisection is going to be much much slower, because it's not going to
> pick half-way points.
>
> So bisection works so well exactly because it picks points that are
> far away from each other, and you should just totally ignore the
> version number. It's meaningless. Looking at it just confuses you.
> Don't do it.
>

I actually had better results looking at the version number, saying
"blech", and running git merge v2.6.38. (git bisect good gets a
little confused if I feed it the merge result, but I can just lie.)

Anyway, I must have made a mistake somewhere. The regression is in
drm (presumably i915) and it has a new thread now.

--Andy

2011-05-13 14:57:17

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 9:38 AM, Andrew Lutomirski <[email protected]> wrote:
> On Fri, May 13, 2011 at 4:20 AM, Christian Couder
> <[email protected]> wrote:
>> On Thu, May 12, 2011 at 7:15 PM, Andrew Lutomirski <[email protected]> wrote:
>>>
>>> OK, this sucks. ?In the course of bisecting this, I've hit two other
>>> apparently unrelated bugs that prevent my from testing large numbers
>>> of kernels. ?Do I have two questions:
>>>
>>> 1. Anyone have any ideas from looking at the log?
>>>
>>> It looks like most of what's left is network code, so cc netdev.
>>>
>>> 2. ?The !&$#@ bisection is skipping all over the place. ?I've seen
>>> 2.6.37 versions and all manner of -rc's out of order. ?Linus, and
>>> other people who like pontificating about git bisection: is there any
>>> way to get the bisection to follow Linus' tree? ?I think that if
>>> bisect could be persuaded to consider only changes that are reached by
>>> following only the *first* merge parent all the way from the bad
>>> revision to the good revision, then the bisection would build versions
>>> that were at least good enough for Linus to pull and might have fewer
>>> bisection-killing bugs.
>>>
>>> (This isn't a new idea [1], and git rev-list --bisect --first-parent
>>> isn't so bad except that it doesn't bisect.)
>>
>> Did you forget to put the reference [1] in your email? Was it this one
>> you were thinking about:
>>
>> http://thread.gmane.org/gmane.comp.version-control.git/165433/
>
> No, it was this:
>
> http://stackoverflow.com/questions/5638211/how-do-you-get-git-bisect-to-ignore-merged-branches
>

Sadly even that's not enough. I finished the bisection (by
standard-ish techniques but with a bit of overriding of git bisect's
choices and occasional merging of newer versions of -linus to get
something that would boot) and it pointed to a commit that wasn't the
culprit.

The problem is that 2.6.39-rc7 is bad, 2.6.38 (and 2.6.38.{5,6}) is
good, but 2.6.38-rc5 is bad and fails identically to 2.6.39-rc7. I
think that git bisect makes the assumption that ancestors of a good
commit are good, and that just isn't true for this bug.

So what I really want is a fancy version of git bisect that makes no
assumptions about the relationship of good and bad commits in the
graph and just finds me a commit that is bad but for which all parents
are good or vice versa.

I'm currently bisecting the other way to find the commit before 2.6.38
that fixed the bug, since there's presumably less churn there than in
the early bits of 2.6.39.

--Andy

2011-05-13 16:11:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 7:56 AM, Andrew Lutomirski <[email protected]> wrote:
>
> So what I really want is a fancy version of git bisect that makes no
> assumptions about the relationship of good and bad commits in the
> graph and just finds me a commit that is bad but for which all parents
> are good or vice versa.

Ehh. That's the "non-fancy" way of testing, I'm afraid: if you cannot
make assumption about the relationship between good and bad commits,
then you have to test _every_ commit.

So yes, bisection has its problems. But they really do come from the
fact that it's very efficient. When you have (on average) about ten
thousand commits between releases, you have to make assumptions about
the relationships. But once you do that, the efficiency also results
in a certain fragility.

Think of it as a compression method: it generates the smallest
possible set of test points for you. But it's a "lossy" compression -
you don't test everything. And it's extreme: it boils down 10k commit
events to about 13 bisection events. If anything goes wrong (like the
bug not being entirely repeatable, or the bug comes and goes), it will
give the wrong answer.

The good news is that _usually_ it works really well. And when the
choice is between "works really well for 10k commits but can have
problems" and "you need to test all 10k commits", the "can have
problems" part turns out to be a pretty small downside ;)

Linus

2011-05-13 16:13:45

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 12:11 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, May 13, 2011 at 7:56 AM, Andrew Lutomirski <[email protected]> wrote:
>>
>> So what I really want is a fancy version of git bisect that makes no
>> assumptions about the relationship of good and bad commits in the
>> graph and just finds me a commit that is bad but for which all parents
>> are good or vice versa.
>
> Ehh. That's the "non-fancy" way of testing, I'm afraid: if you cannot
> make assumption about the relationship between good and bad commits,
> then you have to test _every_ commit.
>
> So yes, bisection has its problems. But they really do come from the
> fact that it's very efficient. When you have (on average) about ten
> thousand commits between releases, you have to make assumptions about
> the relationships. But once you do that, the efficiency also results
> in a certain fragility.
>
> Think of it as a compression method: it generates the smallest
> possible set of test points for you. But it's a "lossy" compression -
> you don't test everything. And it's extreme: it boils down 10k commit
> events to about 13 bisection events. If anything goes wrong (like the
> bug not being entirely repeatable, or the bug comes and goes), it will
> give the wrong answer.
>
> The good news is that _usually_ it works really well. And when the
> choice is between "works really well for 10k commits but can have
> problems" and "you need to test all 10k commits", the "can have
> problems" part turns out to be a pretty small downside ;)

In conclusion, I found the problem. It's a clusterfuck and I think
there's no way that any bisection tool under any sane assumptions
could have found it. Patch coming in a couple seconds b/c I think it
needs to go in to 2.6.39.

--Andy

>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Linus
>

2011-05-13 17:24:25

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 12:11 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, May 13, 2011 at 7:56 AM, Andrew Lutomirski <[email protected]> wrote:
>>
>> So what I really want is a fancy version of git bisect that makes no
>> assumptions about the relationship of good and bad commits in the
>> graph and just finds me a commit that is bad but for which all parents
>> are good or vice versa.
>
> Ehh. That's the "non-fancy" way of testing, I'm afraid: if you cannot
> make assumption about the relationship between good and bad commits,
> then you have to test _every_ commit.

Actually, I disagree. I suspect, although I haven't convinced myself
very well yet, that if you assume that the bug was caused one or more
times by some commit C that works but where all of C's parents don't
work (or vice versa), then there exists an algorithm that, at least
for most histories, will find such a commit in polylog tries given a
starting commit that works and another one that fails. But I have to
do real work before I think too much more about that.

That being said, even the fairly weak requirement I wanted wasn't really true...

[I said in a different email:]
>
> In conclusion, I found the problem. It's a clusterfuck and I think
> there's no way that any bisection tool under any sane assumptions
> could have found it. Patch coming in a couple seconds b/c I think it
> needs to go in to 2.6.39.

I should clarify what the problem was for people who don't want to dig
around the archives:

I have a Sandy Bridge box, which means that I need to run a recent
kernel for things to work decently. The bug was introduced once way
back in the depths of time (i.e. before any kernel that I ever tried
since I got the machine). It was fixed shortly before 2.6.38 by
commit A. It was reintroduced in a merge B that was a little past A.
B went in to 2.6.39-something via airlied's tree. B's other parent
was bad because it didn't contain A. It looks like this:

-------------------------------.
\
(bad pre-2.6.38-rc2)--. \ (etc)
\ \
.--(good)-----B--(bad)-. \
/ \ \
(bad)---A--(good)--v2.6.38---------x-x-v2.6.39-rc7


(A is a1656b9090f7008d2941c314f5a64724bea2ae37 and B is
47ae63e0c2e5fdb582d471dc906eb29be94c732f)


The offending commit is B, but the bisection is screwed, because the
series of nonworking commits dangling off B looks just like any other
series of nonworking commits like the top line that have nothing to do
with the problem. Sure enough, my bisection ended up wandering into
dark corners (like the networking tree), which were innocent.

I found the problem by manually bisecting the --first-parent chain
from v2.6.39-rc7 to v2.6.38 to figure out that the problem came from a
drm merge and then noticing that something was screwed up when the
bisection pointed to a commit (in the right driver, even) that wasn't
the problem. (I even tried reverting it to no avail.) Bisection was
*sure* it was the problem, though, because its parent was in v2.6.38.

I thought that maybe the problem had been introduced more than once,
so I tried v2.6.38-rc5, and it *failed*. (That's what caused a lot of
my confusion the first time around -- lots of commits that were "good"
(in the sense that they would work if merged correctly into the
v2.6.39 branch before B got there) failed instead.

So I bisected between v2.6.38 and v2.6.38-rc5 to find the commit that
fixed the problem, since there had to be something. Once I found it,
a bunch of confused calls to git blame found the merge that undid the
fix.

> Think of it as a compression method: it generates the smallest
> possible set of test points for you. But it's a "lossy" compression -
> you don't test everything. And it's extreme: it boils down 10k commit
> events to about 13 bisection events. If anything goes wrong (like the
> bug not being entirely repeatable, or the bug comes and goes), it will
> give the wrong answer.

As I just learned :)

--Andy

2011-05-13 17:55:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 10:24 AM, Andrew Lutomirski <[email protected]> wrote:
> On Fri, May 13, 2011 at 12:11 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Ehh. That's the "non-fancy" way of testing, I'm afraid: if you cannot
>> make assumption about the relationship between good and bad commits,
>> then you have to test _every_ commit.
>
> Actually, I disagree. ?I suspect, although I haven't convinced myself
> very well yet, that if you assume that the bug was caused one or more
> times by some commit C that works but where all of C's parents don't
> work (or vice versa), then there exists an algorithm that, at least
> for most histories, will find such a commit in polylog tries given a
> starting commit that works and another one that fails. ?But I have to
> do real work before I think too much more about that.

So I do think we could probably add a few more concepts to git-bisect
that could be quite useful.

For example, in your case, since you had certain requirements of
support that simply didn't exist earlier, something like

git bisect requires v2.6.38

would have been really useful - telling git bisect that any commit
that cannot reach that required commit is not even worth testing.

That would still have been rather dangerous thing to say (it's not
actually a _true_ requirement: there may well be points in the i915
development tree that still had all the required sandybridge support,
but hadn't been merged into 38 yet), but it would have limited your
bisection space to a degree that would have been useful.

So if that "requirement" wasn't actually true (and the bug was
introduced by a commit that was based on something before v2.6.38),
the bisect would have pinpointed the particular merge that brought the
commit in. So "pinpointed" might in this case mean "thousands of
commits", but it would still likely be a very useful end result.

And no, git-bisect doesn't have that kind of concept. And it could
potentially be quite useful.

Another thing that would be useful for git bisect would be the notion
of "git bisect cherry-pick", which is useful for applying particular
commits that fix unrelated problems _while_ you bisect the one you're
interested in. You can currently do it manually, or by playing around
with 'git bisect run' and making hacky stuff, but it's a pain. You
didn't hit that case, but it's actually the most common problem there
is with git bisect - having multiple _different_ bugs, rather than
having the same bug show up twice.

Yet another issue - related to the "multiple different bugs" thing -
is exactly the fact that 'git bisect' only has a concept of a "single
bug". You cannot say "this revision is good, that revision has bug A,
that revision has bug B", where bug A might hide bug B and vice versa.
If you have multiple bugs and they change symptoms, it can be _really_
painful to bisect things, because you have to basically always pick
one of them, and then re-do the whole thing after you've found the
first one.

So there's no question that there might not be things we would want to
do with "git bisect".

Of course, one of the real advantages of "git bisect" is that for many
cases it's pretty simple. You can (and we absolutely rely on this)
have normal users that have _no_ idea about kernel development do a
bisect - the only thing they need to be able to do is to compile and
install their own kernel, and reliably recognize the problematic
symptoms.

And that's really the biggest advantage of bisecting - it doesn't
_always_ work, but it works often enough, and it's totally mindless.
So clever features and extra complexity and smart things that can be
done with it is often not all that useful - because a major user base
is very much the "I don't know kernel development, but I want to help
and my machine shows badness" kind of situation.

Linus

2011-05-13 18:34:13

by Johannes Sixt

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

Am 13.05.2011 19:54, schrieb Linus Torvalds:
> For example, in your case, since you had certain requirements of
> support that simply didn't exist earlier, something like
>
> git bisect requires v2.6.38
>
> would have been really useful - telling git bisect that any commit
> that cannot reach that required commit is not even worth testing.

You can already have this with

git bisect good v2.6.38

It sounds a bit unintuitive, but with a slight mind-twist it can even be
regarded as correct in a mathematical sense: when the precondition is
false, the result is true. ;-)

-- Hannes

2011-05-13 18:42:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 11:34 AM, Johannes Sixt <[email protected]> wrote:
> Am 13.05.2011 19:54, schrieb Linus Torvalds:
>> For example, in your case, since you had certain requirements of
>> support that simply didn't exist earlier, something like
>>
>> ? ?git bisect requires v2.6.38
>>
>> would have been really useful - telling git bisect that any commit
>> that cannot reach that required commit is not even worth testing.
>
> You can already have this with
>
> ? git bisect good v2.6.38
>
> It sounds a bit unintuitive, but with a slight mind-twist it can even be
> regarded as correct in a mathematical sense: when the precondition is
> false, the result is true. ;-)

No. That's not the same thing AT ALL.

When you say that v2.6.38 is good, that means that everything that can
be reached from 2.6.38 is good.

NOT AT ALL the same thing as "git bisect requires v2.6.38" would be.

The "requires v2.6.38" would basically say that anything that doesn't
contain v2.6.38 is "off-limits". It's fine to call them "good", but
that's not the same thing as "git bisect good v2.6.38".

Why?

Think about it. It's the "reachable from v2.6.38" vs "cannot reach
v2.6.38" difference. That's a HUGE difference.

Linus

2011-05-13 18:47:40

by Johannes Sixt

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

Am 13.05.2011 20:41, schrieb Linus Torvalds:
> On Fri, May 13, 2011 at 11:34 AM, Johannes Sixt <[email protected]> wrote:
>> git bisect good v2.6.38
>
> When you say that v2.6.38 is good, that means that everything that can
> be reached from 2.6.38 is good.
>
> NOT AT ALL the same thing as "git bisect requires v2.6.38" would be.
>
> Think about it. It's the "reachable from v2.6.38" vs "cannot reach
> v2.6.38" difference. That's a HUGE difference.

Oops, you're right, I got it upside-down.

Thanks,
-- Hannes

2011-05-13 18:49:19

by Junio C Hamano

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

Linus Torvalds <[email protected]> writes:

> When you say that v2.6.38 is good, that means that everything that can
> be reached from 2.6.38 is good.
>
> NOT AT ALL the same thing as "git bisect requires v2.6.38" would be.
>
> The "requires v2.6.38" would basically say that anything that doesn't
> contain v2.6.38 is "off-limits". It's fine to call them "good", but
> that's not the same thing as "git bisect good v2.6.38".
>
> Why?
>
> Think about it. It's the "reachable from v2.6.38" vs "cannot reach
> v2.6.38" difference. That's a HUGE difference.

Could you please clarify "off-limits"?

Do you mean "anything before v2.6.38 did not even have this feature, so
the result of testing a version in that range does not give us any
information"? The feature didn't even exist, so a bug can never trigger,
and seeing "good" from such a version does not mean everything reachable
from it is good? Upon seeing "bad" result from a version before v2.6.38,
what can we conclude? The breakage cannot possibly come from the feature
that is being checked, so the procedure to check itself is busted?

2011-05-13 18:55:25

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 2:48 PM, Junio C Hamano <[email protected]> wrote:
> Linus Torvalds <[email protected]> writes:
>
>> When you say that v2.6.38 is good, that means that everything that can
>> be reached from 2.6.38 is good.
>>
>> NOT AT ALL the same thing as "git bisect requires v2.6.38" would be.
>>
>> The "requires v2.6.38" would basically say that anything that doesn't
>> contain v2.6.38 is "off-limits". It's fine to call them "good", but
>> that's not the same thing as "git bisect good v2.6.38".
>>
>> Why?
>>
>> Think about it. It's the "reachable from v2.6.38" vs "cannot reach
>> v2.6.38" difference. That's a HUGE difference.
>
> Could you please clarify "off-limits"?
>
> Do you mean "anything before v2.6.38 did not even have this feature, so
> the result of testing a version in that range does not give us any
> information"? ?The feature didn't even exist, so a bug can never trigger,
> and seeing "good" from such a version does not mean everything reachable
> from it is good? ?Upon seeing "bad" result from a version before v2.6.38,
> what can we conclude? ?The breakage cannot possibly come from the feature
> that is being checked, so the procedure to check itself is busted?
>

In my case, if I'd given bisect a hint that commits that don't include
v2.6.38 are unlikely to work for reasons unrelated to the bug, then
there should still have been enough revisions left for bisect to tell
me "the bug was introduced by the merge of the drm tree but I can't
tell you more without testing off-limits revisions". That would have
avoided testing three or four revisions that just failed to boot.

In my particular case I think it would also have avoided an
unnecessary set of tests to figure out why the networking merge broke
my system when the networking merge did not, in fact, break my system.
This is coincidence -- all of the commits that didn't have the change
that fixed the bug the first time around also didn't contain v2.6.38,
so I never would have tested them.

This is maybe some further justification for a bisect mode that
follows the --first-parent path as long as possible -- it might take
one or two more kernel builds, but it avoids odd trips around the
history that can hit random crap like that. (Of course, it could lead
to different random crap, but what can you do?)

--Andy

--Andy

>
>

2011-05-13 19:19:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: AAARGH bisection is hard (Re: [2.6.39 regression] X locks up hard right after logging in)

On Fri, May 13, 2011 at 11:48 AM, Junio C Hamano <[email protected]> wrote:
>
> Could you please clarify "off-limits"?
>
> Do you mean "anything before v2.6.38 did not even have this feature, so
> the result of testing a version in that range does not give us any
> information"?

Well, I think it's useful in two cases.

It's useful for the "before this version, the test we're doing doesn't
even make sense and cannot succeed" sense.

That doesn't have to be about hardware support, it could be any
feature. For example, in git, say that you noticed that
--dirstat-by-file stopped working at some point. You know it was good
when you merged it, so you'd do

git bisect start
git bisect good ac9666f84a59

but you'd also go "that's also when I introduced the *test* for it, so
I'll need to require that":

git bisect requires ac9666f84a59

and then you can start it all off:

git bisect bad
git bisect run sh -c "make test"

or whatever.

Because you don't want to go into the merges that were based on code
that didn't even _have_ that feature.

Ok, so that's a made-up and contrieved example (it would make more
sense for when you add a whole new flag, and your test-script is
testign that new functionality), but it kind of explains the notion:
it will not bother to run bisect on code that simply isn't _relevant_
for the issue you are bisecting.

>?Upon seeing "bad" result from a version before v2.6.38, what can we conclude?

The point would be that such versions aren't even _testable_. So the
whole "seeing 'bad'" concept is a NULL concept. It's like the above
"new command line flag to 'git'" example: it's not that those commits
might not have broken something, but those commits are crazy to test.

If it turns out that a merge brought in the breakage, we'd have to do
a totally new kind of test thing. But from a bisect standpoint, it's
already very interesting if the end result is "hey, you merged that
code that didn't even _support_ the feature we're testing, and that
broke it". That gives quite a bit of information, and opens up new
avenues for testing.

For example, at that point, we might decide that "Oh, ok, now I will
need to re-run the bisect for everthing that came in in that merge,
but I will do a new merge at that point to see which commit it is that
doesn't play nice with the new feature".

>?The breakage cannot possibly come from the feature
> that is being checked, so the procedure to check itself is busted?

Right.

HOWEVER.

There's another reason to say "require version XYZ", and that's
essentially a "I want to do a (quicker) high-level bisect". Especially
the way the kernel merge window is done, it might be that versions
prior to v2.6.38 work perfectly _fine_, but what you want to do is to
quickly bisect down to which subsystem caused breakage.

A good way to do that would be to just say "requires v2.6.38", and
suddenly the actual set of commits that we're going to bisect is going
to be *much* smaller. We're basically throwing away all the individual
commits that were merged in the merge window, and saying something
that approximates to "we are only interested in the merge points".

Why would we do that? Just to get a quicker "this is the problematic
subsystem". So the "requires xyz" might be quite useful for that
reason too.

Linus