2002-11-04 12:01:40

by Richard J Moore

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.



> What he really wants is for Andrew or Alan or someone else he trusts
> to merge it, get actual field results, and declare it useful. If
> people start visibly passing around crash dump results on l-k and
> solving problems with them, that'll help too. Until then all he has is
> his gut feel to go on.

Are you sure? Isn't what Linus is saying is that he understands that some
problems can be solved using dumps, some from the oops message and some by
source code inspection and some by others means. But, he's not interested
in a timely resolution; he has a preference for solving the problems by
looking at the source and only that way. That's his preference: arguments
relating to timeliness and commercial considerations are of no interest to
him - simply because they argue for benefits in which he has no interest.
Because LKCD doesn't personally interest him he has declared that he will
not merge it; it' up to some trusted advocate.

So, for those of use who passionately care whether Linux has a system
dumping mechanism, we need to regroup, we need to decide the correct
strategy for gaining LKCD's inclusion into the kernel. Many of the
arguments relate to timeliness and ultimately have a commercial benefit. I
suggest we actively campaign among the various distros who are interested
in selling Linus businesses and provide support. We also need to
concentrate on consolidating the various requirements of a system crash
dump - it's going to be much easier for everyone if there is a consensus on
system dumping technology.


First crucial question - are there any avenues still open for 2.5?


Richard J Moore
RAS Project Lead - IBM Linux Technology Centre



2002-11-04 12:21:45

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.

On 2002-11-04T11:59:23,
Richard J Moore <[email protected]> said:

> So, for those of use who passionately care whether Linux has a system
> dumping mechanism, we need to regroup, we need to decide the correct
> strategy for gaining LKCD's inclusion into the kernel. Many of the
> arguments relate to timeliness and ultimately have a commercial benefit. I
> suggest we actively campaign among the various distros who are interested
> in selling Linus businesses and provide support. We also need to
> concentrate on consolidating the various requirements of a system crash
> dump - it's going to be much easier for everyone if there is a consensus on
> system dumping technology.

I think you are somewhat missing the point.

Both RH and UnitedLinux seem to care enough for system dump facilities that
they ship patched kernels (netdump / LKCD, respectively). Anyone who cares can
simply apply the patch themselves, if they want to compile from vanilla
sources. Just buy RH AS or any enterprise product powered by United Linux, and
off you go. I assume that your "enterprise customers" will want to do that
anyway because they need all those very useful certifications...

And since l-k (rightly!) mostly refuses to deal with crash/oops reports from
vendor patched kernels anyway, the distributors have to deal with the
diagnosis themselves already and do so as part of the support contracts.
Anyone who runs their own patched kernels probably also is able to do so.

While I can see the issue that having the patch included in the mainstream
kernel offers the usual advantages, it is by no means the absolute requirement
you make it out to be.

It appears that the facilities are all there now; so 2.6 should be a the
perfect time to test the various approaches in the field. (And face it, field
experience is rather limitted still, but I am very sure it will grow soon
because it is such a useful feature)

Then it can be included. This is how Linux has always worked. reiserfs has
gone through this, as has ext3, XFS, quite a few of the VM patches etc. So no
worries, nobody is being exceptionally harsh in any fashion.

But arguing about "I have so many fortune 100 companies just lined up ready to
say that they support this campaign!" is marketing speak. Go away with that
from Linux kernel, will you.

Come back when it is "I have so many fortune 100 companies actively using this
feature and have solved many problems with it!".


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur

2002-11-04 12:29:04

by naoise

[permalink] [raw]
Subject: Re: [lkcd-devel] Re: [lkcd-general] Re: What's left over.

>
>
> > What he really wants is for Andrew or Alan or someone else he trusts
> > to merge it, get actual field results, and declare it useful. If
> > people start visibly passing around crash dump results on l-k and
> > solving problems with them, that'll help too. Until then all he has is
> > his gut feel to go on.
>
> Are you sure? Isn't what Linus is saying is that he understands that some
> problems can be solved using dumps, some from the oops message and some by
> source code inspection and some by others means. But, he's not interested
> in a timely resolution; he has a preference for solving the problems by
> looking at the source and only that way. That's his preference: arguments
> relating to timeliness and commercial considerations are of no interest to
> him - simply because they argue for benefits in which he has no interest.
> Because LKCD doesn't personally interest him he has declared that he will
> not merge it; it' up to some trusted advocate.

Richard, IMHO what Linus is trying to say is that he wants proof that problems
are solved using crash dumps by developers also outside of the corporations. He wants LKCD to have an audience before he wants to include it. Linus strongly
believes that "oops scribbles" en source code reading is his way of debugging the kernel and is sure of the fact that he doesn't need LKCD. But he leaves the
rest of the developpers free to use LKCD in problem solving. He leaves it up to Alan (for example) to include in a tree for audition. Once he is convinced
enough that enough people (users/developers and also user/developer outside of
businesses) use it, I strongly believe he will merge LKCD into the mainstream
tree.

I for myself think crash dumps make life a lot easier in debugging the kernel. And not only for that, also for teaching the inner workings of the Linux kernel to students. Crash dumps are equivalent of coredumps for processes. You can see
the state of the machine at a given time.

>
> So, for those of use who passionately care whether Linux has a system
> dumping mechanism, we need to regroup, we need to decide the correct
> strategy for gaining LKCD's inclusion into the kernel. Many of the
> arguments relate to timeliness and ultimately have a commercial benefit. I
> suggest we actively campaign among the various distros who are interested
> in selling Linus businesses and provide support. We also need to
> concentrate on consolidating the various requirements of a system crash
> dump - it's going to be much easier for everyone if there is a consensus on
> system dumping technology.

TurboLinux advertised they used LKCD in their distro. Unfortunately the demo didn't contain LKCD.

>
>
> First crucial question - are there any avenues still open for 2.5?

I suggest the road to late 2.5 or early 2.6 is by turning left on Cox' road. But let's ask him. Alan?

> Richard J Moore
> RAS Project Lead - IBM Linux Technology Centre
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: ApacheCon, November 18-21 in
> Las Vegas (supported by COMDEX), the only Apache event to be
> fully supported by the ASF. http://www.apachecon.com
> _______________________________________________
> lkcd-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/lkcd-devel
>

2002-11-04 16:11:23

by John Alvord

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.

On Mon, 4 Nov 2002 11:59:23 +0000, "Richard J Moore"
<[email protected]> wrote:

>
>
>> What he really wants is for Andrew or Alan or someone else he trusts
>> to merge it, get actual field results, and declare it useful. If
>> people start visibly passing around crash dump results on l-k and
>> solving problems with them, that'll help too. Until then all he has is
>> his gut feel to go on.
>
>Are you sure? Isn't what Linus is saying is that he understands that some
>problems can be solved using dumps, some from the oops message and some by
>source code inspection and some by others means. But, he's not interested
>in a timely resolution; he has a preference for solving the problems by
>looking at the source and only that way. That's his preference: arguments
>relating to timeliness and commercial considerations are of no interest to
>him - simply because they argue for benefits in which he has no interest.
>Because LKCD doesn't personally interest him he has declared that he will
>not merge it; it' up to some trusted advocate.

What you describe is certainly Linus' general philosophy.

But he also said that the feature was in "vendor push" mode, which
means if enough vendors adopt the feature he would consider. Why do
you think reisferfs got into the mainline - certainly not because he
uses it personally.

He also said he has seen no evidence of its usefulness... not one
report on L-K of kernel problems resolved.

Seems pretty clear to me...

john alvord

2002-11-04 16:17:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.


On Mon, 4 Nov 2002, Richard J Moore wrote:
>
> Are you sure? Isn't what Linus is saying is that he understands that some
> problems can be solved using dumps, some from the oops message and some by
> source code inspection and some by others means. But, he's not interested
> in a timely resolution;

Ok, with tons of explanation:

- I'm clearly not interested. I've not seen any discussion of the usage
of the tools or how great it is, and that's apparently because all the
LKCD people are off in their own mailing lists and do not want to have
anything to do with the rest of the world. Except when they come out of
the blue one week before feature freeze and _demand_ that I accept
their patches that I've never seen before or heard anybody talk about.

Hint: think about this part. Deeply. And then go and bother SOMEBODY
ELSE.

- Since I'm not personally convinced, it's not going into my tree.

It's as simple as that. I take stuff that I feel is good. Often that
feeling of goodness comes from trusting the person who sends it to me,
simply by past performance. At other times, it is because I think the
feature is cool, or well done, or whatever.

Hint: if you want stuff in my tree, make me trust you. Or work on
things that I feel are innately interesting. Don't bother dragging me
into your flame-wars and trying to convince me that I "must" apply your
patches.

- If it doesn't go into my tree, is that bad?

NO! Open source is all about _other_ people being able to make their
changes. It by no means means that those changes have to be accepted
back: the license basically only boils down to that I must be _able_ to
accept them back. But the really important thing, the thing that really
makes a difference, is that you, your dog, and your company can make
your OWN changes.

- If it doesn't have to happen in my tree, then whose tree _does_ it have
to happen in?

Doesn't much matter, actually. You can keep it in your tree, for all I
care. OSDL has already picked it up and apparently maintains it in
their tree. The only thing that matters is whether it gets used or not,
and whether it proves itself.

More people use vendor trees than my tree. And if you don't find a
vendor who will apply your patches, there are several "personal
vendors" out there, with the -ac, -aa and -mm trees being the obvious
ones. Many of those trees are not just used, they are also
obviously backed by people I do trust, which brings us back to the
criteria for _me_ to apply patches.

- Considering the above, if you still want it to _eventually_ make it
into my tree, what should you do?

Do you think pestering me makes me like the patches any more and trust
you? And if it doesn't, then how do you expect it to help, considering
my patch acceptance criteria?

No. The way to get it into my tree is not to whine about it. There are
a few different ways to get it into my tree:

(a) prove me wrong. And btw, it doesn't help to do so in your LKCD
mailing list. You need to get those patches out there to
_other_ people, or convince your own people that living in
your little hole just means that nobody else knows or cares
about you.

(b) If you can't convince me, convince somebody else. Maybe that
somebody else is somebody I trust, and that somebody else
feels that I was wrong and since _he_ believes in the project
he will try to convince me about it.

And trust me, the people I trust don't revere me and think I'm
always right. These people call me "pinhead" and tell me when
I'm full of shit. If these people don't believe in your
project, don't blame me and think it's because I "poisoned
their minds".

(c) Push your vendor. I have absolutely _zero_ incentives to care
about whining users (I care deeply about the non-whining
kind), but vendors do. Sometimes they do things just to get
their users off their backs.

And once it's in a vendor tree, that doesn't guarantee I pick
it up, but it _does_ guarantee that the patch is at least
widely used and thus we get more easily to (a) - proving me
wrong outside your own little world.

- Never whine about a patch. I know whining works with a lot of people
("Oh, for chrissake, I'll just do it to get him off my back") but it
works remarkably badly with me. Trust me on this.

Was this clear enough? Any confusion on any particular issue?

In short: convince somebody else. So far, the only thing that the
discussion has convinced me off is that people somehow seem to think that
they are ENTITLED to being merged into my tree. Tough. It ain't so. That
tree is called "Linus' tree" for a reason. The only thing you are
ENTITLED to is to have your own tree.

Linus

2002-11-04 16:30:34

by Alan

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.

Let me ask another question here

Other than "register_reboot_notifier()" and adding a
"register_exception_notifier()" chain what else does a dump tool need.
Register_exception_notifier seems to solve about 90% of the insmod gdb
problem space as well ?




2002-11-05 08:58:00

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [lkcd-general] Re: What's left over.

On Mon, Nov 04, 2002 at 04:40:11PM +0000, Alan Cox wrote:
> Let me ask another question here
>
> Other than "register_reboot_notifier()" and adding a
> "register_exception_notifier()" chain what else does a dump tool need.
> Register_exception_notifier seems to solve about 90% of the insmod gdb
> problem space as well ?
>
>

I had tried to list these in an earlier mail, added a few more
comments now marked by ">>"

1.Enabling IPI to collect CPU state on all processors in the
system right when dump is triggered (may not be a normal
situation, so NMIs where supported are the best option)

>> set/register_nmi_callback could also help in part (though
>> synchronization issues need to be thought through so that
>> the effect on regular system operation is as low as possible),
>> but we also need an interface to generate the NMI ipi when
>> required, and something that generalises on all architectures.

2.Ability to quiesce (silence) the system before dumping
(and if in non-disruptive mode, then restore it back)
>> smp_call_function may not the ideal option for many situations
>> - in general we would like to have a separate "force" path
>> available for some troublesome situations, and it would be
>> nice to be able to tackle non-disruptive (but accurate) dumping
>> as well.

>> maybe 1 & 2 can be combined in some form
>> Dump should preferably not overlap with a regularly used IPI.

3. Calls into dump from kernel paths (panic, oops, sysrq
etc).

>> This is where your register_xxx_notifier(s) fit in

4. Exports of symbols to help with physical memory
traversal and verification

>> Covers what Andi Kleen referred to as
>> iterate_over_memmap_and_give_me_type()
>> (a way to figure out the type of memory - true ram or other)

Regards
Suparna


--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Labs, India