2005-04-10 23:12:32

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: New SCM and commit list

Hi Linus !

Do you intend to continue posting "commited" patches to a mailing list
like bk scripts did to bk-commits-head@vger ? As I said a while ago, I
find this very useful, especially with the actual patch included in the
commit message (which isn't the case with most other projects CVS commit
lists, and I find that annoying).

If yes, then I would appreciate if you could either keep the same list,
or if you want to change the list name, keep the subscriber list so
those of us who actually archive it don't miss anything ;)

Thanks !

Regards,
Ben.



2005-04-10 23:27:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: New SCM and commit list



On Mon, 11 Apr 2005, Benjamin Herrenschmidt wrote:
>
> Do you intend to continue posting "commited" patches to a mailing list
> like bk scripts did to bk-commits-head@vger ? As I said a while ago, I
> find this very useful, especially with the actual patch included in the
> commit message (which isn't the case with most other projects CVS commit
> lists, and I find that annoying).

Absolutely. GIT isn't quite at the point where I can start using it yet,
though.

I could actually start committing patches, but I want to make sure that I
can also do automated simple merges, so that there is any _point_ to doing
this in the first place. My plan is to not be very good at merging (in
particular, I don't see GIT resolving renames _at_all_), but my hope is
that the people who used to merge with me using BK might be able to still
do so using GIT, as long as we try actively to be very careful.

> If yes, then I would appreciate if you could either keep the same list,
> or if you want to change the list name, keep the subscriber list so
> those of us who actually archive it don't miss anything ;)

I didn't even set up the list. I think it's Bottomley. I'm cc'ing him just
so that he sees the message, but I don't actually expect him to do
anything about it. I'm not even ready to start _testing_ real merges yet.
But I hope that I can get non-conflicting merges done fairly soon, and
maybe I can con James or Jeff or somebody to try out GIT then...

Linus

2005-04-11 03:26:00

by James Bottomley

[permalink] [raw]
Subject: Re: New SCM and commit list

On Sun, 2005-04-10 at 16:26 -0700, Linus Torvalds wrote:
> On Mon, 11 Apr 2005, Benjamin Herrenschmidt wrote:
> > If yes, then I would appreciate if you could either keep the same list,
> > or if you want to change the list name, keep the subscriber list so
> > those of us who actually archive it don't miss anything ;)
>
> I didn't even set up the list. I think it's Bottomley. I'm cc'ing him just
> so that he sees the message, but I don't actually expect him to do
> anything about it. I'm not even ready to start _testing_ real merges yet.
> But I hope that I can get non-conflicting merges done fairly soon, and
> maybe I can con James or Jeff or somebody to try out GIT then...

Not guilty. If I remember correctly, the list was set up by the vger
list maintainers (davem and company). It was tied to a trigger in one
of your trees (which I think Larry did). It shouldn't be too difficult
to add to git ... it just means traversing all the added patches on a
merge and sending out mail.

I can try out your source control tools ... I have some rc fixes
ready ... when you're ready to try out merges...

James


2005-04-11 05:54:05

by Jeff Garzik

[permalink] [raw]
Subject: Re: New SCM and commit list

Linus Torvalds wrote:
> On Mon, 11 Apr 2005, Benjamin Herrenschmidt wrote:
>>If yes, then I would appreciate if you could either keep the same list,
>>or if you want to change the list name, keep the subscriber list so
>>those of us who actually archive it don't miss anything ;)
>
>
> I didn't even set up the list. I think it's Bottomley. I'm cc'ing him just
> so that he sees the message, but I don't actually expect him to do
> anything about it. I'm not even ready to start _testing_ real merges yet.

When you think kernel.org and BitKeeper, think either me or David
Woodhouse. :)

DaveM / Matti(?) manage the lists ([email protected]), but
largely just create them on request from others, and make sure they
continue to work.


> But I hope that I can get non-conflicting merges done fairly soon, and
> maybe I can con James or Jeff or somebody to try out GIT then...

I don't mind being a guinea pig as long as someone else does the hard
work of finding a new way to merge :)

Jeff


2005-04-11 06:13:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: New SCM and commit list



On Mon, 11 Apr 2005, Jeff Garzik wrote:
>
> > But I hope that I can get non-conflicting merges done fairly soon, and
> > maybe I can con James or Jeff or somebody to try out GIT then...
>
> I don't mind being a guinea pig as long as someone else does the hard
> work of finding a new way to merge :)

So I can tell you what merges are going to be like, just to prepare you.

First, the good news: I think we can make the workflow look like bk, ie
pretty much like "git pull" and "git push". And for well-behaved stuff
(ie minimal changes to the same files on both sides) it will even be fast.
I think.

Then the bad news: the merge algorithm is going to suck. It's going to be
just plain 3-way merge, the same RCS/CVS thing you've seen before. With no
understanding of renames etc. I'll try to find the best parent to base the
merge off of, although early testers may have to tell the piece of crud
what the most recent common parent was.

So anything that got modified in just one tree obviously merges to that
version. Any file that got modified in two trees will end up just being
passed to the "merge" program. See "man merge" and "man diff3". The merger
gets to fix up any conflicts by hand.

Quite frankly, that means that we really want to avoid any "exciting"
merges with GIT. Maybe somebody can come up with something smarter.
Eventually. Don't count on it, at least not in the near future.

The good news is that it's not like a three-way file merge is any worse
than many people are used to. The bad news is that BK is just a hell of a
lot better. So anybody who has been depending heavily on BK merges (and
hey, the beauty of them is that you often don't even _know_ that you are
depending on them) will be a bit bummed by the "Welcome back to the
1980's" message from a three-way merge.

Linus

2005-04-11 06:41:26

by Ryan Anderson

[permalink] [raw]
Subject: Re: New SCM and commit list

On Sun, Apr 10, 2005 at 11:15:20PM -0700, Linus Torvalds wrote:
> On Mon, 11 Apr 2005, Jeff Garzik wrote:
> > > But I hope that I can get non-conflicting merges done fairly soon, and
> > > maybe I can con James or Jeff or somebody to try out GIT then...
> >
> > I don't mind being a guinea pig as long as someone else does the hard
> > work of finding a new way to merge :)
>
> So I can tell you what merges are going to be like, just to prepare you.
>
> First, the good news: I think we can make the workflow look like bk, ie
> pretty much like "git pull" and "git push". And for well-behaved stuff
> (ie minimal changes to the same files on both sides) it will even be fast.
> I think.

If you can stick something meaningful in a simple text file, overwritten
after each merge completes, similar to the BitKeeper/csets-in file, it
should be trivial to write a wrapper for the basic merge tool that calls
a trigger after each merge and uses csets-in to generate diffs and email
them out.

--

Ryan Anderson
sometimes Pug Majere

2005-04-11 06:48:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: New SCM and commit list

On Sun, 10 Apr 2005, Linus Torvalds wrote:
> Then the bad news: the merge algorithm is going to suck. It's going to be
> just plain 3-way merge, the same RCS/CVS thing you've seen before. With no

Actually 3-way merge is not that bad. It's definitely better than ClearCase's
merge (I always fall back to RCS merge if ClearCase cannot resolve a merge
automatically).

> understanding of renames etc. I'll try to find the best parent to base the
> merge off of, although early testers may have to tell the piece of crud
> what the most recent common parent was.

Yep, finding the best parent is the important part :-)

I guess 3-way merge got a bad name because CVS always uses the branch point as
the parent, which fails miserably for any but the first merge after the branch.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2005-04-11 07:13:21

by David Woodhouse

[permalink] [raw]
Subject: Re: New SCM and commit list

On Mon, 2005-04-11 at 09:10 +1000, Benjamin Herrenschmidt wrote:
> Do you intend to continue posting "commited" patches to a mailing list
> like bk scripts did to bk-commits-head@vger ? As I said a while ago, I
> find this very useful, especially with the actual patch included in the
> commit message (which isn't the case with most other projects CVS commit
> lists, and I find that annoying).
>
> If yes, then I would appreciate if you could either keep the same list,
> or if you want to change the list name, keep the subscriber list so
> those of us who actually archive it don't miss anything ;)

The commits lists currently only accept posts from dwmw2@hera, I
believe. That can relatively easily be changed if the mail is going to
come from somewhere else.

I did ask Linus to let me know as soon as possible when he starts to
commit patches, so we can come up with a way to keep the list fed. Since
he thinks I'm James, however, I suspect that part of the message didn't
get through. Perhaps he was just distracted by the Britishness?

--
dwmw2


2005-04-11 07:39:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: New SCM and commit list


* Linus Torvalds <[email protected]> wrote:

> Then the bad news: the merge algorithm is going to suck. It's going to
> be just plain 3-way merge, the same RCS/CVS thing you've seen before.
> With no understanding of renames etc. I'll try to find the best parent
> to base the merge off of, although early testers may have to tell the
> piece of crud what the most recent common parent was.
>
> So anything that got modified in just one tree obviously merges to
> that version. Any file that got modified in two trees will end up just
> being passed to the "merge" program. See "man merge" and "man diff3".
> The merger gets to fix up any conflicts by hand.

at that point Chris Mason's "rej" tool is pretty nifty:

ftp://ftp.suse.com/pub/people/mason/rej/rej-0.13.tar.gz

it gets the trivial rejects right, and is pretty powerful to quickly
cycle through the nontrivial ones too. It shows the old and new code
side by side too, etc.

(There is no fully automatic mode in where it would not bother the user
with the really trivial rejects - but it has an automatic mode where you
basically have to do nothing - maybe a fully automatic one could be
added that would resolve low-risk rejects?)

it's really easy to use (but then again i'm a vim user, so i'm biased),
just try it on a random .rej file you have ("rej -a kernel/sched.c.rej"
or whatever).

Ingo

2005-04-11 12:59:47

by Chris Mason

[permalink] [raw]
Subject: Re: New SCM and commit list

On Monday 11 April 2005 03:38, Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
> > So anything that got modified in just one tree obviously merges to
> > that version. Any file that got modified in two trees will end up just
> > being passed to the "merge" program. See "man merge" and "man diff3".
> > The merger gets to fix up any conflicts by hand.
>
> at that point Chris Mason's "rej" tool is pretty nifty:
>
> ftp://ftp.suse.com/pub/people/mason/rej/rej-0.13.tar.gz
>
> (There is no fully automatic mode in where it would not bother the user
> with the really trivial rejects - but it has an automatic mode where you
> basically have to do nothing - maybe a fully automatic one could be
> added that would resolve low-risk rejects?)
>

rej -M skips the merge program, so rej -a -M will give you something like
this:

coffee:/local/linux.p # rej -a -M drivers/ide/ide.c.rej
drivers/ide/ide.c: 1 matched, 0 conflicts remain

But I would want to go over the bit that calculates the conflicts remaining
more carefully if people plan on trusting this ;) It'll run on unified diffs
too, although it will be slower then patch since the assumption is the quick
and easy placement patch does has already failed. (that's easy enough to fix
though).

> it's really easy to use (but then again i'm a vim user, so i'm biased),
> just try it on a random .rej file you have ("rej -a kernel/sched.c.rej"
> or whatever).

you can rej -m kdiff3|meld|tkdiff or any program that does a side by side
comparison of two files. (export REJMERGE=foo sets the diff prog as well)

I use rej frequently to merge patches in here, but that is mostly because
there is no easy way to get the common ancestor and parent revision of the
patches I'm merging.

With that info in hand, kdiff3 is pretty nice. You would have to spoon feed
it the renames, but it should have most of the other features you're looking
for, including the 'no gui if all conflicts are auto-solvable'

-chris

2005-04-11 18:32:49

by Adam J. Richter

[permalink] [raw]
Subject: Re: New SCM and commit list

On 2005-04-11 Linus Torvalds wrote:
>Then the bad news: the merge algorithm is going to suck. It's going to be
>just plain 3-way merge, the same RCS/CVS thing you've seen before. With no
>understanding of renames etc. I'll try to find the best parent to base the
>merge off of, although early testers may have to tell the piece of crud
>what the most recent common parent was.

I've been surprised at how well it works to put each character on a
separate line, pipe the input into diff3 and then join the lines
back together. For example, let's consider the case of
a adding parameters to a function. Here one version adds a parameter
before the existing parameter, and another version adds another parameter
after the existing parameter:

$ cat orig
call(bar);
$ cat ver1
call(foo,bar);
$ cat ver2
call(bar,baz);
$ charmerge ver1 orig ver2
call(foo,bar,baz);

A more practically scaled application that I tried was with
another filter that I wrote that would automatically resolve certain
types of diff3 conflicts[1]. With that filter, I took the SCSI
FlashPoint driver, and made an edited version by piping it through GNU
indent, which not only reindents, but also splits and joins lines.
I made a second edited version by changing all 146 instances of
"SYNC" to "GROP" in the original. It merged apparently successfully,
giving me a GNU indented version with all of the keyword changes.
The version of this resolution program dies if it his a diff3
conflict of a type that it is not prepared to resolve. I'll post
it once I've got it properly preserving the conflicts that it
doesn't try to fix. In the meantime, here is an illustrative
script to do get diff3 to do character-based merges, although it
gives garbage results if there are any conflicts.

[1] The type of conflict that was automatically resolved is as follows:

variant1 = <prepended-new-text><original><appended-new-text>

result --> <prepended-new-text><variant2><appended-new-text>

...this is actually exactly the order one would want in the
case where <original> also occurs in variant2, but it was close
enough for this test.

__ ______________
Adam J. Richter \ /
[email protected] | g g d r a s i l



#!/bin/sh
# Usage: charmerge ver1_file orig_file ver2_file

lineify() {
sed 's/\([^\n]\)/\1\
/g'
}

unlineify() {
awk '/^$/ {print $0} /^..*/ { printf "%s", $0}'
}

tmpdir=/tmp/charmerge.$$

mkdir $tmpdir
lineify < "$1" > $tmpdir/1
lineify < "$2" > $tmpdir/2
lineify < "$3" > $tmpdir/3
diff3 -m $tmpdir/{1,2,3} | unlineify
rm -rf $tmpdir

2005-04-11 19:38:39

by Chris Mason

[permalink] [raw]
Subject: Re: New SCM and commit list

On Monday 11 April 2005 08:51, Chris Mason wrote:

> rej -M skips the merge program, so rej -a -M will give you something like
> this:
>
> coffee:/local/linux.p # rej -a -M drivers/ide/ide.c.rej
> drivers/ide/ide.c: 1 matched, 0 conflicts remain
>
> But I would want to go over the bit that calculates the conflicts remaining
> more carefully if people plan on trusting this ;)

Ok, looks like this should be safe. I changed -q to skip the gui compare
when rej thinks it has resolved all the conflicts correctly. With rej 0.14
(just uploaded now) this should do what you want:

rej -q -a foo.rej

Download site is here: ftp://ftp.suse.com/pub/people/mason/rej/

Please let me know if you find patches where rej is doing the wrong thing.

-chris

2005-04-11 21:02:17

by Greg KH

[permalink] [raw]
Subject: Re: New SCM and commit list

On Sun, Apr 10, 2005 at 10:25:22PM -0500, James Bottomley wrote:
> On Sun, 2005-04-10 at 16:26 -0700, Linus Torvalds wrote:
> > On Mon, 11 Apr 2005, Benjamin Herrenschmidt wrote:
> > > If yes, then I would appreciate if you could either keep the same list,
> > > or if you want to change the list name, keep the subscriber list so
> > > those of us who actually archive it don't miss anything ;)
> >
> > I didn't even set up the list. I think it's Bottomley. I'm cc'ing him just
> > so that he sees the message, but I don't actually expect him to do
> > anything about it. I'm not even ready to start _testing_ real merges yet.
> > But I hope that I can get non-conflicting merges done fairly soon, and
> > maybe I can con James or Jeff or somebody to try out GIT then...
>
> Not guilty. If I remember correctly, the list was set up by the vger
> list maintainers (davem and company). It was tied to a trigger in one
> of your trees (which I think Larry did). It shouldn't be too difficult
> to add to git ... it just means traversing all the added patches on a
> merge and sending out mail.
>
> I can try out your source control tools ... I have some rc fixes
> ready ... when you're ready to try out merges...

I have some rc fixes too, let us know when you are ready to accept them,
and what format you want them in.

I have a feeling that the kernel.org mirror system is just going to
_love_ us using it to store temporary git trees :)

thanks,

greg k-h

2005-04-11 21:24:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: New SCM and commit list



On Mon, 11 Apr 2005, Greg KH wrote:
>
> I have a feeling that the kernel.org mirror system is just going to
> _love_ us using it to store temporary git trees :)

I don't think kernel.org mirrors the private home directories, so it you
do _temporary_ trees, just make them readable in your private home
directory rather than in /pub/linux/kernel/people. For people with
kernel.org accounts, we can use that as the "bkbits.net" thing.

For really public hosting, we need to find some other approach.

Linus

2005-04-11 21:32:03

by James Bottomley

[permalink] [raw]
Subject: Re: New SCM and commit list

On Mon, 2005-04-11 at 14:26 -0700, Linus Torvalds wrote:
> I don't think kernel.org mirrors the private home directories, so it you
> do _temporary_ trees, just make them readable in your private home
> directory rather than in /pub/linux/kernel/people. For people with
> kernel.org accounts, we can use that as the "bkbits.net" thing.

It's also going to be a slight problem for those of us who don't have a
kernel.org account...although I think the hosting I use on the parisc
website might actually be outside the HP firewall, so I can probably beg
for it to run any protocol you need (like rsync).

James


2005-04-11 22:50:14

by Daniel Barkalow

[permalink] [raw]
Subject: Re: New SCM and commit list

On Sun, 10 Apr 2005, Linus Torvalds wrote:

> On Mon, 11 Apr 2005, Jeff Garzik wrote:
> >
> > > But I hope that I can get non-conflicting merges done fairly soon, and
> > > maybe I can con James or Jeff or somebody to try out GIT then...
> >
> > I don't mind being a guinea pig as long as someone else does the hard
> > work of finding a new way to merge :)
>
> So I can tell you what merges are going to be like, just to prepare you.
>
> First, the good news: I think we can make the workflow look like bk, ie
> pretty much like "git pull" and "git push". And for well-behaved stuff
> (ie minimal changes to the same files on both sides) it will even be fast.
> I think.
>
> Then the bad news: the merge algorithm is going to suck. It's going to be
> just plain 3-way merge, the same RCS/CVS thing you've seen before. With no
> understanding of renames etc. I'll try to find the best parent to base the
> merge off of, although early testers may have to tell the piece of crud
> what the most recent common parent was.
>
> So anything that got modified in just one tree obviously merges to that
> version. Any file that got modified in two trees will end up just being
> passed to the "merge" program. See "man merge" and "man diff3". The merger
> gets to fix up any conflicts by hand.

If merge took trees instead of single files, and had some way of detecting
renames (or it got additional information about the differences between
files), would that give BK-quality performance? Or does BK also support
cases like:

orig ---> first ---> first-merge -
| / \
|------> second - -> final
| \ /
|------> third ---> third-merge -

where the final merge requires, for complete cleanliness, a comparison of
more than 3 states (since some changes will have orig as the common
ancestor and some will have second).

Does this happen in real life? It seems like sane development processes
wouldn't have multiple mainline-candidate patch sets including the same
patches, if for no other reason than that, should the merge fail, nobody
with any clue about the original patches would be anywhere nearby. It
seems better to throw something back to someone to rebase their diffs.

Otherwise, the problem seems to boil down to finding the common ancestor
well, getting trees instead of files to merge, and improving merge until
it handles all of the tractible cases.

-Daniel
*This .sig left intentionally blank*

2005-04-12 04:59:59

by Adam J. Richter

[permalink] [raw]
Subject: Re: New SCM and commit list

On 2005-04-11, Daniel Barkalow wrote:
>If merge took trees instead of single files, and had some way of detecting
>renames (or it got additional information about the differences between
>files), would that give BK-quality performance? Or does BK also support
>cases like:
>
>orig ---> first ---> first-merge -
> | / \
> |------> second - -> final
> | \ /
> |------> third ---> third-merge -
>
>where the final merge requires, for complete cleanliness, a comparison of
>more than 3 states (since some changes will have orig as the common
>ancestor and some will have second).

With 3-way merge and the ability to regenerate the relevant
files from each step, this should be easy to handle as long
as you have a list of which patches are considered to have been
duplicated. Let's detail your example:

orig ---> first 1a 1b 1c ---> first-merge - 1d 1e
| / \
|------> second 2a 2b 2c - -> final
| \ /
|------> third 3a 3b 3c ---> third-merge - 3d 3e

Here, 1a, 1b, etc. refer to specific states of the source tree.
I will refer to differences by a notation like "1a->1b", which
is the difference to go from snapshot 1a to 1b. All that the
merge algorithm for the final merge needs to know is that the
ends of the branches (that is, 1e and 3e) both contain the
following diffs:

orig->2a
2a->2b
2b->2c

The function merge(orig, ver1, ver2) can try to reverse
the duplicate merges in one of the branches:

1e' = merge( 1e, 2c->2b);
1e'' = merge(1e', 2b->2a);
1e''' = merge(1e'', 2a->orig);
return merge(1e''', 2c->3e)

Of course, conflicts can happen, but that can happen
in any merge. There are also other ways to calculate the
merge and because there are different ways one can write a
merge function, it is possible that merging in a different
order might produce slightly different results. For example,
it would be possible to reverse the dpulicates in your "third merge"
branch instead of your "first merge" branch, or one could
reconstruct a branch without the duplicated merges by executing
the other changes forward from a common ancestor, like so:

1e''' = merge(orig, 3d->3e);

...regardless, the point is that all the information
that is absolutely needed is a list of instance of diffs
to be skipped. It is not even necessary that the changes
have such a clearly explainable ancestory as that you have
described. All the merge program needs to know are the changes
to be skipped, although information like changes the skipped
patches are duplicating may be useful for things like trying
to reverse a patch in your "third-merge" branch in your
example if reverseing the patch in "first-merge" fails.

I believe that at least bitkeeper, darcs, a free python-based
system that I can't remember at the moment, and possibly arch do this
sort of machination already.


>Does this happen in real life? [...]

Yes. Both individual users and Linux distributions incorporate
patches that they think are useful to them and then futher patches
that they develop. The time costs of rejecting such patches would
likely be paid for by other integration or development work not being
done.

>It seems like sane development processes
^^^^
>wouldn't have multiple mainline-candidate patch sets including the same
>patches, if for no other reason than that, should the merge fail, nobody
>with any clue about the original patches would be anywhere nearby.

If you could avoid prejudicial subjective adjectives, it
it would make it easier for the saneness or insaneness of an
approach to be apparent just by discussing your more objective criteria,
like the remainder of your sentence, which is where the focus should
be.

(1) Does allowing duplicate patches really mean that
"nobody with any clue about the original patches would be
anywhere near by?" What attracts these clueful people
just by third parties having to rebase their patches?

(2) Does this supposed benefit outweigh the cost of rejecting
many patches unnecessarily? I know from my own experience
that I have either given up on or had to put into a very low
priority mode at least 66% of the patches that I haven't
gotten integrated, but which I am confident the kernel
would be better having (e.g.: devfs shrink, lookup()
trapping, ipv4 as a loadable (not not yet removable) module,
sysfs memory shrink, factoring much of the DMA mapping to
the common bus code from individual drivers, fewer kmap's
in crypto, I could go on).

>It
>seems better to throw something back to someone to rebase their diffs.
^^^^^^

I try to avoid a general subjective adjectives like "better"
unless I am claiming that I've covered the trade-offs fully, and, even
then, avoiding it keeps the focus on analyzing the trade-offs.

__ ______________
Adam J. Richter \ /
[email protected] | g g d r a s i l

2005-04-12 05:29:59

by Arjan van de Ven

[permalink] [raw]
Subject: Re: New SCM and commit list

On Mon, 2005-04-11 at 16:31 -0500, James Bottomley wrote:
> On Mon, 2005-04-11 at 14:26 -0700, Linus Torvalds wrote:
> > I don't think kernel.org mirrors the private home directories, so it you
> > do _temporary_ trees, just make them readable in your private home
> > directory rather than in /pub/linux/kernel/people. For people with
> > kernel.org accounts, we can use that as the "bkbits.net" thing.
>
> It's also going to be a slight problem for those of us who don't have a
> kernel.org account...although I think the hosting I use on the parisc
> website might actually be outside the HP firewall, so I can probably beg
> for it to run any protocol you need (like rsync).

rsync also runs over ssh so if you can ssh in you can rsync to it

2005-04-12 08:37:26

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: New SCM and commit list

On Mon, 11 Apr 2005, Daniel Barkalow wrote:
> If merge took trees instead of single files, and had some way of detecting
> renames (or it got additional information about the differences between
> files), would that give BK-quality performance? Or does BK also support

I wrote a script to do merges on a tree (so far without rename detection,
though ;-) a long time ago, and still use it every time Linus or Marcelo
release a new version.

Look at `mergetree' on http://linux-m68k-cvs.ubb.ca/~geert/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2005-04-12 09:53:04

by Catalin Marinas

[permalink] [raw]
Subject: Re: New SCM and commit list

Linus Torvalds <[email protected]> wrote:
> So anything that got modified in just one tree obviously merges to that
> version. Any file that got modified in two trees will end up just being
> passed to the "merge" program. See "man merge" and "man diff3". The merger
> gets to fix up any conflicts by hand.

"merge" does a better job than "diff3" since it can resolve the
conflicts caused by similar changes to a "parent" file (this is
available in both BK and GNU Arch). This is useful when you try to
merge 2 branches that both include a patch which is not under the
revision control. It also solves the conflicts caused by
cherry-picking changes (just need to find the last consecutive common
changeset as the common ancestor).

--
Catalin

2005-04-12 22:02:00

by Daniel Barkalow

[permalink] [raw]
Subject: Re: New SCM and commit list

On Tue, 12 Apr 2005, Adam J. Richter wrote:

> On 2005-04-11, Daniel Barkalow wrote:
> >If merge took trees instead of single files, and had some way of detecting
> >renames (or it got additional information about the differences between
> >files), would that give BK-quality performance? Or does BK also support
> >cases like:
> >
> >orig ---> first ---> first-merge -
> > | / \
> > |------> second - -> final
> > | \ /
> > |------> third ---> third-merge -
> >
> >where the final merge requires, for complete cleanliness, a comparison of
> >more than 3 states (since some changes will have orig as the common
> >ancestor and some will have second).
>
> With 3-way merge and the ability to regenerate the relevant
> files from each step, this should be easy to handle as long
> as you have a list of which patches are considered to have been
> duplicated. Let's detail your example:
>
> orig ---> first 1a 1b 1c ---> first-merge - 1d 1e
> | / \
> |------> second 2a 2b 2c - -> final
> | \ /
> |------> third 3a 3b 3c ---> third-merge - 3d 3e
>
> Here, 1a, 1b, etc. refer to specific states of the source tree.
> I will refer to differences by a notation like "1a->1b", which
> is the difference to go from snapshot 1a to 1b. All that the
> merge algorithm for the final merge needs to know is that the
> ends of the branches (that is, 1e and 3e) both contain the
> following diffs:
>
> orig->2a
> 2a->2b
> 2b->2c
>
> The function merge(orig, ver1, ver2) can try to reverse
> the duplicate merges in one of the branches:
>
> 1e' = merge( 1e, 2c->2b);
> 1e'' = merge(1e', 2b->2a);
> 1e''' = merge(1e'', 2a->orig);
> return merge(1e''', 2c->3e)

If 1d->1e depends on something in the 2 series, which is why I would
expect 1e to be pushing something containing the 2 series, there must be
conflicts. Likewise on the 3 series.

> Of course, conflicts can happen, but that can happen
> in any merge. There are also other ways to calculate the
> merge and because there are different ways one can write a
> merge function, it is possible that merging in a different
> order might produce slightly different results. For example,
> it would be possible to reverse the dpulicates in your "third merge"
> branch instead of your "first merge" branch, or one could
> reconstruct a branch without the duplicated merges by executing
> the other changes forward from a common ancestor, like so:
>
> 1e''' = merge(orig, 3d->3e);
>
> ...regardless, the point is that all the information
> that is absolutely needed is a list of instance of diffs
> to be skipped. It is not even necessary that the changes
> have such a clearly explainable ancestory as that you have
> described. All the merge program needs to know are the changes
> to be skipped, although information like changes the skipped
> patches are duplicating may be useful for things like trying
> to reverse a patch in your "third-merge" branch in your
> example if reverseing the patch in "first-merge" fails.

Right, an extended primitive solves the problem, certainly, and much more
effectively than sticking with 3-way merge.

> I believe that at least bitkeeper, darcs, a free python-based
> system that I can't remember at the moment, and possibly arch do this
> sort of machination already.
>
>
> >Does this happen in real life? [...]
>
> Yes. Both individual users and Linux distributions incorporate
> patches that they think are useful to them and then futher patches
> that they develop. The time costs of rejecting such patches would
> likely be paid for by other integration or development work not being
> done.

It seems to me that users who use extra patches keep these separate from
their own patches (which they often keep in multiple series):

orig ---> other-people ---> local use, distribution
| /
|------> mine ----------
| \
|------> etc ---------> mainline

If mainline is going to get the third-party patches in a distro tree, it
should get them from the original authors, not as part of a miscellaneous
patch set from the distro. If one patch series depends on another patch
series, it should hold off until the other one goes in, not include it in
the submission.

> >It seems like sane development processes
> ^^^^
> >wouldn't have multiple mainline-candidate patch sets including the same
> >patches, if for no other reason than that, should the merge fail, nobody
> >with any clue about the original patches would be anywhere nearby.
>
> If you could avoid prejudicial subjective adjectives, it
> it would make it easier for the saneness or insaneness of an
> approach to be apparent just by discussing your more objective criteria,
> like the remainder of your sentence, which is where the focus should
> be.
>
> (1) Does allowing duplicate patches really mean that
> "nobody with any clue about the original patches would be
> anywhere near by?" What attracts these clueful people
> just by third parties having to rebase their patches?

The clueful people are the original authors (first, second, and
third); 1d-1e and 3d-3e would be rebased by their authors against a new
orig that's the merge of the 1c, 2c, and 3c (which all have a good common
ancestor).

Actually, the best 3-way merge path might be:

merge(merge(merge(3d,orig->1c),3d->3e),1d->1e)

That is, generate a complete merge at the point where people each merged
in the second line, and then continue forward from there.

> (2) Does this supposed benefit outweigh the cost of rejecting
> many patches unnecessarily? I know from my own experience
> that I have either given up on or had to put into a very low
> priority mode at least 66% of the patches that I haven't
> gotten integrated, but which I am confident the kernel
> would be better having (e.g.: devfs shrink, lookup()
> trapping, ipv4 as a loadable (not not yet removable) module,
> sysfs memory shrink, factoring much of the DMA mapping to
> the common bus code from individual drivers, fewer kmap's
> in crypto, I could go on).

This is unfortunate, certainly, but the alternative under discussion would
be to get those patches into as many other trees as possible until
Andrew/Linus picks them up and then finds that he's gotten multiple
copies of them via different routes. If each of these went in through the
respective maintainer, there would be no problem, and if any of them went
in without the respective maintainer's sign-off, that would upset
people.

-Daniel
*This .sig left intentionally blank*

2005-04-13 20:04:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: New SCM and commit list

Followup to: <[email protected]>
By author: Linus Torvalds <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Mon, 11 Apr 2005, Greg KH wrote:
> >
> > I have a feeling that the kernel.org mirror system is just going to
> > _love_ us using it to store temporary git trees :)
>
> I don't think kernel.org mirrors the private home directories, so it you
> do _temporary_ trees, just make them readable in your private home
> directory rather than in /pub/linux/kernel/people. For people with
> kernel.org accounts, we can use that as the "bkbits.net" thing.
>
> For really public hosting, we need to find some other approach.
>

It's also pretty trivial to set up an additional /pub hierarchy, like
the current /pub/scm, which is up to individual mirrors to pick up or
not to pick up. We only require /pub/linux and /pub/software to be
mirrored.

-hpa

2005-04-16 08:37:30

by Paul Jackson

[permalink] [raw]
Subject: Re: New SCM and commit list

> "merge" does a better job than "diff3" since it can resolve the

The merge command I know of is part of Tichy's RCS tools,
and calls diff3, and has no inherent superior abilities.

Is this the merge command you have in mind here?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401

2005-04-18 08:18:46

by Catalin Marinas

[permalink] [raw]
Subject: Re: New SCM and commit list

Paul Jackson <[email protected]> wrote:
>> "merge" does a better job than "diff3" since it can resolve the
>
> The merge command I know of is part of Tichy's RCS tools,
> and calls diff3, and has no inherent superior abilities.

You are right, I missed some diff3 options. It looks like "diff3 -mE"
generates the same output as "merge" (i.e. solving the identical
changes in the derived files). Sorry for the noise :-)

--
Catalin