2003-03-02 00:02:10

by Adam J. Richter

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Pavel Machek wrote:
> I've created little project for read-only (for now ;-) kitbeeper
> clone. It is available at http://www.sf.net/projects/bitbucket (no tar balls,
> just get it fresh from CVS).

Thank you for taking some initiative and improving this
situation by constructive means. You are an example to us all,
as is Andrea Arcangeli with his openbkweb project, which you
will probably want to examine and perhaps integrate
(ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/openbkweb).

bitbucket is about 350 lines of shell scripts, documentation
and diffs, the most interesting file of which is FORMAT, which
documents some reverse engineering efforts on bitkeeper internal file
formats. bitkbucket currently uses rsync to update data from the
repository. openbkweb is 500+ lines of python that implements enough
of the bitkeeper network protocol to do downloads, although perhaps in
inefficiently. That sounds like some functionality that you might be
interested in integrating.

I think the suggestion made by Pavel Janik that it would
be better to work on adding BitKeeper-like functionality to existing
free software packages is a bit misdirected. BitKeeper uses SCCS
format, and we have a GPL'ed SCCS clone ("cssc"), so you are
adding functionality to existing free software version control
code anyhow.

However, I would like to turn Pavel Janik's point in
what I think might be a more constructive direction.

Aegis, BitKeeper and probably other configuration management
tools that use sccs or rcs basically share a common type of lower
layer. This lower layer converts a file-based revision control system
such as sccs to an "uber-cvs", as someone called it in a slashdot
discussion, that can:

1. process a transaction against a group of files atomically,
2. associate a comment with such a transaction rather than
with just one file,
3. represent symbolic links, file protections
4. represent file renames (and perhaps copies?)

You might want to keep in the back of your mind the
possibility of someday splitting off this lower level into a separate
software package that programs like your bitkeeper clone, aegis could
use in common. If the interface to this lower level took cvs
commands, then it could probably replace cvs, although the repository
would probably be incompatible since the meaning of things like
checking in multiple files together with a single comment would be
different, and there would be other kinds of changes to represent
beyond what cvs currently does. Using a repository format that is
compatible with another system (for example bitkeeper or aegis) would
make such a tool more useful, and if such a tool makes it easier for
people to migrate from a prorprietary system to a free one, that's
even better, so your starting with bitkeeper's format seems like an
excellent choice to me.

Thanks again for starting this project. I will at least
try to be a user of it.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."


2003-03-02 00:10:11

by Larry McVoy

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

> Thanks again for starting this project. I will at least
> try to be a user of it.

Enjoy yourself.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-03-02 00:11:34

by David Lang

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Adam, the openbkweb project didn't reverse engineer the BK network
protocol, it used the HTTP access that is provided on bkbits.net to
download the individual items and created a repository from that.

unfortunantly the bandwidth requirements to support that are high enough
that Larry indicated that if people keep doing that he would have to
shutdown the HTTP access.

bitbucket uses rsync as that is the most efficiant way to get a copy of
the repository without trying to talk the bitkeeper protocol. it is FAR
more efficiant and accruate then the openbkkweb interface

Davdi Lang


On Sat, 1 Mar 2003, Adam J.
Richter wrote:

> Date: Sat, 1 Mar 2003 16:11:55 -0800
> From: Adam J. Richter <[email protected]>
> To: [email protected], [email protected], [email protected],
> [email protected]
> Cc: [email protected]
> Subject: Re: BitBucket: GPL-ed KitBeeper clone
>
> Pavel Machek wrote:
> > I've created little project for read-only (for now ;-) kitbeeper
> > clone. It is available at http://www.sf.net/projects/bitbucket (no tar balls,
> > just get it fresh from CVS).
>
> Thank you for taking some initiative and improving this
> situation by constructive means. You are an example to us all,
> as is Andrea Arcangeli with his openbkweb project, which you
> will probably want to examine and perhaps integrate
> (ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/openbkweb).
>
> bitbucket is about 350 lines of shell scripts, documentation
> and diffs, the most interesting file of which is FORMAT, which
> documents some reverse engineering efforts on bitkeeper internal file
> formats. bitkbucket currently uses rsync to update data from the
> repository. openbkweb is 500+ lines of python that implements enough
> of the bitkeeper network protocol to do downloads, although perhaps in
> inefficiently. That sounds like some functionality that you might be
> interested in integrating.
>
> I think the suggestion made by Pavel Janik that it would
> be better to work on adding BitKeeper-like functionality to existing
> free software packages is a bit misdirected. BitKeeper uses SCCS
> format, and we have a GPL'ed SCCS clone ("cssc"), so you are
> adding functionality to existing free software version control
> code anyhow.
>
> However, I would like to turn Pavel Janik's point in
> what I think might be a more constructive direction.
>
> Aegis, BitKeeper and probably other configuration management
> tools that use sccs or rcs basically share a common type of lower
> layer. This lower layer converts a file-based revision control system
> such as sccs to an "uber-cvs", as someone called it in a slashdot
> discussion, that can:
>
> 1. process a transaction against a group of files atomically,
> 2. associate a comment with such a transaction rather than
> with just one file,
> 3. represent symbolic links, file protections
> 4. represent file renames (and perhaps copies?)
>
> You might want to keep in the back of your mind the
> possibility of someday splitting off this lower level into a separate
> software package that programs like your bitkeeper clone, aegis could
> use in common. If the interface to this lower level took cvs
> commands, then it could probably replace cvs, although the repository
> would probably be incompatible since the meaning of things like
> checking in multiple files together with a single comment would be
> different, and there would be other kinds of changes to represent
> beyond what cvs currently does. Using a repository format that is
> compatible with another system (for example bitkeeper or aegis) would
> make such a tool more useful, and if such a tool makes it easier for
> people to migrate from a prorprietary system to a free one, that's
> even better, so your starting with bitkeeper's format seems like an
> excellent choice to me.
>
> Thanks again for starting this project. I will at least
> try to be a user of it.
>
> Adam J. Richter __ ______________ 575 Oroville Road
> [email protected] \ / Milpitas, California 95035
> +1 408 309-6081 | g g d r a s i l United States of America
> "Free Software For The Rest Of Us."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-03-02 00:38:42

by Diego Calleja

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Sat, 1 Mar 2003 16:11:55 -0800
"Adam J. Richter" <[email protected]> wrote:

(Just a very personal suggestion)
Why to waste time trying to clone a
tool such as bitkeeper? Why not to support things like subversion?



Diego Calleja

2003-03-02 00:53:32

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Arador wrote:
> On Sat, 1 Mar 2003 16:11:55 -0800
> "Adam J. Richter" <[email protected]> wrote:
>
> (Just a very personal suggestion)
> Why to waste time trying to clone a
> tool such as bitkeeper? Why not to support things like subversion?


...because, clearly, Pavel is being paid by BitMover to dilute
programmer resources and user mindshare, thus slowing all open source
SCM efforts.

</sarcasm>

That's not Pavel's aim, obviously, but it's the net effect.

Jeff



2003-03-02 01:02:03

by Alan

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Sun, 2003-03-02 at 00:49, Arador wrote:
> On Sat, 1 Mar 2003 16:11:55 -0800
> "Adam J. Richter" <[email protected]> wrote:
>
> (Just a very personal suggestion)
> Why to waste time trying to clone a
> tool such as bitkeeper? Why not to support things like subversion?

Because the repositories people need to read are in BK format, for better
or worse. It doesn't ultimately matter if you use it as an input filter
for CVS, subversion or no VCS at all.


2003-03-02 01:09:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Alan Cox wrote:
> On Sun, 2003-03-02 at 00:49, Arador wrote:
>
>>On Sat, 1 Mar 2003 16:11:55 -0800
>>"Adam J. Richter" <[email protected]> wrote:
>>
>>(Just a very personal suggestion)
>>Why to waste time trying to clone a
>>tool such as bitkeeper? Why not to support things like subversion?
>
>
> Because the repositories people need to read are in BK format, for better
> or worse. It doesn't ultimately matter if you use it as an input filter
> for CVS, subversion or no VCS at all.

"BK format"? Not really. Patches have been posted (to lkml, even) to
GNU CSSC which allow it to read SCCS files BK reads and writes.

Since that already exists, a full BitKeeper clone is IMO a bit silly,
because it draws users and programmers away from projects that could
potentially _replace_ BitKeeper.

Jeff



2003-03-02 01:15:54

by Olivier Galibert

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Sat, Mar 01, 2003 at 04:11:55PM -0800, Adam J. Richter wrote:
> Aegis, BitKeeper and probably other configuration management
> tools that use sccs or rcs basically share a common type of lower
> layer. This lower layer converts a file-based revision control system
> such as sccs to an "uber-cvs", as someone called it in a slashdot
> discussion, that can:
>
> 1. process a transaction against a group of files atomically,
> 2. associate a comment with such a transaction rather than
> with just one file,
> 3. represent symbolic links, file protections
> 4. represent file renames (and perhaps copies?)

5. Represent merges. That's what is making cvs branches unusable.

Frankly, if you want all of that you'd better design a repository
format that is actually adapted to it. The RCS format is not very
good, the SCCS weave is a little better but not by much (it reminds me
of Hurd, looks cool but slow by design). Larry did quite a feat
turning it into a distributed DAG of versions but I'm not convinced it
was that smart, technically. In particular, everthing suddendly looks
much nicer when you have one file per DAG node plus a cache zone for
full versions.

But anyway, what made[1] Bitkeeper suck less is the real DAG
structure. Neither arch nor subversion seem to have understood that
and, as a result, don't and won't provide the same level of semantics.
Zero hope for Linus to use them, ever. They're needed for any
decently distributed development process.

Hell, arch is still at the update-before-commit level. I'd have hoped
PRCS would have cured that particular sickness in SCM design ages ago.

Atomicity, symbolic links, file renames, splits (copy) and merges (the
different files suddendly ending up being the same one) are somewhat
important, but not the interesting part. A good distributed DAG
structure and a quality 3-point version "merge" is what you actually
need to build bk-level SCMs.

OG.

[1] 2.1.6-pre5, I don't know about current versions

2003-03-02 01:20:55

by Filip Van Raemdonck

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Sat, Mar 01, 2003 at 04:11:55PM -0800, Adam J. Richter wrote:
> Pavel Machek wrote:
> > I've created little project for read-only (for now ;-) kitbeeper
> > clone. It is available at http://www.sf.net/projects/bitbucket (no tar balls,
> > just get it fresh from CVS).
>
> Thank you for taking some initiative and improving this
> situation by constructive means. You are an example to us all,
> as is Andrea Arcangeli with his openbkweb project, which you
> will probably want to examine and perhaps integrate
> (ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/openbkweb).

I've said this (indirectly) before, and I'll say it again:
BitBucket, and you, are missing the point here. Openbkweb isn't.
Before one can use bitbucket there still has to be a bkbits mirror first,
which incidentally may be true for the main linux kernel trees but isn't
for other projects developed with the help of bitkeeper.

I've also said this before, and I'll also repeat this again:
While politics & philosophy are my main reasons not to use bitkeeper, I
also am not bothered enough by other issues to use it plain and simple.
Nor to use openbkweb instead. And I'm not going to tell other people what
they should do.

However, until we have a tool (as openbkweb tries to be, although very
inefficiently) which can extract patches from the "main" openlogging
bitkeeper repositories, the schism remains between developers who use BK
and those who cannot use it - be it for political or real legal (i.e.
license violation, because of involvement in another SCM) reasons.

> bitkbucket currently uses rsync to update data from the
> repository.
(...)
> I think the suggestion made by Pavel Janik that it would
> be better to work on adding BitKeeper-like functionality to existing
> free software packages is a bit misdirected. BitKeeper uses SCCS
> format, and we have a GPL'ed SCCS clone ("cssc"), so you are
> adding functionality to existing free software version control
> code anyhow.

Not until you can use that functionality to access the main BK
repositories directly. When you're still accessing mirrors of it, as in
the rsync case, you are - pragmatically speaking - no better of than when
not accessing it at all.


Regards,

Filip

--
"To me it sounds like Cowpland just doesn't know what the hell he is talking
about. That's to be expected: he's CEO, isn't he?"
-- John Hasler

2003-03-02 01:28:39

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Sat, Mar 01, 2003 at 08:19:52PM -0500, Jeff Garzik wrote:
> Alan Cox wrote:
> >On Sun, 2003-03-02 at 00:49, Arador wrote:
> >
> >>On Sat, 1 Mar 2003 16:11:55 -0800
> >>"Adam J. Richter" <[email protected]> wrote:
> >>
> >>(Just a very personal suggestion)
> >>Why to waste time trying to clone a
> >>tool such as *notrademarkhere*? Why not to support things like subversion?
> >
> >
> >Because the repositories people need to read are in BK format, for better
> >or worse. It doesn't ultimately matter if you use it as an input filter
> >for CVS, subversion or no VCS at all.
>
> "BK format"? Not really. Patches have been posted (to lkml, even) to
> GNU CSSC which allow it to read SCCS files BK reads and writes.

you never tried what you're talking about. there's no way to make any
use of the SCCS tree from Rik's website with only the patched CSSC. The
whole point of bitbucket is to find a way to use CSSC on that tree. And
the longer Larry takes to export the whole data in an open format (CVS,
subversion or whatever), the more progress it will be accomplished in
getting the data out of the only service we have right now (Rik's
server). Sure, CSSC is a foundamental piece to extract the data out of
the single files, but CSSC alone is useless. CSSC only allows you to
work on a single file, you lose the whole view of the tree and in turn
it is completely unusable for doing anything useful like watching
changesets, or checking out a branch or whatever else useful thing. As
Pavel found _all_ the info we are interested about is in the
SCCS/s.ChangeSet file and that has nothing to do with CSSC or SCCS.

>
> Since that already exists, a full BitKeeper clone is IMO a bit silly,
> because it draws users and programmers away from projects that could
> potentially _replace_ BitKeeper.

Jeff, please uninstall *notrademarkhere* from your harddisk, install the
patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
harddisk (like I just did), and then send me via email the diff of the
last Changeset that Linus applied to his tree with author, date,
comments etc... If you can do that, you're completely right and at
least personally I will agree 100% with you, again: iff you can.

Andrea

2003-03-02 01:35:04

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Andrea Arcangeli wrote:
> Jeff, please uninstall *notrademarkhere* from your harddisk, install the
> patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
> harddisk (like I just did), and then send me via email the diff of the
> last Changeset that Linus applied to his tree with author, date,
> comments etc... If you can do that, you're completely right and at
> least personally I will agree 100% with you, again: iff you can.


You're missing the point:

A BK exporter is useful. A BK clone is not.

If Pavel is _not_ attempting to clone BK, then I retract my arguments. :)

Jeff



2003-03-02 01:56:47

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Sat, Mar 01, 2003 at 08:45:08PM -0500, Jeff Garzik wrote:
> Andrea Arcangeli wrote:
> >Jeff, please uninstall *notrademarkhere* from your harddisk, install the
> >patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
> >harddisk (like I just did), and then send me via email the diff of the
> >last Changeset that Linus applied to his tree with author, date,
> >comments etc... If you can do that, you're completely right and at
> >least personally I will agree 100% with you, again: iff you can.
>
>
> You're missing the point:
>
> A BK exporter is useful. A BK clone is not.
>
> If Pavel is _not_ attempting to clone BK, then I retract my arguments. :)

hey, in your previous email you claimed all we need is the patched CSSC,
you change topic quick! Glad you agree CSSC alone is useless and to make
anything useful with Rik's *notrademarkhere* tree we need a true
*notrademarkhere* exporter (of course the exporter will be backed by
CSSC to extract the single file changes, since they're in SCCS format
and it would be pointless to reinvent the wheel).

Now you say the bitbucket project (you read Pavel's announcement, he
said "read only for now", that means exporter in my vocabulary) is
useful, to me that sounds the opposite of your previous claims, but
again: glad we agree on this too now.

Andrea

2003-03-02 17:18:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Andrea Arcangeli wrote:
> On Sat, Mar 01, 2003 at 08:45:08PM -0500, Jeff Garzik wrote:
>
>>Andrea Arcangeli wrote:
>>
>>>Jeff, please uninstall *notrademarkhere* from your harddisk, install the
>>>patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
>>>harddisk (like I just did), and then send me via email the diff of the
>>>last Changeset that Linus applied to his tree with author, date,
>>>comments etc... If you can do that, you're completely right and at
>>>least personally I will agree 100% with you, again: iff you can.
>>
>>
>>You're missing the point:
>>
>>A BK exporter is useful. A BK clone is not.
>>
>>If Pavel is _not_ attempting to clone BK, then I retract my arguments. :)
>
>
> hey, in your previous email you claimed all we need is the patched CSSC,
> you change topic quick! Glad you agree CSSC alone is useless and to make
> anything useful with Rik's *notrademarkhere* tree we need a true
> *notrademarkhere* exporter (of course the exporter will be backed by
> CSSC to extract the single file changes, since they're in SCCS format
> and it would be pointless to reinvent the wheel).

I have not changed the topic, you are still missing my point.

Let us get this small point out of the way: I agree that GNU CSSC
cannot read the BitKeeper ChangeSet file, which is a file critical for
getting the "weave" correct.

But that point is not relevant to my thread of discussion.

Let us continue in the below paragraph...


> Now you say the bitbucket project (you read Pavel's announcement, he
> said "read only for now", that means exporter in my vocabulary) is
> useful, to me that sounds the opposite of your previous claims, but
> again: glad we agree on this too now.

I disagree with your translation. Maybe this is the source of
misunderstand.

To me, a "BK clone, read only for now" is vastly different from a "BK
exporter". The "for now" clearly implies that it will eventually
attempt to be a full SCM.

Why do we need Yet Another Open Source SCM?
Why does Pavel not work on an existing open source SCM, to enable it to
read/write BitKeeper files?

These are the key questions which bother me.

Why do they bother me?

The open source world does not need yet another project that is "not
quite as good as BitKeeper." The open source world needs something that
can do all that BitKeeper does, and more :) A BK clone would be in a
perpetual state of "not quite as good as BitKeeper".

Jeff



2003-03-02 18:04:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Sun, Mar 02, 2003 at 12:28:23PM -0500, Jeff Garzik wrote:
> Andrea Arcangeli wrote:
> >On Sat, Mar 01, 2003 at 08:45:08PM -0500, Jeff Garzik wrote:
> >
> >>Andrea Arcangeli wrote:
> >>
> >>>Jeff, please uninstall *notrademarkhere* from your harddisk, install the
> >>>patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
> >>>harddisk (like I just did), and then send me via email the diff of the
> >>>last Changeset that Linus applied to his tree with author, date,
> >>>comments etc... If you can do that, you're completely right and at
> >>>least personally I will agree 100% with you, again: iff you can.
> >>
> >>
> >>You're missing the point:
> >>
> >>A BK exporter is useful. A BK clone is not.
> >>
> >>If Pavel is _not_ attempting to clone BK, then I retract my arguments. :)
> >
> >
> >hey, in your previous email you claimed all we need is the patched CSSC,
> >you change topic quick! Glad you agree CSSC alone is useless and to make
> >anything useful with Rik's *notrademarkhere* tree we need a true
> >*notrademarkhere* exporter (of course the exporter will be backed by
> >CSSC to extract the single file changes, since they're in SCCS format
> >and it would be pointless to reinvent the wheel).
>
> I have not changed the topic, you are still missing my point.

your point is purerly theorical at this point in time. bitbucker is so
far from being an efficient exporter that arguing right now about
stopping at the exporter or going ahead to clone it completely is a
totally pointless discussion at this point in time.

Once it will be a fully functional exporter please raise your point
again, only then it will make sense to discuss your point.

I'm not even convinced it will become a full exporter if Larry finally
provides the kernel data via an open protocol stored in an open format
as he promised us some week ago, go figure how much I can care what it
will become after it has the readonly capability.

> Let us get this small point out of the way: I agree that GNU CSSC
> cannot read the BitKeeper ChangeSet file, which is a file critical for
> getting the "weave" correct.

This is not what I understood from your previous email:

"BK format"? Not really. Patches have been posted (to lkml, even) to
GNU CSSC which allow it to read SCCS files BK reads and writes.

Since that already exists, a full BitKeeper clone is IMO a bit silly,

now you're saying something completely different, you're saying, "yes the
CSSC obviously isn't enough and we _only_ _need_ the exporter but please
don't do more than the exporter or it will waste developement
resources". This is why you changed topic as far as I'm concerned, but
no problem, I'm glad we agree the exporter is useful now.

> To me, a "BK clone, read only for now" is vastly different from a "BK
> exporter". The "for now" clearly implies that it will eventually
> attempt to be a full SCM.

Why do you care that much now? I can't care less. Period. I need the
exporter and for me the exporter or the bk-clone-read-only is the same
thing, I don't mind if I've to run `bk` or `exportbk` or rsync or
whatever to get the data out.

If bitbucket will become much better than bitkeeper 100 years from now,
much better than a clone, is something I can't care less at this point
in time, and it may be the best or worst thing it will happen to the
whole SCM open source arena, you can't know, I can't know, nobody can
know at this point in time.

You agreed the exporter is useful, so we agree, I don't mind what will
happen after the useful thing is avaialble, it's the last of my worries,
and until we reach that point obviously there is no risk to reinvent the
wheel (unless the data become available in a open protocol first).

> Why do we need Yet Another Open Source SCM?
> Why does Pavel not work on an existing open source SCM, to enable it to
> read/write BitKeeper files?

bitbucket could be merged into any SCM at any time, it is _the
exporter_ that the other SCM needs to import from the *notrademarkhere*
trees.

> These are the key questions which bother me.
>
> Why do they bother me?
>
> The open source world does not need yet another project that is "not
> quite as good as BitKeeper." The open source world needs something that
> can do all that BitKeeper does, and more :) A BK clone would be in a
> perpetual state of "not quite as good as BitKeeper".

Disagree, if it will become more than an read-only thing, it will likely
become as good and most probably better than bitkeeper (maybe not
graphical but still usable) because it means it has the critical mass of
developement power _iff_ it can reach that point. But at this point in time
I doubt it will become more than an exporter, infact I even doubt it
will become a fully exporter if Larry avoids us to waste time. I
personally would have no interest in bitbucket if Linus would provide
the data in a open protocol for efficient downloads and in a open format
for backup-archive downloads as we discussed some week ago.

But again, what bitbucket will become after it will be a function
exporter (i.e. your "point") is enterely pointless to argue about right
now IMHO. But feel free to keep discussing it with others if you think
it matters right now (now that I made my point clear, I probably won't
feel the need to answer since my interest in that matter is so low).

Andrea

2003-03-02 20:02:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Andrea Arcangeli wrote:
> your point is purerly theorical at this point in time. bitbucker is so
> far from being an efficient exporter that arguing right now about
> stopping at the exporter or going ahead to clone it completely is a
> totally pointless discussion at this point in time.
>
> Once it will be a fully functional exporter please raise your point
> again, only then it will make sense to discuss your point.

Ok, fair enough ;)


> I'm not even convinced it will become a full exporter if Larry finally
> provides the kernel data via an open protocol stored in an open format
> as he promised us some week ago, go figure how much I can care what it
> will become after it has the readonly capability.

I think this is a fair request.

IMO a good start would be to get BK to export its metadata for each
changeset in XML. Once that is accomplished, (a) nobody gives a damn
about BK file format, and (b) it is easy to set up an automated, public
distribution of XML changesets that can be imported into OpenCM, cvs, or
whatever.


>>Let us get this small point out of the way: I agree that GNU CSSC
>>cannot read the BitKeeper ChangeSet file, which is a file critical for
>>getting the "weave" correct.
>
>
> This is not what I understood from your previous email:
>
> "BK format"? Not really. Patches have been posted (to lkml, even) to
> GNU CSSC which allow it to read SCCS files BK reads and writes.
>
> Since that already exists, a full BitKeeper clone is IMO a bit silly,
>
> now you're saying something completely different, you're saying, "yes the
> CSSC obviously isn't enough and we _only_ _need_ the exporter but please
> don't do more than the exporter or it will waste developement
> resources". This is why you changed topic as far as I'm concerned, but
> no problem, I'm glad we agree the exporter is useful now.

I am sorry for the misunderstanding then. Let me quote from an email I
sent to you yesterday:

A BK exporter is useful.

So I think we do agree :)


>>To me, a "BK clone, read only for now" is vastly different from a "BK
>>exporter". The "for now" clearly implies that it will eventually
>>attempt to be a full SCM.
>
>
> Why do you care that much now? I can't care less. Period. I need the
> exporter and for me the exporter or the bk-clone-read-only is the same
> thing, I don't mind if I've to run `bk` or `exportbk` or rsync or
> whatever to get the data out.
>
> If bitbucket will become much better than bitkeeper 100 years from now,
> much better than a clone, is something I can't care less at this point
> in time, and it may be the best or worst thing it will happen to the
> whole SCM open source arena, you can't know, I can't know, nobody can
> know at this point in time.
>
> You agreed the exporter is useful, so we agree, I don't mind what will
> happen after the useful thing is avaialble, it's the last of my worries,
> and until we reach that point obviously there is no risk to reinvent the
> wheel (unless the data become available in a open protocol first).


Yes. As you see, I care about the future and not the present, in my
arguments: I believe that a BK clone may hurt the overall [future]
effort of creating a good quality open source SCM. So, in my mind I
separate the two topics of "BK exporter" and "future BK clone."


To get back to the topic of "BK exporter", I think it is more productive
to get Larry to export in an open file format. I will work with him
this week to do that. Reading the BK format itself may be interesting
to some, but I would rather have BitMover do the work and export in an
open file format ;-) Reading BK format directly is "chasing a moving
target" in my opinion.

Jeff



2003-03-02 21:41:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Sun, 2 Mar 2003, Jeff Garzik wrote:
> Andrea Arcangeli wrote:
> > I'm not even convinced it will become a full exporter if Larry finally
> > provides the kernel data via an open protocol stored in an open format
> > as he promised us some week ago, go figure how much I can care what it
> > will become after it has the readonly capability.
>
> I think this is a fair request.
>
> IMO a good start would be to get BK to export its metadata for each
> changeset in XML. Once that is accomplished, (a) nobody gives a damn
> about BK file format, and (b) it is easy to set up an automated, public
> distribution of XML changesets that can be imported into OpenCM, cvs, or
> whatever.

Read: an XML scheme with a public, open specification?

Ask Microsoft how to `encrypt' documents using an `open' standard like XML...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2003-03-03 18:27:24

by Larry McVoy

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

How close is http://www.bitmover.com/EXPORT to what you want (3MB file).

Note that this is very coarse granularity, it's very 2.5.62 up to 2.5.63,
in practice the granularity would be as least as fine as each of Linus'
pushes and finer if possible. We can't capture all the branching structure
in patches, there is too much parallelism, but what we can do is capture
each push that Linus does and if he did more than one merge in that push,
we can break it up into each merge.

We can also provide this as a BK url on bkbits for any cset or range of
csets (we'll have to get another T1 line but I don't see way around that).

This should give enough information that anyone could build their own
BK 2 SVN gateway (or whatever, we're doing the CVS one).

Also, here's what Linus' recent pushes look like WITHOUT breaking it into
each merge, we're still working on that code:

57 csets on 2003/03/03 08:49:44
5 csets on 2003/03/02 21:30:31
28 csets on 2003/03/02 21:04:02
1 csets on 2003/03/02 10:19:24
49 csets on 2003/03/01 19:03:58
2 csets on 2003/03/01 11:04:04
5 csets on 2003/03/01 09:19:24
1 csets on 2003/02/28 19:34:30
37 csets on 2003/02/28 15:30:29
8 csets on 2003/02/28 15:18:12
23 csets on 2003/02/28 15:05:08
31 csets on 2003/02/27 23:30:05
16 csets on 2003/02/27 09:15:07
11 csets on 2003/02/27 07:45:06
47 csets on 2003/02/26 23:09:53
32 csets on 2003/02/25 21:35:34
24 csets on 2003/02/25 18:34:41
22 csets on 2003/02/25 15:49:41
14 csets on 2003/02/24 21:23:34
3 csets on 2003/02/24 15:19:44
1 csets on 2003/02/24 11:16:14
15 csets on 2003/02/24 11:00:36
4 csets on 2003/02/24 10:48:49
1 csets on 2003/02/24 10:03:36
15 csets on 2003/02/24 09:49:34
1 csets on 2003/02/23 20:33:00
3 csets on 2003/02/23 11:15:28
8 csets on 2003/02/23 11:01:10
6 csets on 2003/02/23 10:49:14
2 csets on 2003/02/22 19:32:35
4 csets on 2003/02/22 16:17:27
1 csets on 2003/02/22 12:45:28
76 csets on 2003/02/22 12:34:13
1 csets on 2003/02/21 20:18:19
6 csets on 2003/02/21 19:49:32
86 csets on 2003/02/21 18:03:23
3 csets on 2003/02/21 16:18:24
30 csets on 2003/02/21 14:14:48
1 csets on 2003/02/21 10:18:19
1 csets on 2003/02/21 09:49:15
etc.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-03-03 18:36:16

by Larry McVoy

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Mon, Mar 03, 2003 at 10:37:34AM -0800, Larry McVoy wrote:
> How close is http://www.bitmover.com/EXPORT to what you want (3MB file).
>
> Note that this is very coarse granularity, it's very 2.5.62 up to 2.5.63,

This was too big, I replaced it with the diffs + comments for the last push
Linus did. Even this is pretty big, he pulled 57 csets from DaveM if I
understand things properly.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-03-03 22:06:53

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Hi!

> >Jeff, please uninstall *notrademarkhere* from your harddisk, install the
> >patched CSSC instead (like I just did), rsync Rik's SCCS tree on your
> >harddisk (like I just did), and then send me via email the diff of the
> >last Changeset that Linus applied to his tree with author, date,
> >comments etc... If you can do that, you're completely right and at
> >least personally I will agree 100% with you, again: iff you can.
>
>
> You're missing the point:
>
> A BK exporter is useful. A BK clone is not.

I meant exporter.
Pavel

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-03-03 22:06:16

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Hi!

> >>(Just a very personal suggestion)
> >>Why to waste time trying to clone a
> >>tool such as bitkeeper? Why not to support things like subversion?
> >
> >
> >Because the repositories people need to read are in BK format, for better
> >or worse. It doesn't ultimately matter if you use it as an input filter
> >for CVS, subversion or no VCS at all.
>
> "BK format"? Not really. Patches have been posted (to lkml, even) to
> GNU CSSC which allow it to read SCCS files BK reads and writes.
>
> Since that already exists, a full BitKeeper clone is IMO a bit silly,
> because it draws users and programmers away from projects that could
> potentially _replace_ BitKeeper.

Read-only access to the bk repositories is the first goal. Then, I'll
either add write support (unlikely) or feed it into some existing
version control system to work with that. I'm still not sure what's
the best.

[bk's on-disk format is quite reasonable; it might be okay to reuse
that.]

Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-03-03 22:45:09

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Mon, Mar 03, 2003 at 10:37:34AM -0800, Larry McVoy wrote:
> How close is http://www.bitmover.com/EXPORT to what you want (3MB file).
>
> Note that this is very coarse granularity, it's very 2.5.62 up to 2.5.63,

I'm probably missing something obvious but it's not clear to me how to
extract the changeset info from this format.

Let's assume I want to extract this changeset:

[email protected], 2003-02-24 10:49:30-08:00, [email protected]
[PATCH] convert /proc/io{mem,ports} to seq_file

This converts /proc/io{mem,ports} to the seq_file interface
(single_open).

How can I?

I mean, the above format is fine, as far as we have a file like that per
changeset (or alternatively per Linus's merge, even if not for every
single changeset, when he does the pulls). Clearly a file of that format
for a 2.5.62->63 diff is not finegrined enough.

Correct me if I'm wrong but if I understand well the changeset numbers
aren't fixed in the bitkeeper tree, a changeset number can change while
the merging happens across different cloned trees. So in short, the
changeset numbers are useless to the outside (but still providing them
won't hurt as far as nobody rely on them).

> in practice the granularity would be as least as fine as each of Linus'
> pushes and finer if possible. We can't capture all the branching structure
> in patches, there is too much parallelism, but what we can do is capture
> each push that Linus does and if he did more than one merge in that push,
> we can break it up into each merge.
>
> We can also provide this as a BK url on bkbits for any cset or range of
> csets (we'll have to get another T1 line but I don't see way around that).

If that hurts, you could simply upload them to kernel.org. Even if it's
not a file, can't you simply checkin into a remote cvs on kernel.org or
osdl.org or sourceforge, or whatever else place, so you won't need to
pay for it. It's up to you of course, but I'm sure you're not forced to
pay for this service (besides for the once-a-time setup of the exports,
that I hope won't generate any maintainance overhead to you).

> This should give enough information that anyone could build their own
> BK 2 SVN gateway (or whatever, we're doing the CVS one).

Yes, as far as this file-format is per-merge I think this is all we
need. This way it will be usable to checkout, browse and regenerate the
tree, unlike the cset directory currently in kernel.org.

> Also, here's what Linus' recent pushes look like WITHOUT breaking it into
> each merge, we're still working on that code:
>
> 57 csets on 2003/03/03 08:49:44
> 5 csets on 2003/03/02 21:30:31
> 28 csets on 2003/03/02 21:04:02
> 1 csets on 2003/03/02 10:19:24
> 49 csets on 2003/03/01 19:03:58
> 2 csets on 2003/03/01 11:04:04
> 5 csets on 2003/03/01 09:19:24
> 1 csets on 2003/02/28 19:34:30
> 37 csets on 2003/02/28 15:30:29
> 8 csets on 2003/02/28 15:18:12
> 23 csets on 2003/02/28 15:05:08
> 31 csets on 2003/02/27 23:30:05
> 16 csets on 2003/02/27 09:15:07
> 11 csets on 2003/02/27 07:45:06
> 47 csets on 2003/02/26 23:09:53
> 32 csets on 2003/02/25 21:35:34
> 24 csets on 2003/02/25 18:34:41
> 22 csets on 2003/02/25 15:49:41
> 14 csets on 2003/02/24 21:23:34
> 3 csets on 2003/02/24 15:19:44
> 1 csets on 2003/02/24 11:16:14
> 15 csets on 2003/02/24 11:00:36
> 4 csets on 2003/02/24 10:48:49
> 1 csets on 2003/02/24 10:03:36
> 15 csets on 2003/02/24 09:49:34
> 1 csets on 2003/02/23 20:33:00
> 3 csets on 2003/02/23 11:15:28
> 8 csets on 2003/02/23 11:01:10
> 6 csets on 2003/02/23 10:49:14
> 2 csets on 2003/02/22 19:32:35
> 4 csets on 2003/02/22 16:17:27
> 1 csets on 2003/02/22 12:45:28
> 76 csets on 2003/02/22 12:34:13
> 1 csets on 2003/02/21 20:18:19
> 6 csets on 2003/02/21 19:49:32
> 86 csets on 2003/02/21 18:03:23
> 3 csets on 2003/02/21 16:18:24
> 30 csets on 2003/02/21 14:14:48
> 1 csets on 2003/02/21 10:18:19
> 1 csets on 2003/02/21 09:49:15

Just curious, this also means that at least around the 80% of merges
in Linus's tree is submitted via a bitkeeper pull, right?

Andrea

2003-03-03 23:04:06

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Hi!

> > How close is http://www.bitmover.com/EXPORT to what you want (3MB file).
> >
> > Note that this is very coarse granularity, it's very 2.5.62 up to 2.5.63,
>
> I'm probably missing something obvious but it's not clear to me how to
> extract the changeset info from this format.

Is that format parsable at all? It looks like strange changeset
comments could confuse parsers...

> Let's assume I want to extract this changeset:
>
> [email protected], 2003-02-24 10:49:30-08:00, [email protected]
> [PATCH] convert /proc/io{mem,ports} to seq_file
>
> This converts /proc/io{mem,ports} to the seq_file interface
> (single_open).
>
> How can I?
>
> I mean, the above format is fine, as far as we have a file like that per
> changeset (or alternatively per Linus's merge, even if not for every
> single changeset, when he does the pulls). Clearly a file of that format
> for a 2.5.62->63 diff is not finegrined enough.

Ben's bitsubversion script is somewhat slow, but should be capable of
pulling any diff you want...
Pavel

--
Horseback riding is like software...
...vgf orggre jura vgf serr.

2003-03-03 23:47:13

by David Lang

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Mon, 3 Mar 2003, Andrea Arcangeli wrote:

> Just curious, this also means that at least around the 80% of merges
> in Linus's tree is submitted via a bitkeeper pull, right?
>
> Andrea

remember how Linus works, all normal patches get copied into a single
large patch file as he reads his mail then he runs patch to apply them to
the tree. I think this would make the entire batch of messages look like
one cset.

David Lang

2003-03-03 23:52:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

David Lang wrote:
> On Mon, 3 Mar 2003, Andrea Arcangeli wrote:
>
>
>>Just curious, this also means that at least around the 80% of merges
>>in Linus's tree is submitted via a bitkeeper pull, right?
>>
>>Andrea
>
>
> remember how Linus works, all normal patches get copied into a single
> large patch file as he reads his mail then he runs patch to apply them to
> the tree. I think this would make the entire batch of messages look like
> one cset.


Not correct. His commits properly separate the patches out into
individual csets.

Jeff


2003-03-03 23:54:52

by Larry McVoy

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Mon, Mar 03, 2003 at 07:02:28PM -0500, Jeff Garzik wrote:
> David Lang wrote:
> >On Mon, 3 Mar 2003, Andrea Arcangeli wrote:
> >
> >
> >>Just curious, this also means that at least around the 80% of merges
> >>in Linus's tree is submitted via a bitkeeper pull, right?
> >>
> >>Andrea
> >
> >
> >remember how Linus works, all normal patches get copied into a single
> >large patch file as he reads his mail then he runs patch to apply them to
> >the tree. I think this would make the entire batch of messages look like
> >one cset.
>
>
> Not correct. His commits properly separate the patches out into
> individual csets.

And we've written code which finds the longest path through the graph
to get the finest granularity; when run on his tree we get 8138 nodes.
That is 43% of the 18837 nodes possible. The trunk only includes
1068 nodes. So we can a very good job exporting to CVS.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-03-04 00:06:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

On Mon, Mar 03, 2003 at 07:02:28PM -0500, Jeff Garzik wrote:
> David Lang wrote:
> >On Mon, 3 Mar 2003, Andrea Arcangeli wrote:
> >
> >
> >>Just curious, this also means that at least around the 80% of merges
> >>in Linus's tree is submitted via a bitkeeper pull, right?
> >>
> >>Andrea
> >
> >
> >remember how Linus works, all normal patches get copied into a single
> >large patch file as he reads his mail then he runs patch to apply them to
> >the tree. I think this would make the entire batch of messages look like
> >one cset.
>
>
> Not correct. His commits properly separate the patches out into
> individual csets.

and they're unusable as source to regenerate a tree. I had similar
issues with the web too. to make use of the single csets you need to
implement the internal bitkeeper branching knowledge too. Not to tell
apparently the cset numbers changes all the time.

Andrea

2003-03-04 00:20:02

by Jeff Garzik

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

Andrea Arcangeli wrote:
> On Mon, Mar 03, 2003 at 07:02:28PM -0500, Jeff Garzik wrote:
>
>>David Lang wrote:
>>
>>>On Mon, 3 Mar 2003, Andrea Arcangeli wrote:
>>>
>>>
>>>
>>>>Just curious, this also means that at least around the 80% of merges
>>>>in Linus's tree is submitted via a bitkeeper pull, right?
>>>>
>>>>Andrea
>>>
>>>
>>>remember how Linus works, all normal patches get copied into a single
>>>large patch file as he reads his mail then he runs patch to apply them to
>>>the tree. I think this would make the entire batch of messages look like
>>>one cset.
>>
>>
>>Not correct. His commits properly separate the patches out into
>>individual csets.
>
>
> and they're unusable as source to regenerate a tree. I had similar
> issues with the web too. to make use of the single csets you need to
> implement the internal bitkeeper branching knowledge too. Not to tell
> apparently the cset numbers changes all the time.


The "weave", or order of csets, certainly changes each time Linus does a
'bk pull'. I wonder if a 'cset_order' file would be useful -- an
automated job uses BK to export the weave for a specific point in time.
One could use that to glue the csets together, perhaps?

WRT cset numbers, ignore them. Each cset has a unique key. When
setting up the 2.5 snapshot cron job, Linus asked me to export this key
so that the definitive top-of-tree may be identified, regardless of cset
number. Here is an example:
ftp://ftp.kernel.org/pub/linux/kernel/v2.5/snapshots/patch-2.5.63-bk6.key

Jeff



2003-03-04 02:19:19

by Martin J. Bligh

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed *notrademarkhere* clone

>> Just curious, this also means that at least around the 80% of merges
>> in Linus's tree is submitted via a bitkeeper pull, right?
>>
>> Andrea
>
> remember how Linus works, all normal patches get copied into a single
> large patch file as he reads his mail then he runs patch to apply them to
> the tree. I think this would make the entire batch of messages look like
> one cset.

I think he also creates subtrees, applies flat patches to those, then
merges the subtrees back into his main tree as a bk-merge ... won't that
distort the stats?

M.

2003-03-04 16:06:20

by David Woodhouse

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Mon, 2003-03-03 at 00:10, Pavel Machek wrote:
> [bk's on-disk format is quite reasonable; it might be okay to reuse
> that.]

I disagree. Keeping the checked-out files _outside_ the repository, and
being able to have multiple checked-out trees from the same repository
with uncommitted changes outstanding while you pull from a remote
repository, etc, is useful.

cvs with cvsup does some of this but has obvious disadvantages, not
least of which being the one-way nature of change propagation. SVN and a
yet-to-be-invented SVNup (hopefully not in Modula-3) this time) may be a
lot closer to what we want.

--
dwmw2

2003-03-04 16:17:23

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Hi!

> > [bk's on-disk format is quite reasonable; it might be okay to reuse
> > that.]
>
> I disagree. Keeping the checked-out files _outside_ the repository, and
> being able to have multiple checked-out trees from the same repository
> with uncommitted changes outstanding while you pull from a remote
> repository, etc, is useful.

Agreed, but bk's SCCS-based format does not prevent you from keeping
checked-out files outside repository or from having multiple
checked-out trees. In fact I'm doing exactly that with bitbucket.

Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.

2003-03-07 10:59:00

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Hi!

> But anyway, what made[1] Bitkeeper suck less is the real DAG
> structure. Neither arch nor subversion seem to have understood that
> and, as a result, don't and won't provide the same level of semantics.
> Zero hope for Linus to use them, ever. They're needed for any
> decently distributed development process.

Can you elaborate? I thought that this
"real DAG" structure is more or less
equivalent to each developer having
his owm CVS repository...

> Hell, arch is still at the update-before-commit level. I'd have hoped
> PRCS would have cured that particular sickness in SCM design ages ago.
>
> Atomicity, symbolic links, file renames, splits (copy) and merges (the
> different files suddendly ending up being the same one) are somewhat
> important, but not the interesting part. A good distributed DAG
> structure and a quality 3-point version "merge" is what you actually
> need to build bk-level SCMs.

If I fixed CVS renames, added atomic
commits, splits and merges, and gave each
developer his own CVS repository,
would I be in same league as bk?
Ie 10 times slower but equivalent
functionality?

(3 point merge should be doable for CVS
to and would be good thing anyway,
right?)
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...

2003-03-07 12:01:57

by Olivier Galibert

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Thu, Mar 06, 2003 at 05:18:53PM +0100, Pavel Machek wrote:
> Can you elaborate? I thought that this
> "real DAG" structure is more or less
> equivalent to each developer having
> his owm CVS repository...

Nope. CVS uses RCS, and RCS only knows about trees, not graphs.
Specifically, branch merges are not tagged as such, and as a result
CVS is unable to pick up the best grandparent when doing a merge.
That's the main reason of why branching under CVS is so painful
(forgetting about the performance issues).


> If I fixed CVS renames, added atomic
> commits, splits and merges, and gave each
> developer his own CVS repository,
> would I be in same league as bk?
> Ie 10 times slower but equivalent
> functionality?

Nope. You'll find out that this per-developper repository quickly
needs to become a per-branch repository, and even need you need to
write somewhere when the merges with other repositories happen, and
you end up with the DAG again.

Another way to see it is that CVS and friends use an
update-then-commit scheme, which is proven crap because you lose the
working version you had when you do the update to get a result that is
sometimes interesting. Nice systems, like PRCS and bk, first commit
to a new branch (no update necessary obviously) then merge in the
mainline. As a side effect, they are Good with branches. Bk's main
quality over PRCS is the distribution. This lack is what makes PRCS
essentially unusable for serious open source projects. Otherwise
they're semantically the same.


> (3 point merge should be doable for CVS
> to and would be good thing anyway,
> right?)

Technically, CVS does 3-point merge, it's just crap at finding the
third point, and diff3 -m (which is what is used under the hood) isn't
that spectacular either.

You can see the merge operation in a different way. You take 3
versions of your complete repository A, B and R (reference). You
compute the deltas dA and dB so that A=dA(R) and B=dB(R). Then you
try to build M=dA(dB(R))=dB(dA(R)), when it makes sense (not only the
deltas aren't necessarily commutative, they can't even always apply
one after the other). When it doesn't work there are conflicts to be
resolved by the user. You can see that when it workds M=dA(B)=dB(A).

You can do a lot of things with that, merging branches is just one of
them. You can back out patches from within the history for instance
(D->E->F, merge D and F using E as reference removes the D->E patch
from F).

The trick is, the "simplest" your deltas are the lowest the conflict
probability is. That's where the DAG kicks in. For a branch merge,
the lowest conflict probability of conflict tends to occur when the
two deltas are a linear combination of small user-made deltas, with no
delta common between the two chains. I.e. the best reference to use
is the latest merge point. The DAG allows you to find it. CVS
doesn't note the merge points so it always goes all the way where the
branch is rooted, ensuring that the two delta chains have a large
common prefix.

Sub-optimal reference point plus diff3's algorithm being what it is
makes the CVS branches plain unusable. Multiple repositories won't
fix that, since you'll need to merge between repositories anyway.

OG.

2003-03-07 12:22:04

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Hi!

> > Can you elaborate? I thought that this
> > "real DAG" structure is more or less
> > equivalent to each developer having
> > his owm CVS repository...
>
> Nope. CVS uses RCS, and RCS only knows about trees, not graphs.
> Specifically, branch merges are not tagged as such, and as a result
> CVS is unable to pick up the best grandparent when doing a merge.
> That's the main reason of why branching under CVS is so painful
> (forgetting about the performance issues).

I see. But I still somehow can not understand how merging is
possible. Merge possibly means work-by-hand, right? So it is not as
simple as noting that 1.8 and 1.7.1.1 were merged into 1.9, no? [And
what if developer did really crap job at merging that, like dropping
all changes from 1.7.1.1?]

> > If I fixed CVS renames, added atomic
> > commits, splits and merges, and gave each
> > developer his own CVS repository,
> > would I be in same league as bk?
> > Ie 10 times slower but equivalent
> > functionality?
>
> Nope. You'll find out that this per-developper repository quickly
> needs to become a per-branch repository, and even need you need to
> write somewhere when the merges with other repositories happen, and
> you end up with the DAG again.

Yep, that's what I wanted to know. [I see per-branch repository is
pain, but it helps me to understand that.]

Thanx for your explanations,
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.

2003-03-07 16:43:41

by Olivier Galibert

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Fri, Mar 07, 2003 at 01:32:37PM +0100, Pavel Machek wrote:
> > Nope. CVS uses RCS, and RCS only knows about trees, not graphs.
> > Specifically, branch merges are not tagged as such, and as a result
> > CVS is unable to pick up the best grandparent when doing a merge.
> > That's the main reason of why branching under CVS is so painful
> > (forgetting about the performance issues).
>
> I see. But I still somehow can not understand how merging is
> possible. Merge possibly means work-by-hand, right? So it is not as
> simple as noting that 1.8 and 1.7.1.1 were merged into 1.9, no? [And
> what if developer did really crap job at merging that, like dropping
> all changes from 1.7.1.1?]

Calling A and B the versions to merge and R the reference, diff3 uses
this algorithm (probably the simplest possible):
- Compute the diff between A and R, call it dA
- Compute the diff between B and R, call it dB
- Merge the two diffs into one (and conflict where you can't)
- Apply the merged diff to R

Better algorithms do the alignments per-character instead of per-line,
detect moved and changed functions, detect duplicate inserts, etc.
None, of course, is perfect, as Larry could tell you.

Now if the development went that way:

1.7 -> 1.7.1.1 (branching, i.e. copy)
v v
v 1.7.1.2
1.8 v
v -> 1.7.1.3 (merge)
1.9 v
v v
1.10 v
v -> 1.7.1.4 (merge)
v v
v 1.7.1.5
v v
1.11 <- (merge)

Pretty much standard, a developper created a new branch, made some
changes in it, synced with mainline, synced with mailine again a
little later, made some new changes and finally folded the branch back
in the mainline. Let's admit the developper changes don't conflict by
themselves with the mainline changes.

CVS, for all the merges, is going to pick 1.7 as the reference. The
first time, for 1.7.1.3, it's going to work correctly. It will fuse
the 1.7->1.8 patch with the 1.7.1.1->1.7.1.2 patch and apply the
result to 1.7 to get 1.7.1.3. The two patches have no reason to
overlap. 1.7.1.2->1.7.1.3 will essentially be identical to 1.7->1.8,
and 1.8->1.7.1.3 will essentially be identical to 1.7.1.2->1.7.1.3.

As soon as the next merge, i.e 1.7.1.4, it breaks. CVS is going to
try to fuse the 1.7->1.10 patch with the 1.7->1.7.1.3 patch. But
1.7->1.10 = 1.7->1.8+1.8->1.10 and 1.7->1.7.1.3 ~= 1.7->1.7.1.2+1.7->1.8.
So they have components in common, hance they _will_ conflict.

If CVS had taken the latest common ancestor by keeping in the
repository the existence of the 1.8->1.7.1.3 link, it would have taken
the 1.8 version as the reference. The patches to fuse would have been
1.8->1.10 and 1.8->1.7.1.3, which have no reason to conflict.

Same for the next merge, the optimal merge point is in that case 1.10,
and it ends up being a null merge, i.e. 1.11 is a copy of 1.7.1.5.

You can see the final structure is a DAG, with each node having a max
of 2 ancestors. And that's what PRCS and bk are working with,
fundamentally.

OG.

2003-03-07 17:03:59

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

On Fri, 7 Mar 2003, Olivier Galibert wrote:
> Now if the development went that way:
>
> 1.7 -> 1.7.1.1 (branching, i.e. copy)
> v v
> v 1.7.1.2
> 1.8 v
> v -> 1.7.1.3 (merge)
> 1.9 v
> v v
> 1.10 v
> v -> 1.7.1.4 (merge)
> v v
> v 1.7.1.5
> v v
> 1.11 <- (merge)
>
> Pretty much standard, a developper created a new branch, made some
> changes in it, synced with mainline, synced with mailine again a
> little later, made some new changes and finally folded the branch back
> in the mainline. Let's admit the developper changes don't conflict by
> themselves with the mainline changes.
>
> CVS, for all the merges, is going to pick 1.7 as the reference. The
> first time, for 1.7.1.3, it's going to work correctly. It will fuse
> the 1.7->1.8 patch with the 1.7.1.1->1.7.1.2 patch and apply the
> result to 1.7 to get 1.7.1.3. The two patches have no reason to
> overlap. 1.7.1.2->1.7.1.3 will essentially be identical to 1.7->1.8,
> and 1.8->1.7.1.3 will essentially be identical to 1.7.1.2->1.7.1.3.
^^^^^^^^^^^^^^^^
1.7.1.1->1.7.1.2, I assume?

> As soon as the next merge, i.e 1.7.1.4, it breaks. CVS is going to
> try to fuse the 1.7->1.10 patch with the 1.7->1.7.1.3 patch. But
> 1.7->1.10 = 1.7->1.8+1.8->1.10 and 1.7->1.7.1.3 ~= 1.7->1.7.1.2+1.7->1.8.
> So they have components in common, hance they _will_ conflict.
>
> If CVS had taken the latest common ancestor by keeping in the
> repository the existence of the 1.8->1.7.1.3 link, it would have taken
> the 1.8 version as the reference. The patches to fuse would have been
> 1.8->1.10 and 1.8->1.7.1.3, which have no reason to conflict.
>
> Same for the next merge, the optimal merge point is in that case 1.10,
> and it ends up being a null merge, i.e. 1.11 is a copy of 1.7.1.5.
>
> You can see the final structure is a DAG, with each node having a max
> of 2 ancestors. And that's what PRCS and bk are working with,
> fundamentally.

Aha, so that's why my `mergetree' script (which basically is some directory
recursion around plain RCS merge, with additional support for hardlinking
identical files) works better than CVS, when I merge e.g. linux-2.5.64 and
linux-m68k-2.5.63 into linux-m68k-2.5.64. It always uses the latest common
ancestor (linux-2.5.63)...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2003-03-07 18:58:15

by Pavel Machek

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Hi!

> Now if the development went that way:
>
> 1.7 -> 1.7.1.1 (branching, i.e. copy)
> v v
> v 1.7.1.2
> 1.8 v
> v -> 1.7.1.3 (merge)
> 1.9 v
> v v
> 1.10 v
> v -> 1.7.1.4 (merge)
> v v
> v 1.7.1.5
> v v
> 1.11 <- (merge)
>
> Pretty much standard, a developper created a new branch, made some
> changes in it, synced with mainline, synced with mailine again a
> little later, made some new changes and finally folded the branch back
> in the mainline. Let's admit the developper changes don't conflict by
> themselves with the mainline changes.
>
> CVS, for all the merges, is going to pick 1.7 as the reference. The
> first time, for 1.7.1.3, it's going to work correctly. It will fuse
> the 1.7->1.8 patch with the 1.7.1.1->1.7.1.2 patch and apply the
> result to 1.7 to get 1.7.1.3. The two patches have no reason to
> overlap. 1.7.1.2->1.7.1.3 will essentially be identical to
> 1.7->1.8,

So, basically, if branch was killed and recreated after each merge
from mainline, problem would be solved, right?

Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.

2003-03-09 01:57:08

by Horst H. von Brand

[permalink] [raw]
Subject: Re: BitBucket: GPL-ed KitBeeper clone

Pavel Machek <[email protected]> said:

[...]

> So, basically, if branch was killed and recreated after each merge
> from mainline, problem would be solved, right?

Who is branch, who is mainline? The branch owner _will_ be pissed off if
his head version changes each time he syncronizes. What if mainline dies,
and the official line moves to one of the branches? What happens when there
aren't just two, but a dozen developers swizzling individual csets from
each other (not necesarily just resyncing with each other)? If said
developers also apply random patches from a common mailing list?

This is _much_ harder than it looks on the surface.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513