In article <[email protected]>,
Pavel Machek <[email protected]> wrote:
>
>So, basically, if branch was killed and recreated after each merge
>from mainline, problem would be solved, right?
Wrong.
Now think three trees. Each merging back and forth between each other.
Or, in the case of something like the Linux kernel tree, where you don't
have two or three trees. You've got at least 20 actively developed
concurrent trees with branches at different points.
Trust me. CVS simple CANNOT do this. You need the full information.
Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
way indefinitely since most people don't seem to even understand _why_
it is superior.
Linus
Hi Linus,
On Fri, Mar 07, 2003 at 11:16:47PM +0000, Linus Torvalds wrote:
> In article <[email protected]>,
> Pavel Machek <[email protected]> wrote:
> >
> >So, basically, if branch was killed and recreated after each merge
> >from mainline, problem would be solved, right?
>
> Wrong.
>
> Now think three trees. Each merging back and forth between each other.
>
> Or, in the case of something like the Linux kernel tree, where you don't
> have two or three trees. You've got at least 20 actively developed
> concurrent trees with branches at different points.
>
> Trust me. CVS simple CANNOT do this. You need the full information.
>
> Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
> way indefinitely since most people don't seem to even understand _why_
> it is superior.
You make it sound like no one is even interested ;-). But it's not true! A
lot of people currently working on alternative version control systems would
like very much to know what it would take to satisfy the needs of kernel
development. Maybe, being on the inside of the process and well aware of
your own needs, you don't realize how difficult it is to figure these things
out from the outside. I think only very few people (perhaps only one) really
understand this issue, and they aren't communicating with the horde of people
who really want to help, if only they knew how.
My impression is that Pavel is really smart and pretty close to the core of
kernel development. But you say even he doesn't get it? Come on! Throw
us a bone, willya!? ;-)
Be well,
Zack
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Zack Brown
> > Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
> > way indefinitely since most people don't seem to even understand _why_
> > it is superior.
>
> You make it sound like no one is even interested ;-). But it's not true! A
> lot of people currently working on alternative version control systems would
> like very much to know what it would take to satisfy the needs of kernel
> development. Maybe, being on the inside of the process and well aware of
> your own needs, you don't realize how difficult it is to figure these things
> out from the outside. I think only very few people (perhaps only one) really
> understand this issue, and they aren't communicating with the horde of people
> who really want to help, if only they knew how.
[Long rant, summary: it's harder than you think, read on for the details]
There are parts of BitKeeper which required multiple years of thought by
people a lot smarter than me. You guys are under the mistaken impression
that BitKeeper is my doing; it's not. There are a lot of people who
work here and they have some amazing brains. To create something like
BK is actually more difficult than creating a kernel.
To understand why, think of BK as a distributed, replicated, version
controlled user level file system with no limits on any of the file system
events which may happened in parallel. Now put the changes back together,
correctly, no matter how much parallelism there has been. Pavel hasn't
understood anything but a tiny fraction of the problem space yet, he
just doesn't realize it. Even Linus doesn't know how BitKeeper works,
we haven't told him and I can tell from his explanations that he gets
part of it but not most of it. That's not a slam on Linus or Pavel or
anyone else. I'm just trying to tell you guys that this stuff is a lot
harder than you think. I've told people that before, like the SVN and
OpenCM guys, and the leaders of both those efforts showed up later and
said "yup, you're right, it is a hell of a lot harder than it looks".
And they are nowhere near being able to do what BK does. Ask them if
you have doubts about what I am saying.
Merging is just one of the complex areas. It gets all the attention
because it is hard enough but easy enough that people like to work on it.
It's actually fun to work on merging. Ditto for the graph structure,
that's trivial. The other parts aren't fun and they are more difficult
so they don't get talked about. But they are more important because
the user has no idea how to deal with them and users do know how to deal
with merge problems, lots of you understand patch rejects.
Rename handling in a distributed system is actually much harder than
getting the merging done. It doesn't seem like it is, but we've rewritten
how we do it 3 times and are working on a 4th all because we've been
forced to learn about all the different ways that people move things
around. CVS doesn't have any of the rename problems because it doesn't
do them, and SVN doesn't have 1/1000th of the problems we do because it
is centralized. Centralized means that there is never any confusion
about where something should go, you can only create one file in one
directory entry because there is only one directory entry available.
In BK's case, there can be an infinite number of different files which
all want to be src/foo.c.
Symbolic tags are really hard. What?!? What could be easier than adding
a symbolic label on a revision? Well, in a centralized system it is
trivial but in a distributed system you have to handle the fact that
the same symbol can be put on multiple revs. It's the same problem as
the file names, just a variation. Add to that the fact that time can
march forward or backwards in a distributed system, even if all the
events were marching forward, and the fun really starts. I personally
have redone the tags support about 6 times and it still isn't right.
Security semantics are hard in a distributed system. Where do you
put them, how do you integrate them into the system, what happens when
people try and work around them? In CVS or SVN you can simply lock down
the server and not worry about it, but in BK, the user has the revision
history and they are root, they can do whatever they want.
Time semantics are the hardest of all. You simply can't depend on time
being correct. It goes forwards, backwards, and sideways on you and
if you think you can use time you don't have the slightest idea of the
scope of the problem. Again, not a problem for CVS/SVN/whatever, all the
deltas are made against the same clock. Not true in a distributed system.
That's a taste of what it is like. You have to get all of those right
and the many other ones that I didn't tell you about or you might as
well not bother. Why? Because the problems are very subtle and there
isn't any hope of getting an end user to figure out a subtle problem,
they don't have the time or the inclination. We've seen users throw away
weeks of work just because they didn't understand the merge conflict so
they start over on an updated tree. And those people will understand
the rename corner cases? Not a chance.
The main point here is that if you think that BK happened quickly,
by one guy, you are nuts. It started in May of 1997, that's almost 6
years ago, not the 2 years that Pavel thinks, and I had already written
a complete version control system prior to that, so this was round two.
Even with that knowledge, I wasn't near enough to get BK to where it is
today, there is more than 40 man years of effort in BK so far. A bunch
of people, working 60-90 hour weeks, for almost 6 years. Not average
people, either, any one of these people would be a staff engineer or
better at Sun (salaries for those people are in the $115K - $140K range).
The disbelievers think that I'm out here waving the "it's too hard"
flag so you'll go away. And the arrogant people think that they are
smarter than us and can do it quicker. I doubt it but by all means go
for it and see what you can do. Just file away a copy of this and let
me know what you think three or four years from now.
Oh, by the way, you'll need a business model, I found that out 2 or 3
years into it when my savings ran out. Oh, my, you might not be able
to GPL it! Why it might even end up being just like BitKeeper with
an evil corporate dude named Pavel running the show. Believe me, if
that happens, I'll be here to rake him over the coals on a daily basis
for being such an evil person who doesn't understand the point of free
software. I can't wait.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
On Sat, 8 Mar 2003, Larry McVoy wrote:
> > > Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
> > > way indefinitely since most people don't seem to even understand _why_
> > > it is superior.
> >
> > You make it sound like no one is even interested ;-). But it's not true! A
> > lot of people currently working on alternative version control systems would
> > like very much to know what it would take to satisfy the needs of kernel
> > development. Maybe, being on the inside of the process and well aware of
> > your own needs, you don't realize how difficult it is to figure these things
> > out from the outside. I think only very few people (perhaps only one) really
> > understand this issue, and they aren't communicating with the horde of people
> > who really want to help, if only they knew how.
>
> [Long rant, summary: it's harder than you think, read on for the details]
>
> There are parts of BitKeeper which required multiple years of thought by
> people a lot smarter than me. You guys are under the mistaken impression
> that BitKeeper is my doing; it's not. There are a lot of people who
> work here and they have some amazing brains. To create something like
> BK is actually more difficult than creating a kernel.
Larry, how many years are that you're working as a developer and side by
side with developers ? 15 maybe 20 ? Do you know what's the best way to
keep developers out of doing something ? Well, just say the task is
trivial, easy, for dummies. And you will see developers stay away from the
project like cats from water. Try, even remotely, to dress the project
with complexity, and they'll come in storms ...
- Davide
On Sat, Mar 08, 2003 at 04:05:14PM -0800, Larry McVoy wrote:
> Zack Brown wrote:
> > Linus Torvalds wrote:
> > > Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
> > > way indefinitely since most people don't seem to even understand _why_
> > > it is superior.
> >
> > You make it sound like no one is even interested ;-). But it's not true! A
> > lot of people currently working on alternative version control systems would
> > like very much to know what it would take to satisfy the needs of kernel
> > development.
>
> [Long rant, summary: it's harder than you think, read on for the details]
[skipping long description]
OK, so here is my distillation of Larry's post.
Basic summary: a distributed, replicated, version controlled user level file
system with no limits on any of the file system events which may happened
in parallel. All changes must be put correctly back together, no matter how
much parallelism there has been.
* Merging.
* The graph structure.
* Distributed rename handling. Centralized systems like Subversion don't
have as many problems with this because you can only create one file in
one directory entry because there is only one directory entry available.
In distributed rename handling, there can be an infinite number of different
files which all want to be src/foo.c. There are also many rename corner-cases.
* Symbolic tags. This is adding a symbolic label on a revision. A distributed
system must handle the fact that the same symbol can be put on multiple
revisions. This is a variation of file renaming. One important thing to
consider is that time can go forward or backward.
* Security semantics. Where should they go? How can they be integrated
into the system? How are hostile users handled when there is no central
server to lock down?
* Time semantics. A distributed system cannot depend on reported time
being correct. It can go forward or backward at any rate.
I'd be willing to maintain this as the beginning of a feature list and
post it regularly to lkml if enough people feel it would be useful and not
annoying. The goal would be to identify the features/problems that would
need to be handled by a kernel-ready version control system.
Be well,
Zack
--
Zack Brown
Hi,
On Sat, 8 Mar 2003, Zack Brown wrote:
> * Distributed rename handling. Centralized systems like Subversion don't
> have as many problems with this because you can only create one file in
> one directory entry because there is only one directory entry available.
> In distributed rename handling, there can be an infinite number of different
> files which all want to be src/foo.c. There are also many rename corner-cases.
This actually a very bk specific problem, because the real problem under
bk there can be only one src/SCCS/s.foo.c. A separate repository doesn't
have this problem, because it has control over the naming in the
repository and the original naming is restored with an explicit checkout.
In this context it will be really interesting to see how Larry wants to
implement "lines of development" (aka branches which don't suck) and
also maintain SCCS compatibility.
bye, Roman
On Sun, 9 Mar 2003, Roman Zippel wrote:
> On Sat, 8 Mar 2003, Zack Brown wrote:
>
> > * Distributed rename handling.
>
> This actually a very bk specific problem, because the real problem under
> bk there can be only one src/SCCS/s.foo.c.
I don't think that is the issue.
[ Well, yes, I agree that the SCCS format is bad, but for other reasons ]
> A separate repository doesn't have this problem
You're wrong.
The problem is _distribution_. In other words, two people rename the same
file. Or two people rename two _different_ files to the same name. Or two
people create two different files with the same name. What happens when
you merge?
None of these are issues for broken systems like CVS or SVN, since they
have a central repository, so there _cannot_ be multiple concurrent
renames that have to be merged much later (well, CVS cannot handle renames
at all, but the "same name creation" issue you can see even with CVS).
With a central repostory, you avoid a lot of the problems, because the
conflicts must have been resolved _before_ the commit ever happens - put
another way, you can never have a conflict in the revision history.
Sepoarate repostitories and SCCS file formats have nothing to do with the
real problem. Distribution is key, not the repository format.
Linus
Hi,
On Sat, 8 Mar 2003, Linus Torvalds wrote:
> None of these are issues for broken systems like CVS or SVN, since they
> have a central repository, so there _cannot_ be multiple concurrent
> renames that have to be merged much later.
It is possible, you only have to remember that the file foo.c doesn't have
to be called foo.c,v in the repository. SVN should be able to handle this,
it's just lacking important merging mechanisms.
This is actually a key feature I want to see in a SCM system - the ability
to keep multiple developments within the same repository. I want to pull
other source tress into a branch and compare them with other branches and
merge them into new branches.
> Sepoarate repostitories and SCCS file formats have nothing to do with the
> real problem. Distribution is key, not the repository format.
I agree, what I was trying to say is that the SCCS format makes a few
things more complex than they had to be.
bye, Roman
Roman Zippel <[email protected]> writes:
> Hi,
>
> On Sat, 8 Mar 2003, Linus Torvalds wrote:
>
> > None of these are issues for broken systems like CVS or SVN, since they
> > have a central repository, so there _cannot_ be multiple concurrent
> > renames that have to be merged much later.
>
> It is possible, you only have to remember that the file foo.c doesn't have
> to be called foo.c,v in the repository. SVN should be able to handle this,
> it's just lacking important merging mechanisms.
> This is actually a key feature I want to see in a SCM system - the ability
> to keep multiple developments within the same repository. I want to pull
> other source tress into a branch and compare them with other branches and
> merge them into new branches.
In a distributed system everything happens on a branch.
> > Sepoarate repostitories and SCCS file formats have nothing to do with the
> > real problem. Distribution is key, not the repository format.
>
> I agree, what I was trying to say is that the SCCS format makes a few
> things more complex than they had to be.
I don't know, if the problem really changes that much. How do
you pick a globally unique inode number for a file? And then
how do you reconcile this when people on 2 different branches create
the same file and want to merge their versions together?
So as a very rough approximation.
- Distribution is the problem.
- Powerful branching is the only thing that helps this
- Non branch local data (labels/tags) is very difficult.
Eric
On Sat, Mar 08, 2003 at 07:42:24PM -0800, Linus Torvalds wrote:
>
> On Sun, 9 Mar 2003, Roman Zippel wrote:
> > On Sat, 8 Mar 2003, Zack Brown wrote:
> >
> > > * Distributed rename handling.
> >
> > This actually a very bk specific problem, because the real problem under
> > bk there can be only one src/SCCS/s.foo.c.
>
> I don't think that is the issue.
>
> [ Well, yes, I agree that the SCCS format is bad, but for other reasons ]
It is a large part of the issue though. If you don't have one
repository file per project file with a name that resembles the
repository's one you find out that the project file name is somewhat
unimportant, just yet another of the metadata to track.
> The problem is _distribution_.
The only problem with distribution is sending as little as possible
over the network. All the problems you're talking about exist with a
single repository as soon as you have decent branches.
> In other words, two people rename the same
> file. Or two people rename two _different_ files to the same name. Or two
> people create two different files with the same name. What happens when
> you merge?
A conflict, what else? The file name is only one of the
characteristics of a file. And BTW, the interesting problem which is
what to do when you find out two different files end up being in fact
the same one is not covered by bk (or wasn't).
OG.
Hi,
On 9 Mar 2003, Eric W. Biederman wrote:
> > This is actually a key feature I want to see in a SCM system - the ability
> > to keep multiple developments within the same repository. I want to pull
> > other source tress into a branch and compare them with other branches and
> > merge them into new branches.
>
> In a distributed system everything happens on a branch.
That's true, but with bk you have to use separate directories for that,
which makes cross references between branches more difficult.
> > I agree, what I was trying to say is that the SCCS format makes a few
> > things more complex than they had to be.
>
> I don't know, if the problem really changes that much. How do
> you pick a globally unique inode number for a file? And then
> how do you reconcile this when people on 2 different branches create
> the same file and want to merge their versions together?
Unique identifier are needed for change sets anyway and if you decide
during merge, that two files are identical, at least one branch has to
carry the information that these identifiers point to the same file.
> So as a very rough approximation.
> - Distribution is the problem.
I would rather say, that it's only one (although very important) problem.
bye, Roman
>> > This is actually a key feature I want to see in a SCM system - the ability
>> > to keep multiple developments within the same repository. I want to pull
>> > other source tress into a branch and compare them with other branches and
>> > merge them into new branches.
>>
>> In a distributed system everything happens on a branch.
>
> That's true, but with bk you have to use separate directories for that,
> which makes cross references between branches more difficult.
>
>> > I agree, what I was trying to say is that the SCCS format makes a few
>> > things more complex than they had to be.
>>
>> I don't know, if the problem really changes that much. How do
>> you pick a globally unique inode number for a file? And then
>> how do you reconcile this when people on 2 different branches create
>> the same file and want to merge their versions together?
>
> Unique identifier are needed for change sets anyway and if you decide
> during merge, that two files are identical, at least one branch has to
> carry the information that these identifiers point to the same file.
>
>> So as a very rough approximation.
>> - Distribution is the problem.
>
> I would rather say, that it's only one (although very important) problem.
I think it's possible to get 90% of the functionality that most of us
(or at least I) want without the distributed stuff. If that's 10% of
the effort, would be really nice to have the auto-merging type of
functionality at least.
If the "maintainer" heirarchy was a strict tree structure, where you
send patches to your parent, and receive them from your children, that
doesn't seem to need anything particularly fancy to me.
Personally, I just collect together patches mainly from IBM people here,
test them for functionality and performance, and sync up with Linus every
new release by reapplying them on top of the new tree, and fix the conflicts
by hand. Then I just email the patches as flat diffs to Linus. If I could
get some really basic auto-merge functionality, that would get rid of 90%
of the work, even if it only worked 95% of the time, and showed me what
it had done that patch couldn't have done by itself. I don't see why that
requires all this distributed stuff. If I resync with the latest -bk
snapshot just before I send, the chances of Linus having to do much merge
work is pretty small.
I'm sure Bitkeeper is better than that, and has all sorts of fancy features,
and perhaps Linus even uses some of them. But if I can get 90% of that for
10% of the effort, I'd be happy. Some way to pass Linus some basic metadata
like changelog comments would be good (at the moment, I just slap those atop
the patch, and he edits them, but a basic perl script could hack off a
"comment to Linus" section from a "changelog section", which might save
Linus some editing).
Andrew and Alan seem to work pretty well with flat patches too - Larry
seemed to imply that he thought the merge part of the problem was easy
enough in a non-distributed system ... if anything existant has or could
have that without the distributed stuff and the complexity, would be cool.
If I'm missing something fundamental here, it wouldn't suprise me ;-)
M.
On Sun, Mar 09, 2003 at 08:55:44AM -0800, Martin J. Bligh wrote:
> I think it's possible to get 90% of the functionality that most of us
> (or at least I) want without the distributed stuff. If that's 10% of
> the effort, would be really nice to have the auto-merging type of
> functionality at least.
> If I'm missing something fundamental here, it wouldn't suprise me ;-)
I think the fundamental thing you're missing is that Linus doesn't want it. ;-)
As long as people keep trying to avoid the hard problems that Linus and Larry
keep pointing out, I doubt any effort will get very far. I see a lot of cases
where someone says, "yeah, but we can side-step that problem if we do x,
y, or z." That doesn't help. The question is, what are the actual features
required for a version control system that could win support among the top
kernel developers?
People in the know hint at these features ("naming is really important"),
but the details are apparently complicated enough that no one wants to sit
down and actually describe them. They just hint at the *sort* of problems
they are, and then someone says, "but that's not really a problem because
of x, y, or z that can be done instead."
Then people get sidetracked on the features they personally would settle for,
and the real point gets lost in the fog. Or else they start dreaming
about what the perfect system would be like, describing features that
would not actually be required for a kernel-ready version control
system.
Unless the people in the know actually speak up, the rest of us just won't
be able to figure out what they need. A lot of projects are chasing their
tails right now, trying to do something, but lacking the direction they need
in order to do it.
Be well,
Zack
>
> M.
--
Zack Brown
On Sun, 9 Mar 2003, Martin J. Bligh wrote:
>
> I think it's possible to get 90% of the functionality that most of us
> (or at least I) want without the distributed stuff.
No, I really don't think so.
The distribution is absolutely fundamental, and _the_ reason why I use BK.
Now, it's true that in 90% of all cases (probably closer to 99%) you will
never see the really nasty cases Larry was talking about. People just
don't rename files that much, and more importantly: then whey do, they
very very seldom have anybody else doing the same.
But what are you going to do when it happens? Because it _does_ happen:
different people pick up the same patch or fix suggestion from the mailing
list, and do that as just a small part of normal development. Are the
tools now going to break down?
BK doesn't. That' skind of the point. Larry
> If the "maintainer" heirarchy was a strict tree structure, where you
> send patches to your parent, and receive them from your children, that
> doesn't seem to need anything particularly fancy to me.
But it's not, and the above would make BK much less than it is today.
On eof the things I always hated about CVS is how it makes it IMPOSSIBLE
to "work together" on something between two different random people. Take
one person who's been working on something for a while, but is chansing
that one final bug, and asks another person for help. It just DOES NOT
WORK in the CVS mentality (or _any_ centralized setup).
You have to either share the same sandbox (without any source control
support AT ALL), or you have to go to the central repository and create a
branch (never mind that you may not have write permissions, or that you
may not know whether it's going to ever be something worthwhile yet).
With BK, the receiver just "bk pull"s. And if he is smart, he does that
from a cloned repository so that after he's done with it he will just do a
"rm -rf" or something.
This is FUNDAMENTAL.
And yes, maybe the really hard cases are rare. But does that mean that you
aren't going to do it?
Linus
>> I think it's possible to get 90% of the functionality that most of us
>> (or at least I) want without the distributed stuff. If that's 10% of
>> the effort, would be really nice to have the auto-merging type of
>> functionality at least.
>
>> If I'm missing something fundamental here, it wouldn't suprise me ;-)
>
> I think the fundamental thing you're missing is that Linus doesn't want it. ;-)
Depends what your goal is ;-) I'm not on a holy quest to stop Linus using
Bitkeeper .... I'm just trying to make the non-Bitkeeper users' life a
little easier.
M.
>> I think it's possible to get 90% of the functionality that most of us
>> (or at least I) want without the distributed stuff.
>
> No, I really don't think so.
>
> The distribution is absolutely fundamental, and _the_ reason why I use BK.
>
> Now, it's true that in 90% of all cases (probably closer to 99%) you will
> never see the really nasty cases Larry was talking about. People just
> don't rename files that much, and more importantly: then whey do, they
> very very seldom have anybody else doing the same.
>
> But what are you going to do when it happens? Because it _does_ happen:
> different people pick up the same patch or fix suggestion from the mailing
> list, and do that as just a small part of normal development. Are the
> tools now going to break down?
I'm going to fix it by hand ;-) As long as it stops at a sensible point,
and clearly barfs and says what the problem is, that's fine by me.
> BK doesn't. That' skind of the point. Larry
Right ... I appreciate that. I'd just rather fix things up by hand 1% of
the time than use Bitkeeper myself. I'm not trying to stop *you* using
Bitkeeper by any stretch of the imagination ... you probably need the
heavyweight tools, but I'm OK without them.
> This is FUNDAMENTAL.
>
> And yes, maybe the really hard cases are rare. But does that mean that you
> aren't going to do it?
Yup, that's exactly what I'm saying. I'm not saying this as good as bitkeeper,
I'm saying it's "good enough" for me and I suspect several others (not saying
it's good enough for you), and significantly better than diff and patch.
(though cp -lR is *blindingly* fast, and diff understands hard links).
M.
> And yes, maybe the really hard cases are rare. But does that mean that you
> aren't going to do it?
This is sort of the point I've been trying to make for years. It is
unlikely that an open source project is going to solve these problems.
It's possible, but unlikely because the problems are rare and the code to
solve them is incredibly difficult. It isn't obvious at all, it wasn't
obvious to me the first time around, it's only after you've done it that
you can see how something that appeared really simple wasted 6 months.
In the open source model, the portion of the work which is relatively
easy gets done, but the remaining part only gets done if there is a
huge amount of pressure to do so. If you take a problem which occurs
only rarely, is difficult to solve, and has only a small set of users,
that's a classic example of something that just isn't going to get fixed
in the open source environment.
It's a lot different when you have a very small set of users and the
solutions are very expensive. I'm not saying that people don't solve hard
problems in open source projects, they do, the kernel is a good example.
The kernel also has millions of users, gets all sorts of friendly press
every day, and is fun. In the SCM space, there are hundreds of products
for a potential market that is about 4000 times smaller than the potential
market for the kernel.
SVN is a good example. They side stepped almost all of the problems
that BK solves and it was absolutely the right call. It would have cost
them millions to solve them and their product is free, it would take
decades to recoup the investment at the low rates they can charge for
support or bundling or hosting.
Going back to the engineering problems, those problems are not going to
get fixed by people working on them in their spare time, that's for sure,
it's just not fun enough nor are they important enough. Who wants to
spend a year working on a problem which only 10 people see in the world
each year? And commercial customers aren't going to pay for this either
if the model is the traditional open source support model. If you hit a
problem and it costs us $200K to fix it and you only hit it a few times
a year, if that, then you are not going to be OK with us billing you
that $200K, there isn't a chance that will work.
I'm starting to think that the best thing I could do is encourage Pavel &
Co to work as hard as they can to solve these problems. Telling them that
it is too hard is just not believable, they are convinced I'm trying to
make them go away. The fastest way to make them go away is to get them
to start solving the problems. Let's see how well Pavel likes it when
people bitch at him that BitBucket doesn't handle problem XYZ and he
realizes that he needs to take another year of 80 hour weeks to fix it.
Go for it, dude, here's hoping that we can make it as pleasant for you
as you have made it for us. Looking forward to it.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
Hi,
On Sun, 9 Mar 2003, Linus Torvalds wrote:
> The distribution is absolutely fundamental, and _the_ reason why I use BK.
I agree, that this is an important aspect and for your kind of work it's
absolutely necessary.
But source management is more than just distributing and merging changes.
E.g. if I want to develop a driver, I would start like most people from a
stable base, that means 2.4. At a certain point the development splits
into a development and stable branch, eventually I also want to merge my
driver into the 2.5 tree.
This means I have to deal with 5 different source trees (branches), two
branches track external trees and I want to know what has been merged from
my development into my 2.4 and 2.5 stable branches, which I can use to
make official releases. I want to be able to push multiple changes as a
single change into the stable branches and it should be able to tell me
which changes are still left.
If there would be a free SCM system, which could do this, I could easily
do without a distributed option. Although I think as soon as it would be
this far it should be relatively easy to add a distribution mechanism (by
using a separate branch, which is only used for pulling changes). OTOH I
suspect that it will be very hard to add the other capabilities to bk
without a major redesign, as it's not a simple hierarchic structure
anymore.
bye, Roman
On Sun, Mar 09, 2003 at 09:20:45AM -0800, Zack Brown wrote:
> People in the know hint at these features ("naming is really important"),
> but the details are apparently complicated enough that no one wants to sit
> down and actually describe them.
What part of "40 man years" did you not understand? Do you seriously
think that it is easy to "sit down and actually describe them"? And if
you think I would do so just so you can go try to copy our solution
you have to be nuts, of course we aren't going to do that. It took
us year to figure it out, we're still figuring things out every day,
if you want a free SCM you can bloody well go figure it out yourself.
The whole point of the non-compete clause in the well loved BK license
is to say "this stuff is hard. If you want to create a similar product,
do it without the benefit of looking at our product". That seems to be
lost on you and a lot of other people as well.
It's perfectly OK for you to go invent a new SCM system. Go for it.
But stop asking for help from the BK crowd. Not only will we not
give you that help, we will do absolutely everything we can to make
sure that you can't copy BK. Everything up to and including selling
the company to the highest bidder and letting them chase after you.
Get it through your thick head that BK is something valuable to this
community, even if you don't use it you directly benefit from its use.
All you people trying to copy BK are just shooting yourself in the foot
unless you can come up with a solution that Linus will use in the short
term. And nobody but an idiot believes that is possible. So play nice.
Playing nice means you can use it, you can't copy it. You can also
go invent your own SCM system, go for it, it's a challenging problem,
just don't use BK's files, commands, or anything else in the process.
We didn't have the benefit of copying something that you wrote, you
don't get the benefit of copying something we wrote.
You don't have to agree with us, you can do whatever you want, but do
so realizing that if you become too annoying we'll simple decide that
supporting the kernel isn't worth the aggravation. As for you armchair
CEO's who think we're racking in the bucks because of the kernel's usage
of BK, think again. That is not how sales are made in this space, sales
are made at the VP of engineering, CTO, CIO, and/or CEO level. If you
think those guys read this list or slashdot or care about the kernel
using BK, think again, they don't. All they care about it is how much
it costs and how much effort it will save them. And they all know that
their development model is dramatically different than that of the
kernel so any BK success here is of marginal interest at best.
BK is made available for free for one reason and one reason only: to
help Linus not burn out. That's based on my personal belief that he is
critical to success of the Linux effort, he is a unique resource and has
to be protected. I've paid a very heavy price for that belief and I'm
telling you that you are right on the edge of making that price too high.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
On Sun, Mar 09, 2003 at 11:58:52AM -0800, Larry McVoy wrote:
> On Sun, Mar 09, 2003 at 09:20:45AM -0800, Zack Brown wrote:
> > People in the know hint at these features ("naming is really important"),
> > but the details are apparently complicated enough that no one wants to sit
> > down and actually describe them.
>
> It's perfectly OK for you to go invent a new SCM system. Go for it.
> But stop asking for help from the BK crowd.
I haven't been asking you for help. I've been asking Linus and other
kernel developers to describe their needs. There seems to be three
camps in this discussion:
1) the people who feel that the hard problems solved by BitKeeper are
crucial
2) the people who feel that the hard problems are not that important,
and that a decent feature set could be designed to handle pretty much
everything anyone might normally need
3) the people who want features that are not really related to finding a
BitKeeper alternative.
My own opinion is that the people in camp (2) are falling into the trap which
has been described often enough, in which they will realize their design
mistakes too late to do anything about them. Whil the people in camp (3)
seem to be getting ahead of the game. The features they want are all great,
but the question of the basic structure still remains.
I think what needs to be done is to identify the hard problems, so that
any version control project that starts up can avoid mistakes that will
put a glass ceiling over their heads. Even if they choose not to implement
everything, or if they choose to implement features orthogonal to a real
BitKeeper alternative, they would still have the proper framework to raise
the project to the highest level later.
Of kernel developers, only Linus seems to have a clear idea of what the kernel
development process' needs are; but aside from insisting that distribution
is key (which people in camp (1) know already), he hasn't gone into the kind
of detail that folks would need in order to actually make a decent attempt.
Be well,
Zack
--
Zack Brown
On Sun, 09 Mar 2003 13:32:46 PST, Zack Brown <[email protected]> said:
> Of kernel developers, only Linus seems to have a clear idea of what the kerne
l
> development process' needs are; but aside from insisting that distribution
> is key (which people in camp (1) know already), he hasn't gone into the kind
> of detail that folks would need in order to actually make a decent attempt.
It's quite possible that even Linus doesn't have a clear cognitive grasp of
all the problems - Larry gave BK to Linus to prevent burn-out. I'd not be
surprised if Linus was so busy dealing with the *first* order problems in
the pre-BK world (just getting patches to apply to his tree) that he never
encountered all the 'tough problems', and once he started using BK, he
also never hit any of the 'tough problems' because Larry's crew had already
spent 40 man-years making sure Linus *didnt* hit them.
On Sun 09 Mar 03 03:45, Zack Brown wrote:
> OK, so here is my distillation of Larry's post.
>
> Basic summary: a distributed, replicated, version controlled user level
> file system with no limits on any of the file system events which may
> happened in parallel. All changes must be put correctly back together, no
> matter how much parallelism there has been.
>
> * Merging.
>
> * The graph structure.
>
> * Distributed rename handling. Centralized systems like Subversion don't
> have as many problems with this because you can only create one file in
> one directory entry because there is only one directory entry available.
> In distributed rename handling, there can be an infinite number of
> different files which all want to be src/foo.c. There are also many rename
> corner-cases.
>
> * Symbolic tags. This is adding a symbolic label on a revision. A
> distributed system must handle the fact that the same symbol can be put on
> multiple revisions. This is a variation of file renaming. One important
> thing to consider is that time can go forward or backward.
>
> * Security semantics. Where should they go? How can they be integrated
> into the system? How are hostile users handled when there is no central
> server to lock down?
>
> * Time semantics. A distributed system cannot depend on reported time
> being correct. It can go forward or backward at any rate.
>
> I'd be willing to maintain this as the beginning of a feature list and
> post it regularly to lkml if enough people feel it would be useful and not
> annoying. The goal would be to identify the features/problems that would
> need to be handled by a kernel-ready version control system.
>
> Be well,
> Zack
Hi Zack,
You might want to have a look here, there's lots of good stuff:
http://arx.fifthvision.net/bin/view/Arx/LinuxKernel
(Kernel Hackers SCM wish list)
http://arx.fifthvision.net/bin/view/Arx/GccHackers
(Gcc Hackers SCM wish list)
Arx is a fork of Tom Lord's Arch, now in version 1.0pre5.
Regards,
Daniel
On Sun, Mar 09, 2003 at 10:20:09AM -0800, Larry McVoy wrote:
> In the open source model, the portion of the work which is relatively
> easy gets done, but the remaining part only gets done if there is a
> huge amount of pressure to do so. If you take a problem which occurs
> only rarely, is difficult to solve, and has only a small set of users,
> that's a classic example of something that just isn't going to get fixed
> in the open source environment.
You are wrong. The choice of you and your team for a license is well respected
here both by the tree maintainer and its users, but we don't need to go
further into pissing on open source projects because your project wouldn't
make it if it was. I(an almost anonymous reader), and most here respect both
your work and your honesty in describing why you did it commercial but this
is one thing, and generalizing is another.
The Linux kernel by itself is a good example. It has code for things
that Microsoft will create when people need it in great extend like
ipv6, encryption API and IA-64/x64 support. Well, the examples are
numerous and I'm sure some experienced hackers can enlighten you
better.
The Grub bootloader is another example. An Open Source project that
provides support for almost any kernel there exists having command line
and autocomplete support on demand. Features that *nobody asked* but
they exist.
More experienced people on open source projects I'm sure will say "wtf,
there are plenty of better examples".
And think it otherwise. If a closed source project is more advanced on
something is a result of what *its* users want. If Microsoft is better on GUI
is a result of what its users want. The Open Source operating systems
are traditionally (as for the past 10 years) better on networking and
multiuser capabilities because what's what users want.
That of course comes into you words but the fact that most closed source
projects are indeed follow what their users want, that doesn't make a
difference.
So, if your project is better that's another thing. If you and team chose
to make it commercial is well respected and understood. More understood
is the fact that you actuall *spend money* on it. It is a fundamental
right of yours to do what you want with your code especially when it is
a matter of personal economic health. But getting it generalised and
say that every open source project is just a hobbyish thing that is
always inferior to closed source unless 2^64 people ask for a feature?
no sir, real examples show things different.
-fs
On Sun, Mar 09, 2003 at 04:54:02PM -0500, [email protected] wrote:
> On Sun, 09 Mar 2003 13:32:46 PST, Zack Brown <[email protected]> said:
>
> > Of kernel developers, only Linus seems to have a clear idea of what the kerne
> l
> > development process' needs are; but aside from insisting that distribution
> > is key (which people in camp (1) know already), he hasn't gone into the kind
> > of detail that folks would need in order to actually make a decent attempt.
>
> It's quite possible that even Linus doesn't have a clear cognitive grasp of
> all the problems - Larry gave BK to Linus to prevent burn-out. I'd not be
> surprised if Linus was so busy dealing with the *first* order problems in
> the pre-BK world (just getting patches to apply to his tree) that he never
> encountered all the 'tough problems', and once he started using BK, he
> also never hit any of the 'tough problems' because Larry's crew had already
> spent 40 man-years making sure Linus *didnt* hit them.
Bingo. We work hard to make sure that we've thought of and solved the
problems *before* they are hit in the field. We try to be proactive,
not reactive (at least in coding, mailing lists are another matter).
We're not that great at it, but we've definitely solved all sorts of
problems long before Linus did anything to hit them.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
Dear diary, on Sun, Mar 09, 2003 at 03:45:22AM CET, I got a letter,
where Zack Brown <[email protected]> told me, that...
> On Sat, Mar 08, 2003 at 04:05:14PM -0800, Larry McVoy wrote:
> > Zack Brown wrote:
> > > Linus Torvalds wrote:
> > > > Give it up. BitKeeper is simply superior to CVS/SVN, and will stay that
> > > > way indefinitely since most people don't seem to even understand _why_
> > > > it is superior.
> > >
> > > You make it sound like no one is even interested ;-). But it's not true! A
> > > lot of people currently working on alternative version control systems would
> > > like very much to know what it would take to satisfy the needs of kernel
> > > development.
> >
> > [Long rant, summary: it's harder than you think, read on for the details]
> [skipping long description]
>
> OK, so here is my distillation of Larry's post.
I've decided to elaborate a little more how BK in fact works for those who
don't use it and don't want to read over all the documentation, and also share
some thoughts and possible solutions of the individual problems.
All this is derived from various LKML threads and BK.com's documentation as I'm
not permitted to use BK myself, corrections are more than welcome.
> Basic summary: a distributed, replicated, version controlled user level file
> system with no limits on any of the file system events which may happened
> in parallel. All changes must be put correctly back together, no matter how
> much parallelism there has been.
[in the following text, "checkin" and "commit" are not inter-exchangable;
"checkin" means to one-time get some changes to one file, "commit" means to
form a changeset from several checked in changes in several files; this mirrors
BK's semantics]
I'd add
* ChangeSets.
at the top. Unlike ie. SVN, changes checkins and changesets commits are
separated in BK and that sounds as a good thing to do --- it encourages people
to checkin more frequently and group a changeset from the uncommitted changes
when the changes are finished and good enough. See also
http://www.bitkeeper.com/UG/Getting.Working.Checking.html. Basically, you
checkin files as you want and the checkins to individual files are independent.
When you finish some logical change over several files, you use bk commit and
the checkins which aren't part of any changeset yet are automagically grouped
to one, you write a summary comment of the changeset and then ChangeSet
revision number will increase and somewhere will be written down which checkins
are part of this ChangeSet. One changeset is then an atomic unit when sharing
the changes with others, that is you must form one in order to make the changes
available.
The more-or-less opposite concept is to have each checkin(s, when you checking
multiple files at once) as a changeset (this is what SVN does) --- then you
don't need per-file revision numbers but just one per-repository revision
number which is increased by each checkin (which is also commit in SVN). This
can seem more elegant and generic, but I personally believe that it's better to
have release checkins and changeset commits separated. Then per-repository
revision numbers should obviously increase by each commit, not checkin.
In BK, you usually work with the changeset numbers, but for the internal
structure the revision numbers are also important. Since changeset number can
be taken as revision number of the ChangeSet metafile, I will operate mostly
with revision numbers below.
> * Merging.
>
> * The graph structure.
About these two, it could be worth noting how BK works now, how looks branching
and merging and how could it be done internally.
When you want to branch, you just clone the repository --- each clone
automatically creates a branch of the parent repository. Similiarly, you merge
the branch by simply pulling the "branch repository" back. This way the
distributed repositories concept is tightly connected with the branch&merge
concept. When I will talk about merging below, it doesn't matter if it will
happen from the cloned repository just one directory far away or over network
from a different machine.
[note that the following is figured out from various resources but not from the
documentation where I didn't find it; thus I may be well completely wrong here;
please substitute "is" by "looks to be", "i think" etc in the following text]
BK works with a DAG (Directed Acyclic Graph) formed from the versions, however
the graph looks differently from each repository (diagrams show ChangeSet
numbers).
From the imaginary Linus' branch, it looks like:
linus 1.1 -> 1.2 -> 1.3 -----> 1.4 -> 1.5 -----> 1.6 -----> 1.7
\ / \ /
alan \-> 1.2.1.1 --/---\-> 1.2.1.2 -> 1.2.1.3 --/
But from the Alan' branch, it looks like:
linus 1.1 -> 1.2 -> 1.2.1.1 -> 1.2.1.2 -> 1.2.1.3 -> 1.2.1.4 -> 1.2.1.5
\ / \ /
alan \-> 1.3 ------/---\-----> 1.4 -----> 1.5 ------/
But now, how does merging happen? One of the goals is to preserve history even
when merging. Thus you merge individual per-file checkins of the changeset
one-by-one, each checkin receiving own revision in the target tree as well ---
this means the revision numbers of individual checkins change during merge if
there were other checkins to the file between fork and merge.
But it's a bit more complicated: ChangeSet revision number is not globally
unique neither and it changes. You cannot decide it to be globally unique
during clone, because then you would have to increase the branch identifier
after each clone (most of them are probably just read-only). Thus in the cloned
repository, you work like you would continue in the branch you just cloned, and
the branch number is assigned during merge.
A virtual branch (used only to track ChangeSets, not per-file revisions) is
created in the parent repository during merge, where the merged changesets
receive new numbers appropriate for the branch. However the branch is really
only virtual and there is still only one line of development in the repository.
If you want to see the ChangeSets in order they were applied and present in the
files, you have not to sort them by revision, but by merge time. Thus the order
in which they are applied to the files is (from Linus' POV):
1.1 1.2 1.3 1.2.1.1 1.4 1.5 1.6 1.2.1.2 1.2.1.3 1.7
> * Distributed rename handling. Centralized systems like Subversion don't
> have as many problems with this because you can only create one file in
> one directory entry because there is only one directory entry available.
> In distributed rename handling, there can be an infinite number of different
> files which all want to be src/foo.c. There are also many rename corner-cases.
One obvious solution is hitting me here. First, you virtualize files to inodes
and give them numbers (in practice it's not necessary and in fact it could be
better not to do that, but it can be much easier to think about it as if it
would be this way) --- the numbers don't have to be globally unique, they are
just convience abstraction; they are inherited upon clone, though. Then in
repository you have each file name being just that inode number, and for each
inode you keep history of names it had and in which revision the name was
assigned (thus you also know in what changeset it was assigned).
When you are merging an "inode", you just go back to the last common ChangeSet
revision in the names history and look what the name is. If there's no name for
that changeset yet, it's a new file and if there's filename conflict, you
cannot do much with it. Otherwise you know that the inode number has to be same
for both repositories. Then you just do the rename of inode in the target
repository to the current name in the source repository. If there is a
conflict, you check if you can't repeat this whole operation on the file in the
way in the target repository --- if not (or you can but the conflict was not
solved anyway), you again probably cannot do much with this again and you have
to let the user decide.
What am I missing?
> * Symbolic tags. This is adding a symbolic label on a revision. A distributed
> system must handle the fact that the same symbol can be put on multiple
> revisions. This is a variation of file renaming. One important thing to
> consider is that time can go forward or backward.
You remap the tags when you remap the changeset numbers, and? BK seems to allow
one tag to be on multiple changesets and I presume that then the latest one is
normally used --- you can do the similiar here, the latest such-named tag is
used normally, the merged ones are just preserved in the history.
> * Security semantics. Where should they go? How can they be integrated
> into the system? How are hostile users handled when there is no central
> server to lock down?
I'm not sure which points exactly this attempts to bring up. Which particular
issues are open here? This is mostly question of configuration of individual
repositories (if you allow push and from who) and trust (if you will do pull
and from who), isn't it?
> * Time semantics. A distributed system cannot depend on reported time
> being correct. It can go forward or backward at any rate.
Yes, then let's not depend on the time ;-).
Kind regards,
--
Petr "Pasky" Baudis
.
When in doubt, use brute force.
-- Ken Thompson
.
Crap: http://pasky.ji.cz/
> What am I missing?
Nothing that a half of decade of work wouldn't fill in :)
More seriously, lots. I keep saying this and people keep not hearing it,
but it's the corner cases that get you. You seem to have a healthy grasp
of the problem space on the surface but in reading over your stuff, you
aren't even where I was before BK was started. That's not supposed to be
offensive, just an observation. As you try out the ideas you described
you'll find that they don't work in all sorts of corner cases and the
problem is that there are a zillion of them. And the solutions have
this nasty habit of fighting with each other, you solve one problem
and that creates another.
The thing we've found is that this problem space is much bigger than one
person can handle, even an exceptionally talented person. The number of
variables are such that you can't do it in your head, you need to have a
muse for each area and both of you have to be thinking about it full time.
This isn't a case of "oh, I get it, now I'll write the code". It's a
case of "write the code, deploy the code, get taught that it didn't work,
get the insight from that, write new code, repeat". And the problems are
such that if you aren't on them all the time then you work very slowly,
99% of the work is recreating the state you had in your brain the last
time you were here.
I strongly urge you to wander off and talk to people who are actually
writing code for real users. Arch, SVN, CVS, whatever. Get deeply
involved and understand their choices. Personally, I'd suggest the SVN
guys because I think they are the most serious, they have a team which
has been together for a long time and thought hard about it. On the
other hand, Arch is probably closer to mimicing how development really
happens in the real world, in theory, at least, it can do better than BK,
it lets you take out of order changesets and BK does not. But it is light
years behind SVN in terms of actually working and being a real product.
SVN is almost there, they are self hosting, they eat their own dog food,
Arch is more a collection of ideas and some shell scripts. From SVN,
you're going to learn more of the hard problems that actually occur,
but Arch might be a better long term investment, hard to say.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
Zack Brown <[email protected]> said:
[...]
> I'd be willing to maintain this as the beginning of a feature list and
> post it regularly to lkml if enough people feel it would be useful and not
> annoying. The goal would be to identify the features/problems that would
> need to be handled by a kernel-ready version control system.
I believe that has very little relevance to lkml, only perhaps to a mailing
list for a bk replacement. For the kernel this work has already been done
(by Larry and the head penguins).
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Horst von Brand wrote:
> Zack Brown <[email protected]> said:
> > I'd be willing to maintain this as the beginning of a feature list and
> > post it regularly to lkml if enough people feel it would be useful and not
> > annoying. The goal would be to identify the features/problems that would
> > need to be handled by a kernel-ready version control system.
>
> I believe that has very little relevance to lkml, only perhaps to a mailing
> list for a bk replacement. For the kernel this work has already been done
> (by Larry and the head penguins).
I'd like to thank those kind souls who explained how branch _and_
merge history is used by the better merging utilities. Now I see why
tracking merge history is so helpful. (Tracking it for credit and
blame history was obvious, but tracking it to enable tools to be
better at resolving conflicts was not something I'd thought of).
Of course there will be times when two or more people apply a patch
without the history of that patch being tracked, and then try to merge
both changes - any version control system should handle that as
gracefully as it can. However I now see how much actively tracking
the history of those operations can help tools to reduce the amount of
human effort required to combine changes from different places.
So thank you for illustrating that.
ps. Yes I know that CVS sucks at these things. I've seen _awful_
software engineering disasters due to the difficulty of tracking
different lines of development through CVS, first hand :)
-- Jamie
On Tue, Mar 11, 2003 at 12:03:18AM +0100, Daniel Phillips wrote:
> On Sun 09 Mar 03 03:45, Zack Brown wrote:
> > OK, so here is my distillation of Larry's post.
...
> > I'd be willing to maintain this as the beginning of a feature list and
> > post it regularly to lkml if enough people feel it would be useful and not
> > annoying. The goal would be to identify the features/problems that would
> > need to be handled by a kernel-ready version control system.
> >
> > Be well,
> > Zack
>
> Hi Zack,
>
> You might want to have a look here, there's lots of good stuff:
>
> http://arx.fifthvision.net/bin/view/Arx/LinuxKernel
> (Kernel Hackers SCM wish list)
Hi,
I remember that discussion. It was pretty interesting, but some
conflicting ideas about what should be done; and not much organization
to it all.
I've taken a lot of stuff from that wish list, combined it with what I gathered
from Larry's earlier post, and from Petr Baudis' recent post, and elsewhere,
and organized it into something that might be interesting. If anyone would
like to host this document on the web, please let me know.
--------------------------------- cut here ---------------------------------
Linux Kernel Requirements For A Version Control System
Document version 0.0.1
This document describes the features required for a version control system
that would be acceptable to Linux kernel developers. A second section below
lists features that would also be good, but not required for adoption by the
kernel team.
Please help out by clarifying features; identifying which features are
really required and which would just be nice; and by listing corner cases
and other implementation issues.
* * * Basic summary * * *
A distributed, replicated, version controlled user level file system with no
limits on any of the file system events which may happen in parallel. All
changes must be put correctly back together, no matter how much parallelism
there has been.
* * * Requirements For The Kernel * * *
Distributed Branches
1. Introduction
The idea of distributed branches is to allow developers to pull an entire,
full-featured repository onto their home system from the 'main' repository,
allow them to work off-line or with other groups of developers without
sacrificing the features of a full repository, then merge their work back to
the main repository or to other repositories.
A 'main' repository in this case is simply a repository used by the project
leader of a given project. It has no special features or privileges missing
from other branches. It is only considered the 'main' repository for social
reasons, not technical ones. Therefore, branches that have been cloned from
the main repository should not have to 'register' with the repository they
cloned from. i.e. one repository should be able to interact fully with
another, without either of them having prior knowledge of the other.
2. Behavior
Creating one repository from another should produce a full clone; not just
the current state of the parent repository, but all data from the parent
should be included in the child.
When cloning a repository, committing changes back to the parent, or sharing
changes with any other repositories, no assumptions should be made about the
location of the repositories on the network. Repositories may be on the same
machine, or on entirely different machines anywhere in the world.
Changesets
1. Introduction
A changeset is a group of files in a repository, that have been tagged by
the developer, as being logical parts of a patch dealing with a single
feature or fix. A developer working on multiple aspects of a repository, may
create one changeset for each aspect, in which each changeset consists of
the files relevant to that aspect.
In the context of sharing changesets between repositories, a changeset
consists of a diff between the set of files in the local and remote
repositories.
2. Behavior
2.1 Tagging
It must be trivial for a developer to tag a file as part of a given
changeset.
It must be possible to reorganize changesets, so that a given changeset may
be split up into more manageable pieces.
2.2 Versioning
Changesets are given their own local version number, incremented with each
checkin.
3. Problems For Clarification
If a file is tagged as being part of two different changesets, then changes
to that file should be associated with which changeset???
Checkins
1. Introduction
Checkins consist of making local modifications to a given repository. This
is distinct from merging changes from one repository into another. A
developer making local changes to their own repository is doing checkins. A
developer sharing their changes with a separate repository is doing merging.
2. Behavior
Files that are not part of a changeset are treated individually. On checkin,
the developer may include a comment for each file. This is distinct from
version control systems that take a single comment for the whole checkin.
It must be possible to checkin a single changeset to a local repository, and
have that changeset be treated as an individual unit, just as plain files
are: on checkin, the developer includes a single comment for the entire
changeset.
Merging
1. Introduction
Merging consists of sending and receiving changes between two or more
repositories.
2. Behavior
2.1 Preserving Local Work
It must be possible to update a local repository to match changes that have
been made to a remote repository, while at the same time preserve changes
that have been made to the local repository. If conflicts arise because some
of the same files have changed on both the local and remote repositories,
conflict resolution tools should be automatically invoked for the local
developer (see below).
If a checkin is interrupted for some reason, it should be easy to clean up
the tree, bringing it back to a consistant, useful state.
It should be possible to mark a file as private to a local repository, so
that a merge will never try to commit that file's changes to a remote
repository.
2.2 Preserving History
Checkin tags and version numbers are local to a given repository. Because
duplicates may exist across repositories, these historical details must be
remapped during checkin, to values that are unique within the remote
repository, but that can still be identified with their originals.
A merge between two repositories does not consist only of merging the
current state of a set of changesets, but their entire history, including
all their versions and the files that comprise them.
Even if no history is available for a given patch, it should be easy to
checkin and merge that patch.
The implementation must not depend on time being accurately reported by any
of the repositories.
3. Graph Structure
To illustrate some of the above behaviors, see the following DAG (Directed
Acyclic Graph). This graph will look different when viewed from each
repository (diagrams show the ChangeSet numbers). From the imaginary Linus'
branch, it looks like:
linus 1.1 -> 1.2 -> 1.3 -----> 1.4 -> 1.5 -----> 1.6 -----> 1.7
\ / \ /
alan \-> 1.2.1.1 --/---\-> 1.2.1.2 -> 1.2.1.3 --/
But from the Alan' branch, it looks like:
linus 1.1 -> 1.2 -> 1.2.1.1 -> 1.2.1.2 -> 1.2.1.3 -> 1.2.1.4 -> 1.2.1.5
\ / \ /
alan \-> 1.3 ------/---\-----> 1.4 -----> 1.5 ------/
A virtual branch, used to track changesets, not per-file revisions, is
created in the parent repository during merge. At this time the merged
changesets receive new numbers appropriate for that branch. But since the
branch is only virtual, there is still only one line of development in the
repository. To see the changesets in the order they were applied, they must
be sorted not by revision number buy by merge time. Thus, with respect to
the above diagrams, the order in which the patches were applied, from Linus'
perspective, is:
1.1 1.2 1.3 1.2.1.1 1.4 1.5 1.6 1.2.1.2 1.2.1.3 1.7
Distributed Rename Handling
1. Introduction
This consists of allowing developers to rename files and directories, and
have all repository operations properly recognize and handle this.
2. Behavior
2.1 Local
Renaming files and directories locally should preserve all historical
information including changeset tags.
2.2 Distributed
In the general case, a single local repository attempts to merge
name-changes with a remote repository. In this case, the remote repository
receives the name change, along with all history including changeset tags.
2.2.1 Conflicts
An arbitrary number of repositories cloned from either a single remote
repository or from each other may attempt to change the name of a single
file to arbitrary other names and then merge that change back to a single
remote repository or to each other.
An arbitrary number of repositories cloned from either a single remote
repository or from each other may rename file A to something else, and then
other files to the name formerly used by File A, or create a new file with
the name formerly used by file A; and then merge those changes to the single
remote repository or to each other.
An arbitrary number of repositories cloned from either a single remote
repository or from each other may attempt to create a file with the same
name and merge that change back to the remote repository or to each other.
Graphical 2- And 3-Way Merging Tool
1. Introduction
Merge tools are tools used to resolve conflicts when merging files. See
tkdiff ( http://www.accurev.com/free/tkdiff/ )
2. Behavior
The merge tools should identify precisely the areas of conflict, and enable
the user to quickly edit the files to resolve the conflicts and apply the
patch.
Merge tools must be able to handle patches as well as entire files.
A typical usage would be to pull all recent changes to a local tree from a
remote repository; then run the merge tools to resolve any conflicts between
the remote repository and changes that have been made locally; tag local
files to produce a changeset; and generate a diff for sharing.
* * * Not Required For Kernel Development * * *
Changesets
It should be possible to exchange changesets via email.
File Types
The system should support symlinks, device special files, fifos, etc. (i.e.
inode metadata)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
This document is copyright Zack Brown and released under the terms of the
GNU General Public License, version 2.0.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
--------------------------------- cut here ---------------------------------
>
> Regards,
>
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Zack Brown
> I've taken a lot of stuff from that wish list, combined it with what I gathered
> from Larry's earlier post, and from Petr Baudis' recent post, and elsewhere,
> and organized it into something that might be interesting. If anyone would
> like to host this document on the web, please let me know.
Not sure if this was captured before (I don't see it explicitly in what
you sent), but one thing that I don't think current tools do well is to
keep changes seperated out. We need to be able to put a stack of 200
patches on top of 2.5.10, then be able to break those out again easily
come 2.5.60, once we've merged forward. Treating things as one big blob
will work great for Linus, but badly for others.
At the moment, I slap the patches back on top of every new version
seperately, which works well, but is a PITA. I hear this is something
of a pain to do with Bitkeeper (don't know, I've never tried it).
People muttered things about keeping 200 different views, which is
fine for hardlinked diff & patch (takes < 1s to clone normally), but
I'm not sure how long a merge would take in Bitkeeper this way? Perhaps
people who've done this in other SCM's could comment?
M.
On Tue 11 Mar 03 19:46, Martin J. Bligh wrote:
> > I've taken a lot of stuff from that wish list, combined it with what I
> > gathered from Larry's earlier post, and from Petr Baudis' recent post,
> > and elsewhere, and organized it into something that might be interesting.
> > If anyone would like to host this document on the web, please let me
> > know.
>
> Not sure if this was captured before (I don't see it explicitly in what
> you sent), but one thing that I don't think current tools do well is to
> keep changes seperated out. We need to be able to put a stack of 200
> patches on top of 2.5.10, then be able to break those out again easily
> come 2.5.60, once we've merged forward. Treating things as one big blob
> will work great for Linus, but badly for others.
Coincidently, I was having a little think about that exact thing earlier
today. Suppose we call the process of turning an exact delta into a
delta-with-context, "softening". So you select a set of deltas somehow
(e.g., all deltas in wild-card set of files) then soften them by adding
context, or in the deluxe version, convert to lists of tokens with whitespace
markup. The result is a first-class object in the database, called a, hmm,
soft changeset? (Surely there is a better name.)
A soft changeset can be carried forward in the database automatically as long
as there are no conflicts (like patch with fuzz) and where there are
conflicts, the soft changeset itself can be versioned. To implement soft
changeset versioning the lazy way, just merge the changeset with some version
and generate a new soft changeset against some other version. A name for the
versioned soft changeset can be generated automatically, e.g.:
changset.name-from.version-to.version.
You can wave your wand, and the soft changeset will turn into a universal
diff or a BK changeset. But it's obviously a lot cleaner, extensible,
flexible and easier to process automatically than a text diff. It's an
internal format, so it can be improved from time to time with little or no
breakage.
Did that make sense?
> At the moment, I slap the patches back on top of every new version
> seperately, which works well, but is a PITA.
Tell me about it.
> I hear this is something
> of a pain to do with Bitkeeper (don't know, I've never tried it).
> People muttered things about keeping 200 different views, which is
> fine for hardlinked diff & patch (takes < 1s to clone normally), but
> I'm not sure how long a merge would take in Bitkeeper this way? Perhaps
> people who've done this in other SCM's could comment?
I've never seriously used any commercial SCM, so nobody can accuse me of
stealing their ideas. On the other hand, it means I may have to take a few
shots way wide of the target before hitting any bullseyes.
Regards,
Daniel
>> Not sure if this was captured before (I don't see it explicitly in what
>> you sent), but one thing that I don't think current tools do well is to
>> keep changes seperated out. We need to be able to put a stack of 200
>> patches on top of 2.5.10, then be able to break those out again easily
>> come 2.5.60, once we've merged forward. Treating things as one big blob
>> will work great for Linus, but badly for others.
>
> Coincidently, I was having a little think about that exact thing earlier
> today.
Good, then either I'm not insane, or at least I have company in the
madhouse ;-)
> Suppose we call the process of turning an exact delta into a
> delta-with-context, "softening". So you select a set of deltas somehow
> (e.g., all deltas in wild-card set of files) then soften them by adding
> context, or in the deluxe version, convert to lists of tokens with whitespace
> markup. The result is a first-class object in the database, called a, hmm,
> soft changeset? (Surely there is a better name.)
a "patch" ? ;-) A context-diff is kind of a delta with context.
I have some similar patch tools to akpm, and he uses the patch as the
base concept of what he does.
> A soft changeset can be carried forward in the database automatically as long
> as there are no conflicts (like patch with fuzz) and where there are
> conflicts, the soft changeset itself can be versioned. To implement soft
> changeset versioning the lazy way, just merge the changeset with some version
> and generate a new soft changeset against some other version. A name for the
> versioned soft changeset can be generated automatically, e.g.:
>
> changset.name-from.version-to.version.
Right ... what I do is basically have a script that does:
for i in *
<copy lastview to $i>
(cd $i; <apply $i>)
My patches all start with a sequence number (a bit like Andrea does), so
for i in * does really nicely. What it's *meant* to do is read $? back
from patch, and stop if patch failed to apply it properly (more than
just offsets), and barf for user intervention, but that bit's broken
at the moment ;-)
> You can wave your wand, and the soft changeset will turn into a universal
> diff or a BK changeset. But it's obviously a lot cleaner, extensible,
> flexible and easier to process automatically than a text diff. It's an
> internal format, so it can be improved from time to time with little or no
> breakage.
>
> Did that make sense?
Yeah, the wand is called "creatediffs" in my case, and it takes all the
views in a dir, and diffs the first against the second, second against
third, etc. I always start with "000-virgin".
I might even clean up my tools, turn them into one perl script, and send
them out at the weekend. They're a fetid (but working) mess right now.
What we need is a "better context-diff", with something smarter to apply
it that understands C syntax (can fall back to cdiff for text / asm for
now).
And whilst we're at it, would be nice to have something that tried to
produce the most human readable diffs, not the smallest ones. Renaming
the function at the top is frigging annoying.
>> At the moment, I slap the patches back on top of every new version
>> seperately, which works well, but is a PITA.
>
> Tell me about it.
Well, it normally only takes me an hour per release. But it's still a
waste of time. And yes, I have to do some things by hand. But the screams
of others around me when BK goes wrong tell me it's not much better for
all its fancy tricks (for *my* usage at least), in terms of applying
patches happily to deleted files, etc. so it still needs manual fix up.
>> I hear this is something
>> of a pain to do with Bitkeeper (don't know, I've never tried it).
>> People muttered things about keeping 200 different views, which is
>> fine for hardlinked diff & patch (takes < 1s to clone normally), but
>> I'm not sure how long a merge would take in Bitkeeper this way? Perhaps
>> people who've done this in other SCM's could comment?
>
> I've never seriously used any commercial SCM, so nobody can accuse me of
> stealing their ideas. On the other hand, it means I may have to take a few
> shots way wide of the target before hitting any bullseyes.
Yeah, neither have I. CVS I tried for a day, and it was just laughable.
BK I never looked at yet (have been tempted by the fancy looking merge
tool a few times). I tend to be slow to pick up new tools ... I prefer
to let others knock out the bugs first, and most of the time they don't
stick anyway ... so it was wasted time.
BK seems to be sticking better than most, but from the feedback I get
about it from others, I think I like my scripts well enough ... and can
change them to do what I want, and I understand what they're doing,
which makes me happy (they're 10 lines of sh or perl ;-)). And that's
not an open-source license thing ... it's complex enough that it wouldn't
do me any good to be open source (for any non-trivial mod). I want
something *simple* personally.
M.
"Martin J. Bligh" <[email protected]> wrote:
>
> >> At the moment, I slap the patches back on top of every new version
> >> seperately, which works well, but is a PITA.
> >
> > Tell me about it.
>
> Well, it normally only takes me an hour per release.
Whoa. You need better tools.
A bunch of fine people took patch-tools and turned them into a real project.
They have .deb's and .rpm's, but it looks like they're a bit old and a `cvs co'
is needed. I'm still using the old stuff, but I'm sure theirs is better.
See http://savannah.nongnu.org/projects/quilt/
>> >> At the moment, I slap the patches back on top of every new version
>> >> seperately, which works well, but is a PITA.
>> >
>> > Tell me about it.
>>
>> Well, it normally only takes me an hour per release.
>
> Whoa. You need better tools.
>
> A bunch of fine people took patch-tools and turned them into a real project.
> They have .deb's and .rpm's, but it looks like they're a bit old and a `cvs co'
> is needed. I'm still using the old stuff, but I'm sure theirs is better.
>
> See http://savannah.nongnu.org/projects/quilt/
I did take a look at your stuff in the past ... had a few minor objections
at the time, but have actually grown closer to what you do since then.
I *do* like the numbering of my patches though. I might try to merge them
together at some point soon.
So when I say 1 hour ... bear in mind I don't take Linus bk-drops normally,
on the full releases, so the delta is bigger (and I'm slower than you! ;-))
You still have to fix up the rejects from 'patch -p1' by hand though,
right? That's what normally takes most of the time, especially if it's
code I'm unfamiliar with, or I make a mistake (reboot takes 5-10 mins ;-))
M.
Zack Brown <[email protected]> said:
> --------------------------------- cut here ---------------------------------
>
> Linux Kernel Requirements For A Version Control System
>
> Document version 0.0.1
[...]
> Changesets
>
> 1. Introduction
>
> A changeset is a group of files in a repository, that have been tagged by
> the developer, as being logical parts of a patch dealing with a single
> feature or fix. A developer working on multiple aspects of a repository, may
> create one changeset for each aspect, in which each changeset consists of
> the files relevant to that aspect.
Nope. A changeset is (roughly) what was traded as a patch before. I.e., a
coordinated _change_ to a set of files. The RCS problem (inherited by lots
of systems) is that it handles only a diff to _one_ file at a time.
> In the context of sharing changesets between repositories, a changeset
> consists of a diff between the set of files in the local and remote
> repositories.
I don't think it is a good idea to handle differences _between_
repositories, as they could be arbitrary and change in time. A change
_within_ a repository is well defined.
> 2. Behavior
>
> 2.1 Tagging
>
> It must be trivial for a developer to tag a file as part of a given
> changeset.
An individual change, not a file. You need to focus on changes to files,
not files. I.e., file appeared/dissapeared/changed name/was edited by
altering lines so and so.
The bk method of accepting individual changes, and then bundling them up
should be enough, people tend to work at one problem at a time. It might be
possible to take a bunch of changes and slice&dice them into changesets
later, but that could create changesets that interdigitate and interdepend
(i.e., changeset 13 has edits that depend on changeset 14 having been
applied, and 14 similarly depends on 13 in other areas; also called
"deadlock" when talking about locking ;).
> It must be possible to reorganize changesets, so that a given changeset may
> be split up into more manageable pieces.
I don't see this as very useful. The user should take care to make changes
to foo.c and foo.h that touch one aspect into a changeset, and unrelated
changes (even touching the same files) into others. Break a changeset up
might break dependencies between changes. It might make sense to group
changesets into larger changes, i.e., changesets 12-25 are move to new
driver model in /net, sets for /net, /block, /char are move to new driver
model, and so on upwards. Then 2.8.15 to 2.8.16 would be "just" a
(super)changeset. Such a (super)changeset would make sense to break up into
its parts, not individual ones.
[...]
> 3. Problems For Clarification
>
> If a file is tagged as being part of two different changesets, then changes
> to that file should be associated with which changeset???
Individual changes to files can't belong to more than one changeset, AFAICS.
[...]
> Merging
[...]
> It should be possible to mark a file as private to a local repository, so
> that a merge will never try to commit that file's changes to a remote
> repository.
Gets hairy... what if I create file foo as private, and later try to
integrate stuff that creates the same file? Better keep this out of the
repository in the first place.
> 2.2 Preserving History
[...]
> Even if no history is available for a given patch, it should be easy to
> checkin and merge that patch.
Just take that patch as a local edit, and make it a changeset.
> The implementation must not depend on time being accurately reported by any
> of the repositories.
It is more complicated than that. On a distributed system without some form
of shared clock it might be impossible (== nonsense, like in relativity
theory) to talk of a global "before" and "after"
[...]
> Distributed Rename Handling
>
> 1. Introduction
>
> This consists of allowing developers to rename files and directories, and
> have all repository operations properly recognize and handle this.
And create and destroy. Note "rename" must include moving directories
around, and moving stuff from one directory to another, etc.
[...]
> 2.2.1 Conflicts
>
> An arbitrary number of repositories cloned from either a single remote
> repository or from each other may attempt to change the name of a single
> file to arbitrary other names and then merge that change back to a single
> remote repository or to each other.
Or several create the same file, or rename random files to the same name,
or even create and then destroy a file created somewhere else. Or create a
file in a directory that was just destroyed or moved locally, etc. I'm sure
this is one of the rat's nests of hairy special cases noone has thought
through Larry is so fond mentioning.
[...]
> * * * Not Required For Kernel Development * * *
>
> Changesets
>
> It should be possible to exchange changesets via email.
I'd say this is mandatory.
> File Types
>
> The system should support symlinks, device special files, fifos, etc. (i.e.
> inode metadata)
Urgh. If possible/convenient, yes. If not, leave it out. [I fail to see any
use for this, but that might just be lack of immagination on my side]
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> This document is copyright Zack Brown and released under the terms of the
> GNU General Public License, version 2.0.
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Why not the documentation license? Just curious.
>
> --------------------------------- cut here ---------------------------------
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > This document is copyright Zack Brown and released under the terms of the
> > GNU General Public License, version 2.0.
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Since a substantial amount of the information in there is what I said,
Zack has no right to impose any license on the information. It's a bit
unethical if you ask me, it's my copyright, not his. And I didn't impose
any silly license on it.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
Apologies in advance if this is so trivial as to be non-patch-worthy.
Was poking around and noticed a possible improvement to kernekl/sys.c.
This change results in marginally better output using gcc 3.2 on x86.
As a test I constructed look-alike functions and a small driver. There
appeared to be a ~40% speedup on the "true" branch and ~5% slowdown on the
"false" branch. No effort was made to account for overhead when figuring
the percentages.
Unfortunately I don't know enough to say which side of the branch is more
commonly taken.
--- linux-2.5.64.orig/kernel/sys.c Tue Mar 4 21:28:58 2003
+++ linux-2.5.64/kernel/sys.c Tue Mar 11 22:06:12 2003
@@ -1096,18 +1096,12 @@
*/
int in_group_p(gid_t grp)
{
- int retval = 1;
- if (grp != current->fsgid)
- retval = supplemental_group_member(grp);
- return retval;
+ return (grp != current->fsgid) ? supplemental_group_member(grp) : 1;
}
int in_egroup_p(gid_t grp)
{
- int retval = 1;
- if (grp != current->egid)
- retval = supplemental_group_member(grp);
- return retval;
+ return (grp != current->egid) ? supplemental_group_member(grp) : 1;
}
DECLARE_RWSEM(uts_sem);
On Tue, Mar 11, 2003 at 11:47:50PM -0400, Horst von Brand wrote:
> Zack Brown <[email protected]> said:
> > --------------------------------- cut here ---------------------------------
> >
> > Linux Kernel Requirements For A Version Control System
> >
> > Document version 0.0.1
>
> [...]
>
> > In the context of sharing changesets between repositories, a changeset
> > consists of a diff between the set of files in the local and remote
> > repositories.
>
> I don't think it is a good idea to handle differences _between_
> repositories, as they could be arbitrary and change in time. A change
> _within_ a repository is well defined.
But isn't it necessary to excange changesets between repositories? How
else would a developer choose exactly what changes get merged with a
remote repository?
>
> > 2. Behavior
> >
> > 2.1 Tagging
> >
> > It must be trivial for a developer to tag a file as part of a given
> > changeset.
>
> An individual change, not a file. You need to focus on changes to files,
> not files. I.e., file appeared/dissapeared/changed name/was edited by
> altering lines so and so.
>
> The bk method of accepting individual changes, and then bundling them up
> should be enough, people tend to work at one problem at a time.
I'm not so familiar with how BitKeeper operates. What do you mean by
"accepting individual changes, and then bundling them up"?
> > The implementation must not depend on time being accurately reported by any
> > of the repositories.
>
> It is more complicated than that. On a distributed system without some form
> of shared clock it might be impossible (== nonsense, like in relativity
> theory) to talk of a global "before" and "after"
Maybe the system should simply ignore the whole concept of time as occurring
in discrete ticks, and just measure time as the relative history of
changesets. That might give it enough of a basis to make estimates on which
changes came 'before' and 'after' other changes in most cases. I imagine a
lot of subtle intelligence could be implemented. And for situations defying
that intelligence, the system could query the user.
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > This document is copyright Zack Brown and released under the terms of the
> > GNU General Public License, version 2.0.
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> Why not the documentation license? Just curious.
I read it when it first came out, and it just seemed to be trying to do
something that wasn't really feasible, and to do it in a fairly arbitrary
way. Besides, the protections it claimed to offer didn't interest me. The
GPL may have a soft spot or two, but I really like it, and I think it
applies just as well to text as to computer program code.
> >
> > --------------------------------- cut here ---------------------------------
> --
> Dr. Horst H. von Brand User #22616 counter.li.org
> Departamento de Informatica Fono: +56 32 654431
> Universidad Tecnica Federico Santa Maria +56 32 654239
> Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
--
Zack Brown
Zack Brown <[email protected]> said:
> On Tue, Mar 11, 2003 at 11:47:50PM -0400, Horst von Brand wrote:
> > Zack Brown <[email protected]> said:
> > > --------------------------------- cut here --------------------------------
> -
> > >
> > > Linux Kernel Requirements For A Version Control System
> > >
> > > Document version 0.0.1
> >
> > [...]
> >
> > > In the context of sharing changesets between repositories, a changeset
> > > consists of a diff between the set of files in the local and remote
> > > repositories.
> >
> > I don't think it is a good idea to handle differences _between_
> > repositories, as they could be arbitrary and change in time. A change
> > _within_ a repository is well defined.
> But isn't it necessary to excange changesets between repositories? How
> else would a developer choose exactly what changes get merged with a
> remote repository?
_From_ a remote repository. I pull stuff, I can't push it. Once I got the
"patch" here, I start integrating it into my repository. The granularity
should be a changeset (i.e., changes between two well defined points in the
remote repository). If it patches in cleanly, great! If not, do merging (==
resolve problems, by hand if need be).
> > > 2. Behavior
> > >
> > > 2.1 Tagging
> > >
> > > It must be trivial for a developer to tag a file as part of a given
> > > changeset.
> >
> > An individual change, not a file. You need to focus on changes to files,
> > not files. I.e., file appeared/dissapeared/changed name/was edited by
> > altering lines so and so.
> >
> > The bk method of accepting individual changes, and then bundling them up
> > should be enough, people tend to work at one problem at a time.
>
> I'm not so familiar with how BitKeeper operates. What do you mean by
> "accepting individual changes, and then bundling them up"?
In bk you edit a bunch of files, and commit the changes (individually or as
a set), and then you say "Now make all pending changes into a changeset".
> > > The implementation must not depend on time being accurately reported
> > > by any of the repositories.
> > It is more complicated than that. On a distributed system without some form
> > of shared clock it might be impossible (== nonsense, like in relativity
> > theory) to talk of a global "before" and "after"
> Maybe the system should simply ignore the whole concept of time as occurring
> in discrete ticks, and just measure time as the relative history of
> changesets.
Exactly. But this timeline makes sense for one repository only, and (in a
limited way, via merge points) it makes timelines (somewhat) comparable
between repositories. But note that A might take 13 and much later 5 from
B, as long as there is no conflict they will go in cleanly. But this is
time going backwards. Now factor in unrelated exchanging of changesets with
other actors...
> That might give it enough of a basis to make estimates on which
> changes came 'before' and 'after' other changes in most cases. I imagine a
> lot of subtle intelligence could be implemented. And for situations defying
> that intelligence, the system could query the user.
There is no universal "before" and "after", even within one repository;
there might be changes that can't be ordered. I.e., changes to files foo
and bar are independent, and might have happened in any order for the same
result. Same for all non-overlapping changes.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Daniel Phillips wrote:
> Coincidently, I was having a little think about that exact thing earlier
> today. Suppose we call the process of turning an exact delta into a
> delta-with-context, "softening".
Why not just make all deltas "soft" and just ignore the context in
cases when you're absolutely sure you can ? (Provided that such
cases exist and aren't trivial.)
> A soft changeset can be carried forward in the database automatically as long
> as there are no conflicts
You probably also want to be able to apply them to different
views, e.g. if I fix X, I may send it off to integration, and
also apply it independently to my projects Y and Z. When X gets
merged into whatever I consider my "mainstream" (again, that's a
local decision, e.g. it may be Linus' tree, plus net/* and anything
related to changes in net/* from David Miller), I may want to get
notified, e.g. if there's a conflict, but also such that I can drop
that part from my fix (which may contain elements that I didn't
push yet).
Not all of this needs to be known to the SCM if the right tagging
tools are available to users. In fact, limiting the number of work
flows inherently supported by the SCM would probably be a
feature :-)
> and generate a new soft changeset against some other version. A name for the
> versioned soft changeset can be generated automatically, e.g.:
>
> changset.name-from.version-to.version.
Hmm, I'd distinguish three elements in a change set's name:
- its history (i.e. all changesets applied to the file(s)
when the change set was created)
- a globally unique ID
- a human-readable title that doesn't need to be perfectly
unique
I think, for simplicity, changesets should just carry their history
with them. This can later be compressed, e.g. by omitting items
before major convergence points (releases), by using automatically
generated reference points, or simply by fetching additional
information from a repository if needed (hairy).
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/
Zack Brown wrote:
> Maybe the system should simply ignore the whole concept of time as occurring
> in discrete ticks, and just measure time as the relative history of
> changesets.
Real time is still useful, if only as a hint to users. E.g.
assume that you have dependencies the SCM doesn't know about.
Example: somebody posts on linux-kernel a one-line fix for a
remote root exploit. You'll instantly get dozens of people who
will apply that one to their local views, without waiting or
making a common unique change set.
Some of those view may have branched from a long time ago, and
not have touched any common change set for months. So the
partial order of applied change sets tells you very little.
Naturally, such one-line fixes will be slightly different, and
eventually, some of them will merge ...
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/
On Wed 12 Mar 03 04:47, Horst von Brand wrote:
> ...You need to focus on changes to files,
> not files. I.e., file appeared/dissapeared/changed name/was edited by
> altering lines so and so.
It's useful to make the distinction that "file appeared/dissapeared/changed
name" are changes to a directory object, while "was edited by altering lines
so and so" is a change to a file object...
[...]
> > This consists of allowing developers to rename files and directories, and
> > have all repository operations properly recognize and handle this.
>
> And create and destroy. Note "rename" must include moving directories
> around, and moving stuff from one directory to another, etc.
...then this part gets much easier.
Regards,
Daniel
On Wed 12 Mar 03 06:44, Horst von Brand wrote:
> There is no universal "before" and "after", even within one repository;
Sure there is, e.g., by incrementing master transaction number on the
repository database.
> there might be changes that can't be ordered. I.e., changes to files foo
> and bar are independent, and might have happened in any order for the same
> result. Same for all non-overlapping changes.
I think what you're saying is that the repository may be ordered in more than
one way at the same time. Transaction serial number is just one way.
Whatever else is recorded in the repository, at least there ought to be a
serial number on every transaction, a simple unstructured counter. With just
this serial number you already have a way to roll back the entire repository
to any point in the past, provided all repository transactions are reversible.
For dependencies between changes, rather than any fixed ordering, it's better
to record the actual precedence information, i.e., "a before b", where a and
b are id numbers of changes (I think everybody agrees changes are first class
objects). These precedence relations can be determined automatically: if two
changes do not occur in the same file, there is no certainly no precedence
relation. If two changes overlap the same text, then there is a precedence
relation. If two changes do not overlap, there may or may not be a
precedence relation, depending on whether the changes are exact deltas or
deltas-with-context, and if the latter, whether the context is unambiguous.
Once you have the precedence relations, there are all kinds of useful things
you can do with them.
Regards,
Daniel
Zack Brown <[email protected]> said:
> On Tue, Mar 11, 2003 at 11:47:50PM -0400, Horst von Brand wrote:
> > Zack Brown <[email protected]> said:
> > > --------------------------------- cut here --------------------------------
> -
> > >
> > > Linux Kernel Requirements For A Version Control System
> > >
> > > Document version 0.0.1
> >
> > [...]
> >
> > > In the context of sharing changesets between repositories, a changeset
> > > consists of a diff between the set of files in the local and remote
> > > repositories.
> >
> > I don't think it is a good idea to handle differences _between_
> > repositories, as they could be arbitrary and change in time. A change
> > _within_ a repository is well defined.
>
> But isn't it necessary to excange changesets between repositories? How
> else would a developer choose exactly what changes get merged with a
> remote repository?
Again, _from_ a remote repository. I want control over the stuff I have
here.
The idea should be to be able to browse the changesets at the remote
depository and then pick changesets from there. Or just pull all
outstanding changesets (from the last sychronization point on). But that is
a bit hard... say I clone Linus' tree, and then want to sycnronize with say
DaveM. But DaveM's tree is a few changesets behind Linus', and has extra
stuff. If I'm going promiscuous, I'll add some patches of my own, get some
random stuff from lkml (some of which are picked up later by Linus, others
aren't). I'd later try to get up to date with Andrea's tree, where we again
have the same scenario. And then go to Linus' next point release, who mixed
and matched, and sometimes mangled, changesets from the above in the
meantime... please tell me what the sychronization points for all those
transactions should be. Consider that DaveM might have applied changesets
to his tree in a certain order, and later Linus picked up some of the later
ones, and after some time finally integrated an earlier changeset of
DaveM's (perhaps had to merge it (i.e., adjust it) due to intervening
changes). So you don't even have a "standard order in which changesets are
applied" across the board, and "the same changeset" is different depending
on the three on which it is applied.
So, a changeset is local, or something to be sent out and merged elsewhere
(where due to the merging it loses its former identity). Think traditional
patches: I can create a patch here, give it to you. But what you end
applying is different due to changes at your place. You apply a different
patch.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Wed 12 Mar 03 16:32, Horst von Brand wrote:
> ...a changeset is local, or something to be sent out and merged elsewhere
> (where due to the merging it loses its former identity). Think traditional
> patches: I can create a patch here, give it to you. But what you end
> applying is different due to changes at your place. You apply a different
> patch.
This is why changesets need to be first-class objects in the repository,
that can be versioned, segmented and recombined. I'd be able to pull
slightly differing changesets from a variety of sources, *merge
the changesets* and carry the result forward in my repository. This
way, no changeset needs to lose its identity until I explicity want it
to.
Regards,
Daniel
Dear diary, on Mon, Mar 10, 2003 at 01:32:33AM CET, I got a letter,
where Larry McVoy <[email protected]> told me, that...
> > What am I missing?
>
> Nothing that a half of decade of work wouldn't fill in :)
Good then ;-).
> More seriously, lots. I keep saying this and people keep not hearing it,
> but it's the corner cases that get you. You seem to have a healthy grasp
> of the problem space on the surface but in reading over your stuff, you
> aren't even where I was before BK was started. That's not supposed to be
> offensive, just an observation. As you try out the ideas you described
> you'll find that they don't work in all sorts of corner cases and the
> problem is that there are a zillion of them. And the solutions have
> this nasty habit of fighting with each other, you solve one problem
> and that creates another.
Sure, it's expected not to work perfectly. But we must start anywhere, it's
IMHO better than just sitting at one place saying "we won't manage to do it
perfectly anyway". We indeed won't that way, if we will start actually doing
something and discussing the basic design ideas, we may.
I can already see notes of flaws^Wshadow areas ;-) in my ideas, but I believe
most of these can be pruned out. The rest will just have to be fixed later.
..snip..
> I strongly urge you to wander off and talk to people who are actually
> writing code for real users. Arch, SVN, CVS, whatever. Get deeply
> involved and understand their choices.
Certainly, I'm going to start digging into Arch very soon.
> Personally, I'd suggest the SVN guys because I think they are the most
> serious, they have a team which has been together for a long time and thought
> hard about it. On the other hand, Arch is probably closer to mimicing how
> development really happens in the real world, in theory, at least, it can do
> better than BK, it lets you take out of order changesets and BK does not.
> But it is light years behind SVN in terms of actually working and being a
> real product. SVN is almost there, they are self hosting, they eat their own
> dog food, Arch is more a collection of ideas and some shell scripts.
> From SVN, you're going to learn more of the hard problems that actually
> occur, but Arch might be a better long term investment, hard to say.
I would probably base my potential work on Arch (or maybe Arx, I have to
actually compare these, I didn't find any good summary of differences), but I
dislike some concepts so it would be Yet Another Fork anyway ;-).
Kind regards,
--
Petr "Pasky" Baudis
.
When in doubt, use brute force.
-- Ken Thompson
.
Crap: http://pasky.ji.cz/
Hi!
> > OK, so here is my distillation of Larry's post.
>
> I've decided to elaborate a little more how BK in fact works for those who
> don't use it and don't want to read over all the documentation, and also share
> some thoughts and possible solutions of the individual problems.
>
What about commiting this to bitbucket CVS?
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> > [Long rant, summary: it's harder than you think, read on for the details]
> [skipping long description]
>
> OK, so here is my distillation of Larry's post.
>
> Basic summary: a distributed, replicated, version controlled user level file
> system with no limits on any of the file system events which may happened
> in parallel. All changes must be put correctly back together, no matter how
> much parallelism there has been.
>
> * Merging.
>
> * The graph structure.
>
> * Distributed rename handling. Centralized systems like Subversion don't
> have as many problems with this because you can only create one file in
> one directory entry because there is only one directory entry available.
> In distributed rename handling, there can be an infinite number of different
> files which all want to be src/foo.c. There are also many rename corner-cases.
>
> * Symbolic tags. This is adding a symbolic label on a revision. A distributed
> system must handle the fact that the same symbol can be put on multiple
> revisions. This is a variation of file renaming. One important thing to
> consider is that time can go forward or backward.
>
> * Security semantics. Where should they go? How can they be integrated
> into the system? How are hostile users handled when there is no central
> server to lock down?
>
> * Time semantics. A distributed system cannot depend on reported time
> being correct. It can go forward or backward at any rate.
>
> I'd be willing to maintain this as the beginning of a feature list and
> post it regularly to lkml if enough people feel it would be useful and not
> annoying. The goal would be to identify the features/problems that would
Actually, check it in bitbucket's repository
on sf.net; it should not be annoying
there.
(He he "send it to the bitbucket" :-)
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> Going back to the engineering problems, those problems are not going to
> get fixed by people working on them in their spare time, that's for sure,
> it's just not fun enough nor are they important enough. Who wants to
> spend a year working on a problem which only 10 people see in the world
> each year? And commercial
Well, if it happens only to 10 people per
year, it is a non-problem.
> I'm starting to think that the best thing I could do is encourage Pavel &
> Co to work as hard as they can to solve these problems. Telling them that
> it is too hard is just not believable, they are convinced I'm trying to
> make them go away. The fastest way to make them go away is to get them
> to start solving the problems. Let's see how well Pavel likes it when
> people bitch at him that BitBucket doesn't handle problem XYZ and he
If it only happens so rarely, people
are unlikely to complain too loudly.
Take a look at e2fsck. That's similar to
bk -- awfull lot of corner cases. And
guess what, if you mess up your disk
badly enough, it will just tell you to
fix it by hand (deallocate block free bitmap
in full group). And its okay.
(Plus I believe chkdsk has *way* bigger
problems than that.)
I'm sure you are not going to throw away
ext2 just because it has 1-person-per-3-years
problem. 99% solution is going to be
good enough for me, Andrea and
Martin. Linus can keep using bk.
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> I don't know, if the problem really changes that much. How do
> you pick a globally unique inode number for a file?
Use <emailaddress>@locallyuniq. Every
developer should have an email, right? :-)
> And then
> how do you reconcile this when people on 2 different branches create
> the same file and want to merge their versions together?
That's conflict, and user interaction is
neccessary at this point.
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> Get it through your thick head that BK is something valuable to this
> community, even if you don't use it you directly benefit from its use.
> All you people trying to copy BK are just shooting yourself in the foot
> unless you can come up with a solution that Linus will use in the short
> term. And nobody but an idiot believes that is possible. So play nice.
> Playing nice means you can use it, you can't copy it. You can also
> go invent your own SCM system, go for it, it's a challenging problem,
> just don't use BK's files, commands, or anything else in the process.
Eh? It is perfectly okay to look at BK's
commands, ask people how BK works
and study its docs. (Heh, anyone still
has sources of BK from the time it was
available, preferably as hardcopy,
so no license needs to be agreed to
for looking at it?)
> BK is made available for free for one reason and one reason only: to
> help Linus not burn out. That's based on my personal belief that he is
> critical to success of the Linux effort, he is a unique resource and has
> to be protected. I've paid a very heavy price for that belief and I'm
> telling you that you are right on the edge of making that price too high.
So go ahead and disallow no-price use of
bitkeeper. It will reduce flamewars on
l-k quite a bit...
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> > A separate repository doesn't have this problem
>
> You're wrong.
>
> The problem is _distribution_. In other words, two people rename the same
> file. Or two people rename two _different_ files to the same name. Or two
> people create two different files with the same name. What happens when
> you merge?
>
> None of these are issues for broken systems like CVS or SVN, since they
Actually this does not have much to do
with central repository. prcs has central
repository, too, but it has branches
(=multiple repositories in bk); so
yes you have the very same problem.
prcs does not have problems like trust
and non-synchronized time, through.
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
On Wed 12 Mar 03 07:14, Werner Almesberger wrote:
> Daniel Phillips wrote:
> > Coincidently, I was having a little think about that exact thing earlier
> > today. Suppose we call the process of turning an exact delta into a
> > delta-with-context, "softening".
>
> Why not just make all deltas "soft" and just ignore the context in
> cases when you're absolutely sure you can ? (Provided that such
> cases exist and aren't trivial.)
Just because there's no point in storing context that you don't have to, and
when you get into more sophisticated operations on deltas, you'd just
introduce a first step of discarding the context in many cases.
> > A soft changeset can be carried forward in the database automatically as
> > long as there are no conflicts
>
> You probably also want to be able to apply them to different
> views, e.g. if I fix X, I may send it off to integration, and
> also apply it independently to my projects Y and Z. When X gets
> merged into whatever I consider my "mainstream" (again, that's a
> local decision, e.g. it may be Linus' tree, plus net/* and anything
> related to changes in net/* from David Miller), I may want to get
> notified, e.g. if there's a conflict, but also such that I can drop
> that part from my fix (which may contain elements that I didn't
> push yet).
Yes, and if we have the concept of a versioned changeset, your system will
notice automatically that Linus applied either exactly what you sent him or a
descendent (i.e., he had to massage it, but his history still recorded the
fact that he started with your changeset) so your system will know to
automatically reverse your original version during your next merge with
Linus. Um, if Linus is using this new spiffy system of course, you may want
to substitute "Pavel" in the above.
> > and generate a new soft changeset against some other version. A name for
> > the versioned soft changeset can be generated automatically, e.g.:
> >
> > changset.name-from.version-to.version.
>
> Hmm, I'd distinguish three elements in a change set's name:
>
> - its history (i.e. all changesets applied to the file(s)
> when the change set was created)
> - a globally unique ID
> - a human-readable title that doesn't need to be perfectly
> unique
Such things as history (if you need it) and globally-unique id can be tucked
into the header of the changeset. The unique id is good, means you can let
names collide. For the name itself, I personally am mostly interested in the
catchy moniker I thought up for the patch, um, I mean changeset, the kernel
version it applies to, and a sequence number in case I generate more than one
version against the same kernel, so that when I post the changsets on the
web, people can find the file they need. Boring huh?
Naming is a matter of taste, and you ought to be able to do it according to
your own taste, including hooking in your own name-generating script.
> I think, for simplicity, changesets should just carry their history
> with them. This can later be compressed, e.g. by omitting items
> before major convergence points (releases), by using automatically
> generated reference points, or simply by fetching additional
> information from a repository if needed (hairy).
I would not call that hairy, it sounds more like fun. The hairy part is
getting the underlying framework to function properly. Larry is entirely
correct in pointing out that it's hard, though in my opinion, not nearly as
hard as kernel development. Your edit/compile/test cycle is a fraction as
long for one thing.
Regards,
Daniel
Daniel Phillips wrote:
> Naming is a matter of taste, and you ought to be able to do it according to
> your own taste, including hooking in your own name-generating script.
Yup, what I mean is that the system shouldn't have to depend on a
human-usable name. It's usually very hard to generate unique names
that are also human-friendly, so I think it's better not to try in
the first place. (Just look at e-mail message-ids for an example.)
> > I think, for simplicity, changesets should just carry their history
> > with them. This can later be compressed, e.g. by omitting items
> > before major convergence points (releases), by using automatically
> > generated reference points, or simply by fetching additional
> > information from a repository if needed (hairy).
>
> I would not call that hairy, it sounds more like fun.
I called it hairy, because you need to retrieve something from a
machine that may not be available at that time. Waiting until it
comes back usually isn't a choice. Of course, this information
may be replicated on other machines that are available, and that
your repository/agent knows of, etc.
In any case, this would be an optimization. Bandwidth and disk
space are cheap, so it's not so bad to carry a few kB of history
around for each file.
> getting the underlying framework to function properly. Larry is entirely
> correct in pointing out that it's hard, though in my opinion, not nearly as
> hard as kernel development. Your edit/compile/test cycle is a fraction as
> long for one thing.
Oh, I'd say it's an entirely different type of development. The
kernel has to deal with real-time concurrency and subtle
performance issues. An SCM can quite easily eliminate concurrency
to the point that all operations become nice, linear batch jobs
on a completely static data set. On the other hand, the SCM is
likely to work on more complex data structures, and will have a
closer interaction with what is user policy.
While performance is certainly an important issue for an SCM, I'd
expect this to be something that can be safely ignored for a good
while during development. (I'm a firm believer in the
prototype-burn-rewrite-burn_again-... type of software development.
Maybe this shows :-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/
Werner Almesberger <[email protected]> said:
[...]
> Real time is still useful, if only as a hint to users.
Lots of things would be useful to have, but you just can't get them.
There is no guarantee that the clocks of the machines are even remotely
near sychronized (don't get me started on that).
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Daniel Phillips <[email protected]> said:
[...]
> For dependencies between changes, rather than any fixed ordering, it's better
> to record the actual precedence information, i.e., "a before b", where a and
> b are id numbers of changes (I think everybody agrees changes are first class
> objects). These precedence relations can be determined automatically: if two
> changes do not occur in the same file, there is no certainly no precedence
> relation.
Wrong. Edit a header adding a new type T. Later change an existing file
that already includes said header to use T. Change a function, fix most
uses. Find a wrong usage later and fix it separately. Change something, fix
its Documentation/ later. Note how you can come up with dependent changes
that _can't_ be detected automatically.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Daniel Phillips <[email protected]> said:
> On Wed 12 Mar 03 04:47, Horst von Brand wrote:
> > ...You need to focus on changes to files,
> > not files. I.e., file appeared/dissapeared/changed name/was edited by
> > altering lines so and so.
> It's useful to make the distinction that "file appeared/dissapeared/changed
> name" are changes to a directory object, while "was edited by altering lines
> so and so" is a change to a file object...
I don't think so. As the user sees it, a directory is mostly a convenient
labeled container for files. You think in terms of moving files around, not
destroying one and magically creating an exact copy elsewhere (even if
mv(1) does exactly this in some cases). Also, this breaks up the operation
"mv foo bar/baz" into _two_ changes, and this is wrong as the file loses
its revision history.
> [...]
>
> > > This consists of allowing developers to rename files and directories, and
> > > have all repository operations properly recognize and handle this.
> >
> > And create and destroy. Note "rename" must include moving directories
> > around, and moving stuff from one directory to another, etc.
>
> ...then this part gets much easier.
... by screwing it up. This is exactly one of the problems noted for CVS.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Thu 13 Mar 03 01:52, Horst von Brand wrote:
> Daniel Phillips <[email protected]> said:
> > On Wed 12 Mar 03 04:47, Horst von Brand wrote:
> > > ...You need to focus on changes to files,
> > > not files. I.e., file appeared/dissapeared/changed name/was edited by
> > > altering lines so and so.
> >
> > It's useful to make the distinction that "file
> > appeared/dissapeared/changed name" are changes to a directory object,
> > while "was edited by altering lines so and so" is a change to a file
> > object...
>
> I don't think so. As the user sees it, a directory is mostly a convenient
> labeled container for files. You think in terms of moving files around, not
> destroying one and magically creating an exact copy elsewhere (even if
> mv(1) does exactly this in some cases). Also, this breaks up the operation
> "mv foo bar/baz" into _two_ changes, and this is wrong as the file loses
> its revision history.
No, that's a single change to one directory object.
> > ...then this part gets much easier.
>
> ... by screwing it up. This is exactly one of the problems noted for CVS.
CVS doesn't have directory objects.
Does anybody have a convenient mailing list for this design discussion?
Regards,
Daniel
On Thu 13 Mar 03 02:03, Horst von Brand wrote:
> Daniel Phillips <[email protected]> said:
>
> [...]
>
> > For dependencies between changes, rather than any fixed ordering, it's
> > better to record the actual precedence information, i.e., "a before b",
> > where a and b are id numbers of changes (I think everybody agrees changes
> > are first class objects). These precedence relations can be determined
> > automatically: if two changes do not occur in the same file, there is no
> > certainly no precedence relation.
>
> Wrong. Edit a header adding a new type T. Later change an existing file
> that already includes said header to use T. Change a function, fix most
> uses. Find a wrong usage later and fix it separately. Change something, fix
> its Documentation/ later. Note how you can come up with dependent changes
> that _can't_ be detected automatically.
You confused semantic dependencies with structural dependencies that
govern whether or not deltas conflict in the reject sense. Detailed reply is
off-list.
Regards,
Daniel
On Thu, Mar 13, 2003 at 06:00:48PM +0100, Daniel Phillips wrote:
> Does anybody have a convenient mailing list for this design discussion?
Keep in mind that one part of the discussion is to figure out what is
and is not required for adoption by the kernel team. For that, this is
probably the best place to discuss it. Otherwise, it's just the same
tail-chasing that has been going on with the various version control
projects up till now.
Later on, people can just be referred to an existing feature description,
which will cut down on future flamewars on lkml.
Be well,
Zack
>
> Regards,
>
> Daniel
--
Zack Brown
On Thu 13 Mar 03 22:48, Zack Brown wrote:
> On Thu, Mar 13, 2003 at 06:00:48PM +0100, Daniel Phillips wrote:
> > Does anybody have a convenient mailing list for this design discussion?
>
> Keep in mind that one part of the discussion is to figure out what is
> and is not required for adoption by the kernel team. For that, this is
> probably the best place to discuss it. Otherwise, it's just the same
> tail-chasing that has been going on with the various version control
> projects up till now.
Well, I know that, but HPA declared it offtopic and I wish to respect that.
> Later on, people can just be referred to an existing feature description,
> which will cut down on future flamewars on lkml.
Right, but we went well beyond what the features should be and started into
the implementation details. I'm getting a lot out of it, personally, but
others may not be.
Regards,
Daniel
[Cc: chopped _way_ down]
Pavel Machek <[email protected]> dijo:
[...]
> Take a look at e2fsck. That's similar to
> bk -- awfull lot of corner cases. And
> guess what, if you mess up your disk
> badly enough, it will just tell you to
> fix it by hand (deallocate block free bitmap
> in full group). And its okay.
> (Plus I believe chkdsk has *way* bigger
> problems than that.)
> I'm sure you are not going to throw away
> ext2 just because it has 1-person-per-3-years
> problem. 99% solution is going to be
> good enough for me, Andrea and
> Martin. Linus can keep using bk.
"Sorry, corner case encountered. Your repository is toast, get a fresh
copy" will make you an extremely popular sort of game...
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Dear diary, on Thu, Mar 13, 2003 at 11:36:15AM CET, I got a letter,
where Pavel Machek <[email protected]> told me, that...
> Hi!
Hello,
> > > OK, so here is my distillation of Larry's post.
> >
> > I've decided to elaborate a little more how BK in fact works for those who
> > don't use it and don't want to read over all the documentation, and also share
> > some thoughts and possible solutions of the individual problems.
> >
>
> What about commiting this to bitbucket CVS?
feel free to do anything you want with this, just please keep some credit
there. Maybe you would prefer to use the Zack's summary instead, though, dunno.
Kind regards,
--
Petr "Pasky" Baudis
.
When in doubt, use brute force.
-- Ken Thompson
.
Crap: http://pasky.ji.cz/
Daniel Phillips <[email protected]> said:
[...]
> You confused semantic dependencies with structural dependencies that
> govern whether or not deltas conflict in the reject sense. Detailed
> reply is off-list.
In both cases hand fixup is needed. The "overlapping patch" partial order
is a (small, or even very small) subset of the "depends on" partial order
which you really want. It would be nice to be able to get a much better
approximation than "conflicting patch" automatically, but I fail to see
how. Giving dependencies by hand is a possibility, but it will most of the
time give as bad an approximation as the above (Do you really know _all_
patches on which your latest and greatest depends? Some (or even most) of
them will be old patches, that by now will be just part of the general
landscape. And this can happen even with direct dependencies: Think of
"disabling IRQs doesn't ensure mutual exclusion" or some such pervasive
change that will affect a small part of any patch, and now move an old
patch forward...).
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Daniel Phillips <[email protected]> said:
> On Thu 13 Mar 03 01:52, Horst von Brand wrote:
[...]
> > I don't think so. As the user sees it, a directory is mostly a convenient
> > labeled container for files. You think in terms of moving files around, not
> > destroying one and magically creating an exact copy elsewhere (even if
> > mv(1) does exactly this in some cases). Also, this breaks up the operation
> > "mv foo bar/baz" into _two_ changes, and this is wrong as the file loses
> > its revision history.
> No, that's a single change to one directory object.
mv some/where/foo bar/baz
How is that _one_ change to _one_ directory object?
> > > ...then this part gets much easier.
> >
> > ... by screwing it up. This is exactly one of the problems noted for CVS.
>
> CVS doesn't have directory objects.
And it doesn't keep history across moves, as the only way it knows to move
a file is destroying the original and creating a fresh copy.
> Does anybody have a convenient mailing list for this design discussion?
Good idea to move this off LKML
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Hi!
> You can wave your wand, and the soft changeset will turn into a universal
> diff or a BK changeset. But it's obviously a lot cleaner, extensible,
> flexible and easier to process automatically than a text diff. It's an
> internal format, so it can be improved from time to time with little or no
> breakage.
>
> Did that make sense?
Yes.
Some kind of better-patch is badly needed.
What kind of data would have to be in soft-changeset?
* unique id of changeset
* unique id of previous changeset
(two previous if it is merge)
? or would it be better to have here
whole path to first change?
* commit comment
* for each file:
** diff -u of change
** file's unique id
** in case of rename: new name (delete is rename to special dir)
** in case of chmod/chown: new permissions
** per-file comment
? How to handle directory moves?
Does it seem sane? Any comments?
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
Hi!
> I remember that discussion. It was pretty interesting, but some
> conflicting ideas about what should be done; and not much organization
> to it all.
>
> I've taken a lot of stuff from that wish list, combined it with what I gathered
> from Larry's earlier post, and from Petr Baudis' recent post, and elsewhere,
> and organized it into something that might be interesting. If anyone would
> like to host this document on the web, please let me know.
I'd like to host it in bitbucket CVS. If you
have sf account, I'll just add you as a
developer.
> 2.1 Tagging
>
> It must be trivial for a developer to tag a file as part of a given
> changeset.
>
> It must be possible to reorganize changesets, so that a given changeset may
> be split up into more manageable pieces.
>
What does this have to do with tagging?
> 3. Problems For Clarification
>
> If a file is tagged as being part of two different changesets, then changes
> to that file should be associated with which changeset???
>
Perhaps tagging should be explained?
I thought that tagging is assigning
symbolic name to some release?
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
> Yes.
>
> Some kind of better-patch is badly needed.
>
> What kind of data would have to be in soft-changeset?
> * unique id of changeset
> * unique id of previous changeset
> (two previous if it is merge)
> ? or would it be better to have here
> whole path to first change?
> * commit comment
> * for each file:
> ** diff -u of change
> ** file's unique id
> ** in case of rename: new name (delete is rename to special dir)
> ** in case of chmod/chown: new permissions
> ** per-file comment
>
> ? How to handle directory moves?
>
> Does it seem sane? Any comments?
Looks good to me.
If people keep changesets sanely, then there should be no need for
per-file comments IMHO, but I'm sure that's a matter of debate.
M.
On Sat 15 Mar 03 17:21, Horst von Brand wrote:
> Daniel Phillips <[email protected]> said:
> > On Thu 13 Mar 03 01:52, Horst von Brand wrote:
>
> [...]
>
> > > I don't think so. As the user sees it, a directory is mostly a
> > > convenient labeled container for files. You think in terms of moving
> > > files around, not destroying one and magically creating an exact copy
> > > elsewhere (even if mv(1) does exactly this in some cases). Also, this
> > > breaks up the operation "mv foo bar/baz" into _two_ changes, and this
> > > is wrong as the file loses its revision history.
> >
> > No, that's a single change to one directory object.
>
> mv some/where/foo bar/baz
>
> How is that _one_ change to _one_ directory object?
Oops, sorry, I didn't read your bar/baz correctly. Yes, it's two directory
objects, but it's only one file object, and the history (not including the
name changes) is attached to the file object, not the directory object. This
is implemented via an object id for each file object, something like an inode
number.
> > > > ...then this part gets much easier.
> > >
> > > ... by screwing it up. This is exactly one of the problems noted for
> > > CVS.
> >
> > CVS doesn't have directory objects.
>
> And it doesn't keep history across moves, as the only way it knows to move
> a file is destroying the original and creating a fresh copy.
Ah, but it does. Sorry for not explaining the object id thing earlier.
> > Does anybody have a convenient mailing list for this design discussion?
>
> Good idea to move this off LKML
Yup, but nobody has offered one yet, so...
Regards,
Daniel
On Sat 15 Mar 03 16:02, Horst von Brand wrote:
> Daniel Phillips <[email protected]> said:
>
> [...]
>
> > You confused semantic dependencies with structural dependencies that
> > govern whether or not deltas conflict in the reject sense. Detailed
> > reply is off-list.
>
> In both cases hand fixup is needed. The "overlapping patch" partial order
> is a (small, or even very small) subset of the "depends on" partial order
> which you really want.
But it's a very irritating subset and much of the work involved can be
handled automatically, so it should be.
> It would be nice to be able to get a much better
> approximation than "conflicting patch" automatically, but I fail to see
> how.
I suppose automatic syntactic analysis could be worked in there, or trial
builds could be done automatically (check out how Visual Age aka Eclipse does
it). I'd put that in the "extra credit" category, and for starters I'd be
entirely satisfied with:
- Automatic handling of most structural conflicts, which would result
in multiple possible deltas between two objects involved in a merge.
These would be marked by the UI as "conflicts", and the system could
helpfully point your editor at the relevant source texts.
- Manual handling of semantic conflicts, but good support for navigating
your editor to where the problems likely are (e.g., probably involves
a changeset you recently merged).
> Giving dependencies by hand is a possibility,
Very useful, and not hard to do.
> but it will most of the
> time give as bad an approximation as the above (Do you really know _all_
> patches on which your latest and greatest depends?
You don't need to, you just provide a little help to the system. When you
don't provide enough help, you'll get extra compile/run errors, which isn't
worse than what happens now.
Chances are, the same dependencies will carry over from version to version,
so it's largely a one-time effort. When you do put in a manual dependency,
you can also put a notation on it, explaining why it's there in case that
needs clarification.
> Some (or even most) of
> them will be old patches, that by now will be just part of the general
> landscape. And this can happen even with direct dependencies: Think of
> "disabling IRQs doesn't ensure mutual exclusion" or some such pervasive
> change that will affect a small part of any patch, and now move an old
> patch forward...).
Eventually, a changeset that the system is carrying forward could become
moot, because it's unlikely ever to be backed out. In that case, just merge
it permanently and stop carrying it forward. And if you happen to be wrong
about needing to carry it forward, it just means you have to bring it forward
from where you ended it.
Regards,
Daniel
On Fri 14 Mar 03 13:29, Pavel Machek wrote:
> What kind of data would have to be in soft-changeset?
> * unique id of changeset
> * unique id of previous changeset
> (two previous if it is merge)
> ? or would it be better to have here
> whole path to first change?
> * commit comment
> * for each file:
> ** diff -u of change
> ** file's unique id
> ** in case of rename: new name (delete is rename to special dir)
> ** in case of chmod/chown: new permissions
> ** per-file comment
This *very* closely matches the schema I worked up some months ago, and
dusted off again when I saw your original Bitbucket post.
> ? How to handle directory moves?
>
> Does it seem sane? Any comments?
Oh yes. Comment: see response to Horst van Brand, on much the same subject.
Regards,
Daniel
On Sat, 2003-03-15 at 13:25, Daniel Phillips wrote:
> On Sat 15 Mar 03 17:21, Horst von Brand wrote:
> > Daniel Phillips <[email protected]> said:
> > > On Thu 13 Mar 03 01:52, Horst von Brand wrote:
> >
> > [...]
> >
> > > > I don't think so. As the user sees it, a directory is mostly a
> > > > convenient labeled container for files. You think in terms of moving
> > > > files around, not destroying one and magically creating an exact copy
> > > > elsewhere (even if mv(1) does exactly this in some cases). Also, this
> > > > breaks up the operation "mv foo bar/baz" into _two_ changes, and this
> > > > is wrong as the file loses its revision history.
> > >
> > > No, that's a single change to one directory object.
> >
> > mv some/where/foo bar/baz
> >
> > How is that _one_ change to _one_ directory object?
>
> Oops, sorry, I didn't read your bar/baz correctly. Yes, it's two directory
> objects, but it's only one file object, and the history (not including the
> name changes) is attached to the file object, not the directory object. This
> is implemented via an object id for each file object, something like an inode
> number.
>
> > > > > ...then this part gets much easier.
> > > >
> > > > ... by screwing it up. This is exactly one of the problems noted for
> > > > CVS.
> > >
> > > CVS doesn't have directory objects.
> >
> > And it doesn't keep history across moves, as the only way it knows to move
> > a file is destroying the original and creating a fresh copy.
>
> Ah, but it does. Sorry for not explaining the object id thing earlier.
>
> > > Does anybody have a convenient mailing list for this design discussion?
> >
> > Good idea to move this off LKML
>
> Yup, but nobody has offered one yet, so...
I think the [email protected] list would be happy to host
continuing discussion in this vein. Considering Larry's repeated
attempts to get people to look at arch as a "better fit," it seems
particularly appropriate.
Of course, you'd have to tolerate "arch community" views on a lot of
these issues, but I suspect that might help focus the discussion.
Bob
Dear diary, on Fri, Mar 14, 2003 at 01:29:03PM CET, I got a letter,
where Pavel Machek <[email protected]> told me, that...
> Hi!
>
> > You can wave your wand, and the soft changeset will turn into a universal
> > diff or a BK changeset. But it's obviously a lot cleaner, extensible,
> > flexible and easier to process automatically than a text diff. It's an
> > internal format, so it can be improved from time to time with little or no
> > breakage.
> >
> > Did that make sense?
>
> Yes.
>
> Some kind of better-patch is badly needed.
>
> What kind of data would have to be in soft-changeset?
> * unique id of changeset
> * unique id of previous changeset
> (two previous if it is merge)
> ? or would it be better to have here
> whole path to first change?
> * commit comment
> * for each file:
> ** diff -u of change
> ** file's unique id
> ** in case of rename: new name (delete is rename to special dir)
> ** in case of chmod/chown: new permissions
> ** per-file comment
>
> ? How to handle directory moves?
>
> Does it seem sane? Any comments?
Sounds almost sane (except the requirement for -u, it should be probably
possible to use the same scale of diff types as now </nitpicking>). When
already doing -u, it should probably also mention the original name of the file
in case of move/rename and especially the original chmod/chown
permissions/ownership. About chown, I'm not that sure ownership should be
recorded/carried, given that normal users can't even chown, and probably the
usernames won't exist on the system anyway. Maybe making that an optional
feature which the patching subject may ignore.
Whole separate issue is how to generate the unique ids. First, we need unique
ids for people, that shouldn't be that difficult. In fact, email should do
quite well, as it does for BitKeeper or arch. Interesting thing could happen
when someone's email is going to change and he wants to use the new one.
Probably when changing this information in his repository, the old one should
be kept as an "alias" and sent along with any updates near the new one --- I
believe the backlog shouldn't even reach any dangerous length, for standard
communication some sane upper threshold (like 5) could be set and more would be
sent only in direct communication and only in case of conflicts.
Changeset unique id should probably include author of that changeset and time
(with seconds precision) of commiting such a changeset to the [original]
repository [of the changeset]. However some insane scripts could make, checkin
and commit several changesets in line fast enough or so, thus you want
something else in the id as well, which could further differentiate commitins
happenning at same time. Checksum of the changeset changes (in some suitable
form) would do. Now, if you want to annoy Larry, separate the fields by '|'s
and you could get something familiar.
File's unique id is a little harder. The best thing to do is probably to
identify file by its origin. The file appeared in some changeset, we have
already unique ids for changesets. And the file appeared under some original
name there, which has to be unique inside of one changeset. Thus take changeset
unique id and add another field there, the original file name under which it
appeared in that changeset. It should be still unique and also cheap to look up
--- you have only to look for changes in that one changeset, look up the
particular file in the list of files appearing there and you should keep some
file name -> "virtual inode" number mapping near the files anyway.
What do you think?
Kind regards,
--
Petr "Pasky" Baudis
.
The pure and simple truth is rarely pure and never simple.
-- Oscar Wilde
.
Stuff: http://pasky.ji.cz/
>> > > Does anybody have a convenient mailing list for this design
>> discussion?
>> >
>> > Good idea to move this off LKML
>>
>> Yup, but nobody has offered one yet, so...
>
> I think the [email protected] list would be happy to host
> continuing discussion in this vein. Considering Larry's repeated
> attempts to get people to look at arch as a "better fit," it seems
> particularly appropriate.
>
> Of course, you'd have to tolerate "arch community" views on a lot of these
> issues, but I suspect that might help focus the discussion.
>
> Bob
> -
Yes, that sounds good to me too.
And they have already begun a list of what gcc and Linux kernel SCM
requirements (see http://arch.fifthvision.net/bin/view/Arx/WebHome
for "Requirements").
~Randy
On Sat, 2003-03-15 at 13:50, Randy.Dunlap wrote:
> >> > > Does anybody have a convenient mailing list for this design
> >> discussion?
> >> >
> >> > Good idea to move this off LKML
> >>
> >> Yup, but nobody has offered one yet, so...
> >
> > I think the [email protected] list would be happy to host
> > continuing discussion in this vein. Considering Larry's repeated
> > attempts to get people to look at arch as a "better fit," it seems
> > particularly appropriate.
> >
> > Of course, you'd have to tolerate "arch community" views on a lot of these
> > issues, but I suspect that might help focus the discussion.
> >
> > Bob
> > -
>
> Yes, that sounds good to me too.
> And they have already begun a list of what gcc and Linux kernel SCM
> requirements (see http://arch.fifthvision.net/bin/view/Arx/WebHome
I actually just moved this topic to:
http://arch.fifthvision.net/bin/view/Main/WebHome
since it doesn't properly belong exclusively to the "ArX" fork of the
arch project.
Bob
On Sat, 2003-03-15 at 13:50, Randy.Dunlap wrote:
> >> > > Does anybody have a convenient mailing list for this design
> >> discussion?
> >> >
> >> > Good idea to move this off LKML
> >>
> >> Yup, but nobody has offered one yet, so...
> >
> > I think the [email protected] list would be happy to host
> > continuing discussion in this vein. Considering Larry's repeated
> > attempts to get people to look at arch as a "better fit," it seems
> > particularly appropriate.
> >
> > Of course, you'd have to tolerate "arch community" views on a lot of these
> > issues, but I suspect that might help focus the discussion.
> >
> > Bob
> > -
>
> Yes, that sounds good to me too.
> And they have already begun a list of what gcc and Linux kernel SCM
> requirements (see http://arch.fifthvision.net/bin/view/Arx/WebHome
> for "Requirements").
>
> ~Randy
Sounds good. Here's the mailing list page:
http://lists.fifthvision.net/mailman/listinfo/arch-users/
You have to be registered, or your messages will be queued for
moderation.
Bob
Dear diary, on Sat, Mar 15, 2003 at 10:32:46PM CET, I got a letter,
where Petr Baudis <[email protected]> told me, that...
..snip..
> Changeset unique id should probably include author of that changeset and time
> (with seconds precision) of commiting such a changeset to the [original]
> repository [of the changeset]. However some insane scripts could make, checkin
> and commit several changesets in line fast enough or so, thus you want
> something else in the id as well, which could further differentiate commitins
> happenning at same time. Checksum of the changeset changes (in some suitable
> form) would do. Now, if you want to annoy Larry, separate the fields by '|'s
> and you could get something familiar.
..snip..
Okay, you will also need to define some project (let's define project as a
group of files with a history, where the instances of a project are called
"repositories" and are nodes of a DAG with common root, which we will call the
initial repository) unique id and include it in the changeset id. I think the
best for a project unique id would be some checksum (so that it isn't too
long..?) of the initial repository owner (project founder), project name (such
as 'linux' or 'foobar' or "this isn't going to be unique, who cares") and some
roughly random number (be it a timestamp, /dev/urandom output snippet or
metheorogical situation snapshot).
We could maybe raise the precision for timestamp of changeset ids instead of
having the checksum there, is it really neccessary? I fear of changeset id
being too annoyingly long and complicated. And yes I'm looking at BK heavily
regarding these concepts --- they seem to get these concepts fairly right so
why not.
Kind regards,
--
Petr "Pasky" Baudis
.
The pure and simple truth is rarely pure and never simple.
-- Oscar Wilde
.
Stuff: http://pasky.ji.cz/
Dear diary, on Sat, Mar 15, 2003 at 10:53:34PM CET, I got a letter,
where Robert Anderson <[email protected]> told me, that...
> On Sat, 2003-03-15 at 13:25, Daniel Phillips wrote:
> > On Sat 15 Mar 03 17:21, Horst von Brand wrote:
> > > Daniel Phillips <[email protected]> said:
> > > > On Thu 13 Mar 03 01:52, Horst von Brand wrote:
..snip..
> > > > Does anybody have a convenient mailing list for this design discussion?
> > >
> > > Good idea to move this off LKML
> >
> > Yup, but nobody has offered one yet, so...
>
> I think the [email protected] list would be happy to host
> continuing discussion in this vein. Considering Larry's repeated
> attempts to get people to look at arch as a "better fit," it seems
> particularly appropriate.
>
> Of course, you'd have to tolerate "arch community" views on a lot of
> these issues, but I suspect that might help focus the discussion.
I'm not sure if arch is the right thing to base on. Its concepts are surely
interesting, however there are several problems (some of them may be
subjective):
* Terrible interface. Work with arch involves much more typing out of long
commands (and sequences of these), subcommands and parameters to get
functionality equivalent to the one provided much simpler by other SCMs. I see
it is in sake of genericity and sometimes more sophisticated usage scheme, but
I fear it can be PITA in practice for daily work.
* Awful revision names (just unique ids format). Again, it involves much more
typing and after some hours of work, the dashes will start to dance around and
regroup at random places in front of your eyes. The concepts behind (like
seamless division to multiple archives; I can't say I see sense in categories)
are intriguing, but the result again doesn't seem very practical.
* Evil directory naming. {arch} seems much more visible than CVS/ and SCCS/,
particularly as it gets sorted as last in a directory, thus you see it at the
bottom of ls output. Also it's a PITA with bash, as the stuff starting by '='
(arch likes to spawn that as well) is. The files starting by '+' are problem
for vi, which is kind of flaw when they are probably the only arch files
dedicated for editting by user (they are supposed to contain log messages).
* Cloud of shell scripts. It poses a lot of limitations which are pain to work
around (including speed, two-fields version numbers [eek] and I can imagine
several others; I'm not sure about these though, so I won't name further; you
can possibly imagine something by yourself).
* Absence of sufficient merging ability, at least impression I got from the
documentation. Merging on the *.rej files level I cannot call sufficient ;-).
Also, history is not preserved during merging, which is quite fatal. And it
looks to me at least from the documentation that arch is still in the
update-before-commit stage.
* Absence of checkin/commit distinction. File revisions and changesets seem to
be tied together, losing some of the cute flexibility BK has.
I must have missed terribly something in the documentation given how arch is
being recommended, please feel encouraged to correct me. But as I see it, most
of the juicy stuff is missing (altough I really like the concept of
configurations and especially the concept of caching --- mainly that you do not
_have_ to pull all the stuff from the clonee repository, which can be a pain
with more poor internet connection; then also if you aren't doing any that big
changes and you're confident that the remote repository is going to stay there,
it is less expensive to talk with the repository over network) and the existing
stuff is mostly in the form of shell scripts, which it has to leave and be
rewritten sooner or later anyway. The backend history format doesn't appear to
be particularily great as well. Dunno. What's so special about arch then?
Kind regards,
--
Petr "Pasky" Baudis
.
The pure and simple truth is rarely pure and never simple.
-- Oscar Wilde
.
Stuff: http://pasky.ji.cz/
On Sun, 16 Mar 2003, Petr Baudis wrote:
> I'm not sure if arch is the right thing to base on. Its concepts are surely
> interesting, however there are several problems (some of them may be
> subjective):
>
> * Terrible interface. Work with arch involves much more typing out of long
> commands (and sequences of these), subcommands and parameters to get
> functionality equivalent to the one provided much simpler by other SCMs. I see
> it is in sake of genericity and sometimes more sophisticated usage scheme, but
> I fear it can be PITA in practice for daily work.
>
> * Awful revision names (just unique ids format). Again, it involves much more
> typing and after some hours of work, the dashes will start to dance around and
> regroup at random places in front of your eyes. The concepts behind (like
> seamless division to multiple archives; I can't say I see sense in categories)
> are intriguing, but the result again doesn't seem very practical.
>
> * Evil directory naming. {arch} seems much more visible than CVS/ and SCCS/,
> particularly as it gets sorted as last in a directory, thus you see it at the
> bottom of ls output. Also it's a PITA with bash, as the stuff starting by '='
> (arch likes to spawn that as well) is. The files starting by '+' are problem
> for vi, which is kind of flaw when they are probably the only arch files
> dedicated for editting by user (they are supposed to contain log messages).
>
> * Cloud of shell scripts. It poses a lot of limitations which are pain to work
> around (including speed, two-fields version numbers [eek] and I can imagine
> several others; I'm not sure about these though, so I won't name further; you
> can possibly imagine something by yourself).
>
> * Absence of sufficient merging ability, at least impression I got from the
> documentation. Merging on the *.rej files level I cannot call sufficient ;-).
> Also, history is not preserved during merging, which is quite fatal. And it
> looks to me at least from the documentation that arch is still in the
> update-before-commit stage.
>
> * Absence of checkin/commit distinction. File revisions and changesets seem to
> be tied together, losing some of the cute flexibility BK has.
>
> I must have missed terribly something in the documentation given how arch is
> being recommended, please feel encouraged to correct me. But as I see it, most
I must have missed too. Last time I checked I had the same impression. A
bunch of shell scripts ( speed and portability goodbye ) and even
diff&patch were ran as external programs. Maybe it has the right concepts
but the architecture is *at least* weak. Subversion looked ( last time I
checked ) a better organized project, with *real* source code. Ok, it has
*insane* symbol names, but IMHO it's way better than the shell script
cloud.
- Davide
On Mar 16 2003, Petr wrote:
> > I think the [email protected] list would be happy to host
> > continuing discussion in this vein. Considering Larry's repeated
> > attempts to get people to look at arch as a "better fit," it seems
> > particularly appropriate.
> >
> > Of course, you'd have to tolerate "arch community" views on a lot of
> > these issues, but I suspect that might help focus the discussion.
>
> I'm not sure if arch is the right thing to base on. Its concepts are surely
> interesting, however there are several problems (some of them may be
> subjective):
>
> * Terrible interface. Work with arch involves much more typing out of long
> commands (and sequences of these), subcommands and parameters to get
> functionality equivalent to the one provided much simpler by other SCMs. I see
> it is in sake of genericity and sometimes more sophisticated usage scheme, but
> I fear it can be PITA in practice for daily work.
Someone made a script not long ago to create four-letter aliases of all
arch commands. Instead of `larch star-merge' you type `lstm'. Does that
sound more like what you want?
> * Awful revision names (just unique ids format). Again, it involves much more
> typing and after some hours of work, the dashes will start to dance around and
> regroup at random places in front of your eyes. The concepts behind (like
> seamless division to multiple archives; I can't say I see sense in categories)
> are intriguing, but the result again doesn't seem very practical.
Chose shorter names ;p
> * Evil directory naming. {arch} seems much more visible than CVS/ and SCCS/,
> particularly as it gets sorted as last in a directory, thus you see it at the
> bottom of ls output.
echo "alias ls='ls --ignore {arch}'" >> .bashrc
Funnily enough, {arch} lists _first_ in ls output here. That was the
idea behind the curly braces in the first place too afaik.
> Also it's a PITA with bash, as the stuff starting by '=' (arch likes
> to spawn that as well) is.
No it doesn't. Tom, the main author of arch, likes files starting with
`='. The rest of us are not so sure ;) Off the top of my head I cannot
think of any file users should have to touch wich have a name starting
with `='.
> The files starting by '+' are problem for vi, which is kind of flaw
> when they are probably the only arch files dedicated for editting by
> user (they are supposed to contain log messages).
This is a known issue and is being looked into afaik.
I for one agree completely with this point.
> * Cloud of shell scripts. It poses a lot of limitations which are pain to work
> around (including speed, two-fields version numbers [eek] and I can imagine
> several others; I'm not sure about these though, so I won't name further; you
> can possibly imagine something by yourself).
Arch being a bunch of shell scripts:
http://arch.fifthvision.net/bin/view/Main/ArchMyths
Three-fields version names is being worked at IIRC.
> Also, history is not preserved during merging, which is quite fatal.
Not true. Any merge will include patch logs for the merged-in patches.
> And it looks to me at least from the documentation that arch is still
> in the update-before-commit stage.
have you looked at the --out-of-date-ok flag to commit? (not that I
understand why you would want to use that...)
> rewritten sooner or later anyway. The backend history format doesn't appear to
> be particularily great as well. Dunno. What's so special about arch then?
This say it so much better than I can:
http://arch.fifthvision.net/bin/view/Main/WhyArch
Stig
--
brautaset.org
I'm not sure if arch is the right thing to base on. Its
concepts are surely interesting, however there are several
problems (some of them may be subjective):
Let's see.
I'll say at the outset: you've named a bunch of things that do give a
bad first-impression to many users. None of these issues go "deep"
into arch -- there's lots of room, and even some actual work, towards
changing some of what you're complaining about. If the question is
"is arch a good starting point" -- the fact that all of these are
fairly minor issues reinforces the answer "yes", even if people insist
that there be changes related to these issues.
* Terrible interface. Work with arch involves much more typing
out of long commands (and sequences of these), subcommands and
parameters to get functionality equivalent to the one provided
much simpler by other SCMs. I see it is in sake of genericity
and sometimes more sophisticated usage scheme, but I fear it
can be PITA in practice for daily work.
Perhaps so. But the question is "Is arch the right starting point
from which to build a system for Linux kernel developers." If we
agree that what you describe is a problem, it seems to me that the
solution (at least to long command names and options) is _trivial_:
write some front-end scripts. That would be easy to do, wouldn't take
much code, and if a winning convenience layer emerged from that, I'm
sure we'd be happy to add it to arch (possibly via a a more general
"alias" mechanism for creating short-names for commands with default
option values).
But then there's revision names:
* Awful revision names (just unique ids format). Again, it
involves much more typing and after some hours of work, the
dashes will start to dance around and regroup at random places
in front of your eyes.
In practice, that hasn't been a problem. Instead, what people
who use arch to do real work complain about is:
1) two-component version numbers, major.minor
Several people want three-component. Everyone agrees that
n-component (user's choice) is best. We have good practical
reasons for making the change to n-component versions slowly and
carefully, but it is not a major change. Again, I'm assuming that
the question is "is arch the best starting point".
2) ordering of components
Arch unique ids say:
<category>--<branch>--<version>
and some users would rather have:
<category>--<version>--<branch>
which better matches the naming scheme currently used for the Linux
kernel.
So far, there really aren't sufficiently vociferous+convincing
requests to make any changes in this area -- but again, in the big
picture, no matter what happens in this area -- it's a minor point.
> The concepts behind (like seamless division to multiple
> archives; I can't say I see sense in categories) are
> intriguing, but the result again doesn't seem very practical.
Don't have much to say about that. It's been quite practical
for me, at least, in practice.
* Evil directory naming. {arch} seems much more visible than
CVS/ and SCCS/, particularly as it gets sorted as last in a
directory, thus you see it at the bottom of ls output. Also
it's a PITA with bash, as the stuff starting by '=' (arch
likes to spawn that as well) is. The files starting by '+' are
problem for vi, which is kind of flaw when they are probably
the only arch files dedicated for editting by user (they are
supposed to contain
log messages).
Yet again: these are minor complaints.
`+'-named log message files _are_ going to change to something more
vi-friendly. My bad. I'm both an emacs user and a
unix-traditionalist. I didn't initially notice the problem and my
reaction on hearing about it was "Well, vi is broken" -- but as a
practical matter, arch does need to change in that area.
arch itself does not generate `=' files in source trees -- I use
them in the arch source code; they do appear in archives and under
{arch} where you'll nearly never need to interact with them via
bash. Incidently, `bash' has recently been patched (not sure if
it's released yet) to make it deal properly, or at least better,
with `=' files. ("Well, bash is broken." :-)
I'm not sure why you think that "{arch}" is bad. There's _one_ of
those per controlled _tree_, while there's one CVS/ per _directory_.
I'm not sure why you think the sort-order of {arch} is bad -- I
think it's a feature because it puts that directory "out of sight;
out of mind" when I use my outline-style directory editor (if you
are an Emacs user, would you like a copy of my tree editor,
"monkey"?). I'm not sure why you think it's a PITA wrt bash -- I
use bash interactively and have never had any problem with {arch}.
But you know, again, this is a shallow issue. Practically speaking,
changing that name to something else is relatively low impact
(though, to be sure, a tedious change that would take several entire
_hours_ to make + a few days to figure out how to deal with existing
archives).
* Cloud of shell scripts. It poses a lot of limitations which
are pain to work around (including speed, two-fields version
numbers [eek] and I can imagine several others; I'm not sure
about these though, so I won't name further; you can possibly
imagine something by yourself).
You'll be happier with n-component versions when we have a variation
on sort(1) that handles them with ease -- and that's where we're
going.
Robert Anderson has posted on the wiki a tasty defense of the choice
of `sh' for the first implementation. He also pointed out some great
URLs from the SCSH site in the discussion on kerneltrap.org. (Sorry
for the meta-URLs here....)
Some people complain about `sh' as an implementation language. Beyond
defending that choice, let me say this: arch is a design and a set of
file and directory formats to go with that design. It _invites_
reimplementation. The other day, I played around with some of those
horrible shell tools and figured out that the meat of the sh scripts
in arch are just a bit over 20K LOC -- think you can rewrite that in
another language without too much cost? There's another 10-15K LOC
which is nothing but printf(1) statements for `--help' options,
copyright comments, and boilerplate loops that read command line
options and assign their values to variables.
In other words, look beyond just any one implementation -- arch is a
set of concepts; a set of interop standards just ripe to be written;
and a revision control system design that is simultaneously extremely
powerful, yet utterly trivial to implement or reimplement. Forget
about the problem of tying Linux kernel development to a proprietary
tool -- arch can can help you avoid tying to a single implementation
of a free revision control system.
* Absence of sufficient merging ability, at least impression I
got from the documentation. Merging on the *.rej files level I
cannot call sufficient ;-). Also, history is not preserved
during merging, which is quite fatal. And it looks to me at
least from the documentation that arch is still in the
update-before-commit stage.
You are partially misinformed.
Merge history in arch is preserved in excurciating detail. That
history is used smartly in some very common cases (like a tree of
trusted lieutenants) to eliminate some of the most common sources of
merge conflicts.
Yes, when conflicts occur, arch currently represents these via the
".rej" mechanism. Yes, that's low-level and, at least arguably, icky.
Yet, again, that's not a "deep" issue in the sense that changing that
behavior leaves unaffected "99%" of arch. So, again, is arch the
right _starting point_ for displacing BK?
* Absence of checkin/commit distinction. File revisions and
changesets seem to be tied together, losing some of the cute
flexibility BK has.
Yes, I've noted that from the lkml discussion. My impression so far
is that layering that functionality over the existing core of arch is
straightforward.
I must have missed terribly something in the documentation
given how arch is being recommended,
Larry, and, increasingly, some of the arch-users members, are revision
control experts. Your complaints about arch express, no offense, a
fairly superficial (yet valid, yet easy to deal with) end-user
perspective. I think that the recommendations come from the
perspective of "Hey, this is a decent foundation and if you don't
appreciate that, you're probably not qualified to do revision control
design", while the complaints come from a perspective of "Ick,
there's about 2-dozen tweaks to the UI on this thing that I can't
possibly live without and I won't use your system unless you start to
accomodate those."
The two perspectives are compatible and complementary. Thank you for
your feedback
But as I see it, most of the juicy stuff is missing (altough I
really like the concept of configurations and especially the
concept of caching --- mainly that you do not _have_ to pull
all the stuff from the clonee repository, which can be a pain
with more poor internet connection;
That's a deep point -- thank you for noticing.
then also if you aren't doing any that big changes and you're
confident that the remote repository is going to stay there,
it is less expensive to talk with the repository over network)
which arch does perfectly well.
and the existing stuff is mostly in the form of shell scripts,
which it has to leave and be rewritten sooner or later anyway.
Parts of it, sure. All of it? Well, I hope there are multiple
reimplementations of this tiny-yet-powerful system -- but I think the
sh-based version is more viable than some people do.
The backend history format doesn't appear to be particularily
great as well.
I can't respond to such a vague statement. Details, if you please.
Dunno. What's so special about arch then?
Superp design. Tiny yet powerful implementation. Unprecedented
features. Based, deeply, on "what changesets mean" -- thus handilly
adapts to a very wide range of usage scenarios. Software tools.
Also, mostly outside of the scope of Linux kernel foo -- the design
considers "programming in the large". In other words, it takes into
account the problem of managing a complete system, not just the
kernel, in a commercial context with competing but related
distributions. It's "scope of concern" is much larger than just the
lkml crowd.
-t
Pavel Machek <[email protected]> dijo:
[...]
> Some kind of better-patch is badly needed.
>
> What kind of data would have to be in soft-changeset?
> * unique id of changeset
If you want...
> * unique id of previous changeset
What is "previous"?
> (two previous if it is merge)
And if they are merges themselves? Or if it is a 3-way merge? Etc? How do I
get the original patches (if wanted)?
> ? or would it be better to have here
> whole path to first change?
> * commit comment
Right.
> * for each file:
> ** diff -u of change
> ** file's unique id
What is that? If I moved the file away and created a new one? Other moving
around stuff?
> ** in case of rename: new name (delete is rename to special dir)
> ** in case of chmod/chown: new permissions
> ** per-file comment
Much more important: How to merge a conflicting patch in sanely? This is
perhaps the worst stumbling block on plain patches.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Tom Lord ([email protected]) wrote:
> `+'-named log message files _are_ going to change to something more
> vi-friendly. My bad. I'm both an emacs user and a
> unix-traditionalist. I didn't initially notice the problem and my
> reaction on hearing about it was "Well, vi is broken" -- but as a
> practical matter, arch does need to change in that area.
Not that you need any more prodding on this direction, but it's worth
noting that both more(1) and less(1) suffer from this problem too.
$ touch +foo
$ more +foo
usage: more [-dflpcsu] [+linenum | +/pattern] name1 name2 ...
$ less +foo
Missing filename ("less --help" for help)
hey guys, the suggestion to move to another list for this discussion was
to reduce traffic on the kernel list, not add a bunch of arch discussions
to the bitkeeper discussions.
pick one list and use it, don't use both.
David Lang
On Sun, 16 Mar 2003, Adam Spiers wrote:
> Date: Sun, 16 Mar 2003 02:06:17 +0000
> From: Adam Spiers <[email protected]>
> To: [email protected], [email protected]
> Subject: Re: [arch-users] Re: BitBucket: GPL-ed KitBeeper clone
>
> Tom Lord ([email protected]) wrote:
> > `+'-named log message files _are_ going to change to something more
> > vi-friendly. My bad. I'm both an emacs user and a
> > unix-traditionalist. I didn't initially notice the problem and my
> > reaction on hearing about it was "Well, vi is broken" -- but as a
> > practical matter, arch does need to change in that area.
>
> Not that you need any more prodding on this direction, but it's worth
> noting that both more(1) and less(1) suffer from this problem too.
>
> $ touch +foo
> $ more +foo
> usage: more [-dflpcsu] [+linenum | +/pattern] name1 name2 ...
> $ less +foo
> Missing filename ("less --help" for help)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Sat, 2003-03-15 at 16:18, Petr Baudis wrote:
> Dear diary, on Sat, Mar 15, 2003 at 10:53:34PM CET, I got a letter,
> where Robert Anderson <[email protected]> told me, that...
> > On Sat, 2003-03-15 at 13:25, Daniel Phillips wrote:
> > > On Sat 15 Mar 03 17:21, Horst von Brand wrote:
> > > > Daniel Phillips <[email protected]> said:
> > > > > On Thu 13 Mar 03 01:52, Horst von Brand wrote:
> ..snip..
> > > > > Does anybody have a convenient mailing list for this design discussion?
> > > >
> > > > Good idea to move this off LKML
> > >
> > > Yup, but nobody has offered one yet, so...
> >
> > I think the [email protected] list would be happy to host
> > continuing discussion in this vein. Considering Larry's repeated
> > attempts to get people to look at arch as a "better fit," it seems
> > particularly appropriate.
> >
> > Of course, you'd have to tolerate "arch community" views on a lot of
> > these issues, but I suspect that might help focus the discussion.
>
> I'm not sure if arch is the right thing to base on. Its concepts are surely
> interesting, however there are several problems (some of them may be
> subjective):
I, for one, was not necessarily interesting in "basing on" arch. I
think what the arch crowd would like to see is what kernel developers
are asking for, first, and then potentially to relate those needs to
arch.
But, let me address some of your points anyway:
> * Terrible interface. Work with arch involves much more typing out of long
> commands (and sequences of these), subcommands and parameters to get
> functionality equivalent to the one provided much simpler by other SCMs. I see
> it is in sake of genericity and sometimes more sophisticated usage scheme, but
> I fear it can be PITA in practice for daily work.
The commands are verbose, but they are verbose for the simple reason
that the command set for arch is very rich, and verbosity is somewhat
necessary to avoid ambiguity. I would certainly recommend the use of
completion facilities to use the command set as it exists natively. If
you are a bash user, try the bash completion code here:
[email protected]
http://rwa.homelinux.net/{public-archives}/rwa-2003
Certainly robust completion would mitigate much of the "typing problem."
But, there is also an alternate solution to this which consists of
having aliased "short forms" of the commands. Some work has also been
done in this area to provide complete, unambiguous, and easy to type
short forms. Search the arch mailing list archive for "short forms" for
the discussion and results so far.
> * Awful revision names (just unique ids format). Again, it involves much more
> typing
There will always be a tension between clarity and terseness of names in
general. arch tends to the side of clarity. You seem to favor
terseness for reasons of typing effort. That tension can be mitigated
in any number of ways; completion probably being the most pragmatic.
and after some hours of work, the dashes will start to dance around and
> regroup at random places in front of your eyes.
I don't think I've ever seen a complaint about "dashes dancing around in
front of people's eyes" on the mailing list since its inception. In
fact, I've started using the double-dash separator in a number of other
contexts since growing accustomed to it as a "hard break" in a name.
The concepts behind (like
> seamless division to multiple archives; I can't say I see sense in categories)
You can't see a sense in categories? That statement is hard to fathom.
Possibly you mean you don't see a sense in separate branch and version
qualifiers, and that's a more legitimate question in my view.
> are intriguing, but the result again doesn't seem very practical.
> * Evil directory naming. {arch} seems much more visible than CVS/ and SCCS/,
Well, {arch} does not litter every directory like CVS/ does. It marks
the root of a project tree, and therefore it's actually a _nice_ thing
to have be visible. That's the point of giving it a noticeable name.
There's nothing "evil" about it from my perspective.
> particularly as it gets sorted as last in a directory, thus you see it at the
> bottom of ls output. Also it's a PITA with bash, as the stuff starting by '='
> (arch likes to spawn that as well)
No, it doesn't.
is. The files starting by '+' are problem
> for vi, which is kind of flaw when they are probably the only arch files
> dedicated for editting by user (they are supposed to contain log messages).
The output of make-log is now prefixed with an absolute path; the ++
does not cause a problem in that context anymore, even with vi, i.e.:
vi `larch make-log`
is fine now.
While I think both the ++ and = convention should be reconsidered as
users almost uniformly resist them when first getting used to arch, I
don't think this is much of a substantive problem. It's just a
character or two, after all.
> * Cloud of shell scripts. It poses a lot of limitations which are pain to work
> around (including speed, two-fields version numbers [eek] and I can imagine
> several others; I'm not sure about these though, so I won't name further; you
> can possibly imagine something by yourself).
Out of curiosity, what is your favorite language that you would like to
see arch implemented in? Some of the usual concerns regarding shell
scripts having been addressed on the wiki under "ArchMyths."
> * Absence of sufficient merging ability,
With all due respect, I think this reveals your level of familiarity
with arch to be, umm, not high. I'm not aware of any revision control
system that has the depth of capability that arch has with respect to
merging. BitKeeper may; I'm not a BitKeeper expert, but I'm pretty sure
nothing else comes close.
at least impression I got from the
> documentation. Merging on the *.rej files level I cannot call sufficient ;-).
I think you've misunderstood something fairly basic.
> Also, history is not preserved during merging, which is quite fatal.
Which documentation were you reading again? I'm not being facetious,
there's several versions of various forms of documentation still around,
I'd like to know which one gave you that impression.
And it
> looks to me at least from the documentation that arch is still in the
> update-before-commit stage.
I'm not sure what the "update-before-commit stage" is. Can you clarify?
> * Absence of checkin/commit distinction. File revisions and changesets seem to
> be tied together, losing some of the cute flexibility BK has.
I'm not aware of any such thing as a "file revision" in arch. Perhaps
you could expand on what that is and why you think you need such a
thing.
> I must have missed terribly something in the documentation
Yes, I believe you did.
given how arch is
> being recommended, please feel encouraged to correct me. But as I see it, most
> of the juicy stuff is missing
Let's start with the above problems with your first reading of the docs,
then we'll move onto the "juicy stuff."
(altough I really like the concept of
> configurations and especially the concept of caching --- mainly that you do not
> _have_ to pull all the stuff from the clonee repository, which can be a pain
> with more poor internet connection; then also if you aren't doing any that big
> changes and you're confident that the remote repository is going to stay there,
> it is less expensive to talk with the repository over network)
Signs of life... :)
and the existing
> stuff is mostly in the form of shell scripts, which it has to leave and be
> rewritten sooner or later anyway.
Most of us would probably like to see that, but I think "has to" is
debatable.
The backend history format doesn't appear to
> be particularily great as well. Dunno. What's so special about arch then?
Let's talk about what kernel developers think they need, then we can
frame "what is so special about arch" in terms of that. I think that's
a reasonable way to frame the discussion.
Bob
Dear diary, on Sun, Mar 16, 2003 at 01:18:40AM CET, I got a letter,
where Petr Baudis <[email protected]> told me, that...
> Dear diary, on Sat, Mar 15, 2003 at 10:53:34PM CET, I got a letter,
> where Robert Anderson <[email protected]> told me, that...
> > On Sat, 2003-03-15 at 13:25, Daniel Phillips wrote:
> > > On Sat 15 Mar 03 17:21, Horst von Brand wrote:
> > > > Daniel Phillips <[email protected]> said:
> > > > > On Thu 13 Mar 03 01:52, Horst von Brand wrote:
> ..snip..
> > > > > Does anybody have a convenient mailing list for this design discussion?
> > > >
> > > > Good idea to move this off LKML
> > >
> > > Yup, but nobody has offered one yet, so...
> >
> > I think the [email protected] list would be happy to host
> > continuing discussion in this vein. Considering Larry's repeated
> > attempts to get people to look at arch as a "better fit," it seems
> > particularly appropriate.
> >
> > Of course, you'd have to tolerate "arch community" views on a lot of
> > these issues, but I suspect that might help focus the discussion.
>
> I'm not sure if arch is the right thing to base on. Its concepts are surely
> interesting, however there are several problems (some of them may be
> subjective):
..rant..
Ok, from a perspective of few hours I think it's a good idea to really move
this discussion to arch-users, although the resulting SCM may not be
neccessarily arch. I will strip lkml from the recipients list in my further
mails replying to this sub-thread and I would like to ask the others to do the
same.
Kind regards,
--
Petr "Pasky" Baudis
.
The pure and simple truth is rarely pure and never simple.
-- Oscar Wilde
.
Stuff: http://pasky.ji.cz/
(Please strip lkml from the cc list when replying.)
Dear diary, on Mon, Mar 10, 2003 at 01:02:33AM CET, I got a letter,
where Petr Baudis <[email protected]> told me, that...
> Dear diary, on Sun, Mar 09, 2003 at 03:45:22AM CET, I got a letter,
> where Zack Brown <[email protected]> told me, that...
..snip..
> > * Merging.
> >
> > * The graph structure.
>
> About these two, it could be worth noting how BK works now, how looks branching
> and merging and how could it be done internally.
>
> When you want to branch, you just clone the repository --- each clone
> automatically creates a branch of the parent repository. Similiarly, you merge
> the branch by simply pulling the "branch repository" back. This way the
> distributed repositories concept is tightly connected with the branch&merge
> concept. When I will talk about merging below, it doesn't matter if it will
> happen from the cloned repository just one directory far away or over network
> from a different machine.
>
> [note that the following is figured out from various resources but not from the
> documentation where I didn't find it; thus I may be well completely wrong here;
> please substitute "is" by "looks to be", "i think" etc in the following text]
>
> BK works with a DAG (Directed Acyclic Graph) formed from the versions, however
> the graph looks differently from each repository (diagrams show ChangeSet
> numbers).
>
> From the imaginary Linus' branch, it looks like:
>
> linus 1.1 -> 1.2 -> 1.3 -----> 1.4 -> 1.5 -----> 1.6 -----> 1.7
> \ / \ /
> alan \-> 1.2.1.1 --/---\-> 1.2.1.2 -> 1.2.1.3 --/
>
> But from the Alan' branch, it looks like:
>
> linus 1.1 -> 1.2 -> 1.2.1.1 -> 1.2.1.2 -> 1.2.1.3 -> 1.2.1.4 -> 1.2.1.5
> \ / \ /
> alan \-> 1.3 ------/---\-----> 1.4 -----> 1.5 ------/
>
> But now, how does merging happen? One of the goals is to preserve history even
> when merging. Thus you merge individual per-file checkins of the changeset
> one-by-one, each checkin receiving own revision in the target tree as well ---
> this means the revision numbers of individual checkins change during merge if
> there were other checkins to the file between fork and merge.
>
> But it's a bit more complicated: ChangeSet revision number is not globally
> unique neither and it changes. You cannot decide it to be globally unique
> during clone, because then you would have to increase the branch identifier
> after each clone (most of them are probably just read-only). Thus in the cloned
> repository, you work like you would continue in the branch you just cloned, and
> the branch number is assigned during merge.
>
> A virtual branch (used only to track ChangeSets, not per-file revisions) is
> created in the parent repository during merge, where the merged changesets
> receive new numbers appropriate for the branch. However the branch is really
> only virtual and there is still only one line of development in the repository.
> If you want to see the ChangeSets in order they were applied and present in the
> files, you have not to sort them by revision, but by merge time. Thus the order
> in which they are applied to the files is (from Linus' POV):
>
> 1.1 1.2 1.3 1.2.1.1 1.4 1.5 1.6 1.2.1.2 1.2.1.3 1.7
..snip..
I didn't explain (and get as well, in fact) this probably well enough
initially, as several people asked me about this privately. Thus I decided it
would be worth elaborating the "virtual branching" (which turns out not to be
*that* virtual after all) concept, and alternative solutions. While the current
operation may be quite obvious for the regular bk users, it probably isn't for
the others and it could be worth documenting it. And deciding (on arch-list,
please), how to actually do it for ourselves ;-).
Let's sketch some really very simple example of DAG here, but much more
detailed about revisions. First, the basic situation:
Linus +-----+ +-----+ +-----+
BASE ,-->| 1.2 |---->| 1.3 |---->| 1.4 |--.
+-----+ / +-----+ +-----+ +-----+ \ +-------+
| 1.1 |--< >--| MERGE |
+-----+ \ +-----+ +-----+ / +-------+
`------>| 1.2 |-------->| 1.3 |------'
Alan +-----+ +-----+
At the merge point, Linus will merge the Alan's changesets committed after the
fork. However, do we want to do a flat-merge, cummulating the changesets to
one big ball and placing it as a 1.5 ? Or rather take the changesets and commit
them separately ?
Let's see what Bitkeeper appears to do. It will take the common ancestors of
the branches, that is 1.1 here. Then, it will pull from the branch being
merged, fork an internal branch at 1.1 and stuff the pulled changesets there.
Thus the result will be the classical image of brances in RCS-alike systems
(Linus' perspective):
Linus +-----+ +-----+ +-----+
BASE ,-->| 1.2 |---->| 1.3 |---->| 1.4 |--.
+-----+ / +-----+ +-----+ +-----+ \ +-------+
| 1.1 |--< >--| MERGE |
+-----+ \ +---------+ +---------+ / +-------+
`--->| 1.1.1.1 |------>| 1.1.1.2 |---'
Alan +---------+ +---------+
So BK *appears* to _do_ have a "classical branching" capability, despite the
impression, although it seems not to be available for regular usage but rather
only for internal purposes.
After this, the merge itself (done by "bk resolve", looking from the
documentation) will do some magical operation, which _probably_ looks like:
* combine all these changesets to one big diff (note that in practice you don't
probably do such a silly things but just hijack the development line to include
the branched changesets at the right point and only skip the conflicting delta
fragments; however it is best to illustrate as if we would do it this way ;)
* apply this big diff on the top of 1.4
* check it in and hide it to some eyes (it certainly doesn't appear in the
mails which are emitted by bk to our popular mailing lists, I didn't check how
exactly are these merges presented on the web interface and have zero clue
about how are they being presented by "bk log"-alikes; I'd say that it is
sorted by the date of merge, thus it is in that order "1.1 1.2 1.3 1.4 1.1.1.1
1.1.1.2" being presented in the previous mail; also the merge changeset
certainly has to contain information about the branch being merged there, so
the log can have inserted info about 1.1.1.1 and 1.1.1.2 between 1.4 and 1.5).
* note that at some conditions the file revision numbers are branchized as
well, it probably happens when merging branches, I didn't investigate this yet
* don't check in the parts which were conflicting, leave them and let the user
resolve them --- these changes won't be hidden and they will appear as the diff
being attached to the "merge changeset"
* bundle all these changes and present them as changeset 1.5, called "Merge
alan with linus" or so ;) :
Linus +-----+ +-----+ +-----+
BASE ,-->| 1.2 |---->| 1.3 |---->| 1.4 |--. MERGE
+-----+ / +-----+ +-----+ +-----+ \ +-----+
| 1.1 |--< 1.4 + 1.1.1.2 >--| 1.5 |
+-----+ \ +---------+ +---------+ / +-----+
`--->| 1.1.1.1 |------>| 1.1.1.2 |---'
Alan +---------+ +---------+
Now this is certainly an interesting concept, and it is nice to users
especially because they have to solve conflicts _once_, when applying the
combined delta.
However, it is next to impossible to so-called "cherrypick changes", thus
select only some changesets which to merge. The problem is that you have to
spawn the branch with this one changeset, but what if you will later want to
import a changeset being _before_ this one? In order to preserve the order of
things, you will have to move the changeset on the branch forward to a next
revision number and push the new one at that place. Aside of revision numbers
being changed in frame of one repository is a weird thought, not only backwards
but even _forward_ conflicts could happen. So, the question is, is it mandatory
to preserve order of changesets? If the changesets would appear in the branch
in the order as they would be merged, the cherrypicking shouldn't be a problem,
IMHO.
Another thing is that the merging procedure is kind of weird, there has to be
some "hijacking" of the development line to get the order right. Otherwise,
however, the concept is not all that bad and it is full of nice ideas. Can we
do better?
There can be an alterantive approach, which picks the changesets from the
mergee one by one and apply them one by one, stopping at the moment when there
is a conflict. Then it will be let to user to solve and then the step-by-step
merge can continue. This will result in merged changesets being directly
committed to the tree as regular changesets, only with some additional info
that they are merged:
Linus +-----+ +-----+ +-----+
BASE ,-->| 1.2 |---->| 1.3 |---->| 1.4 |--.
+-----+ / +-----+ +-----+ +-----+ )
| 1.1 |--' ,------------------------------------' .-- . . .
+-----+ ( +-----+ +-----+ /
`------>| 1.5 |-------->| 1.6 |------'
Alan +-----+ +-----+
Note that there is no special changeset dedicated to the merge, all the
conflicts are resolved in the individual changesets, thus all the diffs there
are "real"; also, you do no hijacking of the line and all the changesets are
ordinary general ones, with almost no special attributes. The underlying
versioning system doesn't even have to know about branches ;-). However the
changesets are already mirrored modified, you have to possibly resolve
conflicts multiple times during a merge and it is not clear from the revision
number what the originating branch is forked from.
Which model do you think is better? Or do you have yet another idea how to do
this? (given that we _do_ have to do this somehow)
Have fun,
--
Petr "Pasky" Baudis
.
The pure and simple truth is rarely pure and never simple.
-- Oscar Wilde
.
Stuff: http://pasky.ji.cz/