LinuxLists.cc - Yet another linux filesytem: with version control

2001-07-23 21:04:15

Subject: Yet another linux filesytem: with version control

Hi all,

Handling multiples versions is a tough challenge (...even in the linux
kernel). Working under software configuration management (SCM) helps
but with some overhead; and it works only if everybody support it.

>From CVS to ClearCase, i haven't seen any easy tool. I feel a real
need to handle SCM simply.

The multiple version filesystem (mvfs) of ClearCase gives a
transparent acces to the data. I found this feature cool, but the
overall system is too complex. I would like to write an extension
module for the linux kernel to handle version control in a simply way.

Here's the main features:

-no check-out/check-in
-labelization
-private copy
-transparent acces to data
-select configuration with a single environment variable.
-mix of normal files (with the base FS) and, files which are managed
under version control (C-files) in a same filesystem.

Here's how i see it works:

When a C-file is created, the label "init" is put onto. The first
write on a C-file create a private copy for the user who run the
process. This C-file is added to a "User File List" (UFL). This
private copy is now selected by the FS in place of version "init".
Each user can start his own private copy by writting into a C-file.

When a developper has reach a step and, would like to share his work;
he creates a new label. This label will be put on every private copy
listed in the UFL and, the UFL is zeroed. Thoses new versions
are now public. They are viewed by setting $CONFIGURATION to the new
label. New developpement can be start from this label.

The label "init" is predefined. Labels will be organized in a tree
and, the structure will look like this:

struct label {
int id;
char [] name;
struct label * parent;
}

When we access a C-file with a "read" or a "write", the extension
module select one version with the following rules:

First, if the C-file is into the UFL, we have a private copy to
select. Else, we choose the version labeled by "$CONFIGURATION". If
such version does not exist, we search the version marked by the
nearest "parent" label (at least, label "init" match).

In kernel side, we need to manage the following structes:
-a tree of versions for each C-file.
-a tree of labels.
-a UFL list for each developpers.

In userland, we need:
-a "mklabel" tool.
-use a "CONFIGURATION" environment variable.
-use existing tool for "merge" operations.

If my design match your needs and, if there is enough feedback; i will
start this project. As i'm not a super kernel hacker, i need your help.

Any volunters are welcome !

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-23 21:18:27

by Larry McVoy

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

> The multiple version filesystem (mvfs) of ClearCase gives a
> transparent acces to the data. I found this feature cool, but the
> overall system is too complex. I would like to write an extension
> module for the linux kernel to handle version control in a simply way.

Having been through this a time or two, a few points to consider:

a) This is a hard area to get right. I've done it twice, I told Linus that
I could do it the second time in 6 months, and that was 3 years ago and
we're up to 6 full time people working on this. Your mileage may vary.
b) Filesystem support for SCM is really a flawed approach. No matter how
much you hate all SCM systems out there, shoving the problem into the
kernel isn't the answer. All that means is that you have an ongoing
battle to keep your VFS up to date with the kernel. Ask Rational
how much fun that is...
c) If you have to do a file system, may I suggest that you clone the SunOS
4.x TFS (translucent file system)? It's a useful model, you "stack" a
directory on top of a directory and you can see through to the underlying
directory. When you write to a file, the file is copied forward to the
top directory. So a hack attack is

mount -t TFS my_linux /usr/src/linux
cd my_linux
hack hack hack
... many hours later
cd ..
umount my_linux
find . -type f -print # this is your list of modified files

It's a cool thing but only semi needed - most serious programmers already
know how to do the same thing with hard links.

More brains are better than less brains, so welcome to the SCM mess...
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-07-23 21:52:20

by Rik van Riel

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, 23 Jul 2001, Larry McVoy wrote:

> b) Filesystem support for SCM is really a flawed approach.

Agreed. I mean, how can you cleanly group changesets and
versions with a filesystem level "transparent" SCM ?

The goal of an SCM is to _manage_ versions and changesets,
if it doesn't do that we're back at CVS's "every file its
own versioning and to hell with manageability" ...

regards,

Rik
--
Executive summary of a recent Microsoft press release:
"we are concerned about the GNU General Public License (GPL)"

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-07-23 21:58:40

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Larry McVoy a ?crit :

> Having been through this a time or two, a few points to consider:
>
> a) This is a hard area to get right. I've done it twice, I told Linus that
> I could do it the second time in 6 months, and that was 3 years ago and
> we're up to 6 full time people working on this. Your mileage may vary.

Yeah, i'm not alone !

I absolutely don't know how much work it is. Will you work again on this
topic ?
You + Me + 5 persons which work with you = 7p

If we need 50p, there is place enought for 43 volunters !

> b) Filesystem support for SCM is really a flawed approach. No matter how
> much you hate all SCM systems out there, shoving the problem into the
> kernel isn't the answer. All that means is that you have an ongoing

A filesystem seems to be the best location to store files. My first
intend
was to get ride of additional layers and, being able to use all UNIX
tool
directly on data. As i say, i have only one idea in head: "do it simple"
!

> battle to keep your VFS up to date with the kernel. Ask Rational
> how much fun that is...
>
> c) If you have to do a file system, may I suggest that you clone the SunOS
> 4.x TFS (translucent file system)? It's a useful model, you "stack" a
> directory on top of a directory and you can see through to the underlying
> directory. When you write to a file, the file is copied forward to the
> top directory. So a hack attack is
>
> mount -t TFS my_linux /usr/src/linux
> cd my_linux
> hack hack hack
> ... many hours later
> cd ..
> umount my_linux
> find . -type f -print # this is your list of modified files
>
> It's a cool thing but only semi needed - most serious programmers already
> know how to do the same thing with hard links.

I've yet done this kind of solution:
-copy every directories and sub-dircetories of v1/ into v2/
-create a symlink from v2 to v1 for each files.
-protect v1/

To work on a file, we just break and copy the link. But, i don't see how
to
work with 2 versions of the same file with hard link.

>
> More brains are better than less brains, so welcome to the SCM mess...

Ya, it's a true mess !

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-23 22:15:20

by Larry McVoy

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, Jul 24, 2001 at 12:00:53AM +0200, Jerome de Vivie wrote:
> I absolutely don't know how much work it is. Will you work again on this
> topic ?

Err, I've got a young but healthy company that is already doing it. I'm
happy to offer what advice I can to help you but I can't really commit
substantial resources towards this. I make my living off of my company
and that has to come first. That said, it's an interesting area and it's
nice to see others take an interest, so I'll help a little...

> To work on a file, we just break and copy the link. But, i don't see how
> to work with 2 versions of the same file with hard link.

You don't want to do so. You save little by doing so. Please tell me you
weren't going to version control at the block level, therein lies the path
to insanity. Getting it right at the file boundary is hard enough.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-07-23 22:17:00

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Rik van Riel a ?crit :
>
> On Mon, 23 Jul 2001, Larry McVoy wrote:
>
> > b) Filesystem support for SCM is really a flawed approach.
>
> Agreed. I mean, how can you cleanly group changesets and
> versions with a filesystem level "transparent" SCM ?

With label !

In my initial post, i have explain that labels are used to
identify individual files AND are also uses to select for
each files of a set, one version (= select a configuration).
It works !

>
> The goal of an SCM is to _manage_ versions and changesets,
> if it doesn't do that we're back at CVS's "every file its
> own versioning and to hell with manageability" ...

versioning is yet a first step.

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-23 22:24:50

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Larry McVoy a ?crit :
>
> On Tue, Jul 24, 2001 at 12:00:53AM +0200, Jerome de Vivie wrote:
> > I absolutely don't know how much work it is. Will you work again on this
> > topic ?
>
> Err, I've got a young but healthy company that is already doing it. I'm
> happy to offer what advice I can to help you but I can't really commit
> substantial resources towards this. I make my living off of my company
> and that has to come first. That said, it's an interesting area and it's
> nice to see others take an interest, so I'll help a little...

Ok, thanks !

>
> > To work on a file, we just break and copy the link. But, i don't see how
> > to work with 2 versions of the same file with hard link.
>
> You don't want to do so. You save little by doing so. Please tell me you
> weren't going to version control at the block level, therein lies the path
> to insanity. Getting it right at the file boundary is hard enough.

Yes, it was block level version control but it feets our needs ( I have
scattered files across directories when there were no dependencies).

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-23 22:30:00

by Rik van Riel

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, 24 Jul 2001, Jerome de Vivie wrote:
> Rik van Riel a ?crit :
> > On Mon, 23 Jul 2001, Larry McVoy wrote:
> >
> > > b) Filesystem support for SCM is really a flawed approach.
> >
> > Agreed. I mean, how can you cleanly group changesets and
> > versions with a filesystem level "transparent" SCM ?
>
> With label !
>
> In my initial post, i have explain that labels are used to
> identify individual files AND are also uses to select for
> each files of a set, one version (= select a configuration).
> It works !

Hmmmm, so it's not completely transparent. Good.

Now if you want to make this kernel-accessible, why
not make a userland NFS daemon which uses something
like bitkeeper or PRCS as its backend ?

The system would then look like this:

_____ _______ _____ _____
| | | | | | | |
| SCM |--| UNFSD |--| NET |--| NFS |
|_____| |_______| |_____| |_____|

And there, you have a transparent SCM filesystem
that works over the network ... without ever having
to modify the kernel or implement SCM.

> versioning is yet a first step.

And I'm not convinced it is even needed. All you
really need is the glue layer between the SCM
system and the kernel. A user level NFS server
will do this just fine.

regards,

Rik
--
Executive summary of a recent Microsoft press release:
"we are concerned about the GNU General Public License (GPL)"

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-07-23 22:51:03

by Florin Iucha

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Check out the katie project:

http://www.netcraft.com.au/geoffrey/katie/

florin

--

"If it's not broken, is because you are not fixing it enough."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4

2001-07-23 23:03:43

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Rik van Riel a ?crit :
>
> On Tue, 24 Jul 2001, Jerome de Vivie wrote:
> > Rik van Riel a ?crit :
> > > On Mon, 23 Jul 2001, Larry McVoy wrote:
> > >
> > > > b) Filesystem support for SCM is really a flawed approach.
> > >
> > > Agreed. I mean, how can you cleanly group changesets and
> > > versions with a filesystem level "transparent" SCM ?
> >
> > With label !
> >
> > In my initial post, i have explain that labels are used to
> > identify individual files AND are also uses to select for
> > each files of a set, one version (= select a configuration).
> > It works !
>
> Hmmmm, so it's not completely transparent. Good.

You only set a global variable to select on which configuration
you want to work. You can't do it simplier Rik: everything else
is transparent: read, write, ... !

>
> Now if you want to make this kernel-accessible, why
> not make a userland NFS daemon which uses something
> like bitkeeper or PRCS as its backend ?
>
> The system would then look like this:
>
> _____ _______ _____ _____
> | | | | | | | |
> | SCM |--| UNFSD |--| NET |--| NFS |
> |_____| |_______| |_____| |_____|

Your architecture is too complex for me.

>
> And there, you have a transparent SCM filesystem
> that works over the network ... without ever having
> to modify the kernel or implement SCM.
>

I can't do it outside the kernel. There is one important
feature i have mention: I would like to mix file from the
"base" filesystem and files which are managed under
configuration. Why is this feature really important ?
Because in the product, there are two kind of files:
-source (leaf on the dependency tree)
-and generated files.
As you know in SCM, generated files are not identify by version
number, but by a configuration (a set with one version for each
dependencies). So, there is no need to manage all objects of a
partition under version control.

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-23 23:15:15

by Larry McVoy

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, Jul 23, 2001 at 07:29:36PM -0300, Rik van Riel wrote:
> Now if you want to make this kernel-accessible, why
> not make a userland NFS daemon which uses something
> like bitkeeper or PRCS as its backend ?
>
> The system would then look like this:
>
> _____ _______ _____ _____
> | | | | | | | |
> | SCM |--| UNFSD |--| NET |--| NFS |
> |_____| |_______| |_____| |_____|
>
>
> And there, you have a transparent SCM filesystem
> that works over the network ... without ever having
> to modify the kernel or implement SCM.

I like the way you think, Rik. About 2 years ago I did a very quick and ugly
version of exactly this, just as a proof of concept. You could mount old
versions of the repositories and diff them, etc. Quite cool. It's long
since out of date and it adds a layer of caching and performance loss that
I wasn't willing to live with, but it's a cool idea. When we have more time
than problems I might get back to that. I think it is the right approach.

As to the comments he made about mixing files, that's not a problem. You
do need some way to tell UNFDS that this file is to be revision controlled
and that one is not, but with that you can let .o's be created and just
managed in the backing file system. Works fine. The interface to
revision control stuff seems ugly because you have to be explicit, but that
can be made nice. Suppose we used fake subdirectories as a way of doing
operations, such that

mv *.c ./.checkin

does a checkin, etc. That's not so bad and you need the interface anyway
to tell the system you are ready to check things in. You don't want it to
check in a new version every time you modify the file, that's excessive.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-07-23 23:31:05

by Rik van Riel

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, 24 Jul 2001, Jerome de Vivie wrote:
> Rik van Riel a ?crit :

> > Hmmmm, so it's not completely transparent. Good.
>
> You only set a global variable to select on which configuration
> you want to work. You can't do it simplier Rik: everything else
> is transparent: read, write, ... !

*nod*

Sounds like a great idea indeed.

> > Now if you want to make this kernel-accessible, why
> > not make a userland NFS daemon which uses something
> > like bitkeeper or PRCS as its backend ?
> >
> > The system would then look like this:
> >
> > _____ _______ _____ _____
> > | | | | | | | |
> > | SCM |--| UNFSD |--| NET |--| NFS |
> > |_____| |_______| |_____| |_____|
>
> Your architecture is too complex for me.

But you only have to implement 10% of it, the rest already
exists.

You already have:
1) Source Control Management system (SCM)
2) Userland NFS daemon (UNFSD)
3) network layer
4) NFS filesystem support (for every OS!)

All you need is a backend for the NFS server daemon to
get its files from a version control system (the SCM)
instead of from disk.

> > And there, you have a transparent SCM filesystem
> > that works over the network ... without ever having
> > to modify the kernel or implement SCM.
>
> I can't do it outside the kernel.

So chose the appropriate "magic directories" for the
NFS daemon ... maybe even "magic mount paths" ?

You're looking at reimplementing the 90% which is
already there (the versioning and the filesystem code)
while leaving the other 10% (the management code) for
a later date ;)

regards,

Rik
--
Executive summary of a recent Microsoft press release:
"we are concerned about the GNU General Public License (GPL)"

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-07-24 02:14:01

by Keith Owens

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, 23 Jul 2001 23:06:34 +0200,
Jerome de Vivie <[email protected]> wrote:
>Handling multiples versions is a tough challenge (...even in the linux
>kernel). Working under software configuration management (SCM) helps
>but with some overhead; and it works only if everybody support it.

FYI, you do not need this for the kernel. kbuild 2.5 already supports
multiple source trees for building the linux kernel. Current beta is
http://prdownloads.sourceforge.net/kbuild/kbuild-2.5-2.4.7-2.gz, read
Documentation/kbuild/kbuild-2.5.txt.

2001-07-24 05:25:13

by Albert D. Cahalan

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Larry McVoy writes:

> b) Filesystem support for SCM is really a flawed approach. No matter how
> much you hate all SCM systems out there, shoving the problem into the
> kernel isn't the answer. All that means is that you have an ongoing
> battle to keep your VFS up to date with the kernel. Ask Rational
> how much fun that is...

I'm sure it is a pain to maintain, but consider recovery
with revision control in your root filesystem:

LILO: linux init=/bin/sh rootfsopts=ver:/bin/sh@@/main/1

Nice, isn't it? You can trash /bin/* all you want.

Distributed filesystems like Coda seem to get pretty close
to having revision control anyway. They need something like
it for conflict resolution.

The traditional revision control approach seems to get pretty
wasteful as well. Maybe you have a few dozen developers, each
with a few files checked out of a multi-gigabyte source tree.
The kernel solution has less trouble sharing resources among
all the developers, especially when people share a machine.

2001-07-24 05:34:35

by Larry McVoy

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

> > b) Filesystem support for SCM is really a flawed approach. No matter how
> > much you hate all SCM systems out there, shoving the problem into the
> > kernel isn't the answer. All that means is that you have an ongoing
> > battle to keep your VFS up to date with the kernel. Ask Rational
> > how much fun that is...
>
> I'm sure it is a pain to maintain, but consider recovery
> with revision control in your root filesystem:
>
> LILO: linux init=/bin/sh rootfsopts=ver:/bin/sh@@/main/1
>
> Nice, isn't it? You can trash /bin/* all you want.

Yeah, that's cool. I'm with you in spirit on this one Albert, I've long
promoted that we use revision control for all the config files (stuff
like /etc/sendmail.cf, etc).

And we have customers who use BitKeeper to manage their entire OS, I mean
all the binaries are in there.

That said, I'd really urge people to listen to Rik, he has the right idea
with the user level NFS idea. There is no good reason and a lot of bad
reasons to put this stuff in the kernel.

I realize that since this is our business that my credibility is low,
you'll expect that I'm pushing this because it somehow benefits us (how,
I'm not sure, but I have faith that someone will think that). Anyway,
that's not the case, this is purely from a kernel point of view, I think
this is a dead end.

Useful stuff would be the copy on write file system, that's good for SCM
and other things. And the user level NFS approach. That way if you hate
the BK license you can plug PRCS or CVS or my-favorite-SCM system into the
back end. I'd much rather see that than BK in the kernel. Yuck.

> Distributed filesystems like Coda seem to get pretty close
> to having revision control anyway. They need something like
> it for conflict resolution.

Yeah! No kidding. If Coda had this I think there is a reasonable chance
that most SCM systems would go away. Certainly the trivial ones would.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-07-24 06:06:27

by Alexander Viro

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, 23 Jul 2001, Larry McVoy wrote:

> That said, I'd really urge people to listen to Rik, he has the right idea
> with the user level NFS idea. There is no good reason and a lot of bad
> reasons to put this stuff in the kernel.

> > Distributed filesystems like Coda seem to get pretty close
> > to having revision control anyway. They need something like
> > it for conflict resolution.
>
> Yeah! No kidding. If Coda had this I think there is a reasonable chance
> that most SCM systems would go away. Certainly the trivial ones would.

CODA servers tend to be simpler than NFS ones (stateful protocol,
commit-on-close, all file IO handled by local fs code, you name it).
Full-blown Venus is, indeed, a lurking horror from beyond, but that's a
different story - nightmarish stuff is in the distributed fs part. As a
glue for userland fs CODA wins hands down (BTW, that goes not only for
simplicity of code, but for performance and deadlock avoidance reasons).

There's a whole shitcan of worms around the semantics of versioned
fs, though - e.g. what happens if you create a link to an old version of
file? What happens if you rename an old version away? What happens if you
rename _over_ it? There are obvious answers to that (e.g. all versions
except the last one are read-only and can be freely moved around or removed;
all association between them is semblance of names), but I doubt that
any of the easy variants will satisfy those who want that stuff. Personally,
I'd go for "you can take a read-only snapshot of a subtree and then bind
its parts anywhere you want", but that's not the only variant and I really
doubt that _any_ variant would satisfy everyone.

No matter what implementation you choose, semantics will be a fscking
minefild and I'd rather _not_ see that flamewar on l-k. If somebody cares
to set a maillist - great, but let's keep it separate from l-k. This stuff
has a potential for flamewar worse than devfs, forked-files, bk licensing
and CML2 ones combined (and is very likely to resurrect the first two, in
bargain).

2001-07-24 09:32:05

by Padraig Brady

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Larry McVoy wrote:

>
> Useful stuff would be the copy on write file system, that's good for SCM
> and other things

Ooh ooh! are there any filesystems @ present that support copy on write?
Seems like a very useful feature that would be relatively easy to
implement (just
store a hash for each file in it's inode). With the ammount of duplicate
files on my
system (see freshmeat.net/projects/fslint) it would be very useful.
write() already
supports ENOSPC because of holes in files etc. There would be large
overhead
though as the hash for a file would have to be generated on each write()
? For a
"revision control" filesystem it would probably be more appropriate to
work @ the
block level instead? Hmm snapFS be appropriate for this ?
http://uwsg.iu.edu/hypermail/linux/kernel/0103.0/0436.html
Sorry just thinking out load..

Padraig.

2001-07-24 13:17:46

by Andrew Pimlott

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, Jul 23, 2001 at 11:06:34PM +0200, Jerome de Vivie wrote:
> >From CVS to ClearCase, i haven't seen any easy tool. I feel a real
> need to handle SCM simply.

I think your approach is too simple. ClearCase is a monster, but at
the core is conceptually sound (assuming the goal is file-based
control, not change-set-based; Rational has tried to layer a
change-set-based product on top of ClearCase, and I hear it is a
mess). By comparison you are missing some important things, some of
which I will try to point out.

> Here's the main features:
>
> -no check-out/check-in

(You do have check-in, you just call it something else.)

> When a C-file is created,

Presumably this is an explicit operation? What system call?

> the label "init" is put onto. The first
> write on a C-file create a private copy for the user who run the
> process. This C-file is added to a "User File List" (UFL). This
> private copy is now selected by the FS in place of version "init".
> Each user can start his own private copy by writting into a C-file.

per-user? So how do I let another developer look at what I'm
working on? In ClearCase, it's one private version per-view, which
is much more flexible.

Does the private copy know which label it was branched from? This
is essential.

> When a developper has reach a step and, would like to share his work;
> he creates a new label.

Ie, check-in by a different name. What system call?

> This label will be put on every private copy
> listed in the UFL and, the UFL is zeroed.

If I have to check in all files at once, it is even more important
that I be able to have multiple "views". What if, in the middle of
a big change, I make a small fix that I want to check in
independently?

> First, if the C-file is into the UFL, we have a private copy to
> select. Else, we choose the version labeled by "$CONFIGURATION". If
> such version does not exist, we search the version marked by the
> nearest "parent" label (at least, label "init" match).

You just threw away the most useful feature of filesystem
integration: comparing different versions. How do I do this if
everything is keyed off $CONFIGURATION?

I really don't see what you've gained over CVS. (Once you add in
all the little things you didn't mention: setting up the filesystem,
adding files to version control, etc, I don't think you can argue
that your system is simpler.)

Also, what if you create a label, but forget to update
$CONFIGURATION, and start to make more changes? You can just say
"stupid user", but the fact that this failure mode exists is a wart.

> In userland, we need:
> -a "mklabel" tool.
> -use a "CONFIGURATION" environment variable.
> -use existing tool for "merge" operations.

- setup filesystem
- add file to version control
- list labels, private files (what system calls?)

How will the existing merge tool work, if a single process can only
see one $CONFIGURATION?

Here's my conclusion: The overall semantics of a version control
system are non-trivial and should be kept out of the kernel. The
real win with kernel integration is transparent, flexible, read-only
access to versions. Your scheme puts unnecessary stuff in the
kernel, without getting the most important thing right.

(The only other potential win I see with kernel integration is
check-out-on-write, but that doesn't sound like a big deal to me.)

Andrew

2001-07-24 13:30:18

by Olivier Galibert

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Mon, Jul 23, 2001 at 08:30:48PM -0300, Rik van Riel wrote:
> You already have:
> 1) Source Control Management system (SCM)
> 2) Userland NFS daemon (UNFSD)
> 3) network layer
> 4) NFS filesystem support (for every OS!)
>
> All you need is a backend for the NFS server daemon to
> get its files from a version control system (the SCM)
> instead of from disk.

Stupid question maybe, but if you already have knfsd running on the
box serving perfectly normal directories (no unheard of with servers
after all), how do you tie in your own userland nfs server?

OG.

2001-07-24 16:41:01

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Hi Rik,

Rik van Riel a ?crit :
>
> On Tue, 24 Jul 2001, Jerome de Vivie wrote:
> > You only set a global variable to select on which configuration
> > you want to work. You can't do it simplier Rik: everything else
> > is transparent: read, write, ... !
>
> *nod*
>
> Sounds like a great idea indeed.

thx ;-)

>
> > > Now if you want to make this kernel-accessible, why
> > > not make a userland NFS daemon which uses something
> > > like bitkeeper or PRCS as its backend ?
> > >
> > > The system would then look like this:
> > >
> > > _____ _______ _____ _____
> > > | | | | | | | |
> > > | SCM |--| UNFSD |--| NET |--| NFS |
> > > |_____| |_______| |_____| |_____|
> >
> > Your architecture is too complex for me.

I've re-thought my draft and... your architecture is
not so complex !

Here's pros for userland SCM:
-easier to write
-easier to maintain (and no synchronization with kernel dvlp)
-work under every type of FS
-portable
-force me not to touch FS and properly write interface between
the SCM extension and the FS.

And cons:
-Multiple entry point to access data ( => risk of inconsistancy)
-Perhaps, a filesystem is the best place to put file (...even
for multiple-version files)

As it was mention by A. Viro, do it in the kernel may lead
to "devfs like" problems (...even after big simplifictions
like "one node for all version of a file").

I've change a bit my opinion: i'm not sure that userland
is the best place (...because there are cons pending) but,
i'm now nearest the userland solution of "hacking a nfsd".

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-24 17:11:52

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Andrew Pimlott a ?crit :
>
> On Mon, Jul 23, 2001 at 11:06:34PM +0200, Jerome de Vivie wrote:
> > >From CVS to ClearCase, i haven't seen any easy tool. I feel a real
> > need to handle SCM simply.
>
> I think your approach is too simple. ClearCase is a monster, but at
> the core is conceptually sound (assuming the goal is file-based
> control, not change-set-based; Rational has tried to layer a
> change-set-based product on top of ClearCase, and I hear it is a
> mess). By comparison you are missing some important things, some of
> which I will try to point out.
>
> > Here's the main features:
> >
> > -no check-out/check-in
>
> (You do have check-in, you just call it something else.)

Yes: co is now a "copy on write", so it's automatic.

>
> > When a C-file is created,
>
> Presumably this is an explicit operation? What system call?

Yes it's explicit. I know though about a userspace solution but
i would have added a "O_CREATE like" flags on open, or use ioctl.

> per-user? So how do I let another developer look at what I'm
> working on? In ClearCase, it's one private version per-view, which
> is much more flexible.

No.

> Does the private copy know which label it was branched from? This
> is essential.

Yes.

>
> > When a developper has reach a step and, would like to share his work;
> > he creates a new label.
>
> Ie, check-in by a different name. What system call?

Yes. Probably with a ioctl (but now with a user command !)

>
> > This label will be put on every private copy
> > listed in the UFL and, the UFL is zeroed.
>
> If I have to check in all files at once, it is even more important
> that I be able to have multiple "views". What if, in the middle of
> a big change, I make a small fix that I want to check in
> independently?

It's impossible. If you want to go back, you have to put a label on
each step you want and, set the $CONFIGURATION to this label.

>
> > First, if the C-file is into the UFL, we have a private copy to
> > select. Else, we choose the version labeled by "$CONFIGURATION". If
> > such version does not exist, we search the version marked by the
> > nearest "parent" label (at least, label "init" match).
>
> You just threw away the most useful feature of filesystem
> integration: comparing different versions. How do I do this if
> everything is keyed off $CONFIGURATION?

With 2 process and shared memory, it should be possible but i haven't
though deeper.

>
> I really don't see what you've gained over CVS. (Once you add in
> all the little things you didn't mention: setting up the filesystem,
> adding files to version control, etc, I don't think you can argue
> that your system is simpler.)

A developper has a minimum operation to do:
-set his configuration
-commit his work

That's all ! No branch, no config-spec, no view server, no vob server,
no registery server, no ci, no co, ...

>
> Also, what if you create a label, but forget to update
> $CONFIGURATION, and start to make more changes? You can just say
> "stupid user", but the fact that this failure mode exists is a wart.

1. You stop from this new "branch".
2. You commit your work with a new label.
3. You set $CONFIGURATION to the good label and merge the previous
work into.

>
> How will the existing merge tool work, if a single process can only
> see one $CONFIGURATION?

Same as for diff (...but now, obolete)

>
> Here's my conclusion: The overall semantics of a version control
> system are non-trivial and should be kept out of the kernel. The
> real win with kernel integration is transparent, flexible, read-only
> access to versions. Your scheme puts unnecessary stuff in the
> kernel, without getting the most important thing right.
>
> (The only other potential win I see with kernel integration is
> check-out-on-write, but that doesn't sound like a big deal to me.)

Copy-on-write was the first new idea. Using the same system
(labelization) to identify both individual version and configuration
is also a neat idea. The last one is "hacking the nfsd" (thx Rik !)
I'm sure that we can handle SCM differently.

regards,

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-24 19:08:12

by Jan Harkes

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, Jul 24, 2001 at 01:24:57AM -0400, Albert D. Cahalan wrote:
> The traditional revision control approach seems to get pretty
> wasteful as well. Maybe you have a few dozen developers, each
> with a few files checked out of a multi-gigabyte source tree.

Ouch, but that is a lot more difficult in kernel space than that.
Every developer would have his own personal view on the same filesystem.

One problem is how to identify a developer, by his uid's/gid's? This is
either not fine-grained enough, or breaks with setuid/gid processes. The
process group id or session id, these are already used by shells for
signal handling and typically don't follow a user's identity. AFS uses
yet another 'session identifier', the process authentication group.

Maybe some of the session information can be stored in the vfsmount
structure, or it might already be solved by Al's namespaces patch and
can be 'set' by remounting a file system. Perhaps the security module
work will give the stuff to track actions of a specific user.

Then keep the various versions/views of a file need to be kept separate
from each other in the pagecache, which involves having a separate
inode/address_space for each filehandle. On the other hand, when two
developers are working with the same revision they expect UNIX sharing
semantics, so in these cases at least the address_space does need to be
shared.

This actually should work as a result of how Coda handles container
files as long as we agressively unhash dentries and have iget return new
inodes each time, a checked-out revision can then be stored in a
separate container file. But as a result there would be many more
upcalls to userspace, i.e. a serious performance penalty.

Jan

2001-07-24 19:15:52

by Andrew Pimlott

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, Jul 24, 2001 at 07:14:02PM +0200, Jerome de Vivie wrote:
> Andrew Pimlott a ?crit :
> > per-user? So how do I let another developer look at what I'm
> > working on? In ClearCase, it's one private version per-view, which
> > is much more flexible.
>
> No.

So you're saying if I have a file on my UFL, there's no way anyone
else can see it unless I copy it to another filesystem?

> > If I have to check in all files at once, it is even more important
> > that I be able to have multiple "views". What if, in the middle of
> > a big change, I make a small fix that I want to check in
> > independently?
>
> It's impossible. If you want to go back, you have to put a label on
> each step you want and, set the $CONFIGURATION to this label.

Again, this seems exceedingly restrictive.

> > You just threw away the most useful feature of filesystem
> > integration: comparing different versions. How do I do this if
> > everything is keyed off $CONFIGURATION?
>
> With 2 process and shared memory, it should be possible but i haven't
> though deeper.

Standard tools, please. (Can I tell you how painful I would find
ClearCase if I had to use their diff instead of GNU diff?)

> > I really don't see what you've gained over CVS. (Once you add in
> > all the little things you didn't mention: setting up the filesystem,
> > adding files to version control, etc, I don't think you can argue
> > that your system is simpler.)
>
> A developper has a minimum operation to do:
> -set his configuration
> -commit his work
>
> That's all ! No branch, no config-spec, no view server, no vob server,
> no registery server, no ci, no co, ...

I said, compared to CVS, not ClearCase! The analog in CVS is
- cvs checkout
- cvs update

The only advantages your have are 1) you don't have to specify the
repository/modules and 2) you're faster.

Also, you have left out at least one important step. Say I set
CONFIGURATION=A, do my work, and label it with B. How do other
developers know to switch to B? What if they're already working
off A--how do they merge up their private copies?

If you say your system is not intended for concurrent development, I
think it is not worth doing. And from what I can see, you're
building in restrictions that would make concurrent development
hard.

> > How will the existing merge tool work, if a single process can only
> > see one $CONFIGURATION?
>
> Same as for diff (...but now, obolete)

But you said "existing" merge tool.

What do you mean by obsolete? You don't mean to say that the need
for merging is eliminated, do you?

> Using the same system
> (labelization) to identify both individual version and configuration
> is also a neat idea.

It is neat, but eventually will become a pain in the neck. You'll
need a way to come up with a unique label for every checkin, so you
will inevitably just decide to use incrementing numbers, so pretty
soon you will end up with files having versions 1, 5, 329, and
18473. Ugh.

Andrew

2001-07-24 23:12:44

by Jerome de Vivie

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

Andrew Pimlott a ?crit :
>
> On Tue, Jul 24, 2001 at 07:14:02PM +0200, Jerome de Vivie wrote:
> > Andrew Pimlott a ?crit :
> > > per-user? So how do I let another developer look at what I'm
> > > working on? In ClearCase, it's one private version per-view, which
> > > is much more flexible.
> >
> > No.
>
> So you're saying if I have a file on my UFL, there's no way anyone
> else can see it unless I copy it to another filesystem?

Yes, that's exactly what i want: a working file must be private unless
the developper as decide to share it . An individual developper is not
impacted by external changes (like with the "LATEST" rule in clearcase)
and doesn't interact with other developpers. That's very important in
SCM !

> > > If I have to check in all files at once, it is even more important
> > > that I be able to have multiple "views". What if, in the middle of
> > > a big change, I make a small fix that I want to check in
> > > independently?
> >
> > It's impossible. If you want to go back, you have to put a label on
> > each step you want and, set the $CONFIGURATION to this label.
>
> Again, this seems exceedingly restrictive.

Regression is exactly what we try to avoid when we work under SCM. What
is done is done. If you really want, you can labelize after each write
but your must NOT modify the past !

Labelizing is the same things that doing ci/co but at a coarser grain
This level must exactly match your needs ( ... and with less overhead).

> > > You just threw away the most useful feature of filesystem
> > > integration: comparing different versions. How do I do this if
> > > everything is keyed off $CONFIGURATION?
> >
> > With 2 process and shared memory, it should be possible but i haven't
> > though deeper.
>
> Standard tools, please. (Can I tell you how painful I would find
> ClearCase if I had to use their diff instead of GNU diff?)

Ok, now i am more oriented throw a userspace SCM. Perhaps i will use a
naming convention a la clearcase (ie: filename@@label ) and, with this
namespace, you will be able to use all your favourite UNIX tools.

> I said, compared to CVS, not ClearCase! The analog in CVS is
> - cvs checkout
> - cvs update
>
> The only advantages your have are 1) you don't have to specify the
> repository/modules and 2) you're faster.

CVS deals with versionning and not configuration management, so you
can't compare them.

>
> Also, you have left out at least one important step. Say I set
> CONFIGURATION=A, do my work, and label it with B. How do other
> developers know to switch to B?

Labels are public and i hope there are meeting organized between
developpers !

> What if they're already working
> off A--how do they merge up their private copies?

Like the naming scheme above:
$merge filename@@A filename@@B

>
> If you say your system is not intended for concurrent development, I
> think it is not worth doing. And from what I can see, you're
> building in restrictions that would make concurrent development
> hard.

??????????????????????????
? Where have I said this ?
??????????????????????????

> > Using the same system
> > (labelization) to identify both individual version and configuration
> > is also a neat idea.

>
> It is neat, but eventually will become a pain in the neck. You'll
> need a way to come up with a unique label for every checkin, so you
> will inevitably just decide to use incrementing numbers, so pretty
> soon you will end up with files having versions 1, 5, 329, and
> 18473. Ugh.

The first goal of SCM is to physicaly identify your software . This
goal is achieve. After, it's up to you to choose a good naming
convention for labels. And yes, it's neat ;-)

regards,

j.

--
Jerome de Vivie jerome . de - vivie @ wanadoo . fr

2001-07-24 23:58:22

by Peter A. Castro

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

On Tue, 24 Jul 2001, Jerome de Vivie wrote:

> Rik van Riel a ?crit :
> >
> > On Mon, 23 Jul 2001, Larry McVoy wrote:
> >
> > > b) Filesystem support for SCM is really a flawed approach.
> >
> > Agreed. I mean, how can you cleanly group changesets and
> > versions with a filesystem level "transparent" SCM ?
>
> With label !
>
> In my initial post, i have explain that labels are used to
> identify individual files AND are also uses to select for
> each files of a set, one version (= select a configuration).
> It works !

.. and essentially you've re-created Rational's ClearCase implementation.
The problem becomes: how will you specify that label for file version
selection? Will it be part of the filename? Will it be implied in a
configuration specificier (config spec)? Will that config spec be global
to the system, local to the user or just that session? Will it be stored
in a file or part of the filesystems mount parameters?

These are the same problems Rational faced with ClearCase and it's mvfs.
To maintain a config spec design you'll need essentially a database to
contain the labels and their relationship to a given version & branch of
a particular file. So, suddenly it's not just a filesystem, it's now a
database with external chunks of data.

> > The goal of an SCM is to _manage_ versions and changesets,
> > if it doesn't do that we're back at CVS's "every file its
> > own versioning and to hell with manageability" ...

Really, the whole of the problem needs to be reviewed, not just the
individual parts. I seem to recall someone implementing a filesystem
that stored the files in a Postgres database that did versioning of files
in a simple way. I thought that was rather novel, at the time. You
really need to think out the unifying mechanism first. The storage of
versions of each file will be an end result. Think more about how the
user will actually use it and manipulate the selection.

> versioning is yet a first step.
>
> j.

--
Peter A. Castro <[email protected]> or <[email protected]>
"Cats are just autistic Dogs" -- Dr. Tony Attwood

2001-07-25 00:50:21

by Andrew Pimlott

[permalink] [raw]

Subject: Re: Yet another linux filesytem: with version control

[ This will probably be my last message to include linux lists; tell
me if you want a Cc: ]

On Wed, Jul 25, 2001 at 01:14:57AM +0200, Jerome de Vivie wrote:
> Andrew Pimlott a ecrit :
> > So you're saying if I have a file on my UFL, there's no way anyone
> > else can see it unless I copy it to another filesystem?
>
> Yes, that's exactly what i want: a working file must be private unless
> the developper as decide to share it . An individual developper is not
> impacted by external changes (like with the "LATEST" rule in clearcase)
> and doesn't interact with other developpers. That's very important in
> SCM !

Of course, there must be isolation, I'm just saying you picked the
wrong level at which to do it. In ClearCase, I have a view that
only I usually work on, but I can still ask some other developer to
look at the changes I'm making.

> > > > If I have to check in all files at once, it is even more important
> > > > that I be able to have multiple "views". What if, in the middle of
> > > > a big change, I make a small fix that I want to check in
> > > > independently?
> > >
> > > It's impossible. If you want to go back, you have to put a label on
> > > each step you want and, set the $CONFIGURATION to this label.
> >
> > Again, this seems exceedingly restrictive.
>
> Regression is exactly what we try to avoid when we work under SCM. What
> is done is done. If you really want, you can labelize after each write
> but your must NOT modify the past !

I must not have been clear. What I'm saying is that your scheme
makes it impossible for one user to have multiple independent
working branches at the same time. In ClearCase, I can have one
view for my big project that I won't check in (or at least, won't
merge into a common branch) for a month, but another view on which I
make bug fixes that should go quickly into the mainstream.

> Ok, now i am more oriented throw a userspace SCM. Perhaps i will use a
> naming convention a la clearcase (ie: filename@@label ) and, with this
> namespace, you will be able to use all your favourite UNIX tools.

Cool! Of course, now you have non-standard filesystem semantics; I
don't mind, but I don't know about the VFS guys :-) BTW, in
ClearCase, it's filename@@/label, and filename@@ is a directory that
you can chdir into (but that doesn't show up in directory listings).

> > I said, compared to CVS, not ClearCase! The analog in CVS is
> > - cvs checkout
> > - cvs update
> >
> > The only advantages your have are 1) you don't have to specify the
> > repository/modules and 2) you're faster.
>
> CVS deals with versionning and not configuration management, so you
> can't compare them.

Oh, come on. "Configuration management" is at most a thin layer
over version control (and at least a fancy term for the same thing).
At least, according to any definition I've ever seen. What
definition do you use? Anyway, ClearCase is certainly no more
"configuration management" than CVS. If you're talking about
"change set" stuff (ie, Rational's "Unified Change Management"),
then compare to "something like CVS, except that works in change
sets".

What specifically is not comparible in my example? If I had added
"cvs tag", would that be better?

> > Also, you have left out at least one important step. Say I set
> > CONFIGURATION=A, do my work, and label it with B. How do other
> > developers know to switch to B?
>
> Labels are public and i hope there are meeting organized between
> developpers !

Put it this way: In your scheme, every checkin implicitly and
automatically creates a branch (right?). So there is significant
branch management to do, and you haven't given any hints as to how
to do it, which makes me skeptical, especially since branches aren't
first-class objects. But maybe an example would help me.

Here is another issue: say A and B are labels, and I set
CONFIGURATION=A and change file a. Now, I set CONFIGURATION=B,
change file b, and try to create a new label. Presumably this
should fail, but how exactly? I think this will be hard to do
cleanly at the kernel level. In order to get reasonably
diagnostics, you'll need user-space tools that can do all the same
logic, which suggests that this should all be user-space to begin
with.

> > If you say your system is not intended for concurrent development, I
> > think it is not worth doing. And from what I can see, you're
> > building in restrictions that would make concurrent development
> > hard.
>
> ??????????????????????????
> ? Where have I said this ?
> ??????????????????????????

Of course, you haven't said this, but I think you've created design
limitations that imply it. Things like views only visible to one
user; one user can have only one view; can't have one command access
multiple versions (fixed with "version-extended" names); and the
branching issue I mentioned; all make it seem unsuitable for
large-scale development.

I hope you can change this impression!

Andrew