LinuxLists.cc - Versioning File Systems?

2002-04-18 15:06:00

Subject: Versioning File Systems?

I just read an article mentioned on Slashdot,
<http://www.sigmaxi.org/amsci/Issues/Comsci02/Compsci2002-05.html>.

It is a fascinating short summary of the history of hard disks (they
still use the same fundamental design as the very first one) and an
update on current technology (disks are no longer aluminum). It also
looks at today's 120 gigabyte disk and muses over the question of how
we might ever put an imagined 120 terabyte disk to use. And the got
me thinking various thoughts, one turns into a question for this list:
It there any work going on to make a versioning file system?

I remember in VMS that I could accumulate "myfile.txt;1",
"myfilw.txt;2", etc., until the local admin got pissed at me for using
up all the disk space with my several megabytes of redundant files.

It is time for Linux to start figuring out ways to use all the disk
space that is on the horizon! In a few weeks the sweet spot will be
to buy a pair of 80 GB disks. Disks are outpacing even Red Hat's
"everything" install.

Seriously, I have a server in the basement with a pair of 60 GB RAID 1
disks the protect me against likely hardware failure, but they don't
protect me against: "# rm rf /*". They don't even let me easily back
out a bad RPM from Red Hat.

I guess I am suggesting the (more constructive) discussions over
desirable Bitkeeper and CVS features consider what it would mean for a
filesystem to absorb some of the key underlying features of each.

As a first crack, I am imagining a file system that records every (or
nearly every) change to every file with time stamps and sequence
numbering. I don't know what all the primitives would be. It
obviously seems much of making sense of it all would have to happen in
userland. Making this too powerful almost brings up some science
fiction problems of time travel through parallel universes, but I
think it could be kept grounded by looking at it as a powerful version
of existing backup systems: they don't have such problems because they
are too cumbersome for them to arise very often.

-kb, the Kent who thinks his journaled filesystem on redundant disks
next needs a better memory.

2002-04-18 15:20:27

by Larry McVoy

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, Apr 18, 2002 at 11:05:58AM -0400, Kent Borg wrote:
> Seriously, I have a server in the basement with a pair of 60 GB RAID 1
> disks the protect me against likely hardware failure, but they don't
> protect me against: "# rm rf /*". They don't even let me easily back
> out a bad RPM from Red Hat.

To protect agains rm -rf /, you need backups, not raid. We do that here
with scripts which just mirror the whole file system to a different drive
every night. Saves us a ton of grief and gives us a very simplistic
version control system, I do stuff like

diff foo.c /nightly/$PWD

all the time for data which isn't in a version control system.

> I guess I am suggesting the (more constructive) discussions over
> desirable Bitkeeper and CVS features consider what it would mean for a
> filesystem to absorb some of the key underlying features of each.

It's certainly a fun space, file system hacking is always fun. There
doesn't seem to be a good match between file system operations and
SCM operations, especially stuff like checkin. write != checkin.
But you can handle that with

echo "I'm done" >> foo.c/checkin

i.e., when the file is treated as a directory, use the rest of the
pathname as the operation. Could be cool.

One other thing you might consider, is gluing an SCM system into
the user level NFS server. That has the nice attribute that you
can export your file system/SCM system. And/Or samba.

The real issue with all of this is that you can make it work
locally by extending your pathname sematics or some other
thing, but I've never figured out how to make it work remotely
without hacking the remote OS. Cross platform is important,
at least it is commercially.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-04-18 15:28:15

by Lars Marowsky-Bree

[permalink] [raw]

Subject: Re: Versioning File Systems?

On 2002-04-18T08:20:25,
Larry McVoy <[email protected]> said:

> It's certainly a fun space, file system hacking is always fun. There
> doesn't seem to be a good match between file system operations and
> SCM operations, especially stuff like checkin. write != checkin.
> But you can handle that with

Either that, or heuristics - file not written to / opened for writing in x
minutes -> commit.

That would actually be pretty interesting because it might also allow you to
back out editor screwups ;-)

However, deducing change sets is more difficult.

Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Immortality is an adequate definition of high availability for me.
--- Gregory F. Pfister

2002-04-18 16:51:27

by Kerl, John

[permalink] [raw]

Subject: RE: Versioning File Systems?

Is it just me or is this sounding a lot like
ClearCase? In their filesystem (I don't know
if they implement it in user space or kernel
space, but I do remember ClearCase on Solaris
did do some kernel mods), file names are really
directories, e.g. foo.c is current; foo.c/main/3
is a (perhaps different) specified version.

& for recovering from editor screwups, one could
easily imagine "vi foo.c/-3" to recover the file
from 3 saves ago, etc.

By "deducing change sets", is the question, how
to associate various versions of *different* files?
I.e. recovering an editor screw-up of a single
file is easy, but how do you back out that RPM
you just installed, which might have affected
many files? Here ClearCase uses "labels",
which associates *one* name with the specified
versions of many files. So you could set your
"view" (in ClearCase terms) to /tuesday, etc.

When I used ClearCase in prior jobs, I loved
it -- it was a joy *because* it looked like
a plain old filesystem (e.g. vi foo.c) when you
wanted to think of it that way, but it also
had full-featured version control.

Is the idea being discussed to open-source
something of that nature, and make it into
a filesystem?

-----Original Message-----
From: Lars Marowsky-Bree [mailto:[email protected]]
Sent: Thursday, April 18, 2002 8:28 AM
To: [email protected]
Subject: Re: Versioning File Systems?

On 2002-04-18T08:20:25,
Larry McVoy <[email protected]> said:

> It's certainly a fun space, file system hacking is always fun. There
> doesn't seem to be a good match between file system operations and
> SCM operations, especially stuff like checkin. write != checkin.
> But you can handle that with

Either that, or heuristics - file not written to / opened for writing in x
minutes -> commit.

That would actually be pretty interesting because it might also allow you to
back out editor screwups ;-)

However, deducing change sets is more difficult.

Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Immortality is an adequate definition of high availability for me.
--- Gregory F. Pfister

2002-04-18 16:55:32

by Kent Borg

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, Apr 18, 2002 at 05:27:58PM +0200, Lars Marowsky-Bree wrote:
> Either that, or heuristics - file not written to / opened for writing in x
> minutes -> commit.

Something like that.

We already have a hierarchy of degrees of saving:

1. live state - the state of a program's data, possibly extended by
undo/redo features.

2. file - saved file, possibly extended by features like emacs'
"file.c~"

3. revision - revision checked into some revision control system

4. checkpoint or tag - revision branded with a symbolic name in a
revision control system

I am envisioning a richer version of the file stage. Just as users
currently decide when to check in a version and when to checkpoint
versions, I am imagining that sort of decision would still be made,
but there would be a lower level of granularity that could be looked
at if desired. Big infrequent changes to a file would all be
recorded, and frequent little changes would be subject to some
heuristic. It doesn't make sense to record a file's state so often
that it isn't even self-consistent. For example, recording all the
changes over the course of the save of a big Star Office drawing would
be silly, most would be intermediate and dependent on the changing
epheneral internal state of Star Office. I don't know the details of
a reasonable heuristic other than obvious things such as when a file
of flushed or closed or not touched for some significant time.

> That would actually be pretty interesting because it might also allow you to
> back out editor screwups ;-)

Writing an editor to take advantage of such underlying features would
be pretty interesting too, it could be integrated into undo/redo
features.

Navigating such an historical fabric turns into a really interesting
user interface problem.

> However, deducing change sets is more difficult.

I think change sets for source code would still be based on versions
declared by a human to be of some specific interest. But changes sets
for a computer's configuration might be implicit in the running of rpm
or chkconfig, or reboots of the system, or saved edits to
configuration files. Etc.

Certainly what I am envisioning would have immediate use in looking at
changes to specific files, but would require more structure imposed to
be useful a system configuration management tool or source code
control system.

I do point out that recently Microsoft announced some sort of feature
to let users backout system changes. It sounds useful to me and I run
Linux, but should that have some basic system support and not be
kludged in? (For example, such a feature could be added to rpm, but
it would only be good at capturing things done by rpm.) Would a
versioning filesystem be part of doing it the right way?

-kb

2002-04-18 17:03:56

by Joshua MacDonald

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, Apr 18, 2002 at 12:55:30PM -0400, Kent Borg wrote:
> On Thu, Apr 18, 2002 at 05:27:58PM +0200, Lars Marowsky-Bree wrote:
>
> > That would actually be pretty interesting because it might also allow you to
> > back out editor screwups ;-)
>
> Writing an editor to take advantage of such underlying features would
> be pretty interesting too, it could be integrated into undo/redo
> features.
>
> Navigating such an historical fabric turns into a really interesting
> user interface problem.

There was a paper presented at SCM8 on just such a system. They used Emacs.

Multi-Grain Version Control in the Historian System
Makram Abu-Shakra and Gene L. Fisher
California Polytechnic State University, USA

This paper describes Historian, a version control system that supports
comprehensive versioning and features to aid history
navigation. Comprehensive versioning is supported through frequent and
automated creation of versions which typically results in a large number
of versions. To reduce user overhead in history navigation, the
hierarchical structure present in most documents is utilized to support
fine-grained version control. The series of document editing operations is
also organized hierarchically and can be used for navigation as well.

-josh

2002-04-18 17:24:20

by Florin Iucha

[permalink] [raw]

Subject: Re: Versioning File Systems?

http://www.netcraft.com.au/geoffrey/katie/

florin

On Thu, Apr 18, 2002 at 09:51:13AM -0700, Kerl, John wrote:
>
> {SNIP}
> Is the idea being discussed to open-source
> something of that nature, and make it into
> a filesystem?

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4

Attachments:

(No filename) (335.00 B)
(No filename) (232.00 B)
Download all attachments

2002-04-18 18:14:58

by Kent Borg

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, Apr 18, 2002 at 12:24:19PM -0500, Florin Iucha wrote:
> http://www.netcraft.com.au/geoffrey/katie/

Very interesting.

Looking at the docs that come in the sources Katie appears to be
(mostly) perl code that stores its data in Postgresql and uses NFS to
loop it back as filesystem of normal looking files, hidden directories
for access to old versions, and command a line program for doing all
other CVS-ish functions.

Glad to see there is such a nice conceptual testbed for what I was
looking for, but this isn't it directly.

Am I crazy or would it be possible to create a versioning file system
on the model of the cannonical ext2? It would sit on top of a rather
stupid block device and present something that, at first glance, looks
like a traditional filesystem. A complete superset, create a file by
creating a file, read a file by reading a file, delete a file by
deleting a file, and make it all happen at a low enough level to boot
from it even.

The extra features would, of course, need additional means for access;
I don't know the ramifications of a such a complete filesystem having
such things like extra hidden-ish directories for accessing old
versions. (I worry about standard utilities tripping over virtual
contents--I know that /proc and /dev do strange things when I forget
and pretend they are simply files.)

-kb

2002-04-18 18:11:59

by Jeremy Jackson

[permalink] [raw]

Subject: Re: Versioning File Systems?

For the RPM case, where the RPM db can be out of sync
with the filesystem if the rpm command is interrupted
(not to mention db corruption), simply use LVM or EVMS
snapshot of fs before doing anything. Haven't tried yet, but
working towards this:

eg:

init 1 ; go to single user mode
; initiate snapshot of /, /usr, /var etc - everything rpm touches
rpm -Fvh * ;
; oops power cable came out in middle
; restore snapshot to be live version (how?)
init 3 ; go back about your business, nothing to see here.

Jeremy

----- Original Message -----
From: "Kent Borg" <[email protected]>
To: <[email protected]>
Sent: Thursday, April 18, 2002 8:05 AM
Subject: Versioning File Systems?

> I just read an article mentioned on Slashdot,
> <http://www.sigmaxi.org/amsci/Issues/Comsci02/Compsci2002-05.html>.
>
> It is a fascinating short summary of the history of hard disks (they
> still use the same fundamental design as the very first one) and an
> update on current technology (disks are no longer aluminum). It also
> looks at today's 120 gigabyte disk and muses over the question of how
> we might ever put an imagined 120 terabyte disk to use. And the got
> me thinking various thoughts, one turns into a question for this list:
> It there any work going on to make a versioning file system?
>
> I remember in VMS that I could accumulate "myfile.txt;1",
> "myfilw.txt;2", etc., until the local admin got pissed at me for using
> up all the disk space with my several megabytes of redundant files.
>
> It is time for Linux to start figuring out ways to use all the disk
> space that is on the horizon! In a few weeks the sweet spot will be
> to buy a pair of 80 GB disks. Disks are outpacing even Red Hat's
> "everything" install.
>
> Seriously, I have a server in the basement with a pair of 60 GB RAID 1
> disks the protect me against likely hardware failure, but they don't
> protect me against: "# rm rf /*". They don't even let me easily back
> out a bad RPM from Red Hat.
>
> I guess I am suggesting the (more constructive) discussions over
> desirable Bitkeeper and CVS features consider what it would mean for a
> filesystem to absorb some of the key underlying features of each.
>
> As a first crack, I am imagining a file system that records every (or
> nearly every) change to every file with time stamps and sequence
> numbering. I don't know what all the primitives would be. It
> obviously seems much of making sense of it all would have to happen in
> userland. Making this too powerful almost brings up some science
> fiction problems of time travel through parallel universes, but I
> think it could be kept grounded by looking at it as a powerful version
> of existing backup systems: they don't have such problems because they
> are too cumbersome for them to arise very often.
>
>
> -kb, the Kent who thinks his journaled filesystem on redundant disks
> next needs a better memory.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-04-18 23:25:30

by Stephen Oberholtzer

[permalink] [raw]

Subject: Re: Versioning File Systems?

At 08:20 AM 4/18/2002 -0700, Larry McVoy wrote:
>It's certainly a fun space, file system hacking is always fun. There
>doesn't seem to be a good match between file system operations and
>SCM operations, especially stuff like checkin. write != checkin.
>But you can handle that with

How about
fsync(fd) || close(fd) == checkin?

--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

2002-04-19 04:17:03

by Mark Mielke

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, Apr 18, 2002 at 07:19:47PM -0400, Stevie O wrote:
> At 08:20 AM 4/18/2002 -0700, Larry McVoy wrote:
> >It's certainly a fun space, file system hacking is always fun. There
> >doesn't seem to be a good match between file system operations and
> >SCM operations, especially stuff like checkin. write != checkin.
> >But you can handle that with
> How about
> fsync(fd) || close(fd) == checkin?

Source management systems usually work much better given explicit
control for the user.

ClearCase has MVFS to do what is being suggested. Compare:

cat a.c # currently selected version of a.c
cat a.c@@/main/5 # version 5 on the main branch

cat a.c@@LINUX_2.4.18 # the version of a.c selected by the
# label 'LINUX_2.4.18'

Having a file system that implicitly performs these operations is
not very useful.

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-04-20 08:44:02

by Thomas Zimmerman

[permalink] [raw]

Subject: Re: Versioning File Systems?

On 18-Apr 12:55, Kent Borg wrote:
[snip]
> I am envisioning a richer version of the file stage. Just as users
> currently decide when to check in a version and when to checkpoint
> versions, I am imagining that sort of decision would still be made,
> but there would be a lower level of granularity that could be looked
> at if desired. Big infrequent changes to a file would all be
> recorded, and frequent little changes would be subject to some
> heuristic. It doesn't make sense to record a file's state so often
> that it isn't even self-consistent. For example, recording all the
> changes over the course of the save of a big Star Office drawing would
> be silly, most would be intermediate and dependent on the changing
> epheneral internal state of Star Office. I don't know the details of
> a reasonable heuristic other than obvious things such as when a file
> of flushed or closed or not touched for some significant time.

Why not commit versions on sync and close. That would seem to carry the least
surprise for the user. When I sync a filesystem/dir that would seem like a time
to make sure any changes make it to disk. And when a file is closed you don't
expect any more changes to that file.

>
> > That would actually be pretty interesting because it might also allow you to
> > back out editor screwup ;-)
>
> Writing an editor to take advantage of such underlying features would
> be pretty interesting too, it could be integrated into undo/redo
> features.
>
> Navigating such an historical fabric turns into a really interesting
> user interface problem.

Why teach current tools anything about it at all? Make this a tool you run on
the filesystem. If you _need_ to see earlier versions, it far past time to be
hoping emacs did the right thing.

> > However, deducing change sets is more difficult.
>
> I think change sets for source code would still be based on versions
> declared by a human to be of some specific interest. But changes sets
> for a computer's configuration might be implicit in the running of rpm
> or chkconfig, or reboots of the system, or saved edits to
> configuration files. Etc.
>
> Certainly what I am envisioning would have immediate use in looking at
> changes to specific files, but would require more structure imposed to
> be useful a system configuration management tool or source code
> control system.
[snip MS vaperware envy]

Thomas

Attachments:

(No filename) (2.35 kB)
(No filename) (232.00 B)
Download all attachments

2002-04-23 22:45:47

by Bill Davidsen

[permalink] [raw]

Subject: Re: Versioning File Systems?

On Thu, 18 Apr 2002, Kent Borg wrote:

> I just read an article mentioned on Slashdot,
> <http://www.sigmaxi.org/amsci/Issues/Comsci02/Compsci2002-05.html>.
>
> It is a fascinating short summary of the history of hard disks (they
> still use the same fundamental design as the very first one) and an
> update on current technology (disks are no longer aluminum). It also
> looks at today's 120 gigabyte disk and muses over the question of how
> we might ever put an imagined 120 terabyte disk to use. And the got
> me thinking various thoughts, one turns into a question for this list:
> It there any work going on to make a versioning file system?
>
> I remember in VMS that I could accumulate "myfile.txt;1",
> "myfilw.txt;2", etc., until the local admin got pissed at me for using
> up all the disk space with my several megabytes of redundant files.

I seem to remember that some CD filesystem does that, and you can see
the versions with Linux if you mount with the right options.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.