2004-03-10 19:34:51

by Jörn Engel

[permalink] [raw]
Subject: [PATCH for testing] cow behaviour for hard links

Hi!

<disclaimer>
This is ugly, unfinished and some may consider it pure evil. So what!
</disclaimer>

Some month ago, I claimed that both filesystems and the buffer cache
should try to identify identical blocks, merge those to save space and
break them back up, when one copy gets written to - commonly known as
copy-on-write or cow.

Yeah, well, here it is, sortof. It works on file granularity instead
of block, doesn't do the cow part inside the kernel (userspace get's
an error and has to do it). But it works for ext2 and ext3 and is
relatively short.

Interna:
I introduced a new flag for inodes, switching between normal behaviour
and cow for hard links. Flag can be changed and queried per fcntl().
Ext[23] needed a bit of tweaking to write this flag to disk. open()
will fail, when a) cowlink flags is set, b) inode has more than one
link and c) write access is requested.

Worth some discussion is the directory stuff. The flag has no meaning
per se for directories, but gets inherited to new files. Also, when
moving or linking a non-cow file into a cow directory or vice versa,
this will fail. Should it fail? I don't know but it made some sense
to me. If you have a good reason either way, please tell me.

So here it is, please flame now!

J?rn

--
Fantasy is more important than knowledge. Knowledge is limited,
while fantasy embraces the whole world.
-- Albert Einstein


Makefile | 2 +-
fs/ext2/inode.c | 3 ++-
fs/ext3/inode.c | 4 +++-
fs/fcntl.c | 18 ++++++++++++++++++
fs/namei.c | 32 ++++++++++++++++++++++++++++++++
fs/open.c | 29 +++++++++++++++++++++++++++++
include/linux/fcntl.h | 3 +++
include/linux/fs.h | 3 +++
8 files changed, 91 insertions(+), 3 deletions(-)

--- linux-2.6.1/Makefile~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/Makefile 2004-03-08 00:47:45.000000000 +0100
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 1
-EXTRAVERSION =
+EXTRAVERSION = moo

# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
--- linux-2.6.1/include/linux/fcntl.h~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/include/linux/fcntl.h 2004-03-08 00:47:45.000000000 +0100
@@ -23,6 +23,9 @@
#define DN_ATTRIB 0x00000020 /* File changed attibutes */
#define DN_MULTISHOT 0x80000000 /* Don't remove notifier */

+#define F_SETCOW (F_LINUX_SPECIFIC_BASE+3)
+#define F_GETCOW (F_LINUX_SPECIFIC_BASE+4)
+
#ifdef __KERNEL__

#if BITS_PER_LONG == 32
--- linux-2.6.1/include/linux/fs.h~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/include/linux/fs.h 2004-03-08 00:47:45.000000000 +0100
@@ -137,6 +137,9 @@
#define S_DEAD 32 /* removed, but still open directory */
#define S_NOQUOTA 64 /* Inode is not counted to quota */
#define S_DIRSYNC 128 /* Directory modifications are synchronous */
+#define S_COWLINK 256 /* Hard links have copy on write semantics.
+ * This flag has no meaning for directories,
+ * but is inherited to directory children */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system
--- linux-2.6.1/fs/fcntl.c~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/fs/fcntl.c 2004-03-08 01:18:59.000000000 +0100
@@ -282,6 +282,17 @@

EXPORT_SYMBOL(f_delown);

+static long fcntl_setcow(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = filp->f_dentry->d_inode;
+ if (arg)
+ inode->i_flags |= S_COWLINK;
+ else
+ inode->i_flags &= ~S_COWLINK;
+ inode->i_sb->s_op->write_inode(inode, 0);
+ return 0;
+}
+
static long do_fcntl(unsigned int fd, unsigned int cmd,
unsigned long arg, struct file * filp)
{
@@ -346,6 +357,13 @@
case F_NOTIFY:
err = fcntl_dirnotify(fd, filp, arg);
break;
+ case F_SETCOW:
+ err = fcntl_setcow(filp, arg);
+ break;
+ case F_GETCOW:
+ err = (filp->f_dentry->d_inode->i_flags & S_COWLINK) /
+ S_COWLINK;
+ break;
default:
break;
}
--- linux-2.6.1/fs/namei.c~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/fs/namei.c 2004-03-09 10:58:24.000000000 +0100
@@ -1141,6 +1141,8 @@
if (!error) {
inode_dir_notify(dir, DN_CREATE);
security_inode_post_create(dir, dentry, mode);
+ dentry->d_inode->i_flags |= dir->i_flags & S_COWLINK;
+ dentry->d_inode->i_sb->s_op->write_inode(dentry->d_inode, 0);
}
return error;
}
@@ -1516,6 +1518,7 @@
if (!error) {
inode_dir_notify(dir, DN_CREATE);
security_inode_post_mkdir(dir,dentry, mode);
+ dentry->d_inode->i_flags |= dir->i_flags & S_COWLINK;
}
return error;
}
@@ -1814,6 +1817,13 @@
return -EXDEV;

/*
+ * Cowlink attribute is inherited from directory, but here,
+ * the inode already has one. If they don't match, bail out.
+ */
+ if ((dir->i_flags ^ old_dentry->d_inode->i_flags) & S_COWLINK)
+ return -EMLINK;
+
+ /*
* A link to an append-only or immutable file cannot be created.
*/
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
@@ -1991,6 +2001,24 @@
return error;
}

+static int cow_allow_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir)
+{
+ /* source and target share directory: allow */
+ if (old_dir == new_dir)
+ return 0;
+ /* source and target directory have identical cowlink flag: allow */
+ if (! ((old_dentry->d_inode->i_flags ^ new_dir->i_flags) & S_COWLINK))
+ return 0;
+ /* We could always fail here, but cowlink flag is only defined for
+ * files and directories, so let's allow special files */
+ if (!S_ISREG(old_dentry->d_inode->i_mode))
+ return -EMLINK;
+ if (!S_ISDIR(old_dentry->d_inode->i_mode))
+ return -EMLINK;
+ return 0;
+}
+
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
@@ -2014,6 +2042,10 @@
if (!old_dir->i_op || !old_dir->i_op->rename)
return -EPERM;

+ error = cow_allow_rename(old_dir, old_dentry, new_dir);
+ if (error)
+ return error;
+
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

--- linux-2.6.1/fs/open.c~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/fs/open.c 2004-03-10 21:26:24.000000000 +0100
@@ -757,6 +757,33 @@

EXPORT_SYMBOL(filp_open);

+/*
+ * Files with the S_COWLINK flag set cannot be written to, if more
+ * than one hard link to them exists. Ultimately, this function
+ * should copy the inode, assign the copy to the dentry and lower use
+ * count of the old inode - one day.
+ * For now, it is sufficient to return an error and let userspace
+ * deal with the messy part. Not exactly the meaning of
+ * copy-on-write, but much better than writing to fifty files at once
+ * and noticing month later.
+ *
+ * Yes, this breaks the kernel interface and is simply wrong. This
+ * is intended behaviour, so Linus will not merge the code before
+ * it is complete. Or will he?
+ */
+static int break_cow_link(struct inode *inode)
+{
+ if (!(inode->i_flags & S_COWLINK))
+ return 0;
+ if (!S_ISREG(inode->i_mode))
+ return 0;
+ if (inode->i_nlink < 2)
+ return 0;
+ /* TODO: As soon as sendfile can do normal file copies, use that
+ * and always return 0 */
+ return -EMLINK;
+}
+
struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags)
{
struct file * f;
@@ -772,6 +799,8 @@
inode = dentry->d_inode;
if (f->f_mode & FMODE_WRITE) {
error = get_write_access(inode);
+ if (!error)
+ error = break_cow_link(inode);
if (error)
goto cleanup_file;
}
--- linux-2.6.1/fs/ext2/inode.c~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/fs/ext2/inode.c 2004-03-08 00:47:45.000000000 +0100
@@ -1020,6 +1020,7 @@
{
unsigned int flags = EXT2_I(inode)->i_flags;

+ inode->i_flags = flags;
inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
if (flags & EXT2_SYNC_FL)
inode->i_flags |= S_SYNC;
@@ -1191,7 +1192,7 @@

raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
- raw_inode->i_flags = cpu_to_le32(ei->i_flags);
+ raw_inode->i_flags = cpu_to_le32(inode->i_flags);
raw_inode->i_faddr = cpu_to_le32(ei->i_faddr);
raw_inode->i_frag = ei->i_frag_no;
raw_inode->i_fsize = ei->i_frag_size;
--- linux-2.6.1/fs/ext3/inode.c~cowlink 2004-03-08 00:47:30.000000000 +0100
+++ linux-2.6.1/fs/ext3/inode.c 2004-03-08 00:47:45.000000000 +0100
@@ -2447,6 +2447,7 @@
{
unsigned int flags = EXT3_I(inode)->i_flags;

+ inode->i_flags = flags;
inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
if (flags & EXT3_SYNC_FL)
inode->i_flags |= S_SYNC;
@@ -2629,7 +2630,8 @@
raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
- raw_inode->i_flags = cpu_to_le32(ei->i_flags);
+ raw_inode->i_flags = cpu_to_le32((ei->i_flags & ~S_COWLINK) |
+ (inode->i_flags & S_COWLINK));
#ifdef EXT3_FRAGMENTS
raw_inode->i_faddr = cpu_to_le32(ei->i_faddr);
raw_inode->i_frag = ei->i_frag_no;


2004-03-10 19:36:35

by Jörn Engel

[permalink] [raw]
Subject: [Program for testing] cow behaviour for hard links

And here is the userspace program to fiddle with the cowlink flag.

J?rn

--
"Security vulnerabilities are here to stay."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001

/**
* cowlink - set, unset and query the cowlink flag to files
*
* Copyright (C) 2004 J?rn Engel <[email protected]>
*
* This is *not* open source. The rights granted to anyone by the
* author are the rights to
* - look at the code,
* - make verbatim copies and distribute them,
* - compile it,
* - send patches to the author.
*
* Nothing else. If you don't like the license, feel free to discuss it
* over a beer. You can send me beer and discuss via email, if you like. ;)
*
* Seriously, this program shouldn't exist at all. It would make more sense
* to merge it into chmod, at least in the authors opinion. We'll see...
*
* Oh yeah, the compiled binary is free, there are no strings attached to
* it whatsoever.
*/
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#define F_LINUX_SPECIFIC_BASE 1024
#define F_SETCOW (F_LINUX_SPECIFIC_BASE+3)
#define F_GETCOW (F_LINUX_SPECIFIC_BASE+4)
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>

static long mode = -1;
static long get = 0;
static int recursive = 0;

static void do_file(const char *name)
{
int fd, ret;

fd = open(name, O_RDONLY);
if (fd < 0)
return perror(name);

switch (mode) {
default:
break;
case 0: /* fall through */
case 1:
ret = fcntl(fd, F_SETCOW, mode);
if (ret)
return perror(name);
}

if (get) {
ret = fcntl(fd, F_GETCOW);
if (ret < 0)
return perror(name);
printf("%d %s\n", ret, name);
}

close(fd);
}

static void do_dir(const char *name)
{
do_file(name);

DIR *dir = opendir(name);
if (!dir) {
switch (errno) {
case ENOTDIR:
return;
default:
return perror(name);
}
}

if (!recursive)
return;

char *newname, *end;
{
size_t len = strlen(name);
newname = malloc(len + 4096);
strcpy(newname, name);
newname[len] = '/';
end = newname + len + 1;
}

for (struct dirent *de = readdir(dir); de; de = readdir(dir)) {
strcpy(end, de->d_name);

struct stat status;
lstat(newname, &status);

if (S_ISDIR(status.st_mode)) {
if (!strcmp(de->d_name, "."))
continue;
if (!strcmp(de->d_name, ".."))
continue;
do_dir(newname);
}
if (S_ISREG(status.st_mode))
do_file(newname);
}
free(newname);
closedir(dir);
}

int main(int argc, char **argv)
{
for (;;) {
int oi = 1;
char short_opts[] = "cgRrs";
static const struct option long_opts[] = {
{"clear", 0, 0, 'c'},
{"get", 0, 0, 'g'},
{"recursive", 0, 0, 'r'},
{"set", 0, 0, 's'},
{0, 0, 0, 0}
};
int c = getopt_long(argc, argv, short_opts, long_opts, &oi);
if (c == -1)
break;
switch (c) {
case 'c':
mode = 0;
break;
case 'g':
get = 1;
break;
case 'R': /* fall through */
case 'r':
recursive = 1;
break;
case 's':
mode = 1;
break;
default:
fprintf(stderr, "BUG\n");
exit(EXIT_FAILURE);
}
}

while (optind < argc)
do_dir(argv[optind++]);

return 0;
}

2004-03-10 21:34:30

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

J?rn Engel wrote:
> Yeah, well, here it is, sortof. It works on file granularity instead
> of block, doesn't do the cow part inside the kernel (userspace get's
> an error and has to do it). But it works for ext2 and ext3 and is
> relatively short.
>
> Interna:
> I introduced a new flag for inodes, switching between normal behaviour
> and cow for hard links. Flag can be changed and queried per fcntl().
> Ext[23] needed a bit of tweaking to write this flag to disk. open()
> will fail, when a) cowlink flags is set, b) inode has more than one
> link and c) write access is requested.

I like the idea!

I keep many hard-linked kernel trees, and local version management is
done by "cp -rl" to make new trees and then change a few files in
those trees, compile, test etc. To prevent changes in one tree
accidentally affecting other trees, I "chmod -R a-r" all but the tree
I'm currently working on.

Thats works quite nicely, but it'd be even nicer to not need the
"chmod", and just be confident that writes won't clobber files in
another tree by accident.

-- Jamie

2004-03-10 22:17:43

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Wed, 10 March 2004 21:34:27 +0000, Jamie Lokier wrote:
>
> I like the idea!

Thanks!

> I keep many hard-linked kernel trees, and local version management is
> done by "cp -rl" to make new trees and then change a few files in
> those trees, compile, test etc. To prevent changes in one tree
> accidentally affecting other trees, I "chmod -R a-r" all but the tree
> I'm currently working on.
>
> Thats works quite nicely, but it'd be even nicer to not need the
> "chmod", and just be confident that writes won't clobber files in
> another tree by accident.

Same here, that was my main motivation. Ultimately I'd like to see a
lot of SCM functionality inside regular filesystems and this is just
the first step.

J?rn

--
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu

2004-03-10 23:08:13

by Sytse Wielinga

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Wednesday 10 March 2004 20:34, J?rn Engel wrote:
> Interna:
> I introduced a new flag for inodes, switching between normal behaviour
> and cow for hard links. Flag can be changed and queried per fcntl().
> Ext[23] needed a bit of tweaking to write this flag to disk. open()
> will fail, when a) cowlink flags is set, b) inode has more than one
> link and c) write access is requested.

I really like this! It makes me wonder why anyone hasn't come up with this
idea before...

Pitifully, it only solves half of the problem: it does make sure you can't
make any mistakes anymore, but it doesn't break the files up when a process
wants to write to a linked file. You still have to copy and move every time
you wish to edit a file. I'd like to have the kernel doing just that. I'll
have a look whether I can write a patch, but I don't promise anything as I
don't have much time :-)

Sytse

2004-03-10 23:46:14

by Sytse Wielinga

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

Oh, sorry, actually read your patch now, and see that you already have the
function outline, and nice remarks and all... I'll try to get the link
breaking part done now, if you don't mind :-)

Sytse

2004-03-12 17:49:30

by Sytse Wielinga

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

Hi,

I'm sorry to say this, but I stumbled upon a prohibitive problem...

The problem is that if a hard link would be broken up, one of the dentry's
would get a link to a new inode with a new inode number. This would mean that
right under the nose of the app, the file suddenly gets a new inode number.
Apps don't like that. If anyone has any suggestion that might make this
possible please say so, but I don't see it.

I have made some pretty thorough changes to your patch though. You can find
the patch attached to this email.
Things I've changed:

- moved break_cow_link from dentry_open in open.c to get_write_access in
namei.c. Putting it in dentry_open thoroughly breaks things, as it's too late
to save files from being truncated, for example.
- made something from the mess you made of ext2/ext3 inode flags :-P
- removed inheritance, as it's not useful in any way, not expected and breaks
linking of files with S_COWLINK set.
- made a go at supporting reiserfs, but failed... my changes are in the
patch, could somebody please have a look and tell me what I've missed?
- fcntl_setcow now spins a spinlock

Sytse


Attachments:
(No filename) (1.11 kB)
cowlink.diff (9.56 kB)
Download all attachments

2004-03-12 18:19:15

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Fri, 12 March 2004 18:48:57 +0100, Sytse Wielinga wrote:
>
> I'm sorry to say this, but I stumbled upon a prohibitive problem...
>
> The problem is that if a hard link would be broken up, one of the dentry's
> would get a link to a new inode with a new inode number. This would mean that
> right under the nose of the app, the file suddenly gets a new inode number.
> Apps don't like that. If anyone has any suggestion that might make this
> possible please say so, but I don't see it.

Different design. How about this:
- Files with just one link remain as-is.
- Linking a file:
- Create a new inode and move all data into new inode.
- Make old inode a pointer to new inode.
- Create a second pointer to new inode.
- Unlinking a file:
- Unlink pointer inode
- Unlink target inode
- Writing to a pointer inode:
- Make pointer inode a regular one.
- Copy target inode data into former pointer inode.
- Unlink target inode
- If target count was 1, we don't even need to copy.

Or in ascii art:

Regular file:

inode 1

Second link:

inode 1 ---> inode 2
^
inode 3 -----|

Write to inode 1:

inode 1

inode 3 ---> inode 2

Unlink of inode 3:

inode 1


Not quite as simple and straightforward as my first design, but it has
some advantages. Would even be possible to extend it and allow links
across different filesystems.

J?rn

--
A quarrel is quickly settled when deserted by one party; there is
no battle unless there be two.
-- Seneca

2004-03-12 18:29:19

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

This is assuming we keep current design.

On Fri, 12 March 2004 18:48:57 +0100, Sytse Wielinga wrote:
>
> I have made some pretty thorough changes to your patch though. You can find
> the patch attached to this email.
> Things I've changed:
>
> - moved break_cow_link from dentry_open in open.c to get_write_access in
> namei.c. Putting it in dentry_open thoroughly breaks things, as it's too late
> to save files from being truncated, for example.

True, good catch.

> - made something from the mess you made of ext2/ext3 inode flags :-P

Good. Both variants of my mess worked, so I left it for the moment.

> - removed inheritance, as it's not useful in any way, not expected and breaks
> linking of files with S_COWLINK set.

Not useful? Without inheritance, I have to manually add the flag for
every file/directory I add. Each time I forget, writes go to both
files and I notice the mess weeks later. Naa, that's where we're now
and why I created the patch in the first place.

What we do need, though, is a new errno. -EMLINK is close, but still
wrong.

> - made a go at supporting reiserfs, but failed... my changes are in the
> patch, could somebody please have a look and tell me what I've missed?

No clue, don't care. :)

> - fcntl_setcow now spins a spinlock

Not the only lock I missed. Good.


J?rn

--
He who knows that enough is enough will always have enough.
-- Lao Tsu

2004-03-13 19:24:51

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

Hi!

> > - removed inheritance, as it's not useful in any way, not expected and breaks
> > linking of files with S_COWLINK set.
>
> Not useful? Without inheritance, I have to manually add the flag for
> every file/directory I add. Each time I forget, writes go to both
> files and I notice the mess weeks later. Naa, that's where we're now
> and why I created the patch in the first place.
>
> What we do need, though, is a new errno. -EMLINK is close, but still
> wrong.

I do not know your current design, but...

In ideal world there would be no COW links. System would
magically detect that you are doing cp -a, and would link
at individual block level.

Well, that would be probably too fs-specific. But introducing copyfile()
syscall, which would just link the inodes if underlying fs
supported it might be good start. On first
write into one
of linked files copy
would be done...

Only disadvantage I see is that such links would not survive
tar-backup...
Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-03-13 19:48:35

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Sat, 13 March 2004 14:43:30 +0100, Pavel Machek wrote:
>
> I do not know your current design, but...
>
> In ideal world there would be no COW links. System would
> magically detect that you are doing cp -a, and would link
> at individual block level.
>
> Well, that would be probably too fs-specific. But introducing copyfile()
> syscall, which would just link the inodes if underlying fs
> supported it might be good start. On first
> write into one
> of linked files copy
> would be done...

Agreed.

> Only disadvantage I see is that such links would not survive
> tar-backup...

That's not a problem either. Have a userspace program that checks all
files and hints for identical ones (new syscall, copyfile() cannot do
this without races). Depending on fs size, the necessary data can
grow into the gigabytes, but the code is just 200 lines.

Or did you mean the problem of tar backups growing *much* larger than
the real filesystem? Yes, tar becomes useless for backups then. :)

J?rn

--
Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.
-- Rob Pike

2004-03-13 21:03:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On So 13-03-04 20:48:27, J?rn Engel wrote:
> On Sat, 13 March 2004 14:43:30 +0100, Pavel Machek wrote:
> >
> > I do not know your current design, but...
> >
> > In ideal world there would be no COW links. System would
> > magically detect that you are doing cp -a, and would link
> > at individual block level.
> >
> > Well, that would be probably too fs-specific. But introducing copyfile()
> > syscall, which would just link the inodes if underlying fs
> > supported it might be good start. On first
> > write into one
> > of linked files copy
> > would be done...
>
> Agreed.
>
> > Only disadvantage I see is that such links would not survive
> > tar-backup...
>
> That's not a problem either. Have a userspace program that checks all
> files and hints for identical ones (new syscall, copyfile() cannot do
> this without races). Depending on fs size, the necessary data can
> grow into the gigabytes, but the code is just 200 lines.

Hmm, I don't quite like "copyfile if not modified" syscall, but even
without that it is usefull...

> Or did you mean the problem of tar backups growing *much* larger than
> the real filesystem? Yes, tar becomes useless for backups then. :)

Yep, this is what I meant.
Pavel

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-03-13 22:14:37

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Sat, 13 March 2004 14:43:30 +0100, Pavel Machek wrote:
>
> I do not know your current design, but...
>
> In ideal world there would be no COW links. System would
> magically detect that you are doing cp -a, and would link
> at individual block level.
>
> Well, that would be probably too fs-specific. But introducing copyfile()
> syscall, which would just link the inodes if underlying fs
> supported it might be good start. On first
> write into one
> of linked files copy
> would be done...

On second thought, I already have introduced copyfile() in a way - the
link() syscall is hijacked for exactly that purpose and my ugly flag
multiplexes between them. Yeah, ugly as hell, but enough to get
started and test things.

Once the data structures are clear and at least one filesystem can
deal with it, things should be stable enough to think about the user
interface more seriously.

J?rn

--
The only real mistake is the one from which we learn nothing.
-- John Powell

2004-03-15 07:48:09

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

Pavel Machek wrote:
> > Or did you mean the problem of tar backups growing *much* larger than
> > the real filesystem? Yes, tar becomes useless for backups then. :)
>
> Yep, this is what I meant.

A different but related problem: rsync cannot backup my kernel
development directory from one hard disk to another, because it
contains lots of kernel trees mostly hard linked to each other. rsync
falls over, trying to keep track of the roughly half a million links.

You might see similar problems trying to backup a strongly "copyfile"'d
filesystems.

-- Jamie

2004-03-15 10:27:17

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH for testing] cow behaviour for hard links

On Mon, 15 March 2004 07:45:58 +0000, Jamie Lokier wrote:
> Pavel Machek wrote:
> > > Or did you mean the problem of tar backups growing *much* larger than
> > > the real filesystem? Yes, tar becomes useless for backups then. :)
> >
> > Yep, this is what I meant.
>
> A different but related problem: rsync cannot backup my kernel
> development directory from one hard disk to another, because it
> contains lots of kernel trees mostly hard linked to each other. rsync
> falls over, trying to keep track of the roughly half a million links.
>
> You might see similar problems trying to backup a strongly "copyfile"'d
> filesystems.

And both are easily fixable, if you don't mind using 16 bytes of RAM
per inode. At least for rsync this should be a piece of cake compared
to the amount of memory already used. :)

J?rn

--
When in doubt, use brute force.
-- Ken Thompson