2010-06-09 15:03:57

by Nikanth Karthikesan

[permalink] [raw]
Subject: [PATCH][RFC] Complex filesystem operations: split and join


I had a need to split a file into smaller files on a thumb drive with no
free space on it or anywhere else in the system. When the filesystem
supports sparse files(truncate_range), I could create files, while
punching holes in the original file. But when the underlying fs is FAT,
I couldn't. Also why should we do needless I/O, when all I want is to
split/join files. i.e., all the data are already on the disk, under the
same filesystem. I just want to do some metadata changes.

So, I added two inode operations, namely split and join, that lets me
tell the OS, that all I want is meta-data changes. And the file-system
can avoid doing lots of I/O, when only metadata changes are needed.

sys_split(fd1, n, fd2)
1. Attach the data of file after n bytes in fd1 to fd2.
2. Truncate fd1 to n bytes.

Roughly can be thought of as equivalent of following commands:
1. dd if=file1 of=file2 skip=n
2. truncate -c -s n file1

sys_join(fd1, fd2)
1. Extend fd1 with data of fd2
2. Truncate fd2 to 0.

Roughly can be thought of as equivalent of following commands:
1. dd if=file2 of=file1 seek=`filesize file1`
2. truncate -c -s 0 file2

Attached is the patch that adds these new syscalls and support for them
to the FAT filesystem.

I guess, this approach can be extended to splice() kind of call, between
files, instead of pipes. On a COW fs, splice could simply setup blocks
as shared between files, instead of doing I/O. It would be a kind of
explicit online data-deduplication. Later when a file modifies any of
those blocks, we copy blocks. i.e., COW.

Thanks
Nikanth

p.s: Strangely fibrils and syslets came to my mind, when thinking along
these lines. But, I guess fibrils or syslets are not really related to
this.



From: Nikanth Karthikesan <[email protected]>
Subject: vfs and vfat: add filesystem operations: split and join

Add 2 new inode_operation, namely sys_split and sys_join, with the
following semantics.

sys_split(fd1, n, fd2)
1. Attach the data of file after n bytes in fd1 to fd2.
2. Truncate fd1 to n bytes.

sys_join(fd1, fd2)
1. Extend fd1 with data of fd2
2. Truncate fd2 to 0.

These avoid doing unnecessary I/O that would be needed when the same
should be accompolished using only read,write,truncate. Also using
read/write would require temporary additional free space on filesystems
that do not support sparse files. The files should belong to the same
super block.

The split should be on a cluster boundary, i.e., it should be a multiple
of cluster size i.e., filesystem block-size. And for join the size of
destination file should be a multiple of filesystem block size i.e., FAT
cluster size. Also the syscalls are added only to x86_64, for now.

Some performance numbers of splitting a file into half of its size and
then concatenating it back together using and not using these syscalls.

filesize Using sys_split & sys_join Using read/write
1GB 0.080 56.557
2GB 0.117 116.140
3GB 0.112 144.658

All numbers are seconds.

Signed-off-by: Nikanth Karthikesan <[email protected]>

---


diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..0b9bdf8 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,10 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
#define __NR_recvmmsg 299
__SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_split 300
+__SYSCALL(__NR_split, sys_split)
+#define __NR_join 301
+__SYSCALL(__NR_join, sys_join)

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 990dfae..81e426c 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -453,7 +453,118 @@ out:
}
EXPORT_SYMBOL_GPL(fat_setattr);

+/*
+ * Join the cluster chain of tail_inode to the end of head_inode.
+ */
+int fat_join(struct inode *head_inode, struct inode *tail_inode)
+{
+ struct super_block *sb = head_inode->i_sb;
+ struct msdos_sb_info *sbi = MSDOS_SB(sb);
+ int nr_cluster;
+ int ret = 0;
+
+ nr_cluster = head_inode->i_size >> sbi->cluster_bits;
+ if (nr_cluster << sbi->cluster_bits != head_inode->i_size) {
+ return -EINVAL;
+ }
+
+ nr_cluster = tail_inode->i_size >> sbi->cluster_bits;
+
+ fat_cache_inval_inode(head_inode);
+ fat_cache_inval_inode(tail_inode);
+
+ ret = fat_chain_add(head_inode, MSDOS_I(tail_inode)->i_start, nr_cluster);
+ if (ret)
+ goto out;
+
+ MSDOS_I(tail_inode)->i_start = MSDOS_I(tail_inode)->i_logstart = 0;
+ ret = simple_setsize(head_inode, head_inode->i_size + tail_inode->i_size);
+ if (ret)
+ goto out;
+ head_inode->i_blocks = ((head_inode->i_size + tail_inode->i_size)>> sbi->cluster_bits) << (sbi->cluster_bits - 9);
+ ret = simple_setsize(tail_inode, 0);
+ tail_inode->i_blocks = 0;
+out:
+ mark_inode_dirty(head_inode);
+ mark_inode_dirty(tail_inode);
+
+ return ret;
+}
+
+/*
+ * Split the cluster chain of head_inode after length/cluster_size clusters
+ * And attach the remaining chain to tail_inode.
+ */
+int fat_split(struct inode *head_inode, loff_t new_length, struct inode *tail_inode)
+{
+ struct super_block *sb = head_inode->i_sb;
+ struct msdos_sb_info *sbi = MSDOS_SB(sb);
+ int ret = 0, new_fclus, last;
+ int nr_cluster;
+ struct fat_entry fatent;
+
+ nr_cluster = new_length >> sbi->cluster_bits;
+ if (nr_cluster << sbi->cluster_bits != new_length)
+ return -EINVAL;
+
+ fat_cache_inval_inode(head_inode);
+ fat_cache_inval_inode(tail_inode);
+
+ last = new_fclus = 0;
+ if (MSDOS_I(head_inode)->i_start) {
+ int fclus, dclus, last_clus;
+ loff_t oldsize, newsize;
+ struct msdos_sb_info *sbi = MSDOS_SB(sb);
+
+ ret = fat_get_cluster(head_inode, nr_cluster - 1, &fclus, &dclus);
+ last_clus = dclus;
+
+ if (ret < 0)
+ goto out;
+
+ if (ret == FAT_ENT_EOF) {
+ ret = -1;
+ goto out;
+ }
+
+ ret = fat_get_cluster(head_inode, nr_cluster, &fclus, &dclus);
+
+ if (ret < 0)
+ goto out;
+
+ if (ret == FAT_ENT_EOF) {
+ ret = -1;
+ goto out;
+ }
+
+ oldsize= head_inode->i_size;
+ newsize = nr_cluster * sbi->sec_per_clus * 512;
+
+ fatent_init(&fatent);
+ ret = fat_ent_read(head_inode, &fatent, last_clus);
+ ret = fat_ent_write(head_inode, &fatent, FAT_ENT_EOF, 1);
+ fatent_brelse(&fatent);
+
+ ret = simple_setsize(head_inode, newsize);
+ head_inode->i_blocks = nr_cluster << (sbi->cluster_bits - 9);
+
+ MSDOS_I(tail_inode)->i_logstart =
+ MSDOS_I(tail_inode)->i_start = cpu_to_le32(dclus);
+ ret = simple_setsize(tail_inode, oldsize - newsize);
+ tail_inode->i_blocks = ((oldsize - newsize) >> sbi->cluster_bits) << (sbi->cluster_bits - 9);
+
+ ret = 0;
+ }
+out:
+ mark_inode_dirty(head_inode);
+ mark_inode_dirty(tail_inode);
+
+ return ret;
+}
+
const struct inode_operations fat_file_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .split = fat_split,
+ .join = fat_join,
};
diff --git a/fs/open.c b/fs/open.c
index 5463266..0d1bfc0 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -938,6 +938,146 @@ SYSCALL_DEFINE2(creat, const char __user *, pathname, int, mode)
#endif

/*
+ * Only vfat supports this, so interface might need changes.
+ *
+ * -1: length should be a multiple of filesystem block size
+ * i.e., vfat cluster size.
+ * -2: Hacky whacky code. (Hackweek.. Yay)
+ * -3: Error paths not verified.
+ * ...
+ * -∞: Idea not validated with experts
+ */
+SYSCALL_DEFINE2(join, unsigned int, fd_dst, unsigned int, fd_src)
+{
+ int ret = 0;
+ struct inode *src_inode;
+ struct inode *dst_inode;
+ struct file *src_f;
+ struct file *dst_f;
+
+ src_f = fget(fd_src);
+ if (!src_f)
+ return -EINVAL;
+ dst_f = fget(fd_dst);
+ if (!dst_f) {
+ fput(src_f);
+ return -EINVAL;
+ }
+
+ src_inode = src_f->f_path.dentry->d_inode;
+ dst_inode = dst_f->f_path.dentry->d_inode;
+
+ if (!dst_inode->i_op->join) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ if (src_inode->i_ino < dst_inode->i_ino) {
+ mutex_lock(&src_inode->i_mutex);
+ mutex_lock(&dst_inode->i_mutex);
+ } else {
+ mutex_lock(&dst_inode->i_mutex);
+ mutex_lock(&src_inode->i_mutex);
+ }
+
+ if (!(src_f->f_mode & FMODE_WRITE) || !(dst_f->f_mode & FMODE_WRITE)) {
+ ret = -EPERM;
+ goto out;
+ }
+
+ if (dst_inode->i_sb != src_inode->i_sb) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ ret = dst_inode->i_op->join(dst_inode, src_inode);
+out:
+ if (src_inode->i_ino < dst_inode->i_ino) {
+ mutex_unlock(&dst_inode->i_mutex);
+ mutex_unlock(&src_inode->i_mutex);
+ } else {
+ mutex_unlock(&src_inode->i_mutex);
+ mutex_unlock(&dst_inode->i_mutex);
+ }
+ fput(src_f);
+ fput(dst_f);
+ return ret;
+}
+
+/*
+ * Only vfat supports this, so interface might need changes.
+ *
+ * -1: length should be a multiple of filesystem block size
+ * i.e., vfat cluster size.
+ * -2: Hacky whacky code. (Hackweek.. Yay)
+ * -3: Error paths not verified.
+ * ...
+ * -∞: Idea not validated with experts
+ */
+SYSCALL_DEFINE3(split, unsigned int, fd_src, loff_t, length, unsigned int, fd_dst)
+{
+ int ret = 0;
+ struct inode *head_inode;
+ struct inode *tail_inode;
+ struct file *f1;
+ struct file *f2;
+
+ f1 = fget(fd_src);
+ if (!f1)
+ return -EINVAL;
+ f2 = fget(fd_dst);
+ if (!f2) {
+ fput(f1);
+ return -EINVAL;
+ }
+
+ head_inode = f1->f_path.dentry->d_inode;
+ tail_inode = f2->f_path.dentry->d_inode;
+
+ if (head_inode->i_ino < tail_inode->i_ino) {
+ mutex_lock(&head_inode->i_mutex);
+ mutex_lock(&tail_inode->i_mutex);
+ } else {
+ mutex_lock(&tail_inode->i_mutex);
+ mutex_lock(&head_inode->i_mutex);
+ }
+
+ if (!head_inode->i_op->split) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ if (!(f2->f_mode & FMODE_WRITE) || !(f1->f_mode & FMODE_WRITE)) {
+ ret = -EPERM;
+ goto out;
+ }
+
+ if (head_inode->i_size < length || tail_inode->i_size != 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (head_inode->i_sb != tail_inode->i_sb) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ ret = head_inode->i_op->split(head_inode, length, tail_inode);
+out:
+ if (head_inode->i_ino < tail_inode->i_ino) {
+ mutex_unlock(&tail_inode->i_mutex);
+ mutex_unlock(&head_inode->i_mutex);
+ } else {
+ mutex_unlock(&head_inode->i_mutex);
+ mutex_unlock(&tail_inode->i_mutex);
+ }
+
+ fput(f1);
+ fput(f2);
+ return ret;
+}
+
+/*
* "id" is the POSIX thread ID. We use the
* files pointer for this..
*/
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3428393..4206bb8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1539,6 +1539,8 @@ struct inode_operations {
loff_t len);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len);
+ int (*split)(struct inode *, loff_t, struct inode *);
+ int (*join)(struct inode *, struct inode *);
};

struct seq_file;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a1a86a5..b71e81a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -512,6 +512,8 @@ asmlinkage long sys_sendfile64(int out_fd, int in_fd,
asmlinkage long sys_readlink(const char __user *path,
char __user *buf, int bufsiz);
asmlinkage long sys_creat(const char __user *pathname, int mode);
+asmlinkage long sys_split(unsigned int fd_src, loff_t length, unsigned int fd_dst);
+asmlinkage long sys_join(unsigned int fd_dst, unsigned int fd_src);
asmlinkage long sys_open(const char __user *filename,
int flags, int mode);
asmlinkage long sys_close(unsigned int fd);


2010-06-13 11:43:09

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

Nikanth Karthikesan <[email protected]> writes:

> I had a need to split a file into smaller files on a thumb drive with no
> free space on it or anywhere else in the system. When the filesystem
> supports sparse files(truncate_range), I could create files, while
> punching holes in the original file. But when the underlying fs is FAT,
> I couldn't. Also why should we do needless I/O, when all I want is to
> split/join files. i.e., all the data are already on the disk, under the
> same filesystem. I just want to do some metadata changes.
>
> So, I added two inode operations, namely split and join, that lets me
> tell the OS, that all I want is meta-data changes. And the file-system
> can avoid doing lots of I/O, when only metadata changes are needed.
>
> sys_split(fd1, n, fd2)
> 1. Attach the data of file after n bytes in fd1 to fd2.
> 2. Truncate fd1 to n bytes.
>
> Roughly can be thought of as equivalent of following commands:
> 1. dd if=file1 of=file2 skip=n
> 2. truncate -c -s n file1
>
> sys_join(fd1, fd2)
> 1. Extend fd1 with data of fd2
> 2. Truncate fd2 to 0.
>
> Roughly can be thought of as equivalent of following commands:
> 1. dd if=file2 of=file1 seek=`filesize file1`
> 2. truncate -c -s 0 file2
>
> Attached is the patch that adds these new syscalls and support for them
> to the FAT filesystem.
>
> I guess, this approach can be extended to splice() kind of call, between
> files, instead of pipes. On a COW fs, splice could simply setup blocks
> as shared between files, instead of doing I/O. It would be a kind of
> explicit online data-deduplication. Later when a file modifies any of
> those blocks, we copy blocks. i.e., COW.

[I'll just ignore implementation for now.., because the patch is totally
ignoring cache management.]

I have no objections to such those operations (likewise make hole,
truncate any range, etc. etc.). However, only if someone have enough
motivation to implement/maintain those operations, AND there are real
users (i.e. real sane usecase).

Otherwise, IMO it would be bad than nothing. Because, of course, if
there are such codes, we can't ignore those anymore until remove
codes completely for e.g. security reasons. And IMHO, those cache
managements to such operations are not so easy.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-06-15 10:39:40

by Nikanth Karthikesan

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

Hi OGAWA Hirofumi

Thanks a lot for looking at this and reply.

On Sunday 13 June 2010 17:12:57 OGAWA Hirofumi wrote:
> Nikanth Karthikesan <[email protected]> writes:
> > I had a need to split a file into smaller files on a thumb drive with no
> > free space on it or anywhere else in the system. When the filesystem
> > supports sparse files(truncate_range), I could create files, while
> > punching holes in the original file. But when the underlying fs is FAT,
> > I couldn't. Also why should we do needless I/O, when all I want is to
> > split/join files. i.e., all the data are already on the disk, under the
> > same filesystem. I just want to do some metadata changes.
> >
> > So, I added two inode operations, namely split and join, that lets me
> > tell the OS, that all I want is meta-data changes. And the file-system
> > can avoid doing lots of I/O, when only metadata changes are needed.
> >
> > sys_split(fd1, n, fd2)
> > 1. Attach the data of file after n bytes in fd1 to fd2.
> > 2. Truncate fd1 to n bytes.
> >
> > Roughly can be thought of as equivalent of following commands:
> > 1. dd if=file1 of=file2 skip=n
> > 2. truncate -c -s n file1
> >
> > sys_join(fd1, fd2)
> > 1. Extend fd1 with data of fd2
> > 2. Truncate fd2 to 0.
> >
> > Roughly can be thought of as equivalent of following commands:
> > 1. dd if=file2 of=file1 seek=`filesize file1`
> > 2. truncate -c -s 0 file2
> >
> > Attached is the patch that adds these new syscalls and support for them
> > to the FAT filesystem.
> >
> > I guess, this approach can be extended to splice() kind of call, between
> > files, instead of pipes. On a COW fs, splice could simply setup blocks
> > as shared between files, instead of doing I/O. It would be a kind of
> > explicit online data-deduplication. Later when a file modifies any of
> > those blocks, we copy blocks. i.e., COW.
>
> [I'll just ignore implementation for now.., because the patch is totally
> ignoring cache management.]
>

Ok.

> I have no objections to such those operations (likewise make hole,
> truncate any range, etc. etc.).

As far as FAT is concerned, Sparse files would break the on-disk format?

> However, only if someone have enough
> motivation to implement/maintain those operations, AND there are real
> users (i.e. real sane usecase).

I had a one-off use-case, where I had no free-space, which made me think along
this line.

1. We have the GNU split tool for example, which I guess, many of us use to
split larger files to be transfered via smaller thumb drives, for example. We
do cat many files into one, afterwards. [For this usecase, one can simply dd
with seek and skip and avoid split/cat completely, but we dont.]

2. It could be useful for multimedia editing softwares, that converts frames
into video/animation and vice versa.

3. It could be useful for archiving solutions.

4. It would make it easier to implement simple databases. Even help avoid
needing databases at times. For example, to delete a row, split before & after
that row, and join leaving it.

So I thought this could be useful generally.

I was also thinking of facilities to add/remove bytes from/at any position in
the file. As you said truncate any range, but one which can also increase the
filesize, adding blocks even in between.

IMO It is kind of Chicken-and-egg problem, where applications will start using
these, only, if it would be available.

>
> Otherwise, IMO it would be bad than nothing. Because, of course, if
> there are such codes, we can't ignore those anymore until remove
> codes completely for e.g. security reasons. And IMHO, those cache
> managements to such operations are not so easy.
>

Agreed.

Again, thanks for the comments.

Thanks
Nikanth

2010-06-15 12:01:30

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

Nikanth Karthikesan <[email protected]> writes:

>> I have no objections to such those operations (likewise make hole,
>> truncate any range, etc. etc.).
>
> As far as FAT is concerned, Sparse files would break the on-disk format?

Yes. In the case of making hole on FAT, I guess it will return the
error, or emulate it by zero fill.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-06-15 15:36:37

by David Pottage

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

On 15/06/10 11:41, Nikanth Karthikesan wrote:

> I had a one-off use-case, where I had no free-space, which made me
> think along this line.
>
> 1. We have the GNU split tool for example, which I guess, many of us
> use to split larger files to be transfered via smaller thumb drives,
> for example. We do cat many files into one, afterwards. [For this
> usecase, one can simply dd with seek and skip and avoid split/cat
> completely, but we dont.]

I am not sure how you gain here as either way you have to do I/O to get
the split files on and off the thumb drive. It might make sense if the
thumb drive is formated with btrfs, and the file needs to be copied to
another filling system that can't handle large files (eg FAT-16), but I
would say that is unlikely.

> 2. It could be useful for multimedia editing softwares, that converts
> frames into video/animation and vice versa.

Agreed, it would be very useful in this case, as it would save a lot of
I/O and time.

Video files are very big, so a simple edit of removing a few minutes
here and there in an hour long HD recoding will involve copying many
gigabytes from one file to another. Imagine the time and disc space
saved, if you could just make a COW copy of your source file(s), and
then cut out the portions you don't want, and join the parts you do
want together.

Your final edited file would take no extra disc space compared with
your source files, and though it would be fragmented, the fragments
would still be large compared with most files so the performance
penalty to read the file sequentially to play it would be small. Once
you decide you are happy with the final cut, you can delete the source
files and let some background defrag demon tidy up the final file.

> 3. It could be useful for archiving solutions.

Agreed.

> 4. It would make it easier to implement simple databases. Even help
> avoid needing databases at times. For example, to delete a row, split
> before &amp; after that row, and join leaving it.

I am not sure it would be usefull in practice, as these days, if you
need a simple DB in a programming project, you just use SQLite. (Which
has an extremely liberal licence), and let it figure out how to store
your data on disc.

On the other hand, perhaps databases such as SQLite or MySQL would
benifit from this feature for improving their backend storage, especaly
if large amounts of BLOB data is inserted or deleted?

> So I thought this could be useful generally.

Agreed. I think this would be very useful.

I have proposed this kind of thing in the past, and been shouted down,
and told that it should be implemented in the userland program, however
I think it is anachronistic that Unix filesystems have supported sparse
files since the dawn of time, originaly to suit a particular way of
storing fixed size records, but do not support growing or truncating
files except at the end.

> I was also thinking of facilities to add/remove bytes from/at any
> position in the file. As you said truncate any range, but one which
> can also increase the filesize, adding blocks even in between.
>
> IMO It is kind of Chicken-and-egg problem, where applications will
> start using these, only, if it would be available.

I agree that it is a Chicken and egg problem, but I think the
advantages for video editing are so large, that the feature could
become a killer-app when it comes to video editing, as it would improve
performance so much.

--
David Pottage

Error compiling committee.c To many arguments to function.

2010-06-17 15:11:03

by Hubert Kario

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

On Tuesday 15 June 2010 17:16:06 David Pottage wrote:
> On 15/06/10 11:41, Nikanth Karthikesan wrote:
> > I had a one-off use-case, where I had no free-space, which made me
> > think along this line.
> >
> > 1. We have the GNU split tool for example, which I guess, many of us
> > use to split larger files to be transfered via smaller thumb drives,
> > for example. We do cat many files into one, afterwards. [For this
> > usecase, one can simply dd with seek and skip and avoid split/cat
> > completely, but we dont.]
>
> I am not sure how you gain here as either way you have to do I/O to get
> the split files on and off the thumb drive. It might make sense if the
> thumb drive is formated with btrfs, and the file needs to be copied to
> another filling system that can't handle large files (eg FAT-16), but I
> would say that is unlikely.
>

But you do have to do only half as much of I/O with those features
implemented.

The old way is:
1. Have a file
2. split a file (in effect use twice as much drive space)
3. copy fragments to flash disks

The btrfs way would be:
1. Have a file
2. split the file by using COW and referencing blocks in the original file (in
effect using only a little more space after splitting)
3. copy fragments to flash disks

the amount of I/O in the second case is limited only to metadata operations,
in the first case, all data must be duplicated

--
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
http://www.qbs.com.pl

System Zarządzania Jakością
zgodny z normą ISO 9001:2000

2010-06-22 01:34:16

by Stewart Smith

[permalink] [raw]
Subject: Re: [PATCH][RFC] Complex filesystem operations: split and join

On Tue, 15 Jun 2010 16:16:06 +0100, "David Pottage" <[email protected]> wrote:
> On the other hand, perhaps databases such as SQLite or MySQL would
> benifit from this feature for improving their backend storage, especaly
> if large amounts of BLOB data is inserted or deleted?

unlikely.

1) these operations aren't available everywhere
2) kind of incompatible with how things are stored now
3) no doubt incredibly fun with high concurrency (readers streaming out
BLOBs from later in the file while you go and change everything)
4) a background thread that goes around repacking things if you need to
trim the end off the file just isn't that bad.
(sure, a nice api call to 'instantly' copy/swap parts of a file could
be fun... but then it's relying on the OS and file system to get this
right... which is almost always a bad idea)

cheers,

Stewart "spending way too much time inside databases to maintain sanity" Smith