2012-02-04 20:04:04

by Eric Sandeen

[permalink] [raw]
Subject: sparsify - utility to punch out blocks of 0s in a file

Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
"re-sparsify" a file by punching out ranges of 0s might be in order.

I whipped this up fast, it probably has bugs & off-by-ones but thought
I'd send it out. It's not terribly efficient doing 4k reads by default
I suppose.

I'll see if util-linux wants it after it gets beat into shape.
(or did a tool like this already exist and I missed it?)

(Another mode which does a file copy, possibly from stdin
might be good, like e2fsprogs/contrib/make-sparse.c ? Although
that can be hacked up with cp already).

It works like this:

[root@inode sparsify]# ./sparsify -h
Usage: sparsify [-m min hole size] [-o offset] [-l length] filename

[root@inode sparsify]# dd if=/dev/zero of=fsfile bs=1M count=512
[root@inode sparsify]# mkfs.xfs fsfile >/dev/null
[root@inode sparsify]# du -hc fsfile
512M fsfile
512M total
[root@inode sparsify]# ./sparsify fsfile
punching out holes of minimum size 4096 in range 0-536870912
[root@inode sparsify]# du -hc fsfile
129M fsfile
129M total
[root@inode sparsify]# xfs_repair fsfile
Phase 1 - find and verify superblock...
<snip>
Phase 7 - verify and correct link counts...
done
[root@inode sparsify]# echo $?
0
[root@inode sparsify]#

/*
* sparsify - utility to punch out blocks of 0s in a file
*
* Copyright (C) 2011 Red Hat, Inc. All rights reserved.
* Written by Eric Sandeen <[email protected]>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software Foundation,
* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

#include <sys/stat.h>
#include <sys/statvfs.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <ctype.h>
#include <string.h>

#include <linux/falloc.h>

#ifndef FALLOC_FL_PUNCH_HOLE
#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
#endif

void usage(void)
{
printf("Usage: sparsify [-m min hole size] [-o offset] [-l length] filename\n");
exit(EXIT_FAILURE);
}

#define EXABYTES(x) ((long long)(x) << 60)
#define PETABYTES(x) ((long long)(x) << 50)
#define TERABYTES(x) ((long long)(x) << 40)
#define GIGABYTES(x) ((long long)(x) << 30)
#define MEGABYTES(x) ((long long)(x) << 20)
#define KILOBYTES(x) ((long long)(x) << 10)

#define __round_mask(x, y) ((__typeof__(x))((y)-1))
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
#define round_down(x, y) ((x) & ~__round_mask(x, y))

int debug;

long long
cvtnum(char *s)
{
long long i;
char *sp;
int c;

i = strtoll(s, &sp, 0);
if (i == 0 && sp == s)
return -1LL;
if (*sp == '\0')
return i;
if (sp[1] != '\0')
return -1LL;

c = tolower(*sp);
switch (c) {
case 'k':
return KILOBYTES(i);
case 'm':
return MEGABYTES(i);
case 'g':
return GIGABYTES(i);
case 't':
return TERABYTES(i);
case 'p':
return PETABYTES(i);
case 'e':
return EXABYTES(i);
}

return -1LL;
}

int punch_hole(int fd, off_t offset, off_t len)
{
int error = 0;

if (debug)
printf("punching at %lld len %lld\n", offset, len);
//error = fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
// offset, len);
if (error < 0) {
perror("punch failed");
exit(EXIT_FAILURE);
}
}

int main(int argc, char **argv)
{
int fd;
char *fname;
int opt;
loff_t min_hole = 0;
loff_t punch_range_start = 0;
loff_t punch_range_len = 0;
loff_t punch_range_end = 0;
loff_t cur_offset = 0;
unsigned long blocksize;
struct statvfs statvfsbuf;
struct stat statbuf;
ssize_t ret;
off_t punch_offset, punch_len;
char *readbuf, *zerobuf;

while ((opt = getopt(argc, argv, "m:l:o:vh")) != -1) {
switch(opt) {
case 'm':
min_hole = cvtnum(optarg);
break;
case 'o':
punch_range_start = cvtnum(optarg);
break;
case 'l':
punch_range_len = cvtnum(optarg);
break;
case 'v':
debug++;
break;
case 'h':
default:
usage();
}
}

if (min_hole < 0) {
printf("Error: invalid min hole value specified\n");
usage();
}

if (punch_range_len < 0) {
printf("Error: invalid length value specified\n");
usage();
}

if (punch_range_start < 0) {
printf("Error: invalid offset value specified\n");
usage();
}

if (optind == argc) {
printf("Error: no filename specified\n");
usage();
}

fname = argv[optind++];

fd = open(fname, O_RDWR);
if (fd < 0) {
perror("Error opening file");
exit(EXIT_FAILURE);
}

if (fstat(fd, &statbuf) < 0) {
perror("Error stat-ing file");
exit(EXIT_FAILURE);
}

if (fstatvfs(fd, &statvfsbuf) < 0) {
perror("Error stat-ing fs");
exit(EXIT_FAILURE);
}

blocksize = statvfsbuf.f_bsize;
if (debug)
printf("blocksize is %lu\n", blocksize);

/* default range end is end of file */
if (!punch_range_len)
punch_range_end = statbuf.st_size;
else
punch_range_end = punch_range_start + punch_range_len;

if (punch_range_end > statbuf.st_size) {
printf("Error: range extends past EOF\n");
exit(EXIT_FAILURE);
}

if (debug)
printf("orig start/end %lld/%lld/%lld\n", punch_range_start, punch_range_end, min_hole);

/*
* Normalize to blocksize-aligned range:
* round start down, round end up - get all blocks including the range specified
*/

punch_range_start = round_down(punch_range_start, blocksize);
punch_range_end = round_up(punch_range_end, blocksize);
min_hole = round_up(min_hole, blocksize);
if (!min_hole)
min_hole = blocksize;

if (debug)
printf("new start/end/min %lld/%lld/%lld\n", punch_range_start, punch_range_end, min_hole);

if (punch_range_end <= punch_range_start) {
printf("Range too small, nothing to do\n");
exit(0);
}

readbuf = malloc(min_hole);
zerobuf = malloc(min_hole);

if (!readbuf || !zerobuf) {
perror("buffer allocation failed");
exit(EXIT_FAILURE);
}

memset(zerobuf, 0, min_hole);

punch_offset = -1;
punch_len = 0;

/* Move to the start of our requested range */
if (punch_range_start)
lseek(fd, punch_range_start, SEEK_SET);
cur_offset = punch_range_start;

printf("punching out holes of minimum size %lld in range %lld-%lld\n",
min_hole, punch_range_start, punch_range_end);

/*
* Read through the file, finding block-aligned regions of 0s.
* If the region is at least min_hole, punch it out.
* This should be starting at a block-aligned offset
*/

while ((ret = read(fd, readbuf, min_hole)) > 0) {

if (!memcmp(readbuf, zerobuf, min_hole)) {
/* Block of zeros, so extend punch range */
if (punch_offset < 0)
punch_offset = cur_offset;
punch_len += min_hole;
if (debug > 1)
printf("found zeros at %lld, hole len now %lld\n", cur_offset, punch_len);
} else if (punch_offset > 0) {
/* Found nonzero byte; punch accumulated hole if it's big enough */
if (punch_len >= min_hole)
punch_hole(fd, punch_offset, punch_len);
else if (debug > 1)
printf("skipping hole of insufficient size %lld\n", punch_len);

/* reset punch range */
punch_offset = -1;
punch_len = 0;
}

cur_offset += ret;
/* Quit if we've moved beyond the specified range to punch */
if (cur_offset >= punch_range_end) {
/* punch out last hole in range if needed */
if (punch_offset > 0 && punch_len >= min_hole)
punch_hole(fd, punch_offset, punch_len);
break;
}
}

if (ret < 0) {
perror("read failed");
exit(EXIT_FAILURE);
}

free(readbuf);
free(zerobuf);
close(fd);
return 0;
}



2012-02-04 20:10:34

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/4/12 2:04 PM, Eric Sandeen wrote:
> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
> "re-sparsify" a file by punching out ranges of 0s might be in order.

Gah, of course I sent the version with the actual hole punch commented out ;)
Try this one.

[root@inode sparsify]# ./sparsify -v fsfile
blocksize is 4096
orig start/end 0/536870912/0
new start/end/min 0/536870912/4096
punching out holes of minimum size 4096 in range 0-536870912
punching at 16384 len 16384
punching at 49152 len 134168576
punching at 134234112 len 134201344
punching at 268455936 len 134197248
punching at 402669568 len 134201344
[root@inode sparsify]#

Hm but something is weird, right after the punch-out xfs says
it uses 84K:

[root@inode sparsify]# du -hc fsfile
84K fsfile
84K total

but then after an xfs_repair it looks saner:
# du -hc fsfile
4.8M fsfile
4.8M total

something to look into I guess... weird.

/*
* sparsify - utility to punch out blocks of 0s in a file
*
* Copyright (C) 2011 Red Hat, Inc. All rights reserved.
* Written by Eric Sandeen <[email protected]>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software Foundation,
* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

#include <sys/stat.h>
#include <sys/statvfs.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <ctype.h>
#include <string.h>

#include <linux/falloc.h>

#ifndef FALLOC_FL_PUNCH_HOLE
#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
#endif

void usage(void)
{
printf("Usage: sparsify [-m min hole size] [-o offset] [-l length] filename\n");
exit(EXIT_FAILURE);
}

#define EXABYTES(x) ((long long)(x) << 60)
#define PETABYTES(x) ((long long)(x) << 50)
#define TERABYTES(x) ((long long)(x) << 40)
#define GIGABYTES(x) ((long long)(x) << 30)
#define MEGABYTES(x) ((long long)(x) << 20)
#define KILOBYTES(x) ((long long)(x) << 10)

#define __round_mask(x, y) ((__typeof__(x))((y)-1))
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
#define round_down(x, y) ((x) & ~__round_mask(x, y))

int debug;

long long
cvtnum(char *s)
{
long long i;
char *sp;
int c;

i = strtoll(s, &sp, 0);
if (i == 0 && sp == s)
return -1LL;
if (*sp == '\0')
return i;
if (sp[1] != '\0')
return -1LL;

c = tolower(*sp);
switch (c) {
case 'k':
return KILOBYTES(i);
case 'm':
return MEGABYTES(i);
case 'g':
return GIGABYTES(i);
case 't':
return TERABYTES(i);
case 'p':
return PETABYTES(i);
case 'e':
return EXABYTES(i);
}

return -1LL;
}

int punch_hole(int fd, off_t offset, off_t len)
{
int error = 0;

if (debug)
printf("punching at %lld len %lld\n", offset, len);
error = fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
offset, len);
if (error < 0) {
perror("punch failed");
exit(EXIT_FAILURE);
}
}

int main(int argc, char **argv)
{
int fd;
char *fname;
int opt;
loff_t min_hole = 0;
loff_t punch_range_start = 0;
loff_t punch_range_len = 0;
loff_t punch_range_end = 0;
loff_t cur_offset = 0;
unsigned long blocksize;
struct statvfs statvfsbuf;
struct stat statbuf;
ssize_t ret;
off_t punch_offset, punch_len;
char *readbuf, *zerobuf;

while ((opt = getopt(argc, argv, "m:l:o:vh")) != -1) {
switch(opt) {
case 'm':
min_hole = cvtnum(optarg);
break;
case 'o':
punch_range_start = cvtnum(optarg);
break;
case 'l':
punch_range_len = cvtnum(optarg);
break;
case 'v':
debug++;
break;
case 'h':
default:
usage();
}
}

if (min_hole < 0) {
printf("Error: invalid min hole value specified\n");
usage();
}

if (punch_range_len < 0) {
printf("Error: invalid length value specified\n");
usage();
}

if (punch_range_start < 0) {
printf("Error: invalid offset value specified\n");
usage();
}

if (optind == argc) {
printf("Error: no filename specified\n");
usage();
}

fname = argv[optind++];

fd = open(fname, O_RDWR);
if (fd < 0) {
perror("Error opening file");
exit(EXIT_FAILURE);
}

if (fstat(fd, &statbuf) < 0) {
perror("Error stat-ing file");
exit(EXIT_FAILURE);
}

if (fstatvfs(fd, &statvfsbuf) < 0) {
perror("Error stat-ing fs");
exit(EXIT_FAILURE);
}

blocksize = statvfsbuf.f_bsize;
if (debug)
printf("blocksize is %lu\n", blocksize);

/* default range end is end of file */
if (!punch_range_len)
punch_range_end = statbuf.st_size;
else
punch_range_end = punch_range_start + punch_range_len;

if (punch_range_end > statbuf.st_size) {
printf("Error: range extends past EOF\n");
exit(EXIT_FAILURE);
}

if (debug)
printf("orig start/end %lld/%lld/%lld\n", punch_range_start, punch_range_end, min_hole);

/*
* Normalize to blocksize-aligned range:
* round start down, round end up - get all blocks including the range specified
*/

punch_range_start = round_down(punch_range_start, blocksize);
punch_range_end = round_up(punch_range_end, blocksize);
min_hole = round_up(min_hole, blocksize);
if (!min_hole)
min_hole = blocksize;

if (debug)
printf("new start/end/min %lld/%lld/%lld\n", punch_range_start, punch_range_end, min_hole);

if (punch_range_end <= punch_range_start) {
printf("Range too small, nothing to do\n");
exit(0);
}

readbuf = malloc(min_hole);
zerobuf = malloc(min_hole);

if (!readbuf || !zerobuf) {
perror("buffer allocation failed");
exit(EXIT_FAILURE);
}

memset(zerobuf, 0, min_hole);

punch_offset = -1;
punch_len = 0;

/* Move to the start of our requested range */
if (punch_range_start)
lseek(fd, punch_range_start, SEEK_SET);
cur_offset = punch_range_start;

printf("punching out holes of minimum size %lld in range %lld-%lld\n",
min_hole, punch_range_start, punch_range_end);

/*
* Read through the file, finding block-aligned regions of 0s.
* If the region is at least min_hole, punch it out.
* This should be starting at a block-aligned offset
*/

while ((ret = read(fd, readbuf, min_hole)) > 0) {

if (!memcmp(readbuf, zerobuf, min_hole)) {
/* Block of zeros, so extend punch range */
if (punch_offset < 0)
punch_offset = cur_offset;
punch_len += min_hole;
if (debug > 1)
printf("found zeros at %lld, hole len now %lld\n", cur_offset, punch_len);
} else if (punch_offset > 0) {
/* Found nonzero byte; punch accumulated hole if it's big enough */
if (punch_len >= min_hole)
punch_hole(fd, punch_offset, punch_len);
else if (debug > 1)
printf("skipping hole of insufficient size %lld\n", punch_len);

/* reset punch range */
punch_offset = -1;
punch_len = 0;
}

cur_offset += ret;
/* Quit if we've moved beyond the specified range to punch */
if (cur_offset >= punch_range_end) {
/* punch out last hole in range if needed */
if (punch_offset > 0 && punch_len >= min_hole)
punch_hole(fd, punch_offset, punch_len);
break;
}
}

if (ret < 0) {
perror("read failed");
exit(EXIT_FAILURE);
}

free(readbuf);
free(zerobuf);
close(fd);
return 0;
}




2012-02-04 20:17:33

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/4/12 2:10 PM, Eric Sandeen wrote:

> Hm but something is weird, right after the punch-out xfs says
> it uses 84K:
>
> [root@inode sparsify]# du -hc fsfile
> 84K fsfile
> 84K total
>
> but then after an xfs_repair it looks saner:
> # du -hc fsfile
> 4.8M fsfile
> 4.8M total
>
> something to look into I guess... weird.

nvm that's just xfs_repair zeroing the log & reinstating the blocks.
Sorry for the noise - Ok, back to my Saturday.

-Eric


2012-02-05 09:40:44

by Ron Yorston

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

Eric Sandeen wrote:

>Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>"re-sparsify" a file by punching out ranges of 0s might be in order.
>
>I'll see if util-linux wants it after it gets beat into shape.
>(or did a tool like this already exist and I missed it?)

Way ahead of you. I wrote my sparsify utility for ext2 in 2004:

http://intgat.tigress.co.uk/rmy/uml/sparsify.html

It's mostly of historical interest now, I suppose. The sparsify utility
doesn't work on ext4 and I long since gave up maintaining the kernel
patch. I still use the zerofree utility, though.

It would be nice to have a modern version of sparsify. I'll try it out.

Ron

2012-02-05 15:05:51

by Raghavendra D Prabhu

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

Hi,


* On Sat, Feb 04, 2012 at 02:10:30PM -0600, Eric Sandeen <[email protected]> wrote:
>On 2/4/12 2:04 PM, Eric Sandeen wrote:
>> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>> "re-sparsify" a file by punching out ranges of 0s might be in order.
>
>Gah, of course I sent the version with the actual hole punch commented out ;)
>Try this one.
>
>[root@inode sparsify]# ./sparsify -v fsfile
>blocksize is 4096
>orig start/end 0/536870912/0
>new start/end/min 0/536870912/4096
>punching out holes of minimum size 4096 in range 0-536870912
>punching at 16384 len 16384
>punching at 49152 len 134168576
>punching at 134234112 len 134201344
>punching at 268455936 len 134197248
>punching at 402669568 len 134201344
>[root@inode sparsify]#
>
>Hm but something is weird, right after the punch-out xfs says
>it uses 84K:
>
>[root@inode sparsify]# du -hc fsfile
>84K fsfile
>84K total
>
>but then after an xfs_repair it looks saner:
># du -hc fsfile
>4.8M fsfile
>4.8M total
>
>something to look into I guess... weird.
>
>
>
>
>_______________________________________________
>xfs mailing list
>[email protected]
>http://oss.sgi.com/mailman/listinfo/xfs


So I tried with both resparsify and with cp --sparse, the results
before xfs_repair looks different (5 extents vs 1) but after that
it looks similar (5 extents vs 4)


Regards,
--
Raghavendra Prabhu
GPG Id : 0xD72BE977
Fingerprint: B93F EBCB 8E05 7039 CD3C A4B8 A616 DCA1 D72B E977
www: wnohang.net


Attachments:
(No filename) (0.00 B)
(No filename) (490.00 B)
Download all attachments

2012-02-05 16:36:43

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/5/12 3:33 AM, Ron Yorston wrote:
> Eric Sandeen wrote:
>
>> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>> "re-sparsify" a file by punching out ranges of 0s might be in order.
>>
>> I'll see if util-linux wants it after it gets beat into shape.
>> (or did a tool like this already exist and I missed it?)
>
> Way ahead of you. I wrote my sparsify utility for ext2 in 2004:
>
> http://intgat.tigress.co.uk/rmy/uml/sparsify.html

Cool, I had not known about that one. But that one is a bit less generic -
ext2-specific and requiring an unmounted fs, right?

> It's mostly of historical interest now, I suppose. The sparsify utility
> doesn't work on ext4 and I long since gave up maintaining the kernel
> patch. I still use the zerofree utility, though.
>
> It would be nice to have a modern version of sparsify. I'll try it out.

Thanks!

Matthias' suggestion of adding SEEK_HOLE/SEEK_DATA makes very good sense too.
I should also untie the read/zero buffer size from the minimum hole size,
we should do optimal IO sizes regardless of the minimum hole size desired...

-Eric

> Ron


2012-02-05 16:55:31

by Andreas Dilger

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2012-02-05, at 9:36, Eric Sandeen <[email protected]> wrote:
> On 2/5/12 3:33 AM, Ron Yorston wrote:
>> Eric Sandeen wrote:
>>> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>>> "re-sparsify" a file by punching out ranges of 0s might be in order.
>>>
>>> I'll see if util-linux wants it after it gets beat into shape.
>>> (or did a tool like this already exist and I missed it?)
>
> Matthias' suggestion of adding SEEK_HOLE/SEEK_DATA makes very good sense too.

I thought about this, but if SEEK_HOLE/SEEK_DATA (or FIEMAP) worked, then the file would already be sparse, so I don't think that will help in this case...

> I should also untie the read/zero buffer size from the minimum hole size,
> we should do optimal IO sizes regardless of the minimum hole size desired...

Definitely. 4kB IO is a killer for large files.

Cheers, Andreas

2012-02-05 17:20:21

by Ron Yorston

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

OK, I tried it out for my use case of flinging VM filesystem images around
on ext4 and it seems to do the job. I don't have any 64-bit systems
here at home so I used my feeble 32-bit netbook. Since sizeof(off_t) !=
sizeof(long long) the debug output was all wrong:

punching at 8989607068975104 len -4635819229210214401

but the image file and the host filesystem both survived the ordeal.

Ron

2012-02-05 17:21:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/5/12 11:19 AM, Ron Yorston wrote:
> OK, I tried it out for my use case of flinging VM filesystem images around
> on ext4 and it seems to do the job. I don't have any 64-bit systems
> here at home so I used my feeble 32-bit netbook. Since sizeof(off_t) !=
> sizeof(long long) the debug output was all wrong:
>
> punching at 8989607068975104 len -4635819229210214401

whoops, I'll fix that thanks.

This is the problem when I start something as a hack and then expose it
to the light of day. ;)

-Eric

> but the image file and the host filesystem both survived the ordeal.
>
> Ron


2012-02-05 17:23:25

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/5/12 10:55 AM, Andreas Dilger wrote:
> On 2012-02-05, at 9:36, Eric Sandeen <[email protected]> wrote:
>> On 2/5/12 3:33 AM, Ron Yorston wrote:
>>> Eric Sandeen wrote:
>>>> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>>>> "re-sparsify" a file by punching out ranges of 0s might be in order.
>>>>
>>>> I'll see if util-linux wants it after it gets beat into shape.
>>>> (or did a tool like this already exist and I missed it?)
>>
>> Matthias' suggestion of adding SEEK_HOLE/SEEK_DATA makes very good sense too.
>
> I thought about this, but if SEEK_HOLE/SEEK_DATA (or FIEMAP) worked,
> then the file would already be sparse, so I don't think that will
> help in this case...

But only if other tools originally used them, and there will probably be plenty
of cases where they don't, or legacy files, or ....

>> I should also untie the read/zero buffer size from the minimum hole size,
>> we should do optimal IO sizes regardless of the minimum hole size desired...
>
> Definitely. 4kB IO is a killer for large files.

yeah, it was a quick hack, I'll try to fix that up.

(OTOH for large files you man not want a 4k hole granularity either)

-Eric

> Cheers, Andreas


2012-02-05 17:23:02

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 05.02.2012 09:55, Andreas Dilger wrote:

> > Matthias' suggestion of adding SEEK_HOLE/SEEK_DATA makes very good sense too.
>
> I thought about this, but if SEEK_HOLE/SEEK_DATA (or FIEMAP) worked, then the file would already be sparse, so I don't think that will help in this case...

With that argumentation you wouldn't need the tool in the first place.

"How can a bunch of zeros be in a file in the first place?"
"Can only be because of the deficiency of another program."

And who is to say that you wouldn't want to repeat such a thing from
time to time, without SEEK_HOLE/SEEK_DATE you MAY crunch through big
regions of zeros for no gain at all.



Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2012-02-05 19:24:28

by Andreas Dilger

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2012-02-05, at 10:23 AM, Eric Sandeen wrote:
> On 2/5/12 10:55 AM, Andreas Dilger wrote:
>> On 2012-02-05, at 9:36, Eric Sandeen <[email protected]> wrote:
>>> On 2/5/12 3:33 AM, Ron Yorston wrote:
>>>> Eric Sandeen wrote:
>>>>> Now that ext4, xfs, & ocfs2 can support punch hole, a tool to
>>>>> "re-sparsify" a file by punching out ranges of 0s might be in order.
>>>>>
>>>>> I'll see if util-linux wants it after it gets beat into shape.
>>>>> (or did a tool like this already exist and I missed it?)
>>>
>>> Matthias' suggestion of adding SEEK_HOLE/SEEK_DATA makes very good sense too.
>>
>> I thought about this, but if SEEK_HOLE/SEEK_DATA (or FIEMAP) worked,
>> then the file would already be sparse, so I don't think that will
>> help in this case...
>
> But only if other tools originally used them, and there will probably be plenty
> of cases where they don't, or legacy files, or ....

I was thinking that the suggestion was to use SEEK_HOLE/SEEK_DATA to find
the holes in the file... Of course, it makes a lot of sense if you use
them to skip the existing holes, and only look for strings of zeros in the
data parts...

Cheers, Andreas






2012-02-05 23:44:46

by Michael Tokarev

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 05.02.2012 00:10, Eric Sandeen wrote:
[]

Just a very quick look:

> * sparsify - utility to punch out blocks of 0s in a file
> int main(int argc, char **argv)
> {
[]
> if (optind == argc) {
> printf("Error: no filename specified\n");
> usage();
> }
>
> fname = argv[optind++];

There's no handling of the case when there are more than one file
specified on the command line.


> /*
> * Normalize to blocksize-aligned range:
> * round start down, round end up - get all blocks including the range specified
> */
>
> punch_range_start = round_down(punch_range_start, blocksize);
> punch_range_end = round_up(punch_range_end, blocksize);
> min_hole = round_up(min_hole, blocksize);
> if (!min_hole)
> min_hole = blocksize;

I think this deserves some bold warning if punch_range_start
or punch_hole_end is not a multiple of blocksize.

[]
> /*
> * Read through the file, finding block-aligned regions of 0s.
> * If the region is at least min_hole, punch it out.
> * This should be starting at a block-aligned offset
> */
>
> while ((ret = read(fd, readbuf, min_hole)) > 0) {
>
> if (!memcmp(readbuf, zerobuf, min_hole)) {

Now this is interesting. Can ret be < min_hole? Can a read
in a middle of a file be shorter than specified?

How it will work together with some other operation being done
at the same file -- ftruncate anyone?

Thanks!

/mjt

2012-02-05 23:55:59

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/5/12 5:44 PM, Michael Tokarev wrote:
> On 05.02.2012 00:10, Eric Sandeen wrote:
> []
>
> Just a very quick look:
>
>> * sparsify - utility to punch out blocks of 0s in a file
>> int main(int argc, char **argv)
>> {
> []
>> if (optind == argc) {
>> printf("Error: no filename specified\n");
>> usage();
>> }
>>
>> fname = argv[optind++];
>
> There's no handling of the case when there are more than one file
> specified on the command line.

ok

>
>> /*
>> * Normalize to blocksize-aligned range:
>> * round start down, round end up - get all blocks including the range specified
>> */
>>
>> punch_range_start = round_down(punch_range_start, blocksize);
>> punch_range_end = round_up(punch_range_end, blocksize);
>> min_hole = round_up(min_hole, blocksize);
>> if (!min_hole)
>> min_hole = blocksize;
>
> I think this deserves some bold warning if punch_range_start
> or punch_hole_end is not a multiple of blocksize.

well, we can only punch on block boundaries. But I suppose I should swap
round_up and round_down, so that we never punch anything that isn't *inside*
the specified range.

> []
>> /*
>> * Read through the file, finding block-aligned regions of 0s.
>> * If the region is at least min_hole, punch it out.
>> * This should be starting at a block-aligned offset
>> */
>>
>> while ((ret = read(fd, readbuf, min_hole)) > 0) {
>>
>> if (!memcmp(readbuf, zerobuf, min_hole)) {
>
> Now this is interesting. Can ret be < min_hole? Can a read
> in a middle of a file be shorter than specified?

yes, and yes (but unlikely i think)...


> How it will work together with some other operation being done
> at the same file -- ftruncate anyone?

I probably have some boundary condition & error checking to do yet :)

Thanks for the review,
-Eric

> Thanks!
>
> /mjt


2012-02-06 18:41:28

by Sunil Mushran

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 02/04/2012 12:04 PM, Eric Sandeen wrote:
> Now that ext4, xfs,& ocfs2 can support punch hole, a tool to
> "re-sparsify" a file by punching out ranges of 0s might be in order.
>
> I whipped this up fast, it probably has bugs& off-by-ones but thought
> I'd send it out. It's not terribly efficient doing 4k reads by default
> I suppose.
>
> I'll see if util-linux wants it after it gets beat into shape.
> (or did a tool like this already exist and I missed it?)
>
> (Another mode which does a file copy, possibly from stdin
> might be good, like e2fsprogs/contrib/make-sparse.c ? Although
> that can be hacked up with cp already).
>
> It works like this:
>
> [root@inode sparsify]# ./sparsify -h
> Usage: sparsify [-m min hole size] [-o offset] [-l length] filename


So I have a similar tool queued up in ocfs2-tools. Named puncher.
http://oss.oracle.com/git/?p=ocfs2-tools.git;a=shortlog;h=puncher

I'll pull it out if we get something in util-linux. But maybe you can
extract something useful from it.

Like.... maybe doing dry-run as default. It is an inplace modification
after all. Also using a large hole size as default (1MB). Over using
hole punching will negatively affect read performance. We should make
the sane choice for the user.

On a related note, it may make sense for ext4 to populate the cluster
size (bigalloc) in stat.st_blksize.

2 cents...

2012-02-06 21:41:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

Cool! I assume you're going to try to get this into util-linux-ng?

I'm tempted to drop it in e2fsprogs's contrib directxory for now, but
I think the best home for it is util-linux-ng.

- Ted

2012-02-06 21:47:42

by Eric Sandeen

[permalink] [raw]
Subject: Re: sparsify - utility to punch out blocks of 0s in a file

On 2/6/12 3:41 PM, Ted Ts'o wrote:
> Cool! I assume you're going to try to get this into util-linux-ng?
>
> I'm tempted to drop it in e2fsprogs's contrib directxory for now, but
> I think the best home for it is util-linux-ng.
>
> - Ted

Yep, I will do that, though it could use a fair bit of cleanup first.
kzak seemed amenable to taking it in.

-Eric