2009-05-24 17:07:27

by Thomas Glanzmann

[permalink] [raw]
Subject: zero out blocks of freed user data for operation a virtual machine environment

Hello Ted,
I would like to know if there is already a mount option or feature in
ext3/ext4 that automatically overwrites freed blocks with zeros? If this
is not the case I would like to know if you would consider a patch for
upstream? I'm asking this because I currently do some research work on
data deduplication in virtual machine environments and corresponding
backups. It would be a huge space saver if there is such a feature
because todays and tomorrows backup tools for virtual machine
environments work on the block layer (VMware Consolidated Backup, VMware
Data Recovery, and NetApp Snapshots). This is not only true for backup
tools but also for running Virtual machines. The case that this future
addresses is the following: A huge file is downloaded and later delted.
The backup and datadeduplication that is operating on the block level
can't identify the block as unused. This results in backing up the
amount of the data that was previously allocated by the file and as such
introduces an performance overhead. If you're interested in real live
data, I'm able to provide them.

If you don't intend to have such an optional feature in ext3/ext4 I
would like to know if you know a tool that makes it possible to zero out
unused blocks?

The only reference that I found for such a tool for Linux is the
following:

#!/bin/bash
FileSystem=`grep ext /etc/mtab| awk -F" " '{ print $2 }'`

for i in $FileSystem
do
number=`df -B 512 $i | awk -F" " '{print $4}'`
percent=$(echo "scale=0; $number * 95 / 100" | bc )
dd count=`echo $percent` if=/dev/zero of=`echo $i`/zf
rm -f $i/zf
done

Source: http://blog.core-it.com.au/?p=298

Even if certainly does job I would hardly recommend it to anyone for various
obvious reasons: A lot of I/O overhead that could be avoided, scheduling
this at the bad moment it could lead to full disk situation. And also
the blocksize is left the default and as such is way to low.

Just to be complete: For Microsoft Windows there is a tool called
sdelete which can be used to zero out unused disk blocks, again it has
the same problem as the above script but hopefully is saver to run.

Source: http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx

Thomas


2009-05-24 17:15:52

by Arjan van de Ven

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

On Sun, 24 May 2009 19:00:45 +0200
Thomas Glanzmann <[email protected]> wrote:

> Hello Ted,
> I would like to know if there is already a mount option or feature in
> ext3/ext4 that automatically overwrites freed blocks with zeros? If
> this is not the case I would like to know if you would consider a
> patch for upstream? I'm asking this because I currently do some
> research work on data deduplication in virtual machine environments
> and corresponding backups. It would be a huge space saver if there is
> such a feature because todays and tomorrows backup tools for virtual
> machine environments work on the block layer (VMware Consolidated
> Backup, VMware Data Recovery, and NetApp Snapshots). This is not only
> true for backup tools but also for running Virtual machines. The case
> that this future addresses is the following: A huge file is
> downloaded and later delted. The backup and datadeduplication that is
> operating on the block level can't identify the block as unused. This
> results in backing up the amount of the data that was previously
> allocated by the file and as such introduces an performance overhead.
> If you're interested in real live data, I'm able to provide them.
>
> If you don't intend to have such an optional feature in ext3/ext4 I
> would like to know if you know a tool that makes it possible to zero
> out unused blocks?
>

wouldn't it be better if the VM's would just support the TRIM command?


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-05-24 17:39:33

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Hello,

> > tunable feature that zeroes our free data

> wouldn't it be better if the VM's would just support the TRIM command?

the resources available to me indicate that the TRIM command is a not
yet standarized command targeted at SSD disks to indicate free disk
space. Does ext3/4 trigger a block device layer call that could result
in a TRIM command?

Thomas

2009-05-25 03:29:28

by David Newall

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Thomas Glanzmann wrote:
> If you don't intend to have such an optional feature in ext3/ext4 I
> would like to know if you know a tool that makes it possible to zero out
> unused blocks?
>
> The only reference that I found for such a tool for Linux is the
> following:
>
Astounding use of backquote. I'm not sure about the "percent" bit. I
think it's some confusion over only 95% of total blocks being available
for allocation.

I, too, would not recommend it, but it becomes safer by repeatedly
allocating only half the remaining disk, stopping when there's only a
few blocks free, to leave some for all of the other processes.
Presumably it won't be a problem having (potentially) a few free blocks
that don't de-duplicate.

#!/bin/bash
FileSystem=`grep ext /etc/mtab| awk -F" " '{ print $2 }'`

for i in $FileSystem
do
while number=`df -B 512 $i | awk -F" " '$4 < 10 {exit(1)} {print $4 / 2}'`
do
dd count=$number if=/dev/zero || break
done > $i/zf
rm -f $i/zf
done



Are you proposing to de-duplicate a live filesystem?

2009-05-25 05:26:53

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Hello David,

[ RESEND: CC forgotten ]

> Are you proposing to de-duplicate a live filesystem?

I do, but on the storage appliance / nfs server and not inside the VM.
But inside VM a filesystem could make the deduplication effort much
easier if it reports unused blocks to the outside world by overwriting
them with zero. I have two scenarios in the moment in my head:

- btrfs has already checksums. I'm at the moment evaluating if
the crc32 is good enough to find candidates for deduplication
or if a stronger checksum is required. After that one patch
needs to be adapted and ioctl needs to be implemented in btrfs
which than double checks if the blocks are for real
duplications of each other and deduplicates them

- btrfs will be at some point be able to generate a list of
blocks that have changed between two transactions. This list
can be used to create an (offsite-backup).

See also: http://thread.gmane.org/gmane.comp.file-systems.btrfs/2922

Thomas

PS: And it seems that NetApp has the above already in a product. They
have the ability to dedup blocks on WAFL and they also have a feature
that allows to have an offsite duplication of the filesystem.

2009-05-25 08:05:19

by Ron Yorston

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

I've written a tool to zero freed blocks in ext2/ext3 filesystems, as well
as a (half-baked) kernel patch. Details here:

http://intgat.tigress.co.uk/rmy/uml/sparsify.html

Ron

2009-05-25 10:50:36

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Hello Ron,

* Ron Yorston <[email protected]> [090525 09:49]:
> I've written a tool to zero freed blocks in ext2/ext3 filesystems, as well
> as a (half-baked) kernel patch. Details here:

> http://intgat.tigress.co.uk/rmy/uml/sparsify.html

nice work! While talking about sparse files: Do you know if there is an
option for qcow2 to reclaim zeroed out blocks (like a sparse in
userland)? I hope that this functionality hits upstream. It could also
used to provide a secure file deletion.

Thomas

2009-05-25 12:03:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

On Sun, May 24, 2009 at 07:39:33PM +0200, Thomas Glanzmann wrote:
> Hello,
>
> > > tunable feature that zeroes our free data
>
> > wouldn't it be better if the VM's would just support the TRIM command?
>
> the resources available to me indicate that the TRIM command is a not
> yet standarized command targeted at SSD disks to indicate free disk
> space. Does ext3/4 trigger a block device layer call that could result
> in a TRIM command?

Yes, it does, sb_issue_discard(). So if you wanted to hook into this
routine with a function which issued calls to zero out blocks, it
would be easy to create a private patch.

- Ted

2009-05-25 12:06:49

by Theodore Ts'o

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

On Sun, May 24, 2009 at 07:00:45PM +0200, Thomas Glanzmann wrote:
> Hello Ted,
> I would like to know if there is already a mount option or feature in
> ext3/ext4 that automatically overwrites freed blocks with zeros? If this
> is not the case I would like to know if you would consider a patch for
> upstream? I'm asking this because I currently do some research work on
> data deduplication in virtual machine environments and corresponding
> backups. It would be a huge space saver if there is such a feature
> because todays and tomorrows backup tools for virtual machine
> environments work on the block layer (VMware Consolidated Backup, VMware
> Data Recovery, and NetApp Snapshots). This is not only true for backup
> tools but also for running Virtual machines. The case that this future
> addresses is the following: A huge file is downloaded and later delted.
> The backup and datadeduplication that is operating on the block level
> can't identify the block as unused. This results in backing up the
> amount of the data that was previously allocated by the file and as such
> introduces an performance overhead. If you're interested in real live
> data, I'm able to provide them.

If you are planning to use this on production systems, forcing the
filesystem to zero out blocks to determine whether or not they are in
use is a terrible idea. The performance hit it would impose would
probably not be tolerated by most users.

It would be much better to design a system interface which allowed a
userspace program to be given a list of blocks that are in use given a
certain block range. That way said userspace program could easily
determine whether or not a particular block is in use or not.

- Ted

2009-05-25 12:34:29

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Hello Ted,

> Yes, it does, sb_issue_discard(). So if you wanted to hook into this
> routine with a function which issued calls to zero out blocks, it
> would be easy to create a private patch.

that sounds good because it wouldn't only target the most used
filesystem but every other filesystem that uses the interface as well.
Do you think that a tunable or configurable patch has a chance to hit
upstream as well?

Thomas

2009-05-25 13:14:02

by Goswin von Brederlow

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Thomas Glanzmann <[email protected]> writes:

> Hello Ted,
>
>> Yes, it does, sb_issue_discard(). So if you wanted to hook into this
>> routine with a function which issued calls to zero out blocks, it
>> would be easy to create a private patch.
>
> that sounds good because it wouldn't only target the most used
> filesystem but every other filesystem that uses the interface as well.
> Do you think that a tunable or configurable patch has a chance to hit
> upstream as well?
>
> Thomas

I could imagine a device mapper target that eats TRIM commands and
writes out zeroes instead. That should be easy to maintain outside or
inside the upstream kernel source.

MfG
Goswin

2009-05-25 14:01:50

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Hello Goswin,

> I could imagine a device mapper target that eats TRIM commands and
> writes out zeroes instead. That should be easy to maintain outside or
> inside the upstream kernel source.

again an interesting option and for sure easy to handle. However what
I'm really looking for is an option that gets upstream and will be
incorperated in major distributions so that this option is available on
every Linux distribution shipping in two years. However if this won't be
the case I'm going to consider writing a device mapper target.

Thomas

2009-05-25 17:50:44

by Chris Worley

[permalink] [raw]
Subject: RE: zero out blocks of freed user data for operation a virtual machine environment

On Mon, May 25, 2009 at 7:14 AM, Goswin von Brederlow <[email protected]> wrote:
> Thomas Glanzmann <[email protected]> writes:
> > Hello Ted,
> >
> >> Yes, it does, sb_issue_discard(). ?So if you wanted to hook into this
> >> routine with a function which issued calls to zero out blocks, it
> >> would be easy to create a private patch.
> >
> > that sounds good because it wouldn't only target the most used
> > filesystem but every other filesystem that uses the interface as well.
> > Do you think that a tunable or configurable patch has a chance to hit
> > upstream as well?
> >
> > ? ? ? ? Thomas
>
> I could imagine a device mapper target that eats TRIM commands and
> writes out zeroes instead. That should be easy to maintain outside or
> inside the upstream kernel source.

Why bother with a time-consuming performance-draining operation?
There are devices that already support TRIM/discard commands today,
and once you discard a block, it's completely irretrievable (you'll
just get back zeros if you try to read that block w/o writing it after
the discard).

Chris
>
>
> MfG
> ? ? ? ?Goswin

2009-05-25 21:19:29

by Bill Davidsen

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Thomas Glanzmann wrote:
> Hello Ted,
> I would like to know if there is already a mount option or feature in
> ext3/ext4 that automatically overwrites freed blocks with zeros? If this
> is not the case I would like to know if you would consider a patch for
> upstream? I'm asking this because I currently do some research work on
> data deduplication in virtual machine environments and corresponding
> backups. It would be a huge space saver if there is such a feature
> because todays and tomorrows backup tools for virtual machine
> environments work on the block layer (VMware Consolidated Backup, VMware
> Data Recovery, and NetApp Snapshots). This is not only true for backup
> tools but also for running Virtual machines. The case that this future
> addresses is the following: A huge file is downloaded and later delted.
> The backup and datadeduplication that is operating on the block level
> can't identify the block as unused. This results in backing up the
> amount of the data that was previously allocated by the file and as such
> introduces an performance overhead. If you're interested in real live
> data, I'm able to provide them.
>
> If you don't intend to have such an optional feature in ext3/ext4 I
> would like to know if you know a tool that makes it possible to zero out
> unused blocks?
>
Treating blocks as unused due to content seems a bad idea, if you want them to
be unused look for references to TRIM, if you want this for security look at
shred. And if you are interested in backing sparse files I believe that the tar
"-S" option will do what you want or provide code you can use to start writing
what you want.

I don't think this is a good solution to the problem that unused space is not
accounted as you wish it would be. Most filesystems have a bitmap to track this
already, a handle on that would be more generally useful.

Deleting files is slow enough, identifying unused storage by content is 1950s
thinking, and also ignores the fact that new drives often don't come zeroed, and
would behave badly unless you manually zeroed the unused portions.

I doubt this is the optimal solution, since you would have to read the zeros to
see if they were present, making backup smaller but no faster than just doing a
copy.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2009-05-26 04:45:58

by Thomas Glanzmann

[permalink] [raw]
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment

Bill,
I think you didn't read what I write so here is it again: My
applications are VMs. Every disk that you give to a VM is zeroed out, one
way ot the other: One way is to use dd or something that has the same
effect the other is using a sparse file. That is guranteed. Now as soon
as you start working in this VM it is not guranteed because on real live
applications it makes limited sense to zero out freed blocks (expect
maybe you have a SAN LUN exported from a storage device that supports
data deduplication or if you want that deleted files disappear from you
block device). Todays datadeduplication and backupsolutions for VM
depend on the property that unused data blocks are zeroed out. And
actually I can't think of an easier interface. As I proposed earlier, if
you don't like it for performance reasons, that's fine, but if you have
to backup 5.6 Terabyte instead of 17 Terabyte than this is a huge space
safer even with the performance overhead involved.

Thomas