LinuxLists.cc - Re: [PATCH v1 00/30] Ext4 snapshots

2011-06-07 15:57:01

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

Hi Amir,

thanks very much for the resend. I'll take a look at the whole patch
series, but first I want to bring up one important thing.

While this being a huge feature for ext4 (regardless on how
intrusive it is for the usual code paths) and while we already have
patches in the list with people interesting in looking into them, you
should clearly clarify what is the gain of it, what is the use case (and
I know you have one), and why it is better than other approaches. You
know, advertise it a bit in the marketing way :).

There is some confusion among developers on what actually are benefits
of ext4 snapshots in comparison to btrfs, or in comparison to the new
dm_multisnap code. I know that you have done quite a lot of testing to
assure that it does not actually change old ext4 behavior when snapshot
disabled, and that it works well when enabled, but have you done any
performance related benchmarks ? Do you have any expectations on how it
should behave in different work loads ?

It would be great to see and be able to confirm that ext4 snapshots are
really a win, not only on the feature side, but on the performance side
as well. I know that there are people out there still undecided or
having a strange feeling about your snapshot work. But who can blame
them, when we have not seen any hard data on this matter ?

So I, for myself, and I believe there are others, would like to see some
benchmark numbers and comparison (both, features and performance) with at
least new dm-multisnap code and probably btrfs and plain ext4 as well.

Thanks!
-Lukas

On Tue, 7 Jun 2011, [email protected] wrote:

> Hi All,
>
> I am resending the snapshots patch series as per Lukas's request.
> This time, the snapshot*.c files have not been omitted, as in
> the previous posting.
>
> The series is still based on ext4 dev branch sometime in the preparation
> for 3.0 merge window. It was not yet rebased on 3.0-rc1, so punch holes
> changes have not been addressed yet.
>
> As always, I advocate online review of the patches at:
> https://github.com/amir73il/ext4-snapshots/commits/for-ext4-v1
> but if you insist on doing it the old way, I won't complain.
>
> Thanks,
> Amir.
>
> [PATCH v1 01/36] ext4: EXT4 snapshots (Experimental)
> [PATCH v1 02/36] ext4: snapshot debugging support
> [PATCH v1 03/36] ext4: snapshot hooks - inside JBD hooks
> [PATCH v1 04/36] ext4: snapshot hooks - block bitmap access
> [PATCH v1 05/36] ext4: snapshot hooks - delete blocks
> [PATCH v1 06/36] ext4: snapshot hooks - move data blocks
> [PATCH v1 07/36] ext4: snapshot hooks - direct I/O
> [PATCH v1 08/36] ext4: snapshot hooks - move extent file data blocks
> [PATCH v1 09/36] ext4: snapshot file
> [PATCH v1 10/36] ext4: snapshot file - read through to block device
> [PATCH v1 11/36] ext4: snapshot file - permissions
> [PATCH v1 12/36] ext4: snapshot file - store on disk
> [PATCH v1 13/36] ext4: snapshot file - increase maximum file size limit to 16TB
> [PATCH v1 14/36] ext4: snapshot block operations
> [PATCH v1 15/36] ext4: snapshot block operation - copy blocks to snapshot
> [PATCH v1 16/36] ext4: snapshot block operation - move blocks to snapshot
> [PATCH v1 17/36] ext4: snapshot block operation - copy block bitmap to snapshot
> [PATCH v1 18/36] ext4: snapshot control
> [PATCH v1 19/36] ext4: snapshot control - init new snapshot
> [PATCH v1 20/36] ext4: snapshot control - fix new snapshot
> [PATCH v1 21/36] ext4: snapshot control - reserve disk space for snapshot
> [PATCH v1 22/36] ext4: snapshot journaled - increase transaction credits
> [PATCH v1 23/36] ext4: snapshot journaled - implement journal_release_buffer()
> [PATCH v1 24/36] ext4: snapshot journaled - bypass to save credits
> [PATCH v1 25/36] ext4: snapshot journaled - cache last COW tid in journal_head
> [PATCH v1 26/36] ext4: snapshot journaled - trace COW/buffer credits
> [PATCH v1 27/36] ext4: snapshot list support
> [PATCH v1 28/36] ext4: snapshot list - read through to previous snapshot
> [PATCH v1 29/36] ext4: snapshot race conditions - concurrent COW bitmap operations
> [PATCH v1 30/36] ext4: snapshot race conditions - concurrent COW operations
> [PATCH v1 31/36] ext4: snapshot race conditions - tracked reads
> [PATCH v1 32/36] ext4: snapshot exclude - the exclude bitmap
> [PATCH v1 33/36] ext4: snapshot cleanup
> [PATCH v1 34/36] ext4: snapshot cleanup - shrink deleted snapshots
> [PATCH v1 35/36] ext4: snapshot cleanup - merge shrunk snapshots
> [PATCH v1 36/36] ext4: snapshot rocompat - enable rw mount
>
> fs/ext4/Kconfig | 11 +
> fs/ext4/Makefile | 3 +
> fs/ext4/balloc.c | 132 +++
> fs/ext4/ext4.h | 188 ++++-
> fs/ext4/ext4_jbd2.c | 162 ++++-
> fs/ext4/ext4_jbd2.h | 266 ++++++-
> fs/ext4/extents.c | 157 ++++-
> fs/ext4/file.c | 11 +-
> fs/ext4/ialloc.c | 19 +-
> fs/ext4/inode.c | 668 +++++++++++++--
> fs/ext4/ioctl.c | 120 +++
> fs/ext4/mballoc.c | 161 ++++-
> fs/ext4/move_extent.c | 3 +-
> fs/ext4/namei.c | 9 +
> fs/ext4/resize.c | 19 +-
> fs/ext4/snapshot.c | 1000 ++++++++++++++++++++++
> fs/ext4/snapshot.h | 690 ++++++++++++++++
> fs/ext4/snapshot_buffer.c | 393 +++++++++
> fs/ext4/snapshot_ctl.c | 2002 +++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/snapshot_debug.c | 107 +++
> fs/ext4/snapshot_debug.h | 105 +++
> fs/ext4/snapshot_inode.c | 960 ++++++++++++++++++++++
> fs/ext4/super.c | 157 ++++-
> fs/ext4/xattr.c | 4 +-
> 24 files changed, 7182 insertions(+), 165 deletions(-)
>
>

--

2011-06-07 16:32:03

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Tue, Jun 7, 2011 at 6:56 PM, Lukas Czerner <[email protected]> wrote:
> Hi Amir,
>
> thanks very much for the resend. I'll take a look at the whole patch
> series, but first I want to bring up one important thing.
>
> While this being a huge feature for ext4 (regardless on how
> intrusive it is for the usual code paths) and while we already have
> patches in the list with people interesting in looking into them, you
> should clearly clarify what is the gain of it, what is the use case (and
> I know you have one), and why it is better than other approaches. You
> know, advertise it a bit in the marketing way :).

Hi Lukas,

Thank you for pointing out the marketing aspect.

I must admit that my user-case rather speaks for itself.
CTERA develops a NAS device which is specialized for
backing up local networks and snapshots gives the NAS a time
dimension without paying for it in disk space and performance.

The reason for not going with btrfs 3 years ago is clear.
So why not go with it now instead of moving forward to
ext4 with snapshots?
Part of the answer lies in the possibility to run fsck -x,
which gets rid of the snapshots in the case of fs corruption
and gets you back to good old stable and consistent ext4.

>
> There is some confusion among developers on what actually are benefits
> of ext4 snapshots in comparison to btrfs, or in comparison to the new
> dm_multisnap code. I know that you have done quite a lot of testing to
> assure that it does not actually change old ext4 behavior when snapshot
> disabled, and that it works well when enabled, but have you done any
> performance related benchmarks ? Do you have any expectations on how it
> should behave in different work loads ?
>
> It would be great to see and be able to confirm that ext4 snapshots are
> really a win, not only on the feature side, but on the performance side
> as well. I know that there are people out there still undecided or
> having a strange feeling about your snapshot work. But who can blame
> them, when we have not seen any hard data on this matter ?

Ehm.. I did present this benchmark on LSF:
http://global.phoronix-test-suite.com/index.php?k=profile&u=amir73il-4632-11284-26560

unless you snoozed ;-)
it shows performance vs. ext4 w/o snapshots and with snapshots
and while taking snapshots.
I did not compare with btrfs, but I bet there are ext4 vs. btrfs
benchmarks out there.
dm-multisnap is better than dm-snap only when it comes to overhead
per snapshot. it still copies every written block, which is far from
being the case in ext4 snapshots.

>
> So I, for myself, and I believe there are others, would like to see some
> benchmark numbers and comparison (both, features and performance) with at
> least new dm-multisnap code and probably btrfs and plain ext4 as well.
>
> Thanks!
> -Lukas
>
>
> On Tue, 7 Jun 2011, [email protected] wrote:
>
>> Hi All,
>>
>> I am resending the snapshots patch series as per Lukas's request.
>> This time, the snapshot*.c files have not been omitted, as in
>> the previous posting.
>>
>> The series is still based on ext4 dev branch sometime in the preparation
>> for 3.0 merge window. It was not yet rebased on 3.0-rc1, so punch holes
>> changes have not been addressed yet.
>>
>> As always, I advocate online review of the patches at:
>> https://github.com/amir73il/ext4-snapshots/commits/for-ext4-v1
>> but if you insist on doing it the old way, I won't complain.
>>
>> Thanks,
>> Amir.
>>
>> [PATCH v1 01/36] ext4: EXT4 snapshots (Experimental)
>> [PATCH v1 02/36] ext4: snapshot debugging support
>> [PATCH v1 03/36] ext4: snapshot hooks - inside JBD hooks
>> [PATCH v1 04/36] ext4: snapshot hooks - block bitmap access
>> [PATCH v1 05/36] ext4: snapshot hooks - delete blocks
>> [PATCH v1 06/36] ext4: snapshot hooks - move data blocks
>> [PATCH v1 07/36] ext4: snapshot hooks - direct I/O
>> [PATCH v1 08/36] ext4: snapshot hooks - move extent file data blocks
>> [PATCH v1 09/36] ext4: snapshot file
>> [PATCH v1 10/36] ext4: snapshot file - read through to block device
>> [PATCH v1 11/36] ext4: snapshot file - permissions
>> [PATCH v1 12/36] ext4: snapshot file - store on disk
>> [PATCH v1 13/36] ext4: snapshot file - increase maximum file size limit to 16TB
>> [PATCH v1 14/36] ext4: snapshot block operations
>> [PATCH v1 15/36] ext4: snapshot block operation - copy blocks to snapshot
>> [PATCH v1 16/36] ext4: snapshot block operation - move blocks to snapshot
>> [PATCH v1 17/36] ext4: snapshot block operation - copy block bitmap to snapshot
>> [PATCH v1 18/36] ext4: snapshot control
>> [PATCH v1 19/36] ext4: snapshot control - init new snapshot
>> [PATCH v1 20/36] ext4: snapshot control - fix new snapshot
>> [PATCH v1 21/36] ext4: snapshot control - reserve disk space for snapshot
>> [PATCH v1 22/36] ext4: snapshot journaled - increase transaction credits
>> [PATCH v1 23/36] ext4: snapshot journaled - implement journal_release_buffer()
>> [PATCH v1 24/36] ext4: snapshot journaled - bypass to save credits
>> [PATCH v1 25/36] ext4: snapshot journaled - cache last COW tid in journal_head
>> [PATCH v1 26/36] ext4: snapshot journaled - trace COW/buffer credits
>> [PATCH v1 27/36] ext4: snapshot list support
>> [PATCH v1 28/36] ext4: snapshot list - read through to previous snapshot
>> [PATCH v1 29/36] ext4: snapshot race conditions - concurrent COW bitmap operations
>> [PATCH v1 30/36] ext4: snapshot race conditions - concurrent COW operations
>> [PATCH v1 31/36] ext4: snapshot race conditions - tracked reads
>> [PATCH v1 32/36] ext4: snapshot exclude - the exclude bitmap
>> [PATCH v1 33/36] ext4: snapshot cleanup
>> [PATCH v1 34/36] ext4: snapshot cleanup - shrink deleted snapshots
>> [PATCH v1 35/36] ext4: snapshot cleanup - merge shrunk snapshots
>> [PATCH v1 36/36] ext4: snapshot rocompat - enable rw mount
>>
>> ?fs/ext4/Kconfig ? ? ? ? ? | ? 11 +
>> ?fs/ext4/Makefile ? ? ? ? ?| ? ?3 +
>> ?fs/ext4/balloc.c ? ? ? ? ?| ?132 +++
>> ?fs/ext4/ext4.h ? ? ? ? ? ?| ?188 ++++-
>> ?fs/ext4/ext4_jbd2.c ? ? ? | ?162 ++++-
>> ?fs/ext4/ext4_jbd2.h ? ? ? | ?266 ++++++-
>> ?fs/ext4/extents.c ? ? ? ? | ?157 ++++-
>> ?fs/ext4/file.c ? ? ? ? ? ?| ? 11 +-
>> ?fs/ext4/ialloc.c ? ? ? ? ?| ? 19 +-
>> ?fs/ext4/inode.c ? ? ? ? ? | ?668 +++++++++++++--
>> ?fs/ext4/ioctl.c ? ? ? ? ? | ?120 +++
>> ?fs/ext4/mballoc.c ? ? ? ? | ?161 ++++-
>> ?fs/ext4/move_extent.c ? ? | ? ?3 +-
>> ?fs/ext4/namei.c ? ? ? ? ? | ? ?9 +
>> ?fs/ext4/resize.c ? ? ? ? ?| ? 19 +-
>> ?fs/ext4/snapshot.c ? ? ? ?| 1000 ++++++++++++++++++++++
>> ?fs/ext4/snapshot.h ? ? ? ?| ?690 ++++++++++++++++
>> ?fs/ext4/snapshot_buffer.c | ?393 +++++++++
>> ?fs/ext4/snapshot_ctl.c ? ?| 2002 +++++++++++++++++++++++++++++++++++++++++++++
>> ?fs/ext4/snapshot_debug.c ?| ?107 +++
>> ?fs/ext4/snapshot_debug.h ?| ?105 +++
>> ?fs/ext4/snapshot_inode.c ?| ?960 ++++++++++++++++++++++
>> ?fs/ext4/super.c ? ? ? ? ? | ?157 ++++-
>> ?fs/ext4/xattr.c ? ? ? ? ? | ? ?4 +-
>> ?24 files changed, 7182 insertions(+), 165 deletions(-)
>>
>>
>
> --
>

2011-06-08 10:09:34

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Tue, 7 Jun 2011, Amir G. wrote:

> On Tue, Jun 7, 2011 at 6:56 PM, Lukas Czerner <[email protected]> wrote:
> > Hi Amir,
> >
> > thanks very much for the resend. I'll take a look at the whole patch
> > series, but first I want to bring up one important thing.
> >
> > While this being a huge feature for ext4 (regardless on how
> > intrusive it is for the usual code paths) and while we already have
> > patches in the list with people interesting in looking into them, you
> > should clearly clarify what is the gain of it, what is the use case (and
> > I know you have one), and why it is better than other approaches. You
> > know, advertise it a bit in the marketing way :).
>
> Hi Lukas,
>
> Thank you for pointing out the marketing aspect.
>
> I must admit that my user-case rather speaks for itself.
> CTERA develops a NAS device which is specialized for
> backing up local networks and snapshots gives the NAS a time
> dimension without paying for it in disk space and performance.
>
> The reason for not going with btrfs 3 years ago is clear.
> So why not go with it now instead of moving forward to
> ext4 with snapshots?
> Part of the answer lies in the possibility to run fsck -x,
> which gets rid of the snapshots in the case of fs corruption
> and gets you back to good old stable and consistent ext4.

But that is not even a real reason, is it ? When you need snapshots,
well, then you just need it and do no want to get rid of it. When fs
corruption appears, then it's bad in any case and the fsck should be
able to more or less fix it.

So you're saying that when corruption appears, then you *have to* blast
all snapshots ? I am not sure how btrfs is going to deal with it, but it
does seem like an advantage at all, why are you presenting it as such ?

>
> >
> > There is some confusion among developers on what actually are benefits
> > of ext4 snapshots in comparison to btrfs, or in comparison to the new
> > dm_multisnap code. I know that you have done quite a lot of testing to
> > assure that it does not actually change old ext4 behavior when snapshot
> > disabled, and that it works well when enabled, but have you done any
> > performance related benchmarks ? Do you have any expectations on how it
> > should behave in different work loads ?
> >
> > It would be great to see and be able to confirm that ext4 snapshots are
> > really a win, not only on the feature side, but on the performance side
> > as well. I know that there are people out there still undecided or
> > having a strange feeling about your snapshot work. But who can blame
> > them, when we have not seen any hard data on this matter ?
>
> Ehm.. I did present this benchmark on LSF:
> http://global.phoronix-test-suite.com/index.php?k=profile&u=amir73il-4632-11284-26560
>
> unless you snoozed ;-)
> it shows performance vs. ext4 w/o snapshots and with snapshots
> and while taking snapshots.

I believe that you just missed the fact that not everyone has attended LSF
and your lightning talk, but that's ok.

It seems to me that random writes are usually faster with you snapshot
code regardless whether you use snapshots or not. Is that because of
non snapshot related changes you've made ?

Also random reads seems to be slower with snapshots, is suspect that
this is because of read through, so the reason for the slowdown that it
was CPU bound ? I do not see any CPU utilization data.

The postmark results seems quite odd, it is actually a lot faster with
one snapshot and a lot slower with multiple snapshots, do you have an
idea what is going on ?

> I did not compare with btrfs, but I bet there are ext4 vs. btrfs
> benchmarks out there.
> dm-multisnap is better than dm-snap only when it comes to overhead
> per snapshot. it still copies every written block, which is far from
> being the case in ext4 snapshots.

Nevertheless, I still have not seen any comparison with other
snapshotting possibilities we have. Note that ext4 to btrfs comparison
is not enough, because we do not know what is the difference between
the difference of ext4 with/without snapshots and btrfs with/without
snapshots. The reason for this is that btrfs performance is very likely
to scale up, but ext4 is pretty much done in that matter and I do not
expect any huge performance leaps in the future.

Also, rejecting dm-multisnap based on this statement is not enough, show
us some numbers.

I believe that it is not very convenient for you, because this feature
support your business case and you do not necessarily want to find out
that there might be a better way, especially after the work you have
done already.

So it might be unpleasant for you that people ask questions and delaying
the inclusion of ext4 snapshots. But what you see as obstacles people
are throwing at you is really just caution, especially when it comes to
ext4 which is seen as a simple, stable, reliable and predictable linux
filesystem, but I bet you understand.

And one last note, I also think that the snapshot format change in the
future, when we'll have snpashots with 64bit feature compatible seems
just wrong to me. Adding some features or changing the implementation a
bit is ok, but format change is different. When the code is upstream and
stable it is just wrong.

Thanks!
-Lukas

>
> >
> > So I, for myself, and I believe there are others, would like to see some
> > benchmark numbers and comparison (both, features and performance) with at
> > least new dm-multisnap code and probably btrfs and plain ext4 as well.
> >
> > Thanks!
> > -Lukas
> >

2011-06-08 14:04:58

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, Jun 8, 2011 at 1:09 PM, Lukas Czerner <[email protected]> wrote:
> On Tue, 7 Jun 2011, Amir G. wrote:
>
>> On Tue, Jun 7, 2011 at 6:56 PM, Lukas Czerner <[email protected]> wrote:
>> > Hi Amir,
>> >
>> > thanks very much for the resend. I'll take a look at the whole patch
>> > series, but first I want to bring up one important thing.
>> >
>> > While this being a huge feature for ext4 (regardless on how
>> > intrusive it is for the usual code paths) and while we already have
>> > patches in the list with people interesting in looking into them, you
>> > should clearly clarify what is the gain of it, what is the use case (and
>> > I know you have one), and why it is better than other approaches. You
>> > know, advertise it a bit in the marketing way :).
>>
>> Hi Lukas,
>>
>> Thank you for pointing out the marketing aspect.
>>
>> I must admit that my user-case rather speaks for itself.
>> CTERA develops a NAS device which is specialized for
>> backing up local networks and snapshots gives the NAS a time
>> dimension without paying for it in disk space and performance.
>>
>> The reason for not going with btrfs 3 years ago is clear.
>> So why not go with it now instead of moving forward to
>> ext4 with snapshots?
>> Part of the answer lies in the possibility to run fsck -x,
>> which gets rid of the snapshots in the case of fs corruption
>> and gets you back to good old stable and consistent ext4.
>
> But that is not even a real reason, is it ? When you need snapshots,
> well, then you just need it and do no want to get rid of it. When fs
> corruption appears, then it's bad in any case and the fsck should be
> able to more or less fix it.
>
> So you're saying that when corruption appears, then you *have to* blast
> all snapshots ? I am not sure how btrfs is going to deal with it, but it
> does seem like an advantage at all, why are you presenting it as such ?
>

Hi Lukas,

First of all, thank you for being strict with me.
I admit to having lousy marketing skills...

The market I am targeting are the sys admins who
are very cautious about their 'data' and are reluctant
therefor to migrate from ext3 to ext4, not to speak of
btrfs.

To this market I say, you can have snapshots of your
'data' on ext4 without risking the proven stability of ext4.
The snapshots of the 'data' are not guarantied to be as
stable (being a new feature), but because the snapshots
are second to 'data' in ext4 snapshots, corrupted snapshots
will not risk the 'data'.

During 1 year of next3 in production systems, we found bugs.
But none of the bugs corrupted 'data'. All of the bugs which
caused file system to contain errors, the errors were restricted
to snapshot files and in those worst cases, we could always
go to emergency plan B (plan A being fsck -p) and run fsck -x
which always solved the problem.

The customer was always consulted before resorting to 'plan B'
and was given the chance to copy out 'data' from the snapshots
(it was always possible) before we discard them.

Needless to say, the said bugs were fixed and ext4 snapshots
will enjoy the stability of next3 and the 'fail safe' nature of the
solution, which was proven several times on the field.

>>
>> >
>> > There is some confusion among developers on what actually are benefits
>> > of ext4 snapshots in comparison to btrfs, or in comparison to the new
>> > dm_multisnap code. I know that you have done quite a lot of testing to
>> > assure that it does not actually change old ext4 behavior when snapshot
>> > disabled, and that it works well when enabled, but have you done any
>> > performance related benchmarks ? Do you have any expectations on how it
>> > should behave in different work loads ?
>> >
>> > It would be great to see and be able to confirm that ext4 snapshots are
>> > really a win, not only on the feature side, but on the performance side
>> > as well. I know that there are people out there still undecided or
>> > having a strange feeling about your snapshot work. But who can blame
>> > them, when we have not seen any hard data on this matter ?
>>
>> Ehm.. I did present this benchmark on LSF:
>> http://global.phoronix-test-suite.com/index.php?k=profile&u=amir73il-4632-11284-26560
>>
>> unless you snoozed ;-)
>> it shows performance vs. ext4 w/o snapshots and with snapshots
>> and while taking snapshots.
>
> I believe that you just missed the fact that not everyone has attended LSF
> and your lightning talk, but that's ok.

That's not really OK. I should have posted the results
and analysis on my wiki (the results are there).

>
> It seems to me that random writes are usually faster with you snapshot
> code regardless whether you use snapshots or not. Is that because of
> non snapshot related changes you've made ?

Not that I know of.
I can explain why random write onesnap is faster than nosnap
and why 1snappermin is faster than onesnap, but I am not
sure about nosnap vs. plain ext4.

>
> Also random reads seems to be slower with snapshots, is suspect that
> this is because of read through, so the reason for the slowdown that it
> was CPU bound ? I do not see any CPU utilization data.
>

Only the 1snappermin is slower.
I suspect it has to do with the fs freezes, but I admin I have not
looked into it.

> The postmark results seems quite odd, it is actually a lot faster with
> one snapshot and a lot slower with multiple snapshots, do you have an
> idea what is going on ?
>

The name onesnap is misleading. It should have been
existingsnaps.
The important factor is whether or not snapshots are taken during the test.
In the 1snappermin case, postmark is the only test that exposes the
weak spot of ext4 snapshots performance - deletes/truncates.
create file+delete file with existing snapshots has no overhead (no COW).
create file+take snapshot+delete file has the overhead of moving the
deleted blocks to snapshot.
With regards to speed up of onesnap, postmark is randomizing the file
creates/write so it may be a similar effect to random write.
I did not investigate this.

>> I did not compare with btrfs, but I bet there are ext4 vs. btrfs
>> benchmarks out there.
>> dm-multisnap is better than dm-snap only when it comes to overhead
>> per snapshot. it still copies every written block, which is far from
>> being the case in ext4 snapshots.
>
> Nevertheless, I still have not seen any comparison with other
> snapshotting possibilities we have. Note that ext4 to btrfs comparison
> is not enough, because we do not know what is the difference between
> the difference of ext4 with/without snapshots and btrfs with/without
> snapshots. The reason for this is that btrfs performance is very likely
> to scale up, but ext4 is pretty much done in that matter and I do not
> expect any huge performance leaps in the future.
>
> Also, rejecting dm-multisnap based on this statement is not enough, show
> us some numbers.

Well, if you come to understand the difference between fs level an dm
level snapshots, you will see why i am rejecting dm-multisnap
(performance wise only!).

Anyway #1: I have already answered this questions 2 years ago and I
think the answers are still valid both for LVM and btrfs:
http://sourceforge.net/apps/mediawiki/next3/index.php?title=FAQ#Why_use_Next3_snapshots_and_not_LVM_snapshots.3F

Anyway #2: I need to give you some numbers ;-)

>
> I believe that it is not very convenient for you, because this feature
> support your business case and you do not necessarily want to find out
> that there might be a better way, especially after the work you have
> done already.

Your analysis of my motives is correct :-)
The use of the term 'better way' I reject.
I think that ext4/btrfs/LVM snapshots are apples and oranges and hamburgers.
The question of whether the world needs ext4 snapshots is
perfectly valid, but going back to the food analogy, I think it's
a case of "the proof of the pudding is in the eating".
I have no doubt that if ext4 snapshots are merged, many people will use it.
And I think that is a good enough (if not the best)
reason for inclusion.

>
> So it might be unpleasant for you that people ask questions and delaying
> the inclusion of ext4 snapshots. But what you see as obstacles people
> are throwing at you is really just caution, especially when it comes to
> ext4 which is seen as a simple, stable, reliable and predictable linux
> filesystem, but I bet you understand.
>

Yes, I understand. As evidence, I posted the "core patches"
to get them reviewed for "safely" and "stability" rather than
"functionality". (and that didn't work out well, but I understand that as well).

> And one last note, I also think that the snapshot format change in the
> future, when we'll have snpashots with 64bit feature compatible seems
> just wrong to me. Adding some features or changing the implementation a
> bit is ok, but format change is different. When the code is upstream and
> stable it is just wrong.

What can I say, I understand why it looks bad, but is 64bit code
upstream and stable? Hell no! e2fsprogs 64bit is not out yet!
There is no reason to call it 'format change'.
It's going to be a new format used only for 64bit fs, which are not
even out there yet. And when they are finally out there, they won't
have
snapshots until the new format is implemented.

And more important, say I do implement a new 48bit logical
offsets file format, so my employer can provide snapshots on
>16TB volumes in future releases.
I will not recommend my employer to use this format on <16TB volumes,
because there is nothing wrong with staying with the simple and well
tested indirect mapped snapshot format in future releases.

Thanks for your time and patience,
Amir.

2011-06-08 14:41:46

by Eric Sandeen

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On 6/8/11 9:04 AM, Amir G. wrote:
>> And one last note, I also think that the snapshot format change in the
>> > future, when we'll have snpashots with 64bit feature compatible seems
>> > just wrong to me. Adding some features or changing the implementation a
>> > bit is ok, but format change is different. When the code is upstream and
>> > stable it is just wrong.
> What can I say, I understand why it looks bad, but is 64bit code
> upstream and stable? Hell no! e2fsprogs 64bit is not out yet!
> There is no reason to call it 'format change'.
> It's going to be a new format used only for 64bit fs, which are not
> even out there yet. And when they are finally out there, they won't
> have
> snapshots until the new format is implemented.

Well, the on-disk format for 64-bit (48-bit?) ext4 is there & fixed; it's
just that there is no released userspace which can properly handle it, right?

I don't anticipate ext4 format changes for >16T, or am I missing something?

-Eric

2011-06-08 15:01:49

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, Jun 8, 2011 at 5:41 PM, Eric Sandeen <[email protected]> wrote:
> On 6/8/11 9:04 AM, Amir G. wrote:
>>> And one last note, I also think that the snapshot format change in the
>>> > future, when we'll have snpashots with 64bit feature compatible seems
>>> > just wrong to me. Adding some features or changing the implementation a
>>> > bit is ok, but format change is different. When the code is upstream and
>>> > stable it is just wrong.
>> What can I say, I understand why it looks bad, but is 64bit code
>> upstream and stable? Hell no! e2fsprogs 64bit is not out yet!
>> There is no reason to call it 'format change'.
>> It's going to be a new format used only for 64bit fs, which are not
>> even out there yet. And when they are finally out there, they won't
>> have
>> snapshots until the new format is implemented.
>
> Well, the on-disk format for 64-bit (48-bit?) ext4 is there & fixed; it's
> just that there is no released userspace which can properly handle it, right?

I don't know, you tell me.
Are there many users out there using 64bit feature, without the proper
user space tools?

>
> I don't anticipate ext4 format changes for >16T, or am I missing something?
>
> -Eric
>

Argh! I wish I hadn't missed the Monday call (it's
not in a good time for me).
This whole 'format change' has gone out of control
and I find it hard to present my case properly on scattered emails.

The message I am trying to get through is:
There is 32bit snapshot file format, which is implemented and well tested.
There is 64bit snapshot file format, which is not implemented yet, so
64bit and snapshot feature are mutually exclusive.
If and when 64bit snapshot file format will be implemented, it will be
a new type of extent mapped file (v2) with 48bit logical addresses.
Is this a 'format change'? Call it what you will, but it shouldn't
affect anything on existing structures. It should only affect the
non-existing structure of 64bit snapshot file.

Does this answer your question?

Amir.

2011-06-08 15:23:04

by Eric Sandeen

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On 6/8/11 10:01 AM, Amir G. wrote:
> On Wed, Jun 8, 2011 at 5:41 PM, Eric Sandeen <[email protected]> wrote:
>> On 6/8/11 9:04 AM, Amir G. wrote:
>>>> And one last note, I also think that the snapshot format change in the
>>>>> future, when we'll have snpashots with 64bit feature compatible seems
>>>>> just wrong to me. Adding some features or changing the implementation a
>>>>> bit is ok, but format change is different. When the code is upstream and
>>>>> stable it is just wrong.
>>> What can I say, I understand why it looks bad, but is 64bit code
>>> upstream and stable? Hell no! e2fsprogs 64bit is not out yet!
>>> There is no reason to call it 'format change'.
>>> It's going to be a new format used only for 64bit fs, which are not
>>> even out there yet. And when they are finally out there, they won't
>>> have
>>> snapshots until the new format is implemented.
>>
>> Well, the on-disk format for 64-bit (48-bit?) ext4 is there & fixed; it's
>> just that there is no released userspace which can properly handle it, right?
>
> I don't know, you tell me.
> Are there many users out there using 64bit feature, without the proper
> user space tools?

No, but that doesn't mean the disk format has to change when the tools
come out... I just don't want to confuse "there are no tools" with
"the disk format is unstable" - Andreas et. al. have been using
that format for years.

>>
>> I don't anticipate ext4 format changes for >16T, or am I missing something?
>>
>> -Eric
>>
>
> Argh! I wish I hadn't missed the Monday call (it's
> not in a good time for me).
> This whole 'format change' has gone out of control
> and I find it hard to present my case properly on scattered emails.

Sorry; I may have just misunderstood...

> The message I am trying to get through is:
> There is 32bit snapshot file format, which is implemented and well tested.
> There is 64bit snapshot file format, which is not implemented yet, so
> 64bit and snapshot feature are mutually exclusive.
> If and when 64bit snapshot file format will be implemented, it will be
> a new type of extent mapped file (v2) with 48bit logical addresses.
> Is this a 'format change'? Call it what you will, but it shouldn't
> affect anything on existing structures. It should only affect the
> non-existing structure of 64bit snapshot file.
>
> Does this answer your question?

Yes, I guess I had misunderstood your point; I thought you were
implying that ext4's format had to change to support 64-bit, so why
not change snapshots along with it....

But you're just saying that you wish to push 32-bit snapshots which only
work with certain sizes of ext4 filesystems now, and later you will
release a new snapshot format which works with the larger filesystems.
Right?

(I don't actually know if we'll ever have 64-bit ext4, though, there
are still so many scaling issues beyond just being able to mkfs,
mount, growfs etc ... it's a serious game of catch-up with xfs
in that space, IMHO, which has been doing it well for years now...)

Still, pushing snapshots upstream which will have an on-disk format
more limited than the rest of the filesystem's on-disk format
does strike me as suboptimal from a pure technical design POV.

What if we proposed, say, xattr code that could only apply xattrs
to files located in the first 16T? I don't think it'd be accepted.

I understand that you have a history and a format and a business case,
but that really should not change whether we do it right the first time,
upstream, IMHO... But I'm just the peanut gallery, here.... ;)

-Eric

> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-06-08 15:33:27

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, Jun 8, 2011 at 6:22 PM, Eric Sandeen <[email protected]> wrote:
> On 6/8/11 10:01 AM, Amir G. wrote:
>> On Wed, Jun 8, 2011 at 5:41 PM, Eric Sandeen <[email protected]> wrote:
>>> On 6/8/11 9:04 AM, Amir G. wrote:
>>>>> And one last note, I also think that the snapshot format change in the
>>>>>> future, when we'll have snpashots with 64bit feature compatible seems
>>>>>> just wrong to me. Adding some features or changing the implementation a
>>>>>> bit is ok, but format change is different. When the code is upstream and
>>>>>> stable it is just wrong.
>>>> What can I say, I understand why it looks bad, but is 64bit code
>>>> upstream and stable? Hell no! e2fsprogs 64bit is not out yet!
>>>> There is no reason to call it 'format change'.
>>>> It's going to be a new format used only for 64bit fs, which are not
>>>> even out there yet. And when they are finally out there, they won't
>>>> have
>>>> snapshots until the new format is implemented.
>>>
>>> Well, the on-disk format for 64-bit (48-bit?) ext4 is there & fixed; it's
>>> just that there is no released userspace which can properly handle it, right?
>>
>> I don't know, you tell me.
>> Are there many users out there using 64bit feature, without the proper
>> user space tools?
>
> No, but that doesn't mean the disk format has to change when the tools
> come out... I just don't want to confuse "there are no tools" with
> "the disk format is unstable" - Andreas et. al. have been using
> that format for years.
>
>>>
>>> I don't anticipate ext4 format changes for >16T, or am I missing something?
>>>
>>> -Eric
>>>
>>
>> Argh! I wish I hadn't missed the Monday call (it's
>> not in a good time for me).
>> This whole 'format change' has gone out of control
>> and I find it hard to present my case properly on scattered emails.
>
> Sorry; I may have just misunderstood...
>
>> The message I am trying to get through is:
>> There is 32bit snapshot file format, which is implemented and well tested.
>> There is 64bit snapshot file format, which is not implemented yet, so
>> 64bit and snapshot feature are mutually exclusive.
>> If and when 64bit snapshot file format will be implemented, it will be
>> a new type of extent mapped file (v2) with 48bit logical addresses.
>> Is this a 'format change'? Call it what you will, but it shouldn't
>> affect anything on existing structures. It should only affect the
>> non-existing structure of 64bit snapshot file.
>>
>> Does this answer your question?
>
> Yes, I guess I had misunderstood your point; I thought you were
> implying that ext4's format had to change to support 64-bit, so why
> not change snapshots along with it....
>
> But you're just saying that you wish to push 32-bit snapshots which only
> work with certain sizes of ext4 filesystems now, and later you will
> release a new snapshot format which works with the larger filesystems.
> Right?

Right. Where 'Larger filesystems' := 64bit block addresses.

>
> (I don't actually know if we'll ever have 64-bit ext4, though, there
> are still so many scaling issues beyond just being able to mkfs,
> mount, growfs etc ... it's a serious game of catch-up with xfs
> in that space, IMHO, which has been doing it well for years now...)

More of a good reason to push a snapshot file format that work well
with 32bit ext4.

>
> Still, pushing snapshots upstream which will have an on-disk format
> more limited than the rest of the filesystem's on-disk format
> does strike me as suboptimal from a pure technical design POV.
>
> What if we proposed, say, xattr code that could only apply xattrs
> to files located in the first 16T? ?I don't think it'd be accepted.

That is not a correct analogy. The correct analogy is not supporting
xattrs on 64-bit ext4. Whether it makes sense or not for snapshots
depends IMHO on whether people find snapshot on 32bit ext4 only
useful or not.

I naturally think that people will find it useful.
Anyone can add snapshots to his existing 32-bit ext4,
No one can migrate the same fs to 64-bit...

>
> I understand that you have a history and a format and a business case,
> but that really should not change whether we do it right the first time,
> upstream, IMHO... ?But I'm just the peanut gallery, here.... ?;)
>
> -Eric
>
>> Amir.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
>

2011-06-08 15:39:12

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, 8 Jun 2011, Amir G. wrote:

> On Wed, Jun 8, 2011 at 1:09 PM, Lukas Czerner <[email protected]> wrote:
> > On Tue, 7 Jun 2011, Amir G. wrote:
> >
> >> On Tue, Jun 7, 2011 at 6:56 PM, Lukas Czerner <[email protected]> wrote:
> >> > Hi Amir,
> >> >
> >> > thanks very much for the resend. I'll take a look at the whole patch
> >> > series, but first I want to bring up one important thing.
> >> >
> >> > While this being a huge feature for ext4 (regardless on how
> >> > intrusive it is for the usual code paths) and while we already have
> >> > patches in the list with people interesting in looking into them, you
> >> > should clearly clarify what is the gain of it, what is the use case (and
> >> > I know you have one), and why it is better than other approaches. You
> >> > know, advertise it a bit in the marketing way :).
> >>
> >> Hi Lukas,
> >>
> >> Thank you for pointing out the marketing aspect.
> >>
> >> I must admit that my user-case rather speaks for itself.
> >> CTERA develops a NAS device which is specialized for
> >> backing up local networks and snapshots gives the NAS a time
> >> dimension without paying for it in disk space and performance.
> >>
> >> The reason for not going with btrfs 3 years ago is clear.
> >> So why not go with it now instead of moving forward to
> >> ext4 with snapshots?
> >> Part of the answer lies in the possibility to run fsck -x,
> >> which gets rid of the snapshots in the case of fs corruption
> >> and gets you back to good old stable and consistent ext4.
> >
> > But that is not even a real reason, is it ? When you need snapshots,
> > well, then you just need it and do no want to get rid of it. When fs
> > corruption appears, then it's bad in any case and the fsck should be
> > able to more or less fix it.
> >
> > So you're saying that when corruption appears, then you *have to* blast
> > all snapshots ? I am not sure how btrfs is going to deal with it, but it
> > does seem like an advantage at all, why are you presenting it as such ?
> >
>
> Hi Lukas,
>
> First of all, thank you for being strict with me.
> I admit to having lousy marketing skills...
>
> The market I am targeting are the sys admins who
> are very cautious about their 'data' and are reluctant
> therefor to migrate from ext3 to ext4, not to speak of
> btrfs.

Well, that's why I am concerned with merging the ext4 snapshots. This is
exactly the reason why people will get nervous when you try to push a
huge change like ext4 snapshots into the stable code base. Yes, when you
do not compile it in, it does not affect the fs very much, but try to
tell people that ext4 is not the old-good-stable-ext4 when you enable
this feature. And I do not believe that snapshot code does not interfere
with the old ext4 code paths, so there is a place for horrible bugs
waiting for us.

>
> To this market I say, you can have snapshots of your
> 'data' on ext4 without risking the proven stability of ext4.
> The snapshots of the 'data' are not guarantied to be as
> stable (being a new feature), but because the snapshots
> are second to 'data' in ext4 snapshots, corrupted snapshots
> will not risk the 'data'.
>
> During 1 year of next3 in production systems, we found bugs.
> But none of the bugs corrupted 'data'. All of the bugs which
> caused file system to contain errors, the errors were restricted
> to snapshot files and in those worst cases, we could always
> go to emergency plan B (plan A being fsck -p) and run fsck -x
> which always solved the problem.

It does not matter that much how long or how much your embedded
production systems are out there. The fact is that it is really very
limited work load variation, hence very limited testing.

>
> The customer was always consulted before resorting to 'plan B'
> and was given the chance to copy out 'data' from the snapshots
> (it was always possible) before we discard them.

So it is true, when you have an fs problem (corruption) you have to
blast off all your snapshots ?

>
> Needless to say, the said bugs were fixed and ext4 snapshots
> will enjoy the stability of next3 and the 'fail safe' nature of the
> solution, which was proven several times on the field.
>
>
> >>
> >> >
> >> > There is some confusion among developers on what actually are benefits
> >> > of ext4 snapshots in comparison to btrfs, or in comparison to the new
> >> > dm_multisnap code. I know that you have done quite a lot of testing to
> >> > assure that it does not actually change old ext4 behavior when snapshot
> >> > disabled, and that it works well when enabled, but have you done any
> >> > performance related benchmarks ? Do you have any expectations on how it
> >> > should behave in different work loads ?
> >> >
> >> > It would be great to see and be able to confirm that ext4 snapshots are
> >> > really a win, not only on the feature side, but on the performance side
> >> > as well. I know that there are people out there still undecided or
> >> > having a strange feeling about your snapshot work. But who can blame
> >> > them, when we have not seen any hard data on this matter ?
> >>
> >> Ehm.. I did present this benchmark on LSF:
> >> http://global.phoronix-test-suite.com/index.php?k=profile&u=amir73il-4632-11284-26560
> >>
> >> unless you snoozed ;-)
> >> it shows performance vs. ext4 w/o snapshots and with snapshots
> >> and while taking snapshots.
> >
> > I believe that you just missed the fact that not everyone has attended LSF
> > and your lightning talk, but that's ok.
>
> That's not really OK. I should have posted the results
> and analysis on my wiki (the results are there).
>
> >
> > It seems to me that random writes are usually faster with you snapshot
> > code regardless whether you use snapshots or not. Is that because of
> > non snapshot related changes you've made ?
>
> Not that I know of.
> I can explain why random write onesnap is faster than nosnap
> and why 1snappermin is faster than onesnap, but I am not
> sure about nosnap vs. plain ext4.
>
> >
> > Also random reads seems to be slower with snapshots, is suspect that
> > this is because of read through, so the reason for the slowdown that it
> > was CPU bound ? I do not see any CPU utilization data.
> >
>
> Only the 1snappermin is slower.
> I suspect it has to do with the fs freezes, but I admin I have not
> looked into it.
>
> > The postmark results seems quite odd, it is actually a lot faster with
> > one snapshot and a lot slower with multiple snapshots, do you have an
> > idea what is going on ?
> >
>
> The name onesnap is misleading. It should have been
> existingsnaps.
> The important factor is whether or not snapshots are taken during the test.
> In the 1snappermin case, postmark is the only test that exposes the
> weak spot of ext4 snapshots performance - deletes/truncates.
> create file+delete file with existing snapshots has no overhead (no COW).
> create file+take snapshot+delete file has the overhead of moving the
> deleted blocks to snapshot.
> With regards to speed up of onesnap, postmark is randomizing the file
> creates/write so it may be a similar effect to random write.
> I did not investigate this.
>
> >> I did not compare with btrfs, but I bet there are ext4 vs. btrfs
> >> benchmarks out there.
> >> dm-multisnap is better than dm-snap only when it comes to overhead
> >> per snapshot. it still copies every written block, which is far from
> >> being the case in ext4 snapshots.
> >
> > Nevertheless, I still have not seen any comparison with other
> > snapshotting possibilities we have. Note that ext4 to btrfs comparison
> > is not enough, because we do not know what is the difference between
> > the difference of ext4 with/without snapshots and btrfs with/without
> > snapshots. The reason for this is that btrfs performance is very likely
> > to scale up, but ext4 is pretty much done in that matter and I do not
> > expect any huge performance leaps in the future.
> >
> > Also, rejecting dm-multisnap based on this statement is not enough, show
> > us some numbers.
>
> Well, if you come to understand the difference between fs level an dm
> level snapshots, you will see why i am rejecting dm-multisnap
> (performance wise only!).

But I do understand the difference. And also, when it comes to fs level
snapshotting I would suspect that it would do something we can not do
with the current solutions, for example per-file or per-directory snapshots,
cat ext4 snapshots do that ?

>
> Anyway #1: I have already answered this questions 2 years ago and I
> think the answers are still valid both for LVM and btrfs:
> http://sourceforge.net/apps/mediawiki/next3/index.php?title=FAQ#Why_use_Next3_snapshots_and_not_LVM_snapshots.3F

But again, it was two years ago and even back then you have not had any
numbers proving your statements.

>
> Anyway #2: I need to give you some numbers ;-)

That would be great. Thanks!

>
> >
> > I believe that it is not very convenient for you, because this feature
> > support your business case and you do not necessarily want to find out
> > that there might be a better way, especially after the work you have
> > done already.
>
> Your analysis of my motives is correct :-)
> The use of the term 'better way' I reject.
> I think that ext4/btrfs/LVM snapshots are apples and oranges and hamburgers.

But they are really not, because otherwise it would complement each
other, but they are all trying to do the same thing, except btrfs has
it for free.

> The question of whether the world needs ext4 snapshots is
> perfectly valid, but going back to the food analogy, I think it's
> a case of "the proof of the pudding is in the eating".
> I have no doubt that if ext4 snapshots are merged, many people will use it.

Well, I would like to have your confidence. Why do you think so ? They
will use it for what ? Doing backups ? We can do this easily with LVM
without any risk of compromising existing filesystem at all. On desktop
? I very much doubt that since you can not do per directory (or per
file) snapshots, can you ?

> And I think that is a good enough (if not the best)
> reason for inclusion.

It would be of course, except you're the only one saying that.

Thanks!
-Lukas

2011-06-08 15:59:53

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, Jun 8, 2011 at 6:38 PM, Lukas Czerner <[email protected]> wrote:
> On Wed, 8 Jun 2011, Amir G. wrote:
>
>> On Wed, Jun 8, 2011 at 1:09 PM, Lukas Czerner <[email protected]> wrote:
>> > On Tue, 7 Jun 2011, Amir G. wrote:
>> >
>> >> On Tue, Jun 7, 2011 at 6:56 PM, Lukas Czerner <[email protected]> wrote:
>> >> > Hi Amir,
>> >> >
>> >> > thanks very much for the resend. I'll take a look at the whole patch
>> >> > series, but first I want to bring up one important thing.
>> >> >
>> >> > While this being a huge feature for ext4 (regardless on how
>> >> > intrusive it is for the usual code paths) and while we already have
>> >> > patches in the list with people interesting in looking into them, you
>> >> > should clearly clarify what is the gain of it, what is the use case (and
>> >> > I know you have one), and why it is better than other approaches. You
>> >> > know, advertise it a bit in the marketing way :).
>> >>
>> >> Hi Lukas,
>> >>
>> >> Thank you for pointing out the marketing aspect.
>> >>
>> >> I must admit that my user-case rather speaks for itself.
>> >> CTERA develops a NAS device which is specialized for
>> >> backing up local networks and snapshots gives the NAS a time
>> >> dimension without paying for it in disk space and performance.
>> >>
>> >> The reason for not going with btrfs 3 years ago is clear.
>> >> So why not go with it now instead of moving forward to
>> >> ext4 with snapshots?
>> >> Part of the answer lies in the possibility to run fsck -x,
>> >> which gets rid of the snapshots in the case of fs corruption
>> >> and gets you back to good old stable and consistent ext4.
>> >
>> > But that is not even a real reason, is it ? When you need snapshots,
>> > well, then you just need it and do no want to get rid of it. When fs
>> > corruption appears, then it's bad in any case and the fsck should be
>> > able to more or less fix it.
>> >
>> > So you're saying that when corruption appears, then you *have to* blast
>> > all snapshots ? I am not sure how btrfs is going to deal with it, but it
>> > does seem like an advantage at all, why are you presenting it as such ?
>> >
>>
>> Hi Lukas,
>>
>> First of all, thank you for being strict with me.
>> I admit to having lousy marketing skills...
>>
>> The market I am targeting are the sys admins who
>> are very cautious about their 'data' and are reluctant
>> therefor to migrate from ext3 to ext4, not to speak of
>> btrfs.
>
> Well, that's why I am concerned with merging the ext4 snapshots. This is
> exactly the reason why people will get nervous when you try to push a
> huge change like ext4 snapshots into the stable code base. Yes, when you
> do not compile it in, it does not affect the fs very much, but try to
> tell people that ext4 is not the old-good-stable-ext4 when you enable
> this feature. And I do not believe that snapshot code does not interfere
> with the old ext4 code paths, so there is a place for horrible bugs
> waiting for us.
>
>>
>> To this market I say, you can have snapshots of your
>> 'data' on ext4 without risking the proven stability of ext4.
>> The snapshots of the 'data' are not guarantied to be as
>> stable (being a new feature), but because the snapshots
>> are second to 'data' in ext4 snapshots, corrupted snapshots
>> will not risk the 'data'.
>>
>> During 1 year of next3 in production systems, we found bugs.
>> But none of the bugs corrupted 'data'. All of the bugs which
>> caused file system to contain errors, the errors were restricted
>> to snapshot files and in those worst cases, we could always
>> go to emergency plan B (plan A being fsck -p) and run fsck -x
>> which always solved the problem.
>
> It does not matter that much how long or how much your embedded
> production systems are out there. The fact is that it is really very
> limited work load variation, hence very limited testing.

for the record, the embedded systems are x86_64 dual core,
but yes, it's true that the load variation is limited.
I am not saying there are no bugs, I'm just saying the 'fail safe'
always worked.

>
>>
>> The customer was always consulted before resorting to 'plan B'
>> and was given the chance to copy out 'data' from the snapshots
>> (it was always possible) before we discard them.
>
> So it is true, when you have an fs problem (corruption) you have to
> blast off all your snapshots ?

No, most of the time the problem could be solved by fsck -p
without discarding snapshots.
Only for the really hard cases, we had to discard the snapshots.

>
>>
>> Needless to say, the said bugs were fixed and ext4 snapshots
>> will enjoy the stability of next3 and the 'fail safe' nature of the
>> solution, which was proven several times on the field.
>>
>>
>> >>
>> >> >
>> >> > There is some confusion among developers on what actually are benefits
>> >> > of ext4 snapshots in comparison to btrfs, or in comparison to the new
>> >> > dm_multisnap code. I know that you have done quite a lot of testing to
>> >> > assure that it does not actually change old ext4 behavior when snapshot
>> >> > disabled, and that it works well when enabled, but have you done any
>> >> > performance related benchmarks ? Do you have any expectations on how it
>> >> > should behave in different work loads ?
>> >> >
>> >> > It would be great to see and be able to confirm that ext4 snapshots are
>> >> > really a win, not only on the feature side, but on the performance side
>> >> > as well. I know that there are people out there still undecided or
>> >> > having a strange feeling about your snapshot work. But who can blame
>> >> > them, when we have not seen any hard data on this matter ?
>> >>
>> >> Ehm.. I did present this benchmark on LSF:
>> >> http://global.phoronix-test-suite.com/index.php?k=profile&u=amir73il-4632-11284-26560
>> >>
>> >> unless you snoozed ;-)
>> >> it shows performance vs. ext4 w/o snapshots and with snapshots
>> >> and while taking snapshots.
>> >
>> > I believe that you just missed the fact that not everyone has attended LSF
>> > and your lightning talk, but that's ok.
>>
>> That's not really OK. I should have posted the results
>> and analysis on my wiki (the results are there).
>>
>> >
>> > It seems to me that random writes are usually faster with you snapshot
>> > code regardless whether you use snapshots or not. Is that because of
>> > non snapshot related changes you've made ?
>>
>> Not that I know of.
>> I can explain why random write onesnap is faster than nosnap
>> and why 1snappermin is faster than onesnap, but I am not
>> sure about nosnap vs. plain ext4.
>>
>> >
>> > Also random reads seems to be slower with snapshots, is suspect that
>> > this is because of read through, so the reason for the slowdown that it
>> > was CPU bound ? I do not see any CPU utilization data.
>> >
>>
>> Only the 1snappermin is slower.
>> I suspect it has to do with the fs freezes, but I admin I have not
>> looked into it.
>>
>> > The postmark results seems quite odd, it is actually a lot faster with
>> > one snapshot and a lot slower with multiple snapshots, do you have an
>> > idea what is going on ?
>> >
>>
>> The name onesnap is misleading. It should have been
>> existingsnaps.
>> The important factor is whether or not snapshots are taken during the test.
>> In the 1snappermin case, postmark is the only test that exposes the
>> weak spot of ext4 snapshots performance - deletes/truncates.
>> create file+delete file with existing snapshots has no overhead (no COW).
>> create file+take snapshot+delete file has the overhead of moving the
>> deleted blocks to snapshot.
>> With regards to speed up of onesnap, postmark is randomizing the file
>> creates/write so it may be a similar effect to random write.
>> I did not investigate this.
>>
>> >> I did not compare with btrfs, but I bet there are ext4 vs. btrfs
>> >> benchmarks out there.
>> >> dm-multisnap is better than dm-snap only when it comes to overhead
>> >> per snapshot. it still copies every written block, which is far from
>> >> being the case in ext4 snapshots.
>> >
>> > Nevertheless, I still have not seen any comparison with other
>> > snapshotting possibilities we have. Note that ext4 to btrfs comparison
>> > is not enough, because we do not know what is the difference between
>> > the difference of ext4 with/without snapshots and btrfs with/without
>> > snapshots. The reason for this is that btrfs performance is very likely
>> > to scale up, but ext4 is pretty much done in that matter and I do not
>> > expect any huge performance leaps in the future.
>> >
>> > Also, rejecting dm-multisnap based on this statement is not enough, show
>> > us some numbers.
>>
>> Well, if you come to understand the difference between fs level an dm
>> level snapshots, you will see why i am rejecting dm-multisnap
>> (performance wise only!).
>
> But I do understand the difference. And also, when it comes to fs level
> snapshotting I would suspect that it would do something we can not do
> with the current solutions, for example per-file or per-directory snapshots,
> cat ext4 snapshots do that ?

Nope.

>
>>
>> Anyway #1: I have already answered this questions 2 years ago and I
>> think the answers are still valid both for LVM and btrfs:
>> http://sourceforge.net/apps/mediawiki/next3/index.php?title=FAQ#Why_use_Next3_snapshots_and_not_LVM_snapshots.3F
>
> But again, it was two years ago and even back then you have not had any
> numbers proving your statements.
>
>>
>> Anyway #2: I need to give you some numbers ;-)
>
> That would be great. Thanks!
>
>>
>> >
>> > I believe that it is not very convenient for you, because this feature
>> > support your business case and you do not necessarily want to find out
>> > that there might be a better way, especially after the work you have
>> > done already.
>>
>> Your analysis of my motives is correct :-)
>> The use of the term 'better way' I reject.
>> I think that ext4/btrfs/LVM snapshots are apples and oranges and hamburgers.
>
> But they are really not, because otherwise it would complement each
> other, but they are all trying to do the same thing, except btrfs has
> it for free.

apples and oranges don't complement each other.
they are (non-equal) alternatives.

>
>> The question of whether the world needs ext4 snapshots is
>> perfectly valid, but going back to the food analogy, I think it's
>> a case of "the proof of the pudding is in the eating".
>> I have no doubt that if ext4 snapshots are merged, many people will use it.
>
> Well, I would like to have your confidence. Why do you think so ? They
> will use it for what ? Doing backups ? We can do this easily with LVM
> without any risk of compromising existing filesystem at all. On desktop

LVM snapshots are not meant to be long lived snapshots.
As temporary snapshots they are fine, but with ext4 snapshots
you can easily retain monthly/weekly snapshots without the
need to allocate the space for it in advance and without the
'vanish' quality of LVM snapshots.

> ? I very much doubt that since you can not do per directory (or per
> file) snapshots, can you ?

No, I can't.

>
>> And I think that is a good enough (if not the best)
>> reason for inclusion.
>
> It would be of course, except you're the only one saying that.
>

I had several people approaching me that found the feature interesting
for their application. Some are developers I met on LSF, some are
users that found next3 interesting. One distro (OpenNode) has even
announced support for next3.

The incremental filesystem backup (ala ZFS send/recv) is a 'killer app'
in my opinion (and in the opinion of sys admins that use ZFS).
Ext4 snapshots enables that technology.

Amir.

2011-06-08 16:20:18

by Mike Snitzer

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Wed, Jun 8, 2011 at 11:59 AM, Amir G. <[email protected]> wrote:
> On Wed, Jun 8, 2011 at 6:38 PM, Lukas Czerner <[email protected]> wrote:
>> Amir said:

>>> The question of whether the world needs ext4 snapshots is
>>> perfectly valid, but going back to the food analogy, I think it's
>>> a case of "the proof of the pudding is in the eating".
>>> I have no doubt that if ext4 snapshots are merged, many people will use it.
>>
>> Well, I would like to have your confidence. Why do you think so ? They
>> will use it for what ? Doing backups ? We can do this easily with LVM
>> without any risk of compromising existing filesystem at all. On desktop
>
> LVM snapshots are not meant to be long lived snapshots.
> As temporary snapshots they are fine, but with ext4 snapshots
> you can easily retain monthly/weekly snapshots without the
> need to allocate the space for it in advance and without the
> 'vanish' quality of LVM snapshots.

In that old sf.net wiki you say:
Why use Next3 snapshots and not LVM snapshots?
* Performance: only small overhead to write performance with snapshots

Fair claim against current LVM snapshot (but not multisnap).

In this thread you're being very terse on the performance hit you
assert multisnap has that ext4 snapshots does not. Can you please be
more specific?

In your most recent post it seems you're focusing on "LVM snapshots"
and attributing the deficiencies of old-style LVM snapshots
(non-shared exception store causing N-way copy-out) to dm-multisnap?

Again, nobody will dispute that the existing dm-snapshot target has
poor performance that requires snapshots be short-lived. But
multisnap does _not_ suffer from those performance problems.

Mike

2011-06-09 01:59:56

by Yongqiang Yang

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

> But I do understand the difference. And also, when it comes to fs level
> snapshotting I would suspect that it would do something we can not do
> with the current solutions, for example per-file or per-directory snapshots,
> cat ext4 snapshots do that ?
Hi Lukas,

I noticed that there is no answer to this question in the thread. I
can give the question the answer that ext4 can snapshot per-file or
per-directory, and can exclude some files or directories from being
snapshotted.

--
Best Wishes
Yongqiang Yang

2011-06-09 03:18:15

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>> But I do understand the difference. And also, when it comes to fs level
>> snapshotting I would suspect that it would do something we can not do
>> with the current solutions, for example per-file or per-directory snapshots,
>> cat ext4 snapshots do that ?
> Hi Lukas,
>
> I noticed that there is no answer to this question in the thread. ?I

I think I answered this question with No it can't ;-)

> can give the question the answer that ext4 can snapshot per-file or
> per-directory, and can exclude some files or directories from being
> snapshotted.
>

So the full answer is that ext4 snapshot CAN exclude
certain files/dirs from snapshot, but this feature is not fully implemented yet
(I have it in a dev branch)

> --
> Best Wishes
> Yongqiang Yang
>

2011-06-09 03:51:11

by Yongqiang Yang

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
> On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>>> But I do understand the difference. And also, when it comes to fs level
>>> snapshotting I would suspect that it would do something we can not do
>>> with the current solutions, for example per-file or per-directory snapshots,
>>> cat ext4 snapshots do that ?
>> Hi Lukas,
>>
>> I noticed that there is no answer to this question in the thread. ?I
>
> I think I answered this question with No it can't ;-)
I think this can be implemented easily by chattr and adding check in
should_snapshot() or should_move_data().

And I thought Lukas are focusing on if ext4-snapshots can do this
easily. So i said YES:-)

>
>> can give the question the answer that ext4 can snapshot per-file or
>> per-directory, and can exclude some files or directories from being
>> snapshotted.
>>
>
> So the full answer is that ext4 snapshot CAN exclude
> certain files/dirs from snapshot, but this feature is not fully implemented yet
> (I have it in a dev branch)
>
>> --
>> Best Wishes
>> Yongqiang Yang
>>
>

--
Best Wishes
Yongqiang Yang

2011-06-09 06:50:52

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 9 Jun 2011, Yongqiang Yang wrote:

> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
> >>> But I do understand the difference. And also, when it comes to fs level
> >>> snapshotting I would suspect that it would do something we can not do
> >>> with the current solutions, for example per-file or per-directory snapshots,
> >>> cat ext4 snapshots do that ?
> >> Hi Lukas,
> >>
> >> I noticed that there is no answer to this question in the thread. ?I
> >
> > I think I answered this question with No it can't ;-)
> I think this can be implemented easily by chattr and adding check in
> should_snapshot() or should_move_data().
>
> And I thought Lukas are focusing on if ext4-snapshots can do this
> easily. So i said YES:-)

Cool, finally something interesting :). So, how it'll work ? Does that
require any format changes again:) ? Can you exclude the whole root and
then selectively pick the directories or files you are interested in ?

How does rollback work with ext4 snapshots ? Can you selectively roll
back one file, or the whole directory subtree even when you're
snapshotting more ?

You see, when it comes to the full fs snapshots I am not convinced that
it is *very* useful, yes it might have some users, but you can alway
take the safe way and do lvm snapshots (or better use the new multisnap)
for backup, without need to modify stable filesystem code.

Also, I do not buy the whole argument of "not have to create separate disk
space for snapshot". It is actually better for sysadmins, because you
have perfect control on what is going on, how much space is used for
your snapshots and how much is used by your data. You can always easily
extend the snapshot volume, or let it die silently when it is too old
and too big.

How does it actually work on ext4 snapshots ? When you're going to
rewrite a file, you will never know how much disk space it'll take in
advance, am I right ? Is the filesystem accounting for the snapshot size
as well ? or is it hidden ?

Thanks!
-Lukas

>
> >
> >> can give the question the answer that ext4 can snapshot per-file or
> >> per-directory, and can exclude some files or directories from being
> >> snapshotted.
> >>
> >
> > So the full answer is that ext4 snapshot CAN exclude
> > certain files/dirs from snapshot, but this feature is not fully implemented yet
> > (I have it in a dev branch)
> >
> >> --
> >> Best Wishes
> >> Yongqiang Yang
> >>
> >
>
>
>
>

2011-06-09 07:57:16

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
> On Thu, 9 Jun 2011, Yongqiang Yang wrote:
>
>> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
>> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>> >>> But I do understand the difference. And also, when it comes to fs level
>> >>> snapshotting I would suspect that it would do something we can not do
>> >>> with the current solutions, for example per-file or per-directory snapshots,
>> >>> cat ext4 snapshots do that ?
>> >> Hi Lukas,
>> >>
>> >> I noticed that there is no answer to this question in the thread. ?I
>> >
>> > I think I answered this question with No it can't ;-)
>> I think this can be implemented easily by chattr and adding check in
>> should_snapshot() or should_move_data().
>>
>> And I thought Lukas are focusing on if ext4-snapshots can do this
>> easily. ?So i said YES:-)
>
> Cool, finally something interesting :). So, how it'll work ? Does that
> require any format changes again:) ? Can you exclude the whole root and
> then selectively pick the directories or files you are interested in ?

The design is actually very simple and not as powerful as you
probably desire.
I hate to get into the design of future features, when we haven't
even ACKed the current feature yet, but since you're the only one
did any review, I owe you that much ;-)

To exclude a file from snapshot it needs to have the NOCOW_FL flag.
Ironically, btrfs have already added that flag in parallel to me (for the
same purpose) so the flag it is already reserved in the code :-)

To avoid some transition issues and keep it really simple,
I disallow changing the NOCOW_FL
for regular file and only allow to change it for directories.
The NOCOW_FL is inherited from the parent directory,
so setting/clearing the flag on a directory means:
"All files/subdirs will be created excluded/not-excluded from now on".

Inside the snapshot image, excluded directories, which are not really
excluded, show normally, but excluded files are shown with zero length,
because making the files disappear is hard, but their blocks may have already
been reused, so we cannot allow access to them.

>
> How does rollback work with ext4 snapshots ? Can you selectively roll
> back one file, or the whole directory subtree even when you're
> snapshotting more ?

So there is actually no inherent "rollback" feature, not for a file/dir
and not for the entire fs.
It's a drawback of ext4 snapshots, but hey, cp/rsync from snapshot
still works for file/dir ;-)
As for full "fs" rollback. A revert tool has been developed (by students),
which requires an external storage to export the "revert patch".
This tool is going to be enhanced to use LVM snapshot storage
and LVM --merge option to implement ext4 "revert to snapshot" with Yum.

>
> You see, when it comes to the full fs snapshots I am not convinced that
> it is *very* useful, yes it might have some users, but you can alway
> take the safe way and do lvm snapshots (or better use the new multisnap)
> for backup, without need to modify stable filesystem code.
>

You think like a developer. Try talking to some sys admins.
Especially ones that worked with Solaris/ZFS or NetApp.
See what they think about snapshots and about the LVM alternative...
Snapshots have addictive qualities. Ones you've used them, you can't
go back to not having them.
Imagine how people used to live before the 'Undo' button and imagine
that your employer forced you to use an editor without an Undo button.
This is the kind of feedback I got from sys admins that moved from Solaris
to Linux.

> Also, I do not buy the whole argument of "not have to create separate disk
> space for snapshot". It is actually better for sysadmins, because you
> have perfect control on what is going on, how much space is used for
> your snapshots and how much is used by your data. You can always easily
> extend the snapshot volume, or let it die silently when it is too old
> and too big.
>

Seriously, Lukas, talk to sys admins.
Letting the snapshot die silently is the worst possible thing that a snapshots
implementation can do (for long lived snapshots).

> How does it actually work on ext4 snapshots ? When you're going to
> rewrite a file, you will never know how much disk space it'll take in
> advance, am I right ? Is the filesystem accounting for the snapshot size
> as well ? or is it hidden ?

It's not hidden, it's accounted for as a regular file (usually owned by root).
You need a bit of scripting to gather the disk space used by snapshots (du).

In ANY snapshots implementation, you can get ENOSPC on operations,
which traditionally could not produce this error.
This statement is also true for thin provisioning implementations.
The question is how the implementation handles these situations.

What I came to realize on LSF, is that my implementation is the only
one (of LVM and btrfs) that tries to deal with the ENOSPC issue and
does a good job most of the time.

I deal with it by reserving space for metadata COW on snapshot
take, so if a future ENOSPC during metadata COW is possible,
snapshot take will fail with ENOSPC.

As for ENOSPC during regular file rewrite, that's not such a big problem.
The application simply gets ENOSPC as if the file was sparse to begin
with. It may not be pleasant if the application have fallocated the space
and used mmap/close without msync...
The only way I see around this issue is reserving space on mmap time
(and returning ENOSPC at that time), but again, this issue is shared
with btrfs, but is easier to fix (I think) with ext4 snapshots.

>
> Thanks!
> -Lukas
>
>>
>> >
>> >> can give the question the answer that ext4 can snapshot per-file or
>> >> per-directory, and can exclude some files or directories from being
>> >> snapshotted.
>> >>
>> >
>> > So the full answer is that ext4 snapshot CAN exclude
>> > certain files/dirs from snapshot, but this feature is not fully implemented yet
>> > (I have it in a dev branch)
>> >
>> >> --
>> >> Best Wishes
>> >> Yongqiang Yang
>> >>
>> >
>>
>>
>>
>>

2011-06-09 08:13:37

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 9 Jun 2011, Amir G. wrote:

> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
>> On Thu, 9 Jun 2011, Yongqiang Yang wrote:
>>
>>> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
>>>> On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>>
>> You see, when it comes to the full fs snapshots I am not convinced that
>> it is *very* useful, yes it might have some users, but you can alway
>> take the safe way and do lvm snapshots (or better use the new multisnap)
>> for backup, without need to modify stable filesystem code.
>>
>
> You think like a developer. Try talking to some sys admins.
> Especially ones that worked with Solaris/ZFS or NetApp.
> See what they think about snapshots and about the LVM alternative...
> Snapshots have addictive qualities. Ones you've used them, you can't
> go back to not having them.
> Imagine how people used to live before the 'Undo' button and imagine
> that your employer forced you to use an editor without an Undo button.
> This is the kind of feedback I got from sys admins that moved from Solaris
> to Linux.

as a sysadmin, it's a _wonderful_ tool to have for any system that has
people editing/saving files on directly.

>
>> Also, I do not buy the whole argument of "not have to create separate disk
>> space for snapshot". It is actually better for sysadmins, because you
>> have perfect control on what is going on, how much space is used for
>> your snapshots and how much is used by your data. You can always easily
>> extend the snapshot volume, or let it die silently when it is too old
>> and too big.
>>
>
> Seriously, Lukas, talk to sys admins.
> Letting the snapshot die silently is the worst possible thing that a snapshots
> implementation can do (for long lived snapshots).

that depends on the site policy.

sometimes it is better to loose snapshots than to run out of disk space
and halt the system, sometimes you would rather halt the system.

the policy of what happens when you run out of space should not be a
kernel decision, the desired behavior varies far too much.

this includes being able to say things like "I want to always have 10% of
my disk allocated to snapshots, but if there's more free space, go ahead
and use it, but always keep at least 10% of the disk free so that you
don't have to halt new writes while you clear space"

or

"if you run out of space, try and keep the oldest snapshot and the newest
snapshot, delete other snapshots in between before touching either of
these"

>> How does it actually work on ext4 snapshots ? When you're going to
>> rewrite a file, you will never know how much disk space it'll take in
>> advance, am I right ? Is the filesystem accounting for the snapshot size
>> as well ? or is it hidden ?
>
> It's not hidden, it's accounted for as a regular file (usually owned by root).
> You need a bit of scripting to gather the disk space used by snapshots (du).

the worst case when you re-write a file is that it will take the full
amount of space that the file currently takes (as if you wrote a new copy
of the file and some process had a filehandle open on the old copy,
preventing the space from being reclaimed, so it's far from being a new
problem)

see the note above about the need to be able to remove snapshots when you
are out of space.

since snapshots tend to be small compared to the filesystems they protect
(not in all cases, but if you are covering the entire system with one
snapshot that would be the way to bet), having the ability to put the
snapshot metadata off on a smaller/faster disk would be helpful.

having the ability to snapshot just specific files/directories would be a
killer feature IMHO

David Lang

2011-06-09 08:46:42

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 9 Jun 2011, Amir G. wrote:

> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
> > On Thu, 9 Jun 2011, Yongqiang Yang wrote:
> >
> >> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
> >> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
> >> >>> But I do understand the difference. And also, when it comes to fs level
> >> >>> snapshotting I would suspect that it would do something we can not do
> >> >>> with the current solutions, for example per-file or per-directory snapshots,
> >> >>> cat ext4 snapshots do that ?
> >> >> Hi Lukas,
> >> >>
> >> >> I noticed that there is no answer to this question in the thread. ?I
> >> >
> >> > I think I answered this question with No it can't ;-)
> >> I think this can be implemented easily by chattr and adding check in
> >> should_snapshot() or should_move_data().
> >>
> >> And I thought Lukas are focusing on if ext4-snapshots can do this
> >> easily. ?So i said YES:-)
> >
> > Cool, finally something interesting :). So, how it'll work ? Does that
> > require any format changes again:) ? Can you exclude the whole root and
> > then selectively pick the directories or files you are interested in ?
>
> The design is actually very simple and not as powerful as you
> probably desire.
> I hate to get into the design of future features, when we haven't
> even ACKed the current feature yet, but since you're the only one
> did any review, I owe you that much ;-)

Thanks Amir!

You have to understand that I am still not convinced that ext4 snapshot
in its current state is really what we want to have in ext4. Especially
given the very basic features it provides, without any knowledge on how
it can be extended (but you're slowly providing that information, so
thanks for that). And especially facing the new dm-multisnap, I really
wonder if it is worth it.

If we want filesystem level snapshotting we can try to do it right with
all the benefits that snapshots on that level brings. But what I see
now, is not even remotely the case. And I have the feeling that all the
features that might be interesting for snapshotting at file system
level, are just a hack and not inherent from the design. But that is
probably because your goal was to snapshot the whole filesystem for the
backup purposes, but that's not what I would expect from fs level
snapshots. I really hope you understand my point.

>
> To exclude a file from snapshot it needs to have the NOCOW_FL flag.
> Ironically, btrfs have already added that flag in parallel to me (for the
> same purpose) so the flag it is already reserved in the code :-)
>
> To avoid some transition issues and keep it really simple,
> I disallow changing the NOCOW_FL
> for regular file and only allow to change it for directories.
> The NOCOW_FL is inherited from the parent directory,
> so setting/clearing the flag on a directory means:
> "All files/subdirs will be created excluded/not-excluded from now on".
>
> Inside the snapshot image, excluded directories, which are not really
> excluded, show normally, but excluded files are shown with zero length,
> because making the files disappear is hard, but their blocks may have already
> been reused, so we cannot allow access to them.
>
> >
> > How does rollback work with ext4 snapshots ? Can you selectively roll
> > back one file, or the whole directory subtree even when you're
> > snapshotting more ?
>
> So there is actually no inherent "rollback" feature, not for a file/dir
> and not for the entire fs.
> It's a drawback of ext4 snapshots, but hey, cp/rsync from snapshot
> still works for file/dir ;-)
> As for full "fs" rollback. A revert tool has been developed (by students),
> which requires an external storage to export the "revert patch".
> This tool is going to be enhanced to use LVM snapshot storage
> and LVM --merge option to implement ext4 "revert to snapshot" with Yum.

And that is the problem. Because at this level you should be able to do
it without very much trouble, because being at file system level you
should have all the information. Do not get me wrong, I am not saying
that this is easy, but is should be "from design". Exporting the
"revert patch" to the external storage, or exporting snapshot to LVM
format to be able to merge it...that is all just hacks, because the
design itself does not count with that possibility.

>
> >
> > You see, when it comes to the full fs snapshots I am not convinced that
> > it is *very* useful, yes it might have some users, but you can alway
> > take the safe way and do lvm snapshots (or better use the new multisnap)
> > for backup, without need to modify stable filesystem code.
> >
>
> You think like a developer. Try talking to some sys admins.
> Especially ones that worked with Solaris/ZFS or NetApp.
> See what they think about snapshots and about the LVM alternative...
> Snapshots have addictive qualities. Ones you've used them, you can't
> go back to not having them.
> Imagine how people used to live before the 'Undo' button and imagine
> that your employer forced you to use an editor without an Undo button.
> This is the kind of feedback I got from sys admins that moved from Solaris
> to Linux.

Exactly, so if we want fs level snapshots, it should use that
privilege no hack its way to do things like roll back, or
excludes+includes. Ext4 was not meant to work that way, nor was your
snapshots designed to work that way. If we are considering backups only,
because that is what you ext4 snaphosts can provide now, I would prefer
to use LVM. But yes, we all need to know how the new multisnap works
out.

>
>
> > Also, I do not buy the whole argument of "not have to create separate disk
> > space for snapshot". It is actually better for sysadmins, because you
> > have perfect control on what is going on, how much space is used for
> > your snapshots and how much is used by your data. You can always easily
> > extend the snapshot volume, or let it die silently when it is too old
> > and too big.
> >
>
> Seriously, Lukas, talk to sys admins.
> Letting the snapshot die silently is the worst possible thing that a snapshots
> implementation can do (for long lived snapshots).

Oh, no you misunderstood. Even with your snapshots you'll have to delete
old snapshots someday, because otherwise you'll run out of space. With
LVM however, you have prereserved space for it, so even if your snapshot
volume gets full, it does not affect your filesystem what so ever. And,
as a administrator, you can decide whether to extend the snapshot volume
to let it live longer, or just let it be and it will die eventually.

And, as far as I know, the new multisnap will notify the admin when the
snapshot volume approaches the watermark the same way that for example
thinly provisioned storage would do. But again, with your snapshots it
will give you ENOSPC when the snapshot grow too big, and at the end
of the day, you need to create data to be able to backup it:), so having
snapshots separate from your fs volume makes sense.

>
>
> > How does it actually work on ext4 snapshots ? When you're going to
> > rewrite a file, you will never know how much disk space it'll take in
> > advance, am I right ? Is the filesystem accounting for the snapshot size
> > as well ? or is it hidden ?
>
> It's not hidden, it's accounted for as a regular file (usually owned by root).
> You need a bit of scripting to gather the disk space used by snapshots (du).
>
> In ANY snapshots implementation, you can get ENOSPC on operations,
> which traditionally could not produce this error.
> This statement is also true for thin provisioning implementations.
> The question is how the implementation handles these situations.
>
> What I came to realize on LSF, is that my implementation is the only
> one (of LVM and btrfs) that tries to deal with the ENOSPC issue and
> does a good job most of the time.
>
> I deal with it by reserving space for metadata COW on snapshot
> take, so if a future ENOSPC during metadata COW is possible,
> snapshot take will fail with ENOSPC.
>
> As for ENOSPC during regular file rewrite, that's not such a big problem.
> The application simply gets ENOSPC as if the file was sparse to begin
> with. It may not be pleasant if the application have fallocated the space
> and used mmap/close without msync...
> The only way I see around this issue is reserving space on mmap time
> (and returning ENOSPC at that time), but again, this issue is shared
> with btrfs, but is easier to fix (I think) with ext4 snapshots.

Yes, I do understand that ext4 snaphosts are doing well in that aspect,
but as I said, having snapshots separate from your file system gives
you advantage of not running into ENOSPC on your file system until you
really fill it with data.

Granted, I have to take a look at the multisnap code, to see what it can
do and compare it with ext4 snapshots, because really, if it is good
enough and you will be able to do snapshotting backups as you do with
your approach, I do not see the reason why to complicate our life in
ext4.

Thanks!
-Lukas

2011-06-09 10:06:41

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 11:13 AM, <[email protected]> wrote:
> On Thu, 9 Jun 2011, Amir G. wrote:
>
>> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
>>>
>>> On Thu, 9 Jun 2011, Yongqiang Yang wrote:
>>>
>>>> On Thu, Jun 9, 2011 at 11:18 AM, Amir G.
>>>> <[email protected]> wrote:
>>>>>
>>>>> On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]>
>>>>> wrote:
>>>
>>> You see, when it comes to the full fs snapshots I am not convinced that
>>> it is *very* useful, yes it might have some users, but you can alway
>>> take the safe way and do lvm snapshots (or better use the new multisnap)
>>> for backup, without need to modify stable filesystem code.
>>>
>>
>> You think like a developer. Try talking to some sys admins.
>> Especially ones that worked with Solaris/ZFS or NetApp.
>> See what they think about snapshots and about the LVM alternative...
>> Snapshots have addictive qualities. Ones you've used them, you can't
>> go back to not having them.
>> Imagine how people used to live before the 'Undo' button and imagine
>> that your employer forced you to use an editor without an Undo button.
>> This is the kind of feedback I got from sys admins that moved from Solaris
>> to Linux.
>
> as a sysadmin, it's a _wonderful_ tool to have for any system that has
> people editing/saving files on directly.

Thank you david. Finally some positive feedback from the people
for whom the feature is intended for :-)

>
>>
>>> Also, I do not buy the whole argument of "not have to create separate
>>> disk
>>> space for snapshot". It is actually better for sysadmins, because you
>>> have perfect control on what is going on, how much space is used for
>>> your snapshots and how much is used by your data. You can always easily
>>> extend the snapshot volume, or let it die silently when it is too old
>>> and too big.
>>>
>>
>> Seriously, Lukas, talk to sys admins.
>> Letting the snapshot die silently is the worst possible thing that a
>> snapshots
>> implementation can do (for long lived snapshots).
>
> that depends on the site policy.
>
> sometimes it is better to loose snapshots than to run out of disk space and
> halt the system, sometimes you would rather halt the system.
>
> the policy of what happens when you run out of space should not be a kernel
> decision, the desired behavior varies far too much.
>
> this includes being able to say things like "I want to always have 10% of my
> disk allocated to snapshots, but if there's more free space, go ahead and
> use it, but always keep at least 10% of the disk free so that you don't have
> to halt new writes while you clear space"
>
> or
>
> "if you run out of space, try and keep the oldest snapshot and the newest
> snapshot, delete other snapshots in between before touching either of these"
>

I fully agree.
AFAIK, there is no user space tool to manage snapshots to this level for Linux.
The only snapshot manager I know about is snapper:
http://en.opensuse.org/Portal:Snapper, which we are working on adding
ext4 snapshots support to.
Snapper does not have the free space based policy to the best of my knowledge,
but it could be improved to monitor free disk space.

A tool like that does not need any further kernel changes from
ext4 and btrfs to implement the policies suggested above.
However, with LVM snapshots, some of these policies (like use whatever space you
have free in the filesystem) are simply not possible.

>>> How does it actually work on ext4 snapshots ? When you're going to
>>> rewrite a file, you will never know how much disk space it'll take in
>>> advance, am I right ? Is the filesystem accounting for the snapshot size
>>> as well ? or is it hidden ?
>>
>> It's not hidden, it's accounted for as a regular file (usually owned by
>> root).
>> You need a bit of scripting to gather the disk space used by snapshots
>> (du).
>
> the worst case when you re-write a file is that it will take the full amount
> of space that the file currently takes (as if you wrote a new copy of the
> file and some process had a filehandle open on the old copy, preventing the
> space from being reclaimed, so it's far from being a new problem)

No. it's a new problem.
When you have a large db, which does random writes to an exiting db file,
it does not expect ENOSPC, when updating an existing record or index.
Only by keeping enough free disk space in the system at all times, can you
avoid this kind of problems.

>
> see the note above about the need to be able to remove snapshots when you
> are out of space.
>
> since snapshots tend to be small compared to the filesystems they protect
> (not in all cases, but if you are covering the entire system with one
> snapshot that would be the way to bet), having the ability to put the
> snapshot metadata off on a smaller/faster disk would be helpful.

Helpful for which workload?
For reading from snapshots? Yes, that would be faster.
For writing to the filesystem? I demonstrated that the performance
overhead is near zero.

>
> having the ability to snapshot just specific files/directories would be a
> killer feature IMHO

I agree to that, but I don't think the ext4 will be able to provide
that to the full extent.

>
> David Lang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-06-09 10:17:42

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 9 Jun 2011, Amir G. wrote:

> On Thu, Jun 9, 2011 at 11:13 AM, <[email protected]> wrote:
> > On Thu, 9 Jun 2011, Amir G. wrote:
> >
> >> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
> >>>
> >>> On Thu, 9 Jun 2011, Yongqiang Yang wrote:
> >>>
> >>>> On Thu, Jun 9, 2011 at 11:18 AM, Amir G.
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]>
> >>>>> wrote:
> >>>
> >>> You see, when it comes to the full fs snapshots I am not convinced that
> >>> it is *very* useful, yes it might have some users, but you can alway
> >>> take the safe way and do lvm snapshots (or better use the new multisnap)
> >>> for backup, without need to modify stable filesystem code.
> >>>
> >>
> >> You think like a developer. Try talking to some sys admins.
> >> Especially ones that worked with Solaris/ZFS or NetApp.
> >> See what they think about snapshots and about the LVM alternative...
> >> Snapshots have addictive qualities. Ones you've used them, you can't
> >> go back to not having them.
> >> Imagine how people used to live before the 'Undo' button and imagine
> >> that your employer forced you to use an editor without an Undo button.
> >> This is the kind of feedback I got from sys admins that moved from Solaris
> >> to Linux.
> >
> > as a sysadmin, it's a _wonderful_ tool to have for any system that has
> > people editing/saving files on directly.
>
> Thank you david. Finally some positive feedback from the people
> for whom the feature is intended for :-)

No one is arguing about the advantages of snapshots. I think that we are
all clear on this. Snapshots are useful.

>
> >
> >>
> >>> Also, I do not buy the whole argument of "not have to create separate
> >>> disk
> >>> space for snapshot". It is actually better for sysadmins, because you
> >>> have perfect control on what is going on, how much space is used for
> >>> your snapshots and how much is used by your data. You can always easily
> >>> extend the snapshot volume, or let it die silently when it is too old
> >>> and too big.
> >>>
> >>
> >> Seriously, Lukas, talk to sys admins.
> >> Letting the snapshot die silently is the worst possible thing that a
> >> snapshots
> >> implementation can do (for long lived snapshots).
> >
> > that depends on the site policy.
> >
> > sometimes it is better to loose snapshots than to run out of disk space and
> > halt the system, sometimes you would rather halt the system.
> >
> > the policy of what happens when you run out of space should not be a kernel
> > decision, the desired behavior varies far too much.
> >
> > this includes being able to say things like "I want to always have 10% of my
> > disk allocated to snapshots, but if there's more free space, go ahead and
> > use it, but always keep at least 10% of the disk free so that you don't have
> > to halt new writes while you clear space"
> >
> > or
> >
> > "if you run out of space, try and keep the oldest snapshot and the newest
> > snapshot, delete other snapshots in between before touching either of these"
> >
>
> I fully agree.
> AFAIK, there is no user space tool to manage snapshots to this level for Linux.
> The only snapshot manager I know about is snapper:
> http://en.opensuse.org/Portal:Snapper, which we are working on adding
> ext4 snapshots support to.
> Snapper does not have the free space based policy to the best of my knowledge,
> but it could be improved to monitor free disk space.
>
> A tool like that does not need any further kernel changes from
> ext4 and btrfs to implement the policies suggested above.
> However, with LVM snapshots, some of these policies (like use whatever space you
> have free in the filesystem) are simply not possible.

And why is that ? With LVM you can shrink or extent volumes at will, I
do not think this is a problem at all, moreover, you can always add more
drives to resize your existing volumes to.

>
>
> >>> How does it actually work on ext4 snapshots ? When you're going to
> >>> rewrite a file, you will never know how much disk space it'll take in
> >>> advance, am I right ? Is the filesystem accounting for the snapshot size
> >>> as well ? or is it hidden ?
> >>
> >> It's not hidden, it's accounted for as a regular file (usually owned by
> >> root).
> >> You need a bit of scripting to gather the disk space used by snapshots
> >> (du).
> >
> > the worst case when you re-write a file is that it will take the full amount
> > of space that the file currently takes (as if you wrote a new copy of the
> > file and some process had a filehandle open on the old copy, preventing the
> > space from being reclaimed, so it's far from being a new problem)
>
> No. it's a new problem.
> When you have a large db, which does random writes to an exiting db file,
> it does not expect ENOSPC, when updating an existing record or index.
> Only by keeping enough free disk space in the system at all times, can you
> avoid this kind of problems.

You can very well avoid this kind of problems when you separate
filesystem and snapshots, that is what LVM can do easily.

>
> >
> > see the note above about the need to be able to remove snapshots when you
> > are out of space.
> >
> > since snapshots tend to be small compared to the filesystems they protect
> > (not in all cases, but if you are covering the entire system with one
> > snapshot that would be the way to bet), having the ability to put the
> > snapshot metadata off on a smaller/faster disk would be helpful.

Very easy to do with dm-multisnap.

>
> Helpful for which workload?
> For reading from snapshots? Yes, that would be faster.
> For writing to the filesystem? I demonstrated that the performance
> overhead is near zero.
>
> >
> > having the ability to snapshot just specific files/directories would be a
> > killer feature IMHO
>
> I agree to that, but I don't think the ext4 will be able to provide
> that to the full extent.

And that is for being fs level snapshots a huge drawback.

>
> >
> > David Lang
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
>

2011-06-09 10:54:17

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 11:46 AM, Lukas Czerner <[email protected]> wrote:
> On Thu, 9 Jun 2011, Amir G. wrote:
>
>> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
>> > On Thu, 9 Jun 2011, Yongqiang Yang wrote:
>> >
>> >> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
>> >> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>> >> >>> But I do understand the difference. And also, when it comes to fs level
>> >> >>> snapshotting I would suspect that it would do something we can not do
>> >> >>> with the current solutions, for example per-file or per-directory snapshots,
>> >> >>> cat ext4 snapshots do that ?
>> >> >> Hi Lukas,
>> >> >>
>> >> >> I noticed that there is no answer to this question in the thread. ?I
>> >> >
>> >> > I think I answered this question with No it can't ;-)
>> >> I think this can be implemented easily by chattr and adding check in
>> >> should_snapshot() or should_move_data().
>> >>
>> >> And I thought Lukas are focusing on if ext4-snapshots can do this
>> >> easily. ?So i said YES:-)
>> >
>> > Cool, finally something interesting :). So, how it'll work ? Does that
>> > require any format changes again:) ? Can you exclude the whole root and
>> > then selectively pick the directories or files you are interested in ?
>>
>> The design is actually very simple and not as powerful as you
>> probably desire.
>> I hate to get into the design of future features, when we haven't
>> even ACKed the current feature yet, but since you're the only one
>> did any review, I owe you that much ;-)
>
> Thanks Amir!
>
> You have to understand that I am still not convinced that ext4 snapshot
> in its current state is really what we want to have in ext4. Especially
> given the very basic features it provides, without any knowledge on how
> it can be extended (but you're slowly providing that information, so
> thanks for that). And especially facing the new dm-multisnap, I really
> wonder if it is worth it.

Did you not see my post on LVM vs. Ext4 snapshots?
https://lkml.org/lkml/2011/6/8/296
dm-multisnap is much better than dm-snap, but it's not perfect.
And ext4 snapshots aren't perfect either, but they do bring some
new interesting options for sys admins.

>
> If we want filesystem level snapshotting we can try to do it right with
> all the benefits that snapshots on that level brings. But what I see
> now, is not even remotely the case. And I have the feeling that all the
> features that might be interesting for snapshotting at file system
> level, are just a hack and not inherent from the design. But that is
> probably because your goal was to snapshot the whole filesystem for the
> backup purposes, but that's not what I would expect from fs level
> snapshots. I really hope you understand my point.
>

I think I understand the point. The reason that ext4 snapshots are
less powerful then, say, btrfs snapshots, is not because of my design,
it is because I was building on top a 20 year old on-disk format (ext2), which
was extended 2 times already, but remained mostly backwards compatible.
There is only so much you can do without block reference counts and this
is all that I was trying to do.

>>
>> To exclude a file from snapshot it needs to have the NOCOW_FL flag.
>> Ironically, btrfs have already added that flag in parallel to me (for the
>> same purpose) so the flag it is already reserved in the code :-)
>>
>> To avoid some transition issues and keep it really simple,
>> I disallow changing the NOCOW_FL
>> for regular file and only allow to change it for directories.
>> The NOCOW_FL is inherited from the parent directory,
>> so setting/clearing the flag on a directory means:
>> "All files/subdirs will be created excluded/not-excluded from now on".
>>
>> Inside the snapshot image, excluded directories, which are not really
>> excluded, show normally, but excluded files are shown with zero length,
>> because making the files disappear is hard, but their blocks may have already
>> been reused, so we cannot allow access to them.
>>
>> >
>> > How does rollback work with ext4 snapshots ? Can you selectively roll
>> > back one file, or the whole directory subtree even when you're
>> > snapshotting more ?
>>
>> So there is actually no inherent "rollback" feature, not for a file/dir
>> and not for the entire fs.
>> It's a drawback of ext4 snapshots, but hey, cp/rsync from snapshot
>> still works for file/dir ;-)
>> As for full "fs" rollback. A revert tool has been developed (by students),
>> which requires an external storage to export the "revert patch".
>> This tool is going to be enhanced to use LVM snapshot storage
>> and LVM --merge option to implement ext4 "revert to snapshot" with Yum.
>
> And that is the problem. Because at this level you should be able to do
> it without very much trouble, because being at file system level you
> should have all the information. Do not get me wrong, I am not saying
> that this is easy, but is should be "from design". Exporting the
> "revert patch" to the external storage, or exporting snapshot to LVM
> format to be able to merge it...that is all just hacks, because the
> design itself does not count with that possibility.
>

The design makes a conscious choice to keep snapshots *inside*
the filesystem.
This is both an advantage (no need to change on-disk format and checking tools)
and disadvantage (you cannot mount a snapshot without mounting the fs first).

>>
>> >
>> > You see, when it comes to the full fs snapshots I am not convinced that
>> > it is *very* useful, yes it might have some users, but you can alway
>> > take the safe way and do lvm snapshots (or better use the new multisnap)
>> > for backup, without need to modify stable filesystem code.
>> >
>>
>> You think like a developer. Try talking to some sys admins.
>> Especially ones that worked with Solaris/ZFS or NetApp.
>> See what they think about snapshots and about the LVM alternative...
>> Snapshots have addictive qualities. Ones you've used them, you can't
>> go back to not having them.
>> Imagine how people used to live before the 'Undo' button and imagine
>> that your employer forced you to use an editor without an Undo button.
>> This is the kind of feedback I got from sys admins that moved from Solaris
>> to Linux.
>
> Exactly, so if we want fs level snapshots, it should use that
> privilege no hack its way to do things like roll back, or
> excludes+includes. Ext4 was not meant to work that way, nor was your
> snapshots designed to work that way. If we are considering backups only,
> because that is what you ext4 snaphosts can provide now, I would prefer
> to use LVM. But yes, we all need to know how the new multisnap works
> out.
>

Why do you keep saying 'backup only'?
There is a huge difference between having long lived snapshots,
like CTERA products have, and temporary snapshot for backup
purpose (for which LVM is adequate).

>>
>>
>> > Also, I do not buy the whole argument of "not have to create separate disk
>> > space for snapshot". It is actually better for sysadmins, because you
>> > have perfect control on what is going on, how much space is used for
>> > your snapshots and how much is used by your data. You can always easily
>> > extend the snapshot volume, or let it die silently when it is too old
>> > and too big.
>> >
>>
>> Seriously, Lukas, talk to sys admins.
>> Letting the snapshot die silently is the worst possible thing that a snapshots
>> implementation can do (for long lived snapshots).
>
> Oh, no you misunderstood. Even with your snapshots you'll have to delete
> old snapshots someday, because otherwise you'll run out of space. With
> LVM however, you have prereserved space for it, so even if your snapshot
> volume gets full, it does not affect your filesystem what so ever. And,
> as a administrator, you can decide whether to extend the snapshot volume
> to let it live longer, or just let it be and it will die eventually.
>
> And, as far as I know, the new multisnap will notify the admin when the
> snapshot volume approaches the watermark the same way that for example
> thinly provisioned storage would do. But again, with your snapshots it
> will give you ENOSPC when the snapshot grow too big, and at the end
> of the day, you need to create data to be able to backup it:), so having
> snapshots separate from your fs volume makes sense.
>

Yes, one day you will run out of space and will be getting a warning
before that, if you are using a CTERA product.
You won't be getting the warning from the kernel snapshots code, but from
disk space monitoring daemon.
And when you get the warning (or ENOSPC if you ignored the warnings)
you will have 2 options:
1. add disks and resize the fs
2. delete some snapshots

When using a CTERA product, you not have to pre-partition your disk
space between fs and snapshots - they are thinly provisioned, which
is a big advantage for a product which does not require being an IT expert to
operate it.

>>
>>
>> > How does it actually work on ext4 snapshots ? When you're going to
>> > rewrite a file, you will never know how much disk space it'll take in
>> > advance, am I right ? Is the filesystem accounting for the snapshot size
>> > as well ? or is it hidden ?
>>
>> It's not hidden, it's accounted for as a regular file (usually owned by root).
>> You need a bit of scripting to gather the disk space used by snapshots (du).
>>
>> In ANY snapshots implementation, you can get ENOSPC on operations,
>> which traditionally could not produce this error.
>> This statement is also true for thin provisioning implementations.
>> The question is how the implementation handles these situations.
>>
>> What I came to realize on LSF, is that my implementation is the only
>> one (of LVM and btrfs) that tries to deal with the ENOSPC issue and
>> does a good job most of the time.
>>
>> I deal with it by reserving space for metadata COW on snapshot
>> take, so if a future ENOSPC during metadata COW is possible,
>> snapshot take will fail with ENOSPC.
>>
>> As for ENOSPC during regular file rewrite, that's not such a big problem.
>> The application simply gets ENOSPC as if the file was sparse to begin
>> with. It may not be pleasant if the application have fallocated the space
>> and used mmap/close without msync...
>> The only way I see around this issue is reserving space on mmap time
>> (and returning ENOSPC at that time), but again, this issue is shared
>> with btrfs, but is easier to fix (I think) with ext4 snapshots.
>
> Yes, I do understand that ext4 snaphosts are doing well in that aspect,
> but as I said, having snapshots separate from your file system gives
> you advantage of not running into ENOSPC on your file system until you
> really fill it with data.

It should be, as David wrote, a choice to the sys admin.
Because ext4 snapshots are thinly provisioned, you can always say
"use 10% for snapshots and 90% for data" (like you would with LVM),
But you cannot say "reserve 10% for snapshots 50% for data and the
rest to either" when you administer LVM snapshots.

You are confusing user functionality with functionality provided by the kernel.
LVM happens to check water marks in the kernel because of it's design.
That doesn't mean that the same thing cannot be accomplished for ext4
snapshots by user tools.

>
> Granted, I have to take a look at the multisnap code, to see what it can
> do and compare it with ext4 snapshots, because really, if it is good
> enough and you will be able to do snapshotting backups as you do with
> your approach, I do not see the reason why to complicate our life in
> ext4.
>

I don't know how you intend to determine if dm-multisnap is 'good enough'.
I don't claim to have the capability myself to determine if ext4 snapshots
are 'good enough'.
I just try to present the technical differences between the 3 solutions
(LVM,ext4,btrfs) and claim that each have their advantages and disadvantages
over others.
I wish more sys admins and end users would provide feedback, though I don't
know how many of them are following LKML.

Amir.

2011-06-09 13:00:06

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 9 Jun 2011, Amir G. wrote:

> On Thu, Jun 9, 2011 at 11:46 AM, Lukas Czerner <[email protected]> wrote:
> > On Thu, 9 Jun 2011, Amir G. wrote:
> >
> >> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
> >> > On Thu, 9 Jun 2011, Yongqiang Yang wrote:
> >> >
> >> >> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
> >> >> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
> >> >> >>> But I do understand the difference. And also, when it comes to fs level
> >> >> >>> snapshotting I would suspect that it would do something we can not do
> >> >> >>> with the current solutions, for example per-file or per-directory snapshots,
> >> >> >>> cat ext4 snapshots do that ?
> >> >> >> Hi Lukas,
> >> >> >>
> >> >> >> I noticed that there is no answer to this question in the thread. ?I
> >> >> >
> >> >> > I think I answered this question with No it can't ;-)
> >> >> I think this can be implemented easily by chattr and adding check in
> >> >> should_snapshot() or should_move_data().
> >> >>
> >> >> And I thought Lukas are focusing on if ext4-snapshots can do this
> >> >> easily. ?So i said YES:-)
> >> >
> >> > Cool, finally something interesting :). So, how it'll work ? Does that
> >> > require any format changes again:) ? Can you exclude the whole root and
> >> > then selectively pick the directories or files you are interested in ?
> >>
> >> The design is actually very simple and not as powerful as you
> >> probably desire.
> >> I hate to get into the design of future features, when we haven't
> >> even ACKed the current feature yet, but since you're the only one
> >> did any review, I owe you that much ;-)
> >
> > Thanks Amir!
> >
> > You have to understand that I am still not convinced that ext4 snapshot
> > in its current state is really what we want to have in ext4. Especially
> > given the very basic features it provides, without any knowledge on how
> > it can be extended (but you're slowly providing that information, so
> > thanks for that). And especially facing the new dm-multisnap, I really
> > wonder if it is worth it.
>
> Did you not see my post on LVM vs. Ext4 snapshots?
> https://lkml.org/lkml/2011/6/8/296
> dm-multisnap is much better than dm-snap, but it's not perfect.
> And ext4 snapshots aren't perfect either, but they do bring some
> new interesting options for sys admins.
>
> >
> > If we want filesystem level snapshotting we can try to do it right with
> > all the benefits that snapshots on that level brings. But what I see
> > now, is not even remotely the case. And I have the feeling that all the
> > features that might be interesting for snapshotting at file system
> > level, are just a hack and not inherent from the design. But that is
> > probably because your goal was to snapshot the whole filesystem for the
> > backup purposes, but that's not what I would expect from fs level
> > snapshots. I really hope you understand my point.
> >
>
> I think I understand the point. The reason that ext4 snapshots are
> less powerful then, say, btrfs snapshots, is not because of my design,
> it is because I was building on top a 20 year old on-disk format (ext2), which
> was extended 2 times already, but remained mostly backwards compatible.
> There is only so much you can do without block reference counts and this
> is all that I was trying to do.

And I can imagine it works well enough. But given that we have better,
more generic solution, which does not require hacking stable filesystem
I am becoming more and more against ext4 snapshots to be merged. And if
anyone wishes to have some fancy fs level snapshoting features (which
ext4 snapshots can no provide from the resons you have pointed out), you
can always turn to btrfs, which has been designed that way unlike ext4.

>
>
> >>
> >> To exclude a file from snapshot it needs to have the NOCOW_FL flag.
> >> Ironically, btrfs have already added that flag in parallel to me (for the
> >> same purpose) so the flag it is already reserved in the code :-)
> >>
> >> To avoid some transition issues and keep it really simple,
> >> I disallow changing the NOCOW_FL
> >> for regular file and only allow to change it for directories.
> >> The NOCOW_FL is inherited from the parent directory,
> >> so setting/clearing the flag on a directory means:
> >> "All files/subdirs will be created excluded/not-excluded from now on".
> >>
> >> Inside the snapshot image, excluded directories, which are not really
> >> excluded, show normally, but excluded files are shown with zero length,
> >> because making the files disappear is hard, but their blocks may have already
> >> been reused, so we cannot allow access to them.
> >>
> >> >
> >> > How does rollback work with ext4 snapshots ? Can you selectively roll
> >> > back one file, or the whole directory subtree even when you're
> >> > snapshotting more ?
> >>
> >> So there is actually no inherent "rollback" feature, not for a file/dir
> >> and not for the entire fs.
> >> It's a drawback of ext4 snapshots, but hey, cp/rsync from snapshot
> >> still works for file/dir ;-)
> >> As for full "fs" rollback. A revert tool has been developed (by students),
> >> which requires an external storage to export the "revert patch".
> >> This tool is going to be enhanced to use LVM snapshot storage
> >> and LVM --merge option to implement ext4 "revert to snapshot" with Yum.
> >
> > And that is the problem. Because at this level you should be able to do
> > it without very much trouble, because being at file system level you
> > should have all the information. Do not get me wrong, I am not saying
> > that this is easy, but is should be "from design". Exporting the
> > "revert patch" to the external storage, or exporting snapshot to LVM
> > format to be able to merge it...that is all just hacks, because the
> > design itself does not count with that possibility.
> >
>
> The design makes a conscious choice to keep snapshots *inside*
> the filesystem.
> This is both an advantage (no need to change on-disk format and checking tools)
> and disadvantage (you cannot mount a snapshot without mounting the fs first).

And thats where ext4 snapshots loose. With dm you do not need to change
on-disk format, tools or filesystem itself, and you can mount the snapshot
without also mounting the origin.

>
>
> >>
> >> >
> >> > You see, when it comes to the full fs snapshots I am not convinced that
> >> > it is *very* useful, yes it might have some users, but you can alway
> >> > take the safe way and do lvm snapshots (or better use the new multisnap)
> >> > for backup, without need to modify stable filesystem code.
> >> >
> >>
> >> You think like a developer. Try talking to some sys admins.
> >> Especially ones that worked with Solaris/ZFS or NetApp.
> >> See what they think about snapshots and about the LVM alternative...
> >> Snapshots have addictive qualities. Ones you've used them, you can't
> >> go back to not having them.
> >> Imagine how people used to live before the 'Undo' button and imagine
> >> that your employer forced you to use an editor without an Undo button.
> >> This is the kind of feedback I got from sys admins that moved from Solaris
> >> to Linux.
> >
> > Exactly, so if we want fs level snapshots, it should use that
> > privilege no hack its way to do things like roll back, or
> > excludes+includes. Ext4 was not meant to work that way, nor was your
> > snapshots designed to work that way. If we are considering backups only,
> > because that is what you ext4 snaphosts can provide now, I would prefer
> > to use LVM. But yes, we all need to know how the new multisnap works
> > out.
> >
>
> Why do you keep saying 'backup only'?
> There is a huge difference between having long lived snapshots,
> like CTERA products have, and temporary snapshot for backup
> purpose (for which LVM is adequate).

dm's multisnapshots are designed to be long lived and can be used as
such.

>
> >>
> >>
> >> > Also, I do not buy the whole argument of "not have to create separate disk
> >> > space for snapshot". It is actually better for sysadmins, because you
> >> > have perfect control on what is going on, how much space is used for
> >> > your snapshots and how much is used by your data. You can always easily
> >> > extend the snapshot volume, or let it die silently when it is too old
> >> > and too big.
> >> >
> >>
> >> Seriously, Lukas, talk to sys admins.
> >> Letting the snapshot die silently is the worst possible thing that a snapshots
> >> implementation can do (for long lived snapshots).
> >
> > Oh, no you misunderstood. Even with your snapshots you'll have to delete
> > old snapshots someday, because otherwise you'll run out of space. With
> > LVM however, you have prereserved space for it, so even if your snapshot
> > volume gets full, it does not affect your filesystem what so ever. And,
> > as a administrator, you can decide whether to extend the snapshot volume
> > to let it live longer, or just let it be and it will die eventually.
> >
> > And, as far as I know, the new multisnap will notify the admin when the
> > snapshot volume approaches the watermark the same way that for example
> > thinly provisioned storage would do. But again, with your snapshots it
> > will give you ENOSPC when the snapshot grow too big, and at the end
> > of the day, you need to create data to be able to backup it:), so having
> > snapshots separate from your fs volume makes sense.
> >
>
> Yes, one day you will run out of space and will be getting a warning
> before that, if you are using a CTERA product.
> You won't be getting the warning from the kernel snapshots code, but from
> disk space monitoring daemon.
> And when you get the warning (or ENOSPC if you ignored the warnings)
> you will have 2 options:
> 1. add disks and resize the fs
> 2. delete some snapshots
>
> When using a CTERA product, you not have to pre-partition your disk
> space between fs and snapshots - they are thinly provisioned, which
> is a big advantage for a product which does not require being an IT expert to
> operate it.

dm multisnapshot code is using thin provisioning, you just have to pick
the volume and that's it.

>
>
> >>
> >>
> >> > How does it actually work on ext4 snapshots ? When you're going to
> >> > rewrite a file, you will never know how much disk space it'll take in
> >> > advance, am I right ? Is the filesystem accounting for the snapshot size
> >> > as well ? or is it hidden ?
> >>
> >> It's not hidden, it's accounted for as a regular file (usually owned by root).
> >> You need a bit of scripting to gather the disk space used by snapshots (du).
> >>
> >> In ANY snapshots implementation, you can get ENOSPC on operations,
> >> which traditionally could not produce this error.
> >> This statement is also true for thin provisioning implementations.
> >> The question is how the implementation handles these situations.
> >>
> >> What I came to realize on LSF, is that my implementation is the only
> >> one (of LVM and btrfs) that tries to deal with the ENOSPC issue and
> >> does a good job most of the time.
> >>
> >> I deal with it by reserving space for metadata COW on snapshot
> >> take, so if a future ENOSPC during metadata COW is possible,
> >> snapshot take will fail with ENOSPC.
> >>
> >> As for ENOSPC during regular file rewrite, that's not such a big problem.
> >> The application simply gets ENOSPC as if the file was sparse to begin
> >> with. It may not be pleasant if the application have fallocated the space
> >> and used mmap/close without msync...
> >> The only way I see around this issue is reserving space on mmap time
> >> (and returning ENOSPC at that time), but again, this issue is shared
> >> with btrfs, but is easier to fix (I think) with ext4 snapshots.
> >
> > Yes, I do understand that ext4 snaphosts are doing well in that aspect,
> > but as I said, having snapshots separate from your file system gives
> > you advantage of not running into ENOSPC on your file system until you
> > really fill it with data.
>
> It should be, as David wrote, a choice to the sys admin.
> Because ext4 snapshots are thinly provisioned, you can always say
> "use 10% for snapshots and 90% for data" (like you would with LVM),
> But you cannot say "reserve 10% for snapshots 50% for data and the
> rest to either" when you administer LVM snapshots.

I am not sure how can this be managed with multisnap target, but I do
not see a reason why it can not be done, given that both data and
snapshots can be allocated from within the same pool.

>
> You are confusing user functionality with functionality provided by the kernel.
> LVM happens to check water marks in the kernel because of it's design.
> That doesn't mean that the same thing cannot be accomplished for ext4
> snapshots by user tools.

That was not my point, I was simply saying that it is not ext4 snapshots
advantage.

>
>
> >
> > Granted, I have to take a look at the multisnap code, to see what it can
> > do and compare it with ext4 snapshots, because really, if it is good
> > enough and you will be able to do snapshotting backups as you do with
> > your approach, I do not see the reason why to complicate our life in
> > ext4.
> >
>
> I don't know how you intend to determine if dm-multisnap is 'good enough'.
> I don't claim to have the capability myself to determine if ext4 snapshots
> are 'good enough'.
> I just try to present the technical differences between the 3 solutions
> (LVM,ext4,btrfs) and claim that each have their advantages and disadvantages
> over others.
> I wish more sys admins and end users would provide feedback, though I don't
> know how many of them are following LKML.

I do. When it can do long lived snapshots without any obvious headaches
it is good enough. Your only contra argument was that lvm snapshotting
is slow, which is not that big argument now when we have multisnap
almost ready. I am not even talking about features, because clearly
mutlisnap has superset of the features that ext4 does - no I am not
counting per-file or per-directory snapshotting because clearly those
are just hacks and it was not designed that way.

-Lukas

2011-06-10 07:07:04

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, Jun 9, 2011 at 3:59 PM, Lukas Czerner <[email protected]> wrote:
> On Thu, 9 Jun 2011, Amir G. wrote:
>
>> On Thu, Jun 9, 2011 at 11:46 AM, Lukas Czerner <[email protected]> wrote:
>> > On Thu, 9 Jun 2011, Amir G. wrote:
>> >
>> >> On Thu, Jun 9, 2011 at 9:50 AM, Lukas Czerner <[email protected]> wrote:
>> >> > On Thu, 9 Jun 2011, Yongqiang Yang wrote:
>> >> >
>> >> >> On Thu, Jun 9, 2011 at 11:18 AM, Amir G. <[email protected]> wrote:
>> >> >> > On Thu, Jun 9, 2011 at 4:59 AM, Yongqiang Yang <[email protected]> wrote:
>> >> >> >>> But I do understand the difference. And also, when it comes to fs level
>> >> >> >>> snapshotting I would suspect that it would do something we can not do
>> >> >> >>> with the current solutions, for example per-file or per-directory snapshots,
>> >> >> >>> cat ext4 snapshots do that ?
>> >> >> >> Hi Lukas,
>> >> >> >>
>> >> >> >> I noticed that there is no answer to this question in the thread. ?I
>> >> >> >
>> >> >> > I think I answered this question with No it can't ;-)
>> >> >> I think this can be implemented easily by chattr and adding check in
>> >> >> should_snapshot() or should_move_data().
>> >> >>
>> >> >> And I thought Lukas are focusing on if ext4-snapshots can do this
>> >> >> easily. ?So i said YES:-)
>> >> >
>> >> > Cool, finally something interesting :). So, how it'll work ? Does that
>> >> > require any format changes again:) ? Can you exclude the whole root and
>> >> > then selectively pick the directories or files you are interested in ?
>> >>
>> >> The design is actually very simple and not as powerful as you
>> >> probably desire.
>> >> I hate to get into the design of future features, when we haven't
>> >> even ACKed the current feature yet, but since you're the only one
>> >> did any review, I owe you that much ;-)
>> >
>> > Thanks Amir!
>> >
>> > You have to understand that I am still not convinced that ext4 snapshot
>> > in its current state is really what we want to have in ext4. Especially
>> > given the very basic features it provides, without any knowledge on how
>> > it can be extended (but you're slowly providing that information, so
>> > thanks for that). And especially facing the new dm-multisnap, I really
>> > wonder if it is worth it.
>>
>> Did you not see my post on LVM vs. Ext4 snapshots?
>> https://lkml.org/lkml/2011/6/8/296
>> dm-multisnap is much better than dm-snap, but it's not perfect.
>> And ext4 snapshots aren't perfect either, but they do bring some
>> new interesting options for sys admins.
>>
>> >
>> > If we want filesystem level snapshotting we can try to do it right with
>> > all the benefits that snapshots on that level brings. But what I see
>> > now, is not even remotely the case. And I have the feeling that all the
>> > features that might be interesting for snapshotting at file system
>> > level, are just a hack and not inherent from the design. But that is
>> > probably because your goal was to snapshot the whole filesystem for the
>> > backup purposes, but that's not what I would expect from fs level
>> > snapshots. I really hope you understand my point.
>> >
>>
>> I think I understand the point. The reason that ext4 snapshots are
>> less powerful then, say, btrfs snapshots, is not because of my design,
>> it is because I was building on top a 20 year old on-disk format (ext2), which
>> was extended 2 times already, but remained mostly backwards compatible.
>> There is only so much you can do without block reference counts and this
>> is all that I was trying to do.
>
> And I can imagine it works well enough. But given that we have better,
> more generic solution, which does not require hacking stable filesystem
> I am becoming more and more against ext4 snapshots to be merged. And if
> anyone wishes to have some fancy fs level snapshoting features (which
> ext4 snapshots can no provide from the resons you have pointed out), you
> can always turn to btrfs, which has been designed that way unlike ext4.
>
>>
>>
>> >>
>> >> To exclude a file from snapshot it needs to have the NOCOW_FL flag.
>> >> Ironically, btrfs have already added that flag in parallel to me (for the
>> >> same purpose) so the flag it is already reserved in the code :-)
>> >>
>> >> To avoid some transition issues and keep it really simple,
>> >> I disallow changing the NOCOW_FL
>> >> for regular file and only allow to change it for directories.
>> >> The NOCOW_FL is inherited from the parent directory,
>> >> so setting/clearing the flag on a directory means:
>> >> "All files/subdirs will be created excluded/not-excluded from now on".
>> >>
>> >> Inside the snapshot image, excluded directories, which are not really
>> >> excluded, show normally, but excluded files are shown with zero length,
>> >> because making the files disappear is hard, but their blocks may have already
>> >> been reused, so we cannot allow access to them.
>> >>
>> >> >
>> >> > How does rollback work with ext4 snapshots ? Can you selectively roll
>> >> > back one file, or the whole directory subtree even when you're
>> >> > snapshotting more ?
>> >>
>> >> So there is actually no inherent "rollback" feature, not for a file/dir
>> >> and not for the entire fs.
>> >> It's a drawback of ext4 snapshots, but hey, cp/rsync from snapshot
>> >> still works for file/dir ;-)
>> >> As for full "fs" rollback. A revert tool has been developed (by students),
>> >> which requires an external storage to export the "revert patch".
>> >> This tool is going to be enhanced to use LVM snapshot storage
>> >> and LVM --merge option to implement ext4 "revert to snapshot" with Yum.
>> >
>> > And that is the problem. Because at this level you should be able to do
>> > it without very much trouble, because being at file system level you
>> > should have all the information. Do not get me wrong, I am not saying
>> > that this is easy, but is should be "from design". Exporting the
>> > "revert patch" to the external storage, or exporting snapshot to LVM
>> > format to be able to merge it...that is all just hacks, because the
>> > design itself does not count with that possibility.
>> >
>>
>> The design makes a conscious choice to keep snapshots *inside*
>> the filesystem.
>> This is both an advantage (no need to change on-disk format and checking tools)
>> and disadvantage (you cannot mount a snapshot without mounting the fs first).
>
> And thats where ext4 snapshots loose. With dm you do not need to change
> on-disk format, tools or filesystem itself, and you can mount the snapshot
> without also mounting the origin.
>
>>
>>
>> >>
>> >> >
>> >> > You see, when it comes to the full fs snapshots I am not convinced that
>> >> > it is *very* useful, yes it might have some users, but you can alway
>> >> > take the safe way and do lvm snapshots (or better use the new multisnap)
>> >> > for backup, without need to modify stable filesystem code.
>> >> >
>> >>
>> >> You think like a developer. Try talking to some sys admins.
>> >> Especially ones that worked with Solaris/ZFS or NetApp.
>> >> See what they think about snapshots and about the LVM alternative...
>> >> Snapshots have addictive qualities. Ones you've used them, you can't
>> >> go back to not having them.
>> >> Imagine how people used to live before the 'Undo' button and imagine
>> >> that your employer forced you to use an editor without an Undo button.
>> >> This is the kind of feedback I got from sys admins that moved from Solaris
>> >> to Linux.
>> >
>> > Exactly, so if we want fs level snapshots, it should use that
>> > privilege no hack its way to do things like roll back, or
>> > excludes+includes. Ext4 was not meant to work that way, nor was your
>> > snapshots designed to work that way. If we are considering backups only,
>> > because that is what you ext4 snaphosts can provide now, I would prefer
>> > to use LVM. But yes, we all need to know how the new multisnap works
>> > out.
>> >
>>
>> Why do you keep saying 'backup only'?
>> There is a huge difference between having long lived snapshots,
>> like CTERA products have, and temporary snapshot for backup
>> purpose (for which LVM is adequate).
>
> dm's multisnapshots are designed to be long lived and can be used as
> such.
>
>>
>> >>
>> >>
>> >> > Also, I do not buy the whole argument of "not have to create separate disk
>> >> > space for snapshot". It is actually better for sysadmins, because you
>> >> > have perfect control on what is going on, how much space is used for
>> >> > your snapshots and how much is used by your data. You can always easily
>> >> > extend the snapshot volume, or let it die silently when it is too old
>> >> > and too big.
>> >> >
>> >>
>> >> Seriously, Lukas, talk to sys admins.
>> >> Letting the snapshot die silently is the worst possible thing that a snapshots
>> >> implementation can do (for long lived snapshots).
>> >
>> > Oh, no you misunderstood. Even with your snapshots you'll have to delete
>> > old snapshots someday, because otherwise you'll run out of space. With
>> > LVM however, you have prereserved space for it, so even if your snapshot
>> > volume gets full, it does not affect your filesystem what so ever. And,
>> > as a administrator, you can decide whether to extend the snapshot volume
>> > to let it live longer, or just let it be and it will die eventually.
>> >
>> > And, as far as I know, the new multisnap will notify the admin when the
>> > snapshot volume approaches the watermark the same way that for example
>> > thinly provisioned storage would do. But again, with your snapshots it
>> > will give you ENOSPC when the snapshot grow too big, and at the end
>> > of the day, you need to create data to be able to backup it:), so having
>> > snapshots separate from your fs volume makes sense.
>> >
>>
>> Yes, one day you will run out of space and will be getting a warning
>> before that, if you are using a CTERA product.
>> You won't be getting the warning from the kernel snapshots code, but from
>> disk space monitoring daemon.
>> And when you get the warning (or ENOSPC if you ignored the warnings)
>> you will have 2 options:
>> 1. add disks and resize the fs
>> 2. delete some snapshots
>>
>> When using a CTERA product, you not have to pre-partition your disk
>> space between fs and snapshots - they are thinly provisioned, which
>> is a big advantage for a product which does not require being an IT expert to
>> operate it.
>
> dm multisnapshot code is using thin provisioning, you just have to pick
> the volume and that's it.
>
>>
>>
>> >>
>> >>
>> >> > How does it actually work on ext4 snapshots ? When you're going to
>> >> > rewrite a file, you will never know how much disk space it'll take in
>> >> > advance, am I right ? Is the filesystem accounting for the snapshot size
>> >> > as well ? or is it hidden ?
>> >>
>> >> It's not hidden, it's accounted for as a regular file (usually owned by root).
>> >> You need a bit of scripting to gather the disk space used by snapshots (du).
>> >>
>> >> In ANY snapshots implementation, you can get ENOSPC on operations,
>> >> which traditionally could not produce this error.
>> >> This statement is also true for thin provisioning implementations.
>> >> The question is how the implementation handles these situations.
>> >>
>> >> What I came to realize on LSF, is that my implementation is the only
>> >> one (of LVM and btrfs) that tries to deal with the ENOSPC issue and
>> >> does a good job most of the time.
>> >>
>> >> I deal with it by reserving space for metadata COW on snapshot
>> >> take, so if a future ENOSPC during metadata COW is possible,
>> >> snapshot take will fail with ENOSPC.
>> >>
>> >> As for ENOSPC during regular file rewrite, that's not such a big problem.
>> >> The application simply gets ENOSPC as if the file was sparse to begin
>> >> with. It may not be pleasant if the application have fallocated the space
>> >> and used mmap/close without msync...
>> >> The only way I see around this issue is reserving space on mmap time
>> >> (and returning ENOSPC at that time), but again, this issue is shared
>> >> with btrfs, but is easier to fix (I think) with ext4 snapshots.
>> >
>> > Yes, I do understand that ext4 snaphosts are doing well in that aspect,
>> > but as I said, having snapshots separate from your file system gives
>> > you advantage of not running into ENOSPC on your file system until you
>> > really fill it with data.
>>
>> It should be, as David wrote, a choice to the sys admin.
>> Because ext4 snapshots are thinly provisioned, you can always say
>> "use 10% for snapshots and 90% for data" (like you would with LVM),
>> But you cannot say "reserve 10% for snapshots 50% for data and the
>> rest to either" when you administer LVM snapshots.
>
> I am not sure how can this be managed with multisnap target, but I do
> not see a reason why it can not be done, given that both data and
> snapshots can be allocated from within the same pool.
>
>>
>> You are confusing user functionality with functionality provided by the kernel.
>> LVM happens to check water marks in the kernel because of it's design.
>> That doesn't mean that the same thing cannot be accomplished for ext4
>> snapshots by user tools.
>
> That was not my point, I was simply saying that it is not ext4 snapshots
> advantage.
>
>>
>>
>> >
>> > Granted, I have to take a look at the multisnap code, to see what it can
>> > do and compare it with ext4 snapshots, because really, if it is good
>> > enough and you will be able to do snapshotting backups as you do with
>> > your approach, I do not see the reason why to complicate our life in
>> > ext4.
>> >
>>
>> I don't know how you intend to determine if dm-multisnap is 'good enough'.
>> I don't claim to have the capability myself to determine if ext4 snapshots
>> are 'good enough'.
>> I just try to present the technical differences between the 3 solutions
>> (LVM,ext4,btrfs) and claim that each have their advantages and disadvantages
>> over others.
>> I wish more sys admins and end users would provide feedback, though I don't
>> know how many of them are following LKML.
>
> I do. When it can do long lived snapshots without any obvious headaches
> it is good enough. Your only contra argument was that lvm snapshotting
> is slow, which is not that big argument now when we have multisnap
> almost ready. I am not even talking about features, because clearly
> mutlisnap has superset of the features that ext4 does - no I am not
> counting per-file or per-directory snapshotting because clearly those
> are just hacks and it was not designed that way.
>

Hi Lukas,

I am very glad to have you as my reviewer and critic :-)
I am saying that with all honesty, because I know that you are impartial
and have no anti-ext4 agenda.

LVM multisnap does look like a big leap forward, but you should not
be blinded by the promised feature, before you inspect the implementation,
the same as you are doing to ext4 snapshots now...

I could suggest that you put your root fs on a QCOW2 file exported as NBD.
That would give you both thin provisioning and snapshots, but you know
perfectly well, that this is not a 'good enough' solution.
I'm not saying that LVM is comparable to QCOW2 virtual volume.
I'm just saying we (included myself) should carefully examine the alternatives
before make a ruling against one of them.

Amir.

2011-06-10 09:35:33

by Lukas Czerner

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Fri, 10 Jun 2011, Amir G. wrote:

--snip--
> >
> >>
> >>
> >> >
> >> > Granted, I have to take a look at the multisnap code, to see what it can
> >> > do and compare it with ext4 snapshots, because really, if it is good
> >> > enough and you will be able to do snapshotting backups as you do with
> >> > your approach, I do not see the reason why to complicate our life in
> >> > ext4.
> >> >
> >>
> >> I don't know how you intend to determine if dm-multisnap is 'good enough'.
> >> I don't claim to have the capability myself to determine if ext4 snapshots
> >> are 'good enough'.
> >> I just try to present the technical differences between the 3 solutions
> >> (LVM,ext4,btrfs) and claim that each have their advantages and disadvantages
> >> over others.
> >> I wish more sys admins and end users would provide feedback, though I don't
> >> know how many of them are following LKML.
> >
> > I do. When it can do long lived snapshots without any obvious headaches
> > it is good enough. Your only contra argument was that lvm snapshotting
> > is slow, which is not that big argument now when we have multisnap
> > almost ready. I am not even talking about features, because clearly
> > mutlisnap has superset of the features that ext4 does - no I am not
> > counting per-file or per-directory snapshotting because clearly those
> > are just hacks and it was not designed that way.
> >
>
> Hi Lukas,
>
> I am very glad to have you as my reviewer and critic :-)
> I am saying that with all honesty, because I know that you are impartial
> and have no anti-ext4 agenda.
>
> LVM multisnap does look like a big leap forward, but you should not
> be blinded by the promised feature, before you inspect the implementation,
> the same as you are doing to ext4 snapshots now...
>
> I could suggest that you put your root fs on a QCOW2 file exported as NBD.
> That would give you both thin provisioning and snapshots, but you know
> perfectly well, that this is not a 'good enough' solution.
> I'm not saying that LVM is comparable to QCOW2 virtual volume.
> I'm just saying we (included myself) should carefully examine the alternatives
> before make a ruling against one of them.
>
> Amir.
>

Hi Amir,

that is why I spoke with several dm people and all of them had the same
opinion. When you are not using the advantage of being at fs level,
there is no reason to have shapshoting at this level.

And no, I am not blinded. I am trying to understand why is multisnap a
huge win everyone is saying, so I already asked ejt to step in and
give us an overview on how dm-multisnap works and why is it better
than the old implementation. Also I am trying it myslef, and so far
it works quite well. I might have some numbers later.

Thanks!
-Lukas

2011-06-10 12:02:48

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Fri, Jun 10, 2011 at 12:00 PM, Lukas Czerner <[email protected]> wrote:
> On Fri, 10 Jun 2011, Amir G. wrote:
>
> --snip--
>> >
>> >>
>> >>
>> >> >
>> >> > Granted, I have to take a look at the multisnap code, to see what it can
>> >> > do and compare it with ext4 snapshots, because really, if it is good
>> >> > enough and you will be able to do snapshotting backups as you do with
>> >> > your approach, I do not see the reason why to complicate our life in
>> >> > ext4.
>> >> >
>> >>
>> >> I don't know how you intend to determine if dm-multisnap is 'good enough'.
>> >> I don't claim to have the capability myself to determine if ext4 snapshots
>> >> are 'good enough'.
>> >> I just try to present the technical differences between the 3 solutions
>> >> (LVM,ext4,btrfs) and claim that each have their advantages and disadvantages
>> >> over others.
>> >> I wish more sys admins and end users would provide feedback, though I don't
>> >> know how many of them are following LKML.
>> >
>> > I do. When it can do long lived snapshots without any obvious headaches
>> > it is good enough. Your only contra argument was that lvm snapshotting
>> > is slow, which is not that big argument now when we have multisnap
>> > almost ready. I am not even talking about features, because clearly
>> > mutlisnap has superset of the features that ext4 does - no I am not
>> > counting per-file or per-directory snapshotting because clearly those
>> > are just hacks and it was not designed that way.
>> >
>>
>> Hi Lukas,
>>
>> I am very glad to have you as my reviewer and critic :-)
>> I am saying that with all honesty, because I know that you are impartial
>> and have no anti-ext4 agenda.
>>
>> LVM multisnap does look like a big leap forward, but you should not
>> be blinded by the promised feature, before you inspect the implementation,
>> the same as you are doing to ext4 snapshots now...
>>
>> I could suggest that you put your root fs on a QCOW2 file exported as NBD.
>> That would give you both thin provisioning and snapshots, but you know
>> perfectly well, that this is not a 'good enough' solution.
>> I'm not saying that LVM is comparable to QCOW2 virtual volume.
>> I'm just saying we (included myself) should carefully examine the alternatives
>> before make a ruling against one of them.
>>
>> Amir.
>>
>
> Hi Amir,
>
> that is why I spoke with several dm people and all of them had the same
> opinion. When you are not using the advantage of being at fs level,
> there is no reason to have shapshoting at this level.
>
> And no, I am not blinded. I am trying to understand why is multisnap a
> huge win everyone is saying, so I already asked ejt to step in and
> give us an overview on how dm-multisnap works and why is it better
> than the old implementation. Also I am trying it myslef, and so far
> it works quite well. I might have some numbers later.
>
> Thanks!
> -Lukas
>

Wow, if you can provide numbers that would be great!
If you can also run the same tests on the same machine with my
ext4dev module that would be awesome!
the module on next3.sf.net is for kernel 2.6.38, but I can send you
a module for kernel 2.6.39 or 3.0-rc1 if you like.

Thanks!
Amir.

2011-06-10 22:52:19

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Thu, 09 Jun 2011 13:54:13 +0300, "Amir G." said:

> Why do you keep saying 'backup only'?
> There is a huge difference between having long lived snapshots,
> like CTERA products have, and temporary snapshot for backup
> purpose (for which LVM is adequate).

I must have blinked somewhere - I'm not convinced LVM is even "adequate" for
backup purposes. In particular, how does an LVM-level snapshot deal with the
"metadata in memory" problem (basically the exact same problem as running fsck
on a disk partition that is already mounted)?

Attachments:

(No filename) (227.00 B)

2011-06-11 01:09:32

[permalink] [raw]

Subject: Re: [PATCH v1 00/30] Ext4 snapshots

On Sat, Jun 11, 2011 at 1:51 AM, <[email protected]> wrote:
> On Thu, 09 Jun 2011 13:54:13 +0300, "Amir G." said:
>
>> Why do you keep saying 'backup only'?
>> There is a huge difference between having long lived snapshots,
>> like CTERA products have, and temporary snapshot for backup
>> purpose (for which LVM is adequate).
>
> I must have blinked somewhere - I'm not convinced LVM is even "adequate" for
> backup purposes. ?In particular, how does an LVM-level snapshot deal with the
> "metadata in memory" problem (basically the exact same problem as running fsck
> on a disk partition that is already mounted)?
>
>

It uses the filesystem freeze API.
Same as ext4 snapshots.

Amir.