Hi,
I've been working for the past serveral months to take review comments
and to continually solve users' problems come up in mainling list
(thanks for all giving comments and feedbacks!). Also, I've tried to
stabilize API and disk format to restrict additional changes and
ensure backward compatibility.
Except adding ioctl commands for user-demanded features, I think the
interface was almost stabilized. Now I'd like to ask for mainline
inclusion so that more people can try this in the tree.
This file system, nilfs2, provides continuous snapshotting; it gives
not only versioning capability of the entire file system but also
retroactively selectable and recoverable snapshots.
With nilfs2, users can even restore files and namespace mistakenly
overwritten or destroyed just a few seconds ago.
The restoration with nilfs snapshots is, for example, done as follows:
# lscp
(list checkpoints)
CNO DATE TIME MODE FLG NBLKINC ICNT
33338 2009-03-08 14:45:49 cp - 11 3
33339 2009-03-08 14:50:22 cp - 200523 81
33340 2009-03-08 20:40:34 cp - 136 61
33341 2009-03-08 20:41:20 cp - 187666 1604
33342 2009-03-08 20:41:42 cp - 51 1634
...
# chcp ss 33339 ; lscp
(select an existing checkpoint and change it into a snapshot)
CNO DATE TIME MODE FLG NBLKINC ICNT
33338 2009-03-08 14:45:49 cp - 11 3
33339 2009-03-08 14:50:22 ss - 200523 81
33340 2009-03-08 20:40:34 cp - 136 61
33341 2009-03-08 20:41:20 cp - 187666 1604
33342 2009-03-08 20:41:42 cp - 51 1634
...
# mount -t nilfs2 -r -o cp=33339 /dev/sdb1 /snap
(mount the snapshot, then it will become accessible on the mount point)
Where snapshots will remain in the disk, whereas plain checkpoints
will be reclaimed automatically after a certain (guaranteed) period.
These snapshot operations are quickly performable.
There is no limit on the number of checkpoints and snapshots until the
volume gets full. We have long term use experience of nilfs2 about
eleven months on a server; several thousands of checkpoints are dealt
with constantly there.
On the other hand, performance tuning has left on the back burner. I
feel B-tree lookup routines and log writer have room for improvement
to get better read/write throughput. In addition, defrag should be
applied, maybe with help from the userland cleaner daemon; a recent
measurement on the above aged partition shows 20~25% degradation.
These are todo items.
So, we'd like to address such remaining challenges with eliminating
user issues ongoingly and simplifying implementation at every
opportunity.
Nilfs userland tools:
http://www.nilfs.org/en/download.html
General nilfs information:
http://www.nilfs.org/en/
Nilfs usage:
http://www.nilfs.org/en/about_nilfs.html
TODO items:
http://www.nilfs.org/en/current_status.html
User Mailing list:
https://www.nilfs.org/mailman/listinfo/users
Thanks,
Ryusuke Konishi
Hello,
On Wed, Mar 11, 2009 at 01:55:42AM +0900, Ryusuke Konishi wrote:
> # lscp
> (list checkpoints)
>
> CNO DATE TIME MODE FLG NBLKINC ICNT
> 33338 2009-03-08 14:45:49 cp - 11 3
> 33339 2009-03-08 14:50:22 cp - 200523 81
> 33340 2009-03-08 20:40:34 cp - 136 61
> 33341 2009-03-08 20:41:20 cp - 187666 1604
> 33342 2009-03-08 20:41:42 cp - 51 1634
> ...
Is there a cheaty way to get at checkpointed files (and the list of
checkpoints) through some sort of magic dot directory system (so you
wouldn't need the userland tools and the remounting)?
--
Sitsofe | http://sucs.org/~sits/
On Wed, Mar 11, 2009 at 01:55:42AM +0900, Ryusuke Konishi wrote:
> Hi,
>
> I've been working for the past serveral months to take review comments
> and to continually solve users' problems come up in mainling list
> (thanks for all giving comments and feedbacks!). Also, I've tried to
> stabilize API and disk format to restrict additional changes and
> ensure backward compatibility.
>
> Except adding ioctl commands for user-demanded features, I think the
> interface was almost stabilized. Now I'd like to ask for mainline
> inclusion so that more people can try this in the tree.
Then submit some patches, as documented in
Documentation/SubmittingPatches :)
thanks,
greg k-h
Hi,
On Tue, 10 Mar 2009 17:46:27 +0000, Sitsofe Wheeler wrote:
> Hello,
>
> On Wed, Mar 11, 2009 at 01:55:42AM +0900, Ryusuke Konishi wrote:
> > # lscp
> > (list checkpoints)
> >
> > CNO DATE TIME MODE FLG NBLKINC ICNT
> > 33338 2009-03-08 14:45:49 cp - 11 3
> > 33339 2009-03-08 14:50:22 cp - 200523 81
> > 33340 2009-03-08 20:40:34 cp - 136 61
> > 33341 2009-03-08 20:41:20 cp - 187666 1604
> > 33342 2009-03-08 20:41:42 cp - 51 1634
> > ...
>
> Is there a cheaty way to get at checkpointed files (and the list of
> checkpoints) through some sort of magic dot directory system (so you
> wouldn't need the userland tools and the remounting)?
No, we don't give the special namespace extension to nilfs2.
Instead, we have adopted out-of-filesystem solution, that is, using
autofs to present checkpoints on namespace outside nilfs.
Autofs is useful to presents many snapshots or checkpoints
automatically, and it allows flexible naming rules.
Maybe we can natively support the magic dot directory, and we may do
that if many users demand it or we confront some kind of scalability
issue. But, so far it seems to work well enough.
Regards,
Ryusuke
Hi!
On Tue, 10 Mar 2009 11:25:56 -0700, Greg KH wrote:
> On Wed, Mar 11, 2009 at 01:55:42AM +0900, Ryusuke Konishi wrote:
> > Hi,
> >
> > I've been working for the past serveral months to take review comments
> > and to continually solve users' problems come up in mainling list
> > (thanks for all giving comments and feedbacks!). Also, I've tried to
> > stabilize API and disk format to restrict additional changes and
> > ensure backward compatibility.
> >
> > Except adding ioctl commands for user-demanded features, I think the
> > interface was almost stabilized. Now I'd like to ask for mainline
> > inclusion so that more people can try this in the tree.
>
> Then submit some patches, as documented in
> Documentation/SubmittingPatches :)
>
> thanks,
Sorry for my insufficient explanation.
In my case, the patches are maintained in the -mm tree :)
http://userweb.kernel.org/~akpm/mmotm/
My original post was found in
http://marc.info/?l=linux-fsdevel&m=122141951118758
and
http://marc.info/?l=linux-fsdevel&m=121920195516073 (undivided first post)
Additional patches have been posted to these mailing list, too.
Thanks,
Ryusuke
On Wed, 11 Mar 2009 01:55:42 +0900 (JST)
Ryusuke Konishi <[email protected]> wrote:
> I've been working for the past serveral months to take review comments
> and to continually solve users' problems come up in mainling list
Yes, the maintenance has been impressive.
> (thanks for all giving comments and feedbacks!). Also, I've tried to
> stabilize API and disk format to restrict additional changes and
> ensure backward compatibility.
Well. From the point of view of mainline linux, there is no
back-compatibility issue, because the fs hasn't been merged yet.
You perhaps have back-compatibility concerns for existing users of the
out-of-tree patch, but I'd encourage you to not worry about that too
much - there will be fairly few users and they are probably pretty
technical and will be able to cope with a migration. It's a _bit_ hard
on them but on the other hand, omitting back-compatibility code leads
to a better implementation for the long term.
What you should be more concerned about is forward-compatibility. What
arrangements do you presently have in place to be able to later alter the
on-disk format without causing too much disruption? Having a strong
design here will make changes easier to do and will lead to a better
filesystem.
Also.. Don't get _too_ concerned about freezing the on-disk format at
this time. You could put in a mount-time printk("the nilfs on-disk
format may change at any time - do not place critical data on a nilfs
filesystem") and we leave that in place for a few months while things
stabilise.
And yes, I was planning on sending nilfs in to Linus for 2.6.30 unless
someone has decent-sounding reasons to hold it back.
Hi Andrew,
On Tue, 10 Mar 2009 15:54:59 -0700, Andrew Morton wrote:
> On Wed, 11 Mar 2009 01:55:42 +0900 (JST)
> Ryusuke Konishi <[email protected]> wrote:
>
> > I've been working for the past serveral months to take review comments
> > and to continually solve users' problems come up in mainling list
>
> Yes, the maintenance has been impressive.
Thanks for picking up additional patches at all times!
> > (thanks for all giving comments and feedbacks!). Also, I've tried to
> > stabilize API and disk format to restrict additional changes and
> > ensure backward compatibility.
>
> Well. From the point of view of mainline linux, there is no
> back-compatibility issue, because the fs hasn't been merged yet.
Oops, sorry, I mistook; it meant forward-compatibility.
My recent patch series and the recent userland tasks were intended to
give better support for future extensions though I even tried to keep
backward compatibility.
> You perhaps have back-compatibility concerns for existing users of the
> out-of-tree patch, but I'd encourage you to not worry about that too
> much - there will be fairly few users and they are probably pretty
> technical and will be able to cope with a migration. It's a _bit_ hard
> on them but on the other hand, omitting back-compatibility code leads
> to a better implementation for the long term.
Thanks for the advice. I'll keep this comment in mind and will handle
matters more flexibly with considering long term merits and trade-off.
> What you should be more concerned about is forward-compatibility. What
> arrangements do you presently have in place to be able to later alter the
> on-disk format without causing too much disruption? Having a strong
> design here will make changes easier to do and will lead to a better
> filesystem.
Yes, I recognize the importance of this. For example, I've carefully
converted both userland and kernel code so that they refer to ``size''
fields embeded in on-disk structures instead of calculating with
sizeof(). In addition, I paid attention to initialization of reserved
fields not to break the forward compatibility.
We arranged various size fields: size of super block, size of segment
header, size of on-disk items in meta data files such as inode on
ifile, checkpoint entry on cpfile, segment usage entry on sufile, and
so forth. All those fields are for handling possible future
extensions.
Further, each log of nilfs2 is designed so that it can have additional
blocks in the tail. They may be used, for instance, to write
additional copies of superblock.
> Also.. Don't get _too_ concerned about freezing the on-disk format at
> this time. You could put in a mount-time printk("the nilfs on-disk
> format may change at any time - do not place critical data on a nilfs
> filesystem") and we leave that in place for a few months while things
> stabilise.
Got it. So, I will do the message insertion :)
> And yes, I was planning on sending nilfs in to Linus for 2.6.30 unless
> someone has decent-sounding reasons to hold it back.
Great!
Regards,
Ryusuke Konishi
On Wed, 11 Mar 2009 17:35:48 +0900 (JST), Ryusuke Konishi wrote:
> On Tue, 10 Mar 2009 15:54:59 -0700, Andrew Morton wrote:
> > Also.. Don't get _too_ concerned about freezing the on-disk format at
> > this time. You could put in a mount-time printk("the nilfs on-disk
> > format may change at any time - do not place critical data on a nilfs
> > filesystem") and we leave that in place for a few months while things
> > stabilise.
>
> Got it. So, I will do the message insertion :)
I've done this in the mount helper program of nilfs2.
Regards,
Ryusuke
2009/3/10 Ryusuke Konishi <[email protected]>:
>
> # mount -t nilfs2 -r -o cp=33339 /dev/sdb1 /snap
> (mount the snapshot, then it will become accessible on the mount point)
>
Is it possible to mount a snapshot rw to make it a branch?
Hi!
On Fri, 13 Mar 2009 09:05:53 +0100, Alex Riesen wrote:
> 2009/3/10 Ryusuke Konishi <[email protected]>:
> >
> > # mount -t nilfs2 -r -o cp=33339 /dev/sdb1 /snap
> > (mount the snapshot, then it will become accessible on the mount=
point)
>
> Is it possible to mount a snapshot rw to make it a branch?
No, the writable snapshot is not supported.
( So, a readonly option appears in the example. )
Maybe the current design is unfit for efficient branching because
nilfs identifies checkpoints with linear numbers and manages lifetime
of each on-disk block with a range of the numbers.
We might be able to add a feature like replacing the latest filesystem
state with a past snapshot, but it's not yet determined.
Cheers,
Ryusuke