2010-11-05 23:09:35

by Yehuda Sadeh Weinraub

[permalink] [raw]
Subject: [RFC] rbd sysfs interface

The rbd module that was recently merged into the linux kernel on the
2.6.37-rc1 merge window is based on the osdblk device driver. Other
than being somewhat confusing (as in ceph we use the term osd too) it
had been a valuable resource and jump started the development effort.
One thing that we inherited from osdblk is the sysfs class interface
that was used there. Generally, there is a single flat class control
directory that allows adding, removing and listing of devices. For rbd
we expanded this interface to include all snapshots operations too.
This, however, might not be completely suitable to rbd. First, there
might be many different devices, so that having a single control for
all is cumbersome. The problem is exacerbated with rbd snapshots, as
there can be many snapshots to a single device, so using a single
control for all devices and snapshots doesn't scale.

Another point to consider is the integration with udev, so that
devices can be created and mapped automaticallty to /dev.
We'd like to replace the current rbd class interface, and if possible
to make it asap, so that the current interfaces won't be set in stone
once 2.6.37 is out. At this point we wanted to do something like the
following:

Under /sys/class/rbd there'd we'll keep the 'add' entry that adds rbd devices:

# echo "10.0.0.1 name=admin rbd myimage" > /sys/class/rbd/add

The devices that'll be created still be enumerated, and there'll be a
subdirectory under rbd/ for each (actually a soft link to
/sys/devices/virtual/rbd/<id>). For each device we'll have multiple
read-only properties (name, pool, size, client_id, major, cur_snap,
snapshots) and a few control entries that'll allow controlling it
(e.g., remove, refresh, snap_create, snap_rollback).

We're not sure whether the available snapshots go under the device
(e.g., rbd/<id>/snaps/...) or just keep it on a single 'snapshots'
entry.

Does this seem sane? Any comments would be greatly appreciated.

Yehuda


2010-11-06 05:07:06

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Fri, Nov 05, 2010 at 04:09:31PM -0700, Yehuda Sadeh Weinraub wrote:
> The rbd module that was recently merged into the linux kernel on the
> 2.6.37-rc1 merge window is based on the osdblk device driver. Other
> than being somewhat confusing (as in ceph we use the term osd too) it
> had been a valuable resource and jump started the development effort.
> One thing that we inherited from osdblk is the sysfs class interface
> that was used there. Generally, there is a single flat class control
> directory that allows adding, removing and listing of devices. For rbd
> we expanded this interface to include all snapshots operations too.
> This, however, might not be completely suitable to rbd. First, there
> might be many different devices, so that having a single control for
> all is cumbersome. The problem is exacerbated with rbd snapshots, as
> there can be many snapshots to a single device, so using a single
> control for all devices and snapshots doesn't scale.
>
> Another point to consider is the integration with udev, so that
> devices can be created and mapped automaticallty to /dev.
> We'd like to replace the current rbd class interface, and if possible
> to make it asap, so that the current interfaces won't be set in stone
> once 2.6.37 is out. At this point we wanted to do something like the
> following:
>
> Under /sys/class/rbd there'd we'll keep the 'add' entry that adds rbd devices:
>
> # echo "10.0.0.1 name=admin rbd myimage" > /sys/class/rbd/add
>
> The devices that'll be created still be enumerated, and there'll be a
> subdirectory under rbd/ for each (actually a soft link to
> /sys/devices/virtual/rbd/<id>). For each device we'll have multiple
> read-only properties (name, pool, size, client_id, major, cur_snap,
> snapshots) and a few control entries that'll allow controlling it
> (e.g., remove, refresh, snap_create, snap_rollback).
>
> We're not sure whether the available snapshots go under the device
> (e.g., rbd/<id>/snaps/...) or just keep it on a single 'snapshots'
> entry.
>
> Does this seem sane? Any comments would be greatly appreciated.

It sounds like you need to use configfs instead of sysfs, as your model
was the reason it was created.

Have you tried that?

thanks,

greg k-h

2010-11-06 05:51:30

by Yehuda Sadeh Weinraub

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Fri, Nov 5, 2010 at 10:07 PM, Greg KH <[email protected]> wrote:
> On Fri, Nov 05, 2010 at 04:09:31PM -0700, Yehuda Sadeh Weinraub wrote:
>>
>> Does this seem sane? Any comments would be greatly appreciated.
>
> It sounds like you need to use configfs instead of sysfs, as your model
> was the reason it was created.
>
> Have you tried that?

Oh, will look at it now. With ceph (although for a different purpose)
we went through proc -> sysfs -> debugfs, however, it seems that we've
missed at least one userspace-kernel channel.

Thanks,
Yehuda

2010-11-10 19:21:53

by Yehuda Sadeh Weinraub

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Fri, Nov 5, 2010 at 10:51 PM, Yehuda Sadeh Weinraub
<[email protected]> wrote:
> On Fri, Nov 5, 2010 at 10:07 PM, Greg KH <[email protected]> wrote:
>> On Fri, Nov 05, 2010 at 04:09:31PM -0700, Yehuda Sadeh Weinraub wrote:
>>>
>>> Does this seem sane? Any comments would be greatly appreciated.
>>
>> It sounds like you need to use configfs instead of sysfs, as your model
>> was the reason it was created.
>>
>> Have you tried that?
>
> Oh, will look at it now. With ceph (although for a different purpose)
> we went through proc -> sysfs -> debugfs, however, it seems that we've
> missed at least one userspace-kernel channel.
>

Well, we looked a bit at what configfs does, and from what we see it
doesn't really fit our needs. Configfs would be more suitable to
configuring a static system than to control a dynamic one. The main
problem is that items creation is only driven by userspace. That would
be ok if we had a static mapping of the images and snapshots, however,
we don't. We need the system to reflect any state change with the
running configuration (e.g., a new snapshot was created by a different
client), and it doesn't seem possible with configfs as long as items
creation is only driven by userspace operations. We need a system that
would be able to reflect changes that happened due to some external
operation, and this doesn't seem to be the case here.

There is second issue and that's committable items are not implemented
there yet. So the interface itself would be a bit weird. E.g., had
committable items been implemented we would have done something like
the following:

/config/rbd# mkdir pending/myimage
/config/rbd# echo foo > pending/myimage/name
/config/rbd# cat ~/mykey > pending/myimge/key
/config/rbd# echo 10.0.0.1 > pending/myimage/addr
...
/config/rbd# mv pending/myimage live/

and that would do what we need in terms of initial configuration.
However, as this is not really implemented yet, there is no
distinction between images that are pending and images that are live,
so configuration would look something like:
/config/rbd# mkdir myimage
/config/rbd# echo foo > myimage/name
/config/rbd# cat ~/mykey > myimge/key
/config/rbd# echo 10.0.0.1 > myimage/addr
...
/config/rbd# echo 1 > myimage/go

And having that, the myimage/ directory will still hold all those
config options that are moot after the image went live. It doesn't
seem to offer a significant improvement over the current sysfs one
liner configuration and with sysfs we can have it reflect any dynamic
change that occurred within the system. So we tend to opt for an
improved sysfs solution, similar to the one I described before.

Any thoughts? Am I completely off the tracks?

Thanks,
Yehuda

2010-11-11 01:07:55

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Wed, Nov 10, 2010 at 11:21:49AM -0800, Yehuda Sadeh Weinraub wrote:
> On Fri, Nov 5, 2010 at 10:51 PM, Yehuda Sadeh Weinraub
> <[email protected]> wrote:
> > On Fri, Nov 5, 2010 at 10:07 PM, Greg KH <[email protected]> wrote:
> >> On Fri, Nov 05, 2010 at 04:09:31PM -0700, Yehuda Sadeh Weinraub wrote:
> >>>
> >>> Does this seem sane? Any comments would be greatly appreciated.
> >>
> >> It sounds like you need to use configfs instead of sysfs, as your model
> >> was the reason it was created.
> >>
> >> Have you tried that?
> >
> > Oh, will look at it now. With ceph (although for a different purpose)
> > we went through proc -> sysfs -> debugfs, however, it seems that we've
> > missed at least one userspace-kernel channel.
> >
>
> Well, we looked a bit at what configfs does, and from what we see it
> doesn't really fit our needs. Configfs would be more suitable to
> configuring a static system than to control a dynamic one. The main
> problem is that items creation is only driven by userspace. That would
> be ok if we had a static mapping of the images and snapshots, however,
> we don't. We need the system to reflect any state change with the
> running configuration (e.g., a new snapshot was created by a different
> client), and it doesn't seem possible with configfs as long as items
> creation is only driven by userspace operations. We need a system that
> would be able to reflect changes that happened due to some external
> operation, and this doesn't seem to be the case here.
>
> There is second issue and that's committable items are not implemented
> there yet. So the interface itself would be a bit weird. E.g., had
> committable items been implemented we would have done something like
> the following:
>
> /config/rbd# mkdir pending/myimage
> /config/rbd# echo foo > pending/myimage/name
> /config/rbd# cat ~/mykey > pending/myimge/key
> /config/rbd# echo 10.0.0.1 > pending/myimage/addr
> ...
> /config/rbd# mv pending/myimage live/
>
> and that would do what we need in terms of initial configuration.
> However, as this is not really implemented yet, there is no
> distinction between images that are pending and images that are live,
> so configuration would look something like:
> /config/rbd# mkdir myimage
> /config/rbd# echo foo > myimage/name
> /config/rbd# cat ~/mykey > myimge/key
> /config/rbd# echo 10.0.0.1 > myimage/addr
> ...
> /config/rbd# echo 1 > myimage/go
>
> And having that, the myimage/ directory will still hold all those
> config options that are moot after the image went live. It doesn't
> seem to offer a significant improvement over the current sysfs one
> liner configuration and with sysfs we can have it reflect any dynamic
> change that occurred within the system. So we tend to opt for an
> improved sysfs solution, similar to the one I described before.

Ok, that makes sense as to why configfs would not work (I really wish
someone would add the commit stuff to configfs, as you aren't the first
ones to want that.)

So, back to sysfs. But I can't recall what your sysfs interface looked
like, do you have Documentation/ABI/ files that show what it does? If
not, you are required to, so you might as well write them now :)

thanks,

greg k-h

2010-11-11 05:16:51

by Yehuda Sadeh Weinraub

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Wed, Nov 10, 2010 at 5:08 PM, Greg KH <[email protected]> wrote:
> On Wed, Nov 10, 2010 at 11:21:49AM -0800, Yehuda Sadeh Weinraub wrote:
>> On Fri, Nov 5, 2010 at 10:51 PM, Yehuda Sadeh Weinraub
>> <[email protected]> wrote:
>> > On Fri, Nov 5, 2010 at 10:07 PM, Greg KH <[email protected]> wrote:
>> >> On Fri, Nov 05, 2010 at 04:09:31PM -0700, Yehuda Sadeh Weinraub wrote:
>> >>>
>> >>> Does this seem sane? Any comments would be greatly appreciated.
>> >>
>> >> It sounds like you need to use configfs instead of sysfs, as your model
>> >> was the reason it was created.
>> >>
>> >> Have you tried that?
>> >
>> > Oh, will look at it now. With ceph (although for a different purpose)
>> > we went through proc -> sysfs -> debugfs, however, it seems that we've
>> > missed at least one userspace-kernel channel.
>> >
>>
>> Well, we looked a bit at what configfs does, and from what we see it
>> doesn't really fit our needs. Configfs would be more suitable to
>> configuring a static system than to control a dynamic one. The main
>> problem is that items creation is only driven by userspace. That would
>> be ok if we had a static mapping of the images and snapshots, however,
>> we don't. We need the system to reflect any state change with the
>> running configuration (e.g., a new snapshot was created by a different
>> client), and it doesn't seem possible with configfs as long as items
>> creation is only driven by userspace operations. We need a system that
>> would be able to reflect changes that happened due to some external
>> operation, and this doesn't seem to be the case here.
>>
>> There is second issue and that's committable items are not implemented
>> there yet. So the interface itself would be a bit weird. E.g., had
>> committable items been implemented we would have done something like
>> the following:
>>
>> ?/config/rbd# mkdir pending/myimage
>> ?/config/rbd# echo foo > pending/myimage/name
>> ?/config/rbd# cat ~/mykey > pending/myimge/key
>> ?/config/rbd# echo 10.0.0.1 > pending/myimage/addr
>> ...
>> ?/config/rbd# mv pending/myimage live/
>>
>> and that would do what we need in terms of initial configuration.
>> However, as this is not really implemented yet, there is no
>> distinction between images that are pending and images that are live,
>> so configuration would look something like:
>> ?/config/rbd# mkdir myimage
>> ?/config/rbd# echo foo > myimage/name
>> ?/config/rbd# cat ~/mykey > myimge/key
>> ?/config/rbd# echo 10.0.0.1 > myimage/addr
>> ...
>> ?/config/rbd# echo 1 > myimage/go
>>
>> And having that, the myimage/ directory will still hold all those
>> config options that are moot after the image went live. It doesn't
>> seem to offer a significant improvement over the current sysfs one
>> liner configuration and with sysfs we can have it reflect any dynamic
>> change that occurred within the system. So we tend to opt for an
>> improved sysfs solution, similar to the one I described before.
>
> Ok, that makes sense as to why configfs would not work (I really wish
> someone would add the commit stuff to configfs, as you aren't the first
> ones to want that.)
>
> So, back to sysfs. ?But I can't recall what your sysfs interface looked
> like, do you have Documentation/ABI/ files that show what it does? ?If
> not, you are required to, so you might as well write them now :)
>

The original sysfs interface is described in the rbd.c prefix
comments, which we can copy to Documentation/ABI without much pain.
However, we were just thinking of modifying it a bit, as described
previously in my first email. The hierarchy will look like this:

rbd/
add
remove
<id>/
name
pool
size
..
snap_add
snap_remove
snap_rollback
<snap_name>/
size

The 'add' entry will be used to add a device (as before):

# echo "10.0.0.1 name=admin rbd myimage" > /sys/class/rbd/add

The devices that'll be created still be enumerated, and there'll be a
subdirectory under rbd/ for each (actually a soft link to
/sys/devices/virtual/rbd/<id>). For each device we'll have multiple
read-only properties (name, pool, size, client_id, major, cur_snap)
and a few control entries (e.g., snap_add, snap_remove, etc.)

There will be a subdirectory per snapshot under each device, and all
the snapshots properties will be kept there.

Thanks,
Yehuda

2010-11-12 17:49:50

by Yehuda Sadeh Weinraub

[permalink] [raw]
Subject: Re: [RFC] rbd sysfs interface

On Wed, Nov 10, 2010 at 9:16 PM, Yehuda Sadeh Weinraub
<[email protected]> wrote:
> On Wed, Nov 10, 2010 at 5:08 PM, Greg KH <[email protected]> wrote:
>>
>> So, back to sysfs. ?But I can't recall what your sysfs interface looked
>> like, do you have Documentation/ABI/ files that show what it does? ?If
>> not, you are required to, so you might as well write them now :)
>>
>
> The original sysfs interface is described in the rbd.c prefix
> comments, which we can copy to Documentation/ABI without much pain.
> However, we were just thinking of modifying it a bit, as described
> previously in my first email. The hierarchy will look like this:
>
> rbd/
> ? ?add
> ? ?remove
> ? ?<id>/
> ? ? ?name
> ? ? ?pool
> ? ? ?size
> ? ? ?..
> ? ? ?snap_add
> ? ? ?snap_remove
> ? ? ?snap_rollback
> ? ? ? ?<snap_name>/
> ? ? ? ? ?size
>
> The 'add' entry will be used to add a device (as before):
>
> ?# echo "10.0.0.1 name=admin rbd myimage" > /sys/class/rbd/add
>
> The devices that'll be created still be enumerated, and there'll be a
> subdirectory under rbd/ for each (actually a soft link to
> /sys/devices/virtual/rbd/<id>). For each device we'll have multiple
> read-only properties (name, pool, size, client_id, major, cur_snap)
> and a few control entries (e.g., snap_add, snap_remove, etc.)
>
> There will be a subdirectory per snapshot under each device, and all
> the snapshots properties will be kept there.
>

Unless I hear otherwise, I'm going to assume that this proposed
interface is an improvement over the existing osdblk-based one and get
this upstream asap..

Thanks,
Yehuda