Here's the header file for the the proposed new ioctl interface for
dm. We've tried to change as little as possible to minimise code
changes in LVM2 and EVMS.
The changes address the following problems:
o) Alignment mistakes in the ioctl structures
o) Provide a way for the userland tools to back out of a series of
table updates without having to reload the orginal tables (remember
that an LV is often composed of a stack of 2 or more dm devices).
o) Pre-loading of a new table may now occur without suspending the
device. (Table loading can potentially allocate a lot of memory,
eg. mirror buffers).
o) Minimise the amount of time that a device is suspended. eg. if a
simple linear mapping is being extended, there is no longer a need to
call the suspend ioctl in between the table load and the resume.
Instead the resume (which swaps in the new table), will do the right
thing.
- Joe
On Thursday 05 June 2003 04:39, Joe Thornber wrote:
> Here's the header file for the the proposed new ioctl interface for
> dm. We've tried to change as little as possible to minimise code
> changes in LVM2 and EVMS.
>
> The changes address the following problems:
>
> o) Alignment mistakes in the ioctl structures
>
> o) Provide a way for the userland tools to back out of a series of
> table updates without having to reload the orginal tables (remember
> that an LV is often composed of a stack of 2 or more dm devices).
This should be very helpful for error recovery when setting up snapshots.
> o) Pre-loading of a new table may now occur without suspending the
> device. (Table loading can potentially allocate a lot of memory,
> eg. mirror buffers).
Cool.
> o) Minimise the amount of time that a device is suspended. eg. if a
> simple linear mapping is being extended, there is no longer a need to
> call the suspend ioctl in between the table load and the resume.
> Instead the resume (which swaps in the new table), will do the right
> thing.
Also very nice. This ought to elimate the possibility of deadlock due to
suspending the device from user-space. I really like this solution!
I'll bring up a couple of the points Alasdair and I were discussing on irc
yesterday:
1) Snapshots. Currently, the snapshot module, when it constructs a new table,
reads the header and existing exception tables from disk to determine the
initial state of the snapshot. With this new scheme, this setup really
shouldn't happen until the device is resumed (if it is done when the
"inactive" table is created, an existing "active" table could change the
on-disk information before the tables are switched). This kind of implies a
new entry-point into the target module will be required.
2) Removing suspended devices. The current code (2.5.70) does not allow a
suspended device to be removed/unlinked from the ioctl interface, since
removing it would leave you with no way to resume it (and hence flush any
pending I/Os). Alasdair mentioned a couple of new ideas. One would be to
reload the device with an error-map and force it to resume, thus erroring any
pending I/Os and allowing the device to be removed. This seems a bit
heavy-handed. Another idea was to allow removing suspended-devices which
aren't open. But that seems a bit racy. The current code doesn't seem to
coordinate removal of a device from the interface with opening of the device.
So you could potentially be opening the device at the same time you are
removing it from the interface. Personally, I think the current behavior is
fine. User-space can use the new list-devices command along with the
dev-status command to determine if a device should be resumed before trying
to remove it.
Any thoughts?
--
Kevin Corry
[email protected]
http://evms.sourceforge.net/
On Thursday 05 June 2003 18:47, Kevin Corry wrote:
> 2) Removing suspended devices. The current code (2.5.70) does not allow a
> suspended device to be removed/unlinked from the ioctl interface, since
> removing it would leave you with no way to resume it (and hence flush any
> pending I/Os). Alasdair mentioned a couple of new ideas. One would be to
> reload the device with an error-map and force it to resume, thus erroring
> any pending I/Os and allowing the device to be removed. This seems a bit
> heavy-handed.
Which is the heavy-handed part?
Regards,
Daniel
On Thursday 05 June 2003 12:00, Daniel Phillips wrote:
> On Thursday 05 June 2003 18:47, Kevin Corry wrote:
> > 2) Removing suspended devices. The current code (2.5.70) does not allow a
> > suspended device to be removed/unlinked from the ioctl interface, since
> > removing it would leave you with no way to resume it (and hence flush any
> > pending I/Os). Alasdair mentioned a couple of new ideas. One would be to
> > reload the device with an error-map and force it to resume, thus erroring
> > any pending I/Os and allowing the device to be removed. This seems a bit
> > heavy-handed.
>
> Which is the heavy-handed part?
The part about automatically reloading the table with an error map and forcing
it to resume. It just seemed to me that user-space ought to be able to gather
enough information to determine that a device needed to be resumed before it
could be removed. Thus the kernel driver wouldn't be forced to implement such
a policy.
Talking with Alasadair again, he mentioned a case I hadn't considered. Devices
would now be created without a mapping and initially suspended. If some other
error occurred, and you decided to just delete the device before loading a
mapping, it would fail. And having to resume a device with no mapping just
to be able to delete it definitely seems odd.
So, it's not like I'm dead-set against this idea. I was just curious what the
reasoning was behind this change.
--
Kevin Corry
[email protected]
http://evms.sourceforge.net/
On Thursday 05 June 2003 19:50, Kevin Corry wrote:
> On Thursday 05 June 2003 12:00, Daniel Phillips wrote:
> > On Thursday 05 June 2003 18:47, Kevin Corry wrote:
> > > 2) Removing suspended devices. The current code (2.5.70) does not allow
> > > a suspended device to be removed/unlinked from the ioctl interface,
> > > since removing it would leave you with no way to resume it (and hence
> > > flush any pending I/Os). Alasdair mentioned a couple of new ideas. One
> > > would be to reload the device with an error-map and force it to resume,
> > > thus erroring any pending I/Os and allowing the device to be removed.
> > > This seems a bit heavy-handed.
> >
> > Which is the heavy-handed part?
>
> The part about automatically reloading the table with an error map and
> forcing it to resume. It just seemed to me that user-space ought to be able
> to gather enough information to determine that a device needed to be
> resumed before it could be removed. Thus the kernel driver wouldn't be
> forced to implement such a policy.
I didn't see anything about doing that in-kernel.
> Talking with Alasadair again, he mentioned a case I hadn't considered.
> Devices would now be created without a mapping and initially suspended. If
> some other error occurred, and you decided to just delete the device before
> loading a mapping, it would fail. And having to resume a device with no
> mapping just to be able to delete it definitely seems odd.
>
> So, it's not like I'm dead-set against this idea. I was just curious what
> the reasoning was behind this change.
It's similar to the way a lot of things work in Linux: you have to let
operations run to completion so they can let go of resources. One day we'll
be able to shoot down transfers in mid-flight, but I doubt that's going to
happen in this cycle.
So in general, the idea is: let any outstanding operations complete, but feed
them errors. What else can we do?
I don't see this as heavyweight at all. Policy stays in user space, and a
lightweight error path lives in the kernel.
Regards,
Daniel
On Thu, Jun 05, 2003 at 11:47:10AM -0500, Kevin Corry wrote:
> 1) Snapshots. Currently, the snapshot module, when it constructs a new table,
> reads the header and existing exception tables from disk to determine the
> initial state of the snapshot. With this new scheme, this setup really
> shouldn't happen until the device is resumed (if it is done when the
> "inactive" table is created, an existing "active" table could change the
> on-disk information before the tables are switched). This kind of implies a
> new entry-point into the target module will be required.
See the suspend and resume target methods in my recent dev tree.
We'll have to delay the metadata reading for both the snapshots and
mirror to the first 'resume'.
> 2) Removing suspended devices. The current code (2.5.70) does not allow a
> suspended device to be removed/unlinked from the ioctl interface, since
> removing it would leave you with no way to resume it (and hence flush any
> pending I/Os).
I think removing a device that has deferred io against it should not
be possible, since it can only be in that state if the device is open.
We shouldn't start ripping devices out from under people.
The one place where we do want to do this is for the DM_REMOVE_ALL
ioctl cmd. Which is really an emergency panic button. I'll just
error any deferred io in this case.
- Joe
On Thursday 05 June 2003 14:41, Joe Thornber wrote:
> On Thu, Jun 05, 2003 at 11:47:10AM -0500, Kevin Corry wrote:
> > 1) Snapshots. Currently, the snapshot module, when it constructs a new
> > table, reads the header and existing exception tables from disk to
> > determine the initial state of the snapshot. With this new scheme, this
> > setup really shouldn't happen until the device is resumed (if it is done
> > when the "inactive" table is created, an existing "active" table could
> > change the on-disk information before the tables are switched). This kind
> > of implies a new entry-point into the target module will be required.
>
> See the suspend and resume target methods in my recent dev tree.
> We'll have to delay the metadata reading for both the snapshots and
> mirror to the first 'resume'.
Where is your dev tree located? I've checked your website on
people.sistina.com and the various Sistina CVS trees, but I can't really find
anything (regarding suspend and resume target methods) that's very recent. Do
you have another ftp server somewhere?
> > 2) Removing suspended devices. The current code (2.5.70) does not allow a
> > suspended device to be removed/unlinked from the ioctl interface, since
> > removing it would leave you with no way to resume it (and hence flush any
> > pending I/Os).
>
> I think removing a device that has deferred io against it should not
> be possible, since it can only be in that state if the device is open.
> We shouldn't start ripping devices out from under people.
Right.
So are you saying it would be alright to remove a suspended device that has no
pending I/O or isn't open? If so, the current code (in 2.5.70) doesn't seem
to coordinate the removal of such a device with another process trying to
open it or submit new I/O. Some new locking of the device would be necessary
to prevent a device which is being removed from being opened at the same
time.
Sorry if it sounds like I'm harping on this issue - I don't mean to. :) Just
interested in the details behind some of the proposed changes and some of the
affects the changes might have. It will probably be much easier to just wait
to see your new code, which will definitively answer these questions.
> The one place where we do want to do this is for the DM_REMOVE_ALL
> ioctl cmd. Which is really an emergency panic button. I'll just
> error any deferred io in this case.
Ok, this seems reasonable.
--
Kevin Corry
[email protected]
http://evms.sourceforge.net/
On Thu, Jun 05, 2003 at 10:39:43AM +0100, Joe Thornber wrote:
> Here's the header file for the the proposed new ioctl interface for
> dm. We've tried to change as little as possible to minimise code
> changes in LVM2 and EVMS.
Minor comment:
- please do not use uint_32t types in kernel header files. Use
the proper __u32 type which is guarenteed to be the proper
size across the user/kernel boundry.
thanks,
greg k-h
On Fri, Jun 06, 2003 at 10:17:00AM -0700, Greg KH wrote:
> Minor comment:
> - please do not use uint_32t types in kernel header files. Use
> the proper __u32 type which is guarenteed to be the proper
> size across the user/kernel boundry.
So is uint32_t:
linux/types.h: typedef __u32 uint32_t;
I think this comment is about style conformity i.e. most linux kernel
developers avoid these (not-so-)new and more-portable C99 types.
Alasdair
--
[email protected]
Refs. http://www.xml.com/ldd/chapter/book/ch10.html
http://www.unix-systems.org/whitepapers/64bit.html
On Friday 06 June 2003 19:17, Greg KH wrote:
> On Thu, Jun 05, 2003 at 10:39:43AM +0100, Joe Thornber wrote:
> > Here's the header file for the the proposed new ioctl interface for
> > dm. We've tried to change as little as possible to minimise code
> > changes in LVM2 and EVMS.
>
> Minor comment:
> - please do not use uint_32t types in kernel header files. Use
> the proper __u32 type which is guarenteed to be the proper
> size across the user/kernel boundry.
We even have a flavor without the __'s:
http://lxr.linux.no/source/include/asm-i386/types.h?v=2.5.56#L47
A little easier on the eyes, imho.
Regards,
Daniel
On Mon, Jun 09, 2003 at 10:03:50PM +0200, Daniel Phillips wrote:
> On Friday 06 June 2003 19:17, Greg KH wrote:
> > On Thu, Jun 05, 2003 at 10:39:43AM +0100, Joe Thornber wrote:
> > > Here's the header file for the the proposed new ioctl interface for
> > > dm. We've tried to change as little as possible to minimise code
> > > changes in LVM2 and EVMS.
> >
> > Minor comment:
> > - please do not use uint_32t types in kernel header files. Use
> > the proper __u32 type which is guarenteed to be the proper
> > size across the user/kernel boundry.
>
> We even have a flavor without the __'s:
>
> http://lxr.linux.no/source/include/asm-i386/types.h?v=2.5.56#L47
>
> A little easier on the eyes, imho.
But they are not the same.
- "__" means this variable will be the same size across the
kernel/userspace boundry.
- without the "__" means this variable will only be this size within
the kernel.
Use "__" for variables being seen by userspace. Otherwise use the
types without "__".
Hope this helps,
greg k-h
On Monday 09 June 2003 22:39, Greg KH wrote:
> On Mon, Jun 09, 2003 at 10:03:50PM +0200, Daniel Phillips wrote:
> > On Friday 06 June 2003 19:17, Greg KH wrote:
> > > On Thu, Jun 05, 2003 at 10:39:43AM +0100, Joe Thornber wrote:
> > > > Here's the header file for the the proposed new ioctl interface for
> > > > dm. We've tried to change as little as possible to minimise code
> > > > changes in LVM2 and EVMS.
> > >
> > > Minor comment:
> > > - please do not use uint_32t types in kernel header files. Use
> > > the proper __u32 type which is guarenteed to be the proper
> > > size across the user/kernel boundry.
> >
> > We even have a flavor without the __'s:
> >
> > http://lxr.linux.no/source/include/asm-i386/types.h?v=2.5.56#L47
> >
> > A little easier on the eyes, imho.
>
> But they are not the same.
> - "__" means this variable will be the same size across the
> kernel/userspace boundry.
> - without the "__" means this variable will only be this size within
> the kernel.
>
> Use "__" for variables being seen by userspace. Otherwise use the
> types without "__".
Indeed, and (to restate the obvious) ioctl interfaces will need the __ form.
On the other hand, a quick tour through the current tree shows the __ form
being used in a lot more places than it's strictly needed. Although there's
no entry in the style guide for this, perhaps we should consider one.
Regards,
Daniel
On Friday 06 June 2003 19:17, Greg KH wrote:
> On Thu, Jun 05, 2003 at 10:39:43AM +0100, Joe Thornber wrote:
> > Here's the header file for the the proposed new ioctl interface for
> > dm. We've tried to change as little as possible to minimise code
> > changes in LVM2 and EVMS.
>
> Minor comment:
> - please do not use uint_32t types in kernel header files. Use
> the proper __u32 type which is guarenteed to be the proper
> size across the user/kernel boundry.
I'm not sure what we were both smoking, but obviously all flavors of u32
should be the same size regardless of where they are located. As I
understand it, the only interesting difference between __u32 and u32 is that
the former is standard C while the latter is Linux's (sensible) local
dialect.
Regards,
Daniel