LinuxLists.cc - Re: [PATCH] s390: Hypervisor File System

2006-05-01 20:39:58

Subject: Re: [PATCH] s390: Hypervisor File System

On Sun, Apr 30, 2006 at 01:18:46AM -0400, Kyle Moffett wrote:
> On Apr 29, 2006, at 17:55:01, Greg KH wrote:
> >relayfs is for that. You can now put relayfs files in any ram
> >based file system (procfs, ramfs, sysfs, debugfs, etc.)
>
> But you can't twiddle relayfs with echo and cat; it's more suited to
> high-bandwidth transfers than anything else, no?

Yes.

2006-05-01 23:29:56

by Kyle Moffett

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On May 1, 2006, at 16:38:15, Greg KH wrote:
> On Sun, Apr 30, 2006 at 01:18:46AM -0400, Kyle Moffett wrote:
>> On Apr 29, 2006, at 17:55:01, Greg KH wrote:
>>> relayfs is for that. You can now put relayfs files in any ram
>>> based file system (procfs, ramfs, sysfs, debugfs, etc.)
>>
>> But you can't twiddle relayfs with echo and cat; it's more suited
>> to high-bandwidth transfers than anything else, no?
>
> Yes.

So my question stands: What is the _recommended_ way to handle
simple data types in low-bandwidth/frequency multiple-valued
transactions to hardware? Examples include reading/modifying
framebuffer settings (currently done through IOCTLS), s390 current
state (up for discussion), etc. In these cases there needs to be an
atomic snapshot or write of multiple values at the same time. Given
the situation it would be _nice_ to use sysfs so the admin can do it
by hand; makes things shell scriptable and reduces the number of
binary compatibility issues.

Cheers,
Kyle Moffett

2006-05-02 04:02:38

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Mon, May 01, 2006 at 07:29:23PM -0400, Kyle Moffett wrote:
> On May 1, 2006, at 16:38:15, Greg KH wrote:
> >On Sun, Apr 30, 2006 at 01:18:46AM -0400, Kyle Moffett wrote:
> >>On Apr 29, 2006, at 17:55:01, Greg KH wrote:
> >>>relayfs is for that. You can now put relayfs files in any ram
> >>>based file system (procfs, ramfs, sysfs, debugfs, etc.)
> >>
> >>But you can't twiddle relayfs with echo and cat; it's more suited
> >>to high-bandwidth transfers than anything else, no?
> >
> >Yes.
>
> So my question stands: What is the _recommended_ way to handle
> simple data types in low-bandwidth/frequency multiple-valued
> transactions to hardware? Examples include reading/modifying
> framebuffer settings (currently done through IOCTLS), s390 current
> state (up for discussion), etc. In these cases there needs to be an
> atomic snapshot or write of multiple values at the same time. Given
> the situation it would be _nice_ to use sysfs so the admin can do it
> by hand; makes things shell scriptable and reduces the number of
> binary compatibility issues.

I really don't know of a way to use sysfs for this currently, and hence,
am not complaining too much about the different /proc files that have
this kind of information in it at the moment.

If you or someone else wants to come up with some kind of solution for
it, I'm sure that many people would be very happy to see it.

thanks,

greg k-h

2006-05-02 05:23:45

by Kay Sievers

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Mon, May 01, 2006 at 09:00:53PM -0700, Greg KH wrote:
> On Mon, May 01, 2006 at 07:29:23PM -0400, Kyle Moffett wrote:
> > On May 1, 2006, at 16:38:15, Greg KH wrote:
> > >On Sun, Apr 30, 2006 at 01:18:46AM -0400, Kyle Moffett wrote:
> > >>On Apr 29, 2006, at 17:55:01, Greg KH wrote:
> > >>>relayfs is for that. You can now put relayfs files in any ram
> > >>>based file system (procfs, ramfs, sysfs, debugfs, etc.)
> > >>
> > >>But you can't twiddle relayfs with echo and cat; it's more suited
> > >>to high-bandwidth transfers than anything else, no?
> > >
> > >Yes.
> >
> > So my question stands: What is the _recommended_ way to handle
> > simple data types in low-bandwidth/frequency multiple-valued
> > transactions to hardware? Examples include reading/modifying
> > framebuffer settings (currently done through IOCTLS), s390 current
> > state (up for discussion), etc. In these cases there needs to be an
> > atomic snapshot or write of multiple values at the same time. Given
> > the situation it would be _nice_ to use sysfs so the admin can do it
> > by hand; makes things shell scriptable and reduces the number of
> > binary compatibility issues.
>
> I really don't know of a way to use sysfs for this currently, and hence,
> am not complaining too much about the different /proc files that have
> this kind of information in it at the moment.
>
> If you or someone else wants to come up with some kind of solution for
> it, I'm sure that many people would be very happy to see it.

If the count of values handled in a transaction is not to high and it
makes sense to group these values logically, why not just create an
attribute group for every transaction, which creates dummy attributes
to fill the values in, and use an "action" file in that group, that
commits all the values at once to whatever target? That should fit into
the ioctl use pattern, right?

Kay

2006-05-02 05:38:46

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 07:23:41AM +0200, Kay Sievers wrote:
> If the count of values handled in a transaction is not to high and it
> makes sense to group these values logically, why not just create an
> attribute group for every transaction, which creates dummy attributes
> to fill the values in, and use an "action" file in that group, that
> commits all the values at once to whatever target? That should fit into
> the ioctl use pattern, right?

That's what configfs can handle easier. I think the issue is getting
stuff from the kernel in one atomic snapshot (all the different file
values from the same point in time.)

thanks,

greg k-h

2006-05-02 08:49:05

by Kyle Moffett

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On May 2, 2006, at 00:00:53, Greg KH wrote:
> On Mon, May 01, 2006 at 07:29:23PM -0400, Kyle Moffett wrote:
>> So my question stands: What is the _recommended_ way to handle
>> simple data types in low-bandwidth/frequency multiple-valued
>> transactions to hardware? Examples include reading/modifying
>> framebuffer settings (currently done through IOCTLS), s390 current
>> state (up for discussion), etc. In these cases there needs to be
>> an atomic snapshot or write of multiple values at the same time.
>> Given the situation it would be _nice_ to use sysfs so the admin
>> can do it by hand; makes things shell scriptable and reduces the
>> number of binary compatibility issues.
>
> I really don't know of a way to use sysfs for this currently, and
> hence, am not complaining too much about the different /proc files
> that have this kind of information in it at the moment.
>
> If you or someone else wants to come up with some kind of solution
> for it, I'm sure that many people would be very happy to see it.

Hmm, ok; I'll see what I can come up with. Would anybody object to
this kind of API (as in my previous email) that uses an open fd as a
transaction "handle"?

Example script:
> ## Associate this process with an atomic snapshot
> ## of the /sys/hypervisor/s390 filesystem tree.
> exec 3>/sys/hypervisor/s390/transaction
>
> ## Read data from /sys/hypervisor/s390 without
> ## worrying about atomicity; as that's guaranteed
> ## by the open FD 3.
> ls /sys/hypervisor/s390/cpus
> cat /sys/hypervisor/s390/some_data_file
>
> ## Create another reference in this process to the
> ## _same_ atomic snapshot
> exec 4>&3
>
> ## Does *not* close out the atomic snapshot
> exec 3>&-
>
> ## Yet another ref; still the _same_ snapshot
> exec 6>/sys/hypervisor/s390/transaction
> exec 4>&-
>
> ## Regardless of what has changed in the meantime,
> ## our filesystem tree still looks the same
> ls /sys/hypervisor/s390/cpus
>
> ## Write out values
> echo some_state >/sys/hypervisor/s390/statefile
>
> ## Decide we don't like the changes and abort
> echo reset >&6
>
> ## Release the last copy of the snapshot and
> ## commit modified values
> exec 6>&-

This would allow usages like the following:
> exec 3>/sys/hypervisor/s390/transaction
> /bin/s390_change_hypervisor_state
> ## Look at new state; decide if we like it or not
> if [ -z "$I_LIKE_THE_STATE" ]; then
> echo reset >&3
> fi
> exec 3>&-

For actually implementing this; I'm considering a design which hangs
a transaction off of a "struct file" such that fork() and clone()
preserve the same transaction. When a new process obtains an FD with
the given transaction it would add that process' current pointer to a
hash-table referencing the transaction data structure so that the open
() call could look up the transaction for a given task in the hash
table and use the data specified in the transaction. When a
transaction is opened it would read the data atomically from the
hardware or in-kernel data structures and store an "initial" copy as
well as a "current" copy in per-transaction memory. As a user could
theoretically pin NPROC * size_of_transaction_data * 2 of kernel
memory, transaction files should have fairly strict file modes or
some sort of resource-accounting semantic. On a "reset" operation
the "initial" copy would be used to overwrite the "current" copy
again, and a changed bit would be unset. Changes would result in the
changed bit being set. When the transaction is closed, if the
changed bit is set then the data would be committed atomically, then
all the memory would be freed and the transaction removed from the
hash table.

Anything that sounds broken/fishy/"No that's impossible because..."
in there? I appreciate your input; if this sounds feasable I'll try
to hack up a patch.

Cheers,
Kyle Moffett

2006-05-02 11:46:08

by Kay Sievers

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Mon, May 01, 2006 at 10:37:03PM -0700, Greg KH wrote:
> On Tue, May 02, 2006 at 07:23:41AM +0200, Kay Sievers wrote:
> > If the count of values handled in a transaction is not to high and it
> > makes sense to group these values logically, why not just create an
> > attribute group for every transaction, which creates dummy attributes
> > to fill the values in, and use an "action" file in that group, that
> > commits all the values at once to whatever target? That should fit into
> > the ioctl use pattern, right?
>
> That's what configfs can handle easier. I think the issue is getting
> stuff from the kernel in one atomic snapshot (all the different file
> values from the same point in time.)

Sure, but just like an ioctl, the kernel could return the values after
writing to the "action" file in the dummy attributes. That would be
something like a snapshot, right?

Kay

2006-05-02 21:30:35

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 01:46:03PM +0200, Kay Sievers wrote:
> On Mon, May 01, 2006 at 10:37:03PM -0700, Greg KH wrote:
> > On Tue, May 02, 2006 at 07:23:41AM +0200, Kay Sievers wrote:
> > > If the count of values handled in a transaction is not to high and it
> > > makes sense to group these values logically, why not just create an
> > > attribute group for every transaction, which creates dummy attributes
> > > to fill the values in, and use an "action" file in that group, that
> > > commits all the values at once to whatever target? That should fit into
> > > the ioctl use pattern, right?
> >
> > That's what configfs can handle easier. I think the issue is getting
> > stuff from the kernel in one atomic snapshot (all the different file
> > values from the same point in time.)
>
> Sure, but just like an ioctl, the kernel could return the values after
> writing to the "action" file in the dummy attributes. That would be
> something like a snapshot, right?

Yes, but where would the buffer be to return the data to on a write? In
the data that the user passed to write?

thanks,

greg k-h

2006-05-02 21:32:21

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 04:48:42AM -0400, Kyle Moffett wrote:
> On May 2, 2006, at 00:00:53, Greg KH wrote:
> >On Mon, May 01, 2006 at 07:29:23PM -0400, Kyle Moffett wrote:
> >>So my question stands: What is the _recommended_ way to handle
> >>simple data types in low-bandwidth/frequency multiple-valued
> >>transactions to hardware? Examples include reading/modifying
> >>framebuffer settings (currently done through IOCTLS), s390 current
> >>state (up for discussion), etc. In these cases there needs to be
> >>an atomic snapshot or write of multiple values at the same time.
> >>Given the situation it would be _nice_ to use sysfs so the admin
> >>can do it by hand; makes things shell scriptable and reduces the
> >>number of binary compatibility issues.
> >
> >I really don't know of a way to use sysfs for this currently, and
> >hence, am not complaining too much about the different /proc files
> >that have this kind of information in it at the moment.
> >
> >If you or someone else wants to come up with some kind of solution
> >for it, I'm sure that many people would be very happy to see it.
>
> Hmm, ok; I'll see what I can come up with. Would anybody object to
> this kind of API (as in my previous email) that uses an open fd as a
> transaction "handle"?

No, I think Kay played around with something like using the open fd of
the directory as such a lock (or was he using flock on it, I can't
remember now...)

> Example script:
> >## Associate this process with an atomic snapshot
> >## of the /sys/hypervisor/s390 filesystem tree.
> >exec 3>/sys/hypervisor/s390/transaction
> >
> >## Read data from /sys/hypervisor/s390 without
> >## worrying about atomicity; as that's guaranteed
> >## by the open FD 3.
> >ls /sys/hypervisor/s390/cpus
> >cat /sys/hypervisor/s390/some_data_file
> >
> >## Create another reference in this process to the
> >## _same_ atomic snapshot
> >exec 4>&3
> >
> >## Does *not* close out the atomic snapshot
> >exec 3>&-
> >
> >## Yet another ref; still the _same_ snapshot
> >exec 6>/sys/hypervisor/s390/transaction
> >exec 4>&-
> >
> >## Regardless of what has changed in the meantime,
> >## our filesystem tree still looks the same
> >ls /sys/hypervisor/s390/cpus
> >
> >## Write out values
> >echo some_state >/sys/hypervisor/s390/statefile
> >
> >## Decide we don't like the changes and abort
> >echo reset >&6
> >
> >## Release the last copy of the snapshot and
> >## commit modified values
> >exec 6>&-
>
>
> This would allow usages like the following:
> >exec 3>/sys/hypervisor/s390/transaction
> >/bin/s390_change_hypervisor_state
> >## Look at new state; decide if we like it or not
> >if [ -z "$I_LIKE_THE_STATE" ]; then
> > echo reset >&3
> >fi
> >exec 3>&-
>
>
> For actually implementing this; I'm considering a design which hangs
> a transaction off of a "struct file" such that fork() and clone()
> preserve the same transaction. When a new process obtains an FD with
> the given transaction it would add that process' current pointer to a
> hash-table referencing the transaction data structure so that the open
> () call could look up the transaction for a given task in the hash
> table and use the data specified in the transaction. When a
> transaction is opened it would read the data atomically from the
> hardware or in-kernel data structures and store an "initial" copy as
> well as a "current" copy in per-transaction memory. As a user could
> theoretically pin NPROC * size_of_transaction_data * 2 of kernel
> memory, transaction files should have fairly strict file modes or
> some sort of resource-accounting semantic. On a "reset" operation
> the "initial" copy would be used to overwrite the "current" copy
> again, and a changed bit would be unset. Changes would result in the
> changed bit being set. When the transaction is closed, if the
> changed bit is set then the data would be committed atomically, then
> all the memory would be freed and the transaction removed from the
> hash table.
>
> Anything that sounds broken/fishy/"No that's impossible because..."
> in there? I appreciate your input; if this sounds feasable I'll try
> to hack up a patch.

Sounds a bit complex. Try looking at flock and see if you can pass that
info back to the sysfs attribute owners.

thanks,

greg k-h

2006-05-02 21:33:56

by Kay Sievers

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 02:28:45PM -0700, Greg KH wrote:
> On Tue, May 02, 2006 at 01:46:03PM +0200, Kay Sievers wrote:
> > On Mon, May 01, 2006 at 10:37:03PM -0700, Greg KH wrote:
> > > On Tue, May 02, 2006 at 07:23:41AM +0200, Kay Sievers wrote:
> > > > If the count of values handled in a transaction is not to high and it
> > > > makes sense to group these values logically, why not just create an
> > > > attribute group for every transaction, which creates dummy attributes
> > > > to fill the values in, and use an "action" file in that group, that
> > > > commits all the values at once to whatever target? That should fit into
> > > > the ioctl use pattern, right?
> > >
> > > That's what configfs can handle easier. I think the issue is getting
> > > stuff from the kernel in one atomic snapshot (all the different file
> > > values from the same point in time.)
> >
> > Sure, but just like an ioctl, the kernel could return the values after
> > writing to the "action" file in the dummy attributes. That would be
> > something like a snapshot, right?
>
> Yes, but where would the buffer be to return the data to on a write? In
> the data that the user passed to write?

In the "dummy attribute", allocated by the device instance.

Kay

2006-05-02 21:49:13

by Kay Sievers

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 02:30:43PM -0700, Greg KH wrote:
> On Tue, May 02, 2006 at 04:48:42AM -0400, Kyle Moffett wrote:
> > On May 2, 2006, at 00:00:53, Greg KH wrote:
> > >On Mon, May 01, 2006 at 07:29:23PM -0400, Kyle Moffett wrote:
> > >>So my question stands: What is the _recommended_ way to handle
> > >>simple data types in low-bandwidth/frequency multiple-valued
> > >>transactions to hardware? Examples include reading/modifying
> > >>framebuffer settings (currently done through IOCTLS), s390 current
> > >>state (up for discussion), etc. In these cases there needs to be
> > >>an atomic snapshot or write of multiple values at the same time.
> > >>Given the situation it would be _nice_ to use sysfs so the admin
> > >>can do it by hand; makes things shell scriptable and reduces the
> > >>number of binary compatibility issues.
> > >
> > >I really don't know of a way to use sysfs for this currently, and
> > >hence, am not complaining too much about the different /proc files
> > >that have this kind of information in it at the moment.
> > >
> > >If you or someone else wants to come up with some kind of solution
> > >for it, I'm sure that many people would be very happy to see it.
> >
> > Hmm, ok; I'll see what I can come up with. Would anybody object to
> > this kind of API (as in my previous email) that uses an open fd as a
> > transaction "handle"?
>
> No, I think Kay played around with something like using the open fd of
> the directory as such a lock (or was he using flock on it, I can't
> remember now...)

If you can assume that processes accessing the values are cooperative,
it already works without any changes:

$ time flock /sys/class/firmware echo 1 > /sys/class/firmware/timeout
real 0m0.005s

$ flock /sys/class/firmware sleep 5&
[1] 6468

$ time flock /sys/class/firmware echo 1 > /sys/class/firmware/timeout
real 0m3.558s

Kay

2006-05-02 21:56:26

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On Tue, May 02, 2006 at 11:33:52PM +0200, Kay Sievers wrote:
> On Tue, May 02, 2006 at 02:28:45PM -0700, Greg KH wrote:
> > On Tue, May 02, 2006 at 01:46:03PM +0200, Kay Sievers wrote:
> > > On Mon, May 01, 2006 at 10:37:03PM -0700, Greg KH wrote:
> > > > On Tue, May 02, 2006 at 07:23:41AM +0200, Kay Sievers wrote:
> > > > > If the count of values handled in a transaction is not to high and it
> > > > > makes sense to group these values logically, why not just create an
> > > > > attribute group for every transaction, which creates dummy attributes
> > > > > to fill the values in, and use an "action" file in that group, that
> > > > > commits all the values at once to whatever target? That should fit into
> > > > > the ioctl use pattern, right?
> > > >
> > > > That's what configfs can handle easier. I think the issue is getting
> > > > stuff from the kernel in one atomic snapshot (all the different file
> > > > values from the same point in time.)
> > >
> > > Sure, but just like an ioctl, the kernel could return the values after
> > > writing to the "action" file in the dummy attributes. That would be
> > > something like a snapshot, right?
> >
> > Yes, but where would the buffer be to return the data to on a write? In
> > the data that the user passed to write?
>
> In the "dummy attribute", allocated by the device instance.

Ok, I'm totally confused and don't understand anymore. Care to walk
this through again as to how it would work?

sorry,

greg k-h

2006-05-02 23:19:15

by Kyle Moffett

[permalink] [raw]

Subject: Re: [PATCH] s390: Hypervisor File System

On May 2, 2006, at 17:49:08, Kay Sievers wrote:
> If you can assume that processes accessing the values are
> cooperative, it already works without any changes:
>
> $ time flock /sys/class/firmware echo 1 > /sys/class/firmware/
> timeout
> real 0m0.005s
>
> $ flock /sys/class/firmware sleep 5&
> [1] 6468
>
> $ time flock /sys/class/firmware echo 1 > /sys/class/firmware/
> timeout
> real 0m3.558s

But that doesn't solve the problem for framebuffer devices or for the
s390 code. Such transactions have one or more of the following
properties:

(1) A read operation is _expensive_ or adds unacceptable latencies
and should be done as rarely as possible
(2) The data must be all written to hardware simultaneously by the
kernel; a partial update does not make sense and would cause
undesired operation from the hardware.

The idea with the transactions would be to create a kernel-memory
buffer-layer of sorts on top of the underlying sysfs tree to cache
the read data and collect writes for an atomic commit. I'll see if I
can make something work.

Cheers,
Kyle Moffett