2006-05-03 09:32:56

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

Kyle Moffett <[email protected]> wrote on 04/29/2006 10:41:05 AM:

|snip]

> It sounds like a lot of things need some kind of shell-scriptable
> transaction interface for sysfs files. You don't want to have more
> than one value per file, but reading or writing of some values must
> be done together for consistency reasons. Is there any way to
> implement something like this? This would work for the framebuffer
> people and solve the needs of a lot of the people who still want
> ioctls or some other atomic-multivalued transfer that would otherwise
> be a great sysfs candidate.
>

Martin told me that you will probably kill me for the following,
but I neverless would like to post my suggestion:

All the complicated mechanisms with filesystem trees
to obtain consistent data and transaction functionality
could be avoided, if we would use single files, which
contain all the data. When opening the file, the snapshot
is created and attached to the struct file.

As far as I know, common sense is to avoid that, because
it is ugly and error prone to parse the files in user space.

But If we would provide a standard tag language (I don't
say XML here) for all sort of kernel data, which should
be exported to user space, one standard parser could
be used to obtain the data. This is not error prone since
you can always use the standard parser to access the
data. It is still human readable, which is an advantage
compared with binary files. You can either read
the file directly or use the parser to get the tree structure
in a better readable way. E.g in our s390 hypfs example
you could call:

standardparser < /sys/hypervisor/s390/accountigdata

which prints a nice tree to stdout. Or you could use the
parser in shell scripts to obtain specific attributes.
E.g. you could call:

standardparser < .../acountingdata systems.lparxy.cpu0.time

which prints e.g. "32432515" to stdout.

This approach would solve our atomicity problem and
also the problem of parsing complicated sysfs files.

Putting the tree into one file using a tag language is from
a logical point of view exactly the same as creating a
directory tree within a filesystem.

Regarding performance in kernel space it is not more effort
to create the tree in one file than creating the tree
within a filesystem. It is even less effort, since you do not
need hundreds of system calls (open, read, close) to
obtain the data from user space.

If you think, that this topic has already been discussed
too often on this mailing list and it is not worth discussing
this further, please just ignore this posting!

Michael


2006-05-03 09:42:06

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 3 May 2006, Michael Holzheu wrote:
> All the complicated mechanisms with filesystem trees
> to obtain consistent data and transaction functionality
> could be avoided, if we would use single files, which
> contain all the data. When opening the file, the snapshot
> is created and attached to the struct file.

If we're going to add new infrastructure, I'd vote for adding snapshotting
capability to filesystems. We need it for stuff like LogFS anyway.

Pekka

2006-05-03 10:01:39

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 3 May 2006 11:33:01 +0200, Michael Holzheu wrote:
>
> All the complicated mechanisms with filesystem trees
> to obtain consistent data and transaction functionality
> could be avoided, if we would use single files, which
> contain all the data. When opening the file, the snapshot
> is created and attached to the struct file.

s/single file/single entity/ and this may be useful. Your filesystem
exports a directory tree, which is nice and easily parsable. The
problem is that it is a single resource for everyone. If different
users could have their own views of this filesystem, each with a
private snapshot, many problems would be solved.

Spufs might have something similar already. Istr something about
returning a directory fd and then using openat(2) and friends.

J?rn

--
To my face you have the audacity to advise me to become a thief - the worst
kind of thief that is conceivable, a thief of spiritual things, a thief of
ideas! It is insufferable, intolerable!
-- M. Binet in Scarabouche

2006-05-03 12:11:30

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System


Pekka J Enberg <[email protected]> wrote on 05/03/2006 11:42:01 AM:
> On Wed, 3 May 2006, Michael Holzheu wrote:
> > All the complicated mechanisms with filesystem trees
> > to obtain consistent data and transaction functionality
> > could be avoided, if we would use single files, which
> > contain all the data. When opening the file, the snapshot
> > is created and attached to the struct file.
>
> If we're going to add new infrastructure, I'd vote for adding
snapshotting
> capability to filesystems. We need it for stuff like LogFS anyway.
>

Maybe we need that, too. But I think the advantage of the
one file solution moves the complexity from the kernel
to userspace.

Michael

2006-05-03 12:34:19

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 3 May 2006 14:11:36 +0200, Michael Holzheu wrote:
>
> Maybe we need that, too. But I think the advantage of the
> one file solution moves the complexity from the kernel
> to userspace.

Now might be a time to come back to Martin's prediction. ;)

Having a weird format in some file does _not_ move complexity from the
kernel. It may make the userspace more complex, granted. But once
you try to change something, you need to keep the ABI stable. And
part of the ABI is you file format.

Applications will depend on some arcane detail of your format. They
will depend on exactly five spaces in "foo bar". It does not even
matter if you documented "any amount of whitespace". The application
knows that it was five spaces and doesn't care. And once you change
it, the blame will be on you, because you broke existing userspace.

If that does not make the kernel complex, I don't know what does.

J?rn

--
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

2006-05-03 12:51:47

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

Hi J?rn,

J?rn Engel <[email protected]> wrote on 05/03/2006 02:33:39 PM:
> On Wed, 3 May 2006 14:11:36 +0200, Michael Holzheu wrote:
> >
> > Maybe we need that, too. But I think the advantage of the
> > one file solution moves the complexity from the kernel
> > to userspace.
>
> Now might be a time to come back to Martin's prediction. ;)
>
> Having a weird format in some file does _not_ move complexity from the
> kernel. It may make the userspace more complex, granted. But once
> you try to change something, you need to keep the ABI stable. And
> part of the ABI is you file format.

This is also true, if you use a filesystem tree. The tree structure and
the content of the files are part of your ABI. There is no difference
between a standard tag based file (all kernel files should use the
same format of course!) and a filesystem tree.

> Applications will depend on some arcane detail of your format. They
> will depend on exactly five spaces in "foo bar". It does not even
> matter if you documented "any amount of whitespace". The application
> knows that it was five spaces and doesn't care. And once you change
> it, the blame will be on you, because you broke existing userspace.

Again, logically there is no difference between the two solutions. It does
not matter, if you have one file with:

<cpu>
<0>
<onlinetime = 4711>
<\0>
<\cpu>

... or whatever the standard kernel format will be
... and a filesystem tree with:

+cpu
+ 0
+ onlinetime


If you implement a standard userspace parser, you can access the
attributes in one file as easyly as the attributes in a filesystem tree.

Michael

2006-05-03 13:01:32

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 3 May 2006 14:51:53 +0200, Michael Holzheu wrote:
> J?rn Engel <[email protected]> wrote on 05/03/2006 02:33:39 PM:
>
> > Applications will depend on some arcane detail of your format. They
> > will depend on exactly five spaces in "foo bar". It does not even
> > matter if you documented "any amount of whitespace". The application
> > knows that it was five spaces and doesn't care. And once you change
> > it, the blame will be on you, because you broke existing userspace.
>
> Again, logically there is no difference between the two solutions. It does
> not matter, if you have one file with:
>
> <cpu>
> <0>
> <onlinetime = 4711>
> <\0>
> <\cpu>

Userspace can make your life hell by depending on indentation via 4
spaces. The problem is that you don't necessarily know that it does
until you managed to change indentation.

In a filesystem tree, it is fairly hard to make assumptions that are
later broken. It is by no means impossible, agreed. But the
"indentation" doesn't exist anymore. A file is part of a subdirectory
or it isn't. Opening tags without matching closing tags don't exist
either. List goes on.

In the end, both formats can get abused in ways you'd never foresee.
But the directory tree considerably raises the barrier.

J?rn

--
He who knows that enough is enough will always have enough.
-- Lao Tsu

2006-05-03 13:18:35

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

Hi J?rn,

J?rn Engel <[email protected]> wrote on 05/03/2006 03:00:43 PM:
> On Wed, 3 May 2006 14:51:53 +0200, Michael Holzheu wrote:
> > J?rn Engel <[email protected]> wrote on 05/03/2006 02:33:39
PM:
> >
> > > Applications will depend on some arcane detail of your format. They
> > > will depend on exactly five spaces in "foo bar". It does not
even
> > > matter if you documented "any amount of whitespace". The application
> > > knows that it was five spaces and doesn't care. And once you change
> > > it, the blame will be on you, because you broke existing userspace.
> >
> > Again, logically there is no difference between the two solutions. It
does
> > not matter, if you have one file with:
> >
> > <cpu>
> > <0>
> > <onlinetime = 4711>
> > <\0>
> > <\cpu>
>
> Userspace can make your life hell by depending on indentation via 4
> spaces. The problem is that you don't necessarily know that it does
> until you managed to change indentation.

Of course! But the convention must be, that If userspace wants to
access the data, it has to use our standard linux
parser. If it accesses the data directly, this is broken.
This ensures, that whitespaces do not matter at all! And as
I said before, if you use the parser, you don't have any
difference compared to the filesystem solution from a logical
perspective.

Michael

2006-05-03 13:23:30

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 3 May 2006 15:18:41 +0200, Michael Holzheu wrote:
>
> Of course! But the convention must be, that If userspace wants to
> access the data, it has to use our standard linux
> parser. If it accesses the data directly, this is broken.
> This ensures, that whitespaces do not matter at all! And as
> I said before, if you use the parser, you don't have any
> difference compared to the filesystem solution from a logical
> perspective.

o People are not forced to follow the convention. If they don't and
you break an existing application, you get the blame.
o Now you have a dependency on the standard parser, which is in
userspace. Any bug in any version of the standard parser and...

J?rn

--
There's nothing better for promoting creativity in a medium than
making an audience feel "Hmm ? I could do better than that!"
-- Douglas Adams in a slashdot interview

2006-05-03 13:38:14

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

Hi J?rn,

J?rn Engel <[email protected]> wrote on 05/03/2006 03:22:39 PM:
> On Wed, 3 May 2006 15:18:41 +0200, Michael Holzheu wrote:
> >
> o People are not forced to follow the convention. If they don't and
> you break an existing application, you get the blame.

Sure, but this is really just a matter of definition. The kernel defines
the ABI, right?. User space has to follow the rules. If they break the
rules
that's badluck for the userspace tools. Currently you can also
get kernel information directly from /dev/mem. If an application
does that, nobody would say, that we are not allowed to change
kernel data structures because of that user space application.

> o Now you have a dependency on the standard parser, which is in
> userspace. Any bug in any version of the standard parser and...

At least this parser should be well tested, if everybody uses it.

But maybe I am completely wrong here ....

Michael

2006-05-03 14:17:34

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 2006-05-03 at 15:38 +0200, Michael Holzheu wrote:
> > On Wed, 3 May 2006 15:18:41 +0200, Michael Holzheu wrote:
> > >
> > o People are not forced to follow the convention. If they don't and
> > you break an existing application, you get the blame.
>
> Sure, but this is really just a matter of definition. The kernel defines
> the ABI, right?. User space has to follow the rules. If they break the
> rules
> that's badluck for the userspace tools. Currently you can also
> get kernel information directly from /dev/mem. If an application
> does that, nobody would say, that we are not allowed to change
> kernel data structures because of that user space application.

The kernel defines the ABI, but what IS the ABI? Is a single space that
the current implementation delivers an indication that there needs to be
a single space between two values, or could there be an arbitrary number
of spaces and tabs? You certainly can't conclude that from the code.
For /proc files people tended to use sscanf to read lines from the
output. Is the format of a line fixed, or can it be extended by
additional fields? Does certain fields have to start at a specific
offset or not? How long can the different fields get? And so on.

> > o Now you have a dependency on the standard parser, which is in
> > userspace. Any bug in any version of the standard parser and...
>
> At least this parser should be well tested, if everybody uses it.

And the user space then uses the parser only? Is now the parser
interface the "ABI" or the kernel interface that is in turn used by the
parser? And what happens if somebody comes up with a "better" parser
that does things subtly different?

In short: keep the kernel interface as simple as you possibly can. That
is why the single value approach has been invented. A text file that
needs to get parsed is certainly not simple.

--
blue skies,
Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


2006-05-03 14:23:43

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

[email protected] wrote on 05/03/2006 04:17:40 PM:
> And the user space then uses the parser only? Is now the parser
> interface the "ABI" or the kernel interface that is in turn used by the
> parser? And what happens if somebody comes up with a "better" parser
> that does things subtly different?

The ABI is not defined by the Parser. You have to specify the
tag language, which is part of the ABI. Any parser, which is comliant
to the specification of the tag language can be used.


2006-05-03 14:58:04

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 2006-05-03 at 16:23 +0200, Michael Holzheu wrote:
> [email protected] wrote on 05/03/2006 04:17:40 PM:
> > And the user space then uses the parser only? Is now the parser
> > interface the "ABI" or the kernel interface that is in turn used by the
> > parser? And what happens if somebody comes up with a "better" parser
> > that does things subtly different?
>
> The ABI is not defined by the Parser. You have to specify the
> tag language, which is part of the ABI. Any parser, which is comliant
> to the specification of the tag language can be used.

Optimist.

--
blue skies,
Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


2006-05-03 15:22:35

by Michael Holzheu

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

[email protected] wrote on 05/03/2006 04:58:06 PM:
> On Wed, 2006-05-03 at 16:23 +0200, Michael Holzheu wrote:
> > [email protected] wrote on 05/03/2006 04:17:40 PM:
> > > And the user space then uses the parser only? Is now the parser
> > > interface the "ABI" or the kernel interface that is in turn used by
the
> > > parser? And what happens if somebody comes up with a "better" parser
> > > that does things subtly different?
> >
> > The ABI is not defined by the Parser. You have to specify the
> > tag language, which is part of the ABI. Any parser, which is comliant
> > to the specification of the tag language can be used.
>
> Optimist.

One very last comment: I think for our problem to
ensure consitency of hypervisor data, when an application
always wants to get the complete set of information,
the "one file" solution with a fully specified ASCII
tag language format looks for me to be the easiest way
to implement our solution. And I think, if one decides to
use one file to provide all the information, it is better to
have a standard data format than always invent new
formats. And to use a standard ASCII format is
in my eyes also better than to have a binary interface.

If everybody says, that it is in principal not a good idea
to use one file, than a snapshot mechanism
for filesystems is probably go good method to provide
consitency. And this is definitely not my decission...

Michael

2006-05-03 15:54:29

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] s390: Hypervisor File System

On Wed, 03 May 2006 15:18:41 +0200, Michael Holzheu said:
> Of course! But the convention must be, that If userspace wants to
> access the data, it has to use our standard linux
> parser. If it accesses the data directly, this is broken.

Yet another case of Eternal Optimism flying in the face of Reality... ;)

a) you can't *force* the use of your parser.
b) this creates a userspace dependency that can get messy if the parser is buggy
or requires modification to deal with a kernel change.


Attachments:
(No filename) (226.00 B)