2019-12-28 14:39:08

by Pali Rohár

[permalink] [raw]
Subject: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

Hello!

I see that you have introduced in commit 62750d0 two new IOCTLs for
filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL.

I would like to ask, are these two new ioctls mean to be generic way for
userspace to get or set fs label independently of which filesystem is
used? Or are they only for btrfs?

Because I was not able to find any documentation for it, what is format
of passed buffer... null-term string? fixed-length? and in which
encoding? utf-8? latin1? utf-16? or filesystem dependent?

--
Pali Rohár
[email protected]


Attachments:
(No filename) (558.00 B)
signature.asc (201.00 B)
Download all attachments

2019-12-28 19:16:16

by Andreas Dilger

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL


> On Dec 28, 2019, at 7:36 AM, Pali Rohár <[email protected]> wrote:
>
> Hello!
>
> I see that you have introduced in commit 62750d0 two new IOCTLs for
> filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL.
>
> I would like to ask, are these two new ioctls mean to be generic way for
> userspace to get or set fs label independently of which filesystem is
> used? Or are they only for btrfs?
>
> Because I was not able to find any documentation for it, what is format
> of passed buffer... null-term string? fixed-length? and in which
> encoding? utf-8? latin1? utf-16? or filesystem dependent?

It seems that SETFSLABEL is supported by BtrFS and XFS, and GETFSLABEL
also by GFS2. We were just discussing recently about adding it to ext4,
so if you wanted to submit a patch for that it would be welcome.

It looks like the label is a NUL-terminated string, up to the length
allowed by the various filesystems. That said, it seems like a bit of
a bug that the kernel will return -EFAULT if a string shorter than the
maximum size is supplied (256 chars for btrfs).

The copy_{to,from}_user() function will (I think) return the number of
bytes remaining to be copied, so IMHO it would make sense that this is
compared to the string length of the label, and not return -EFAULT if
the buffer is large enough to hold the actual label. Otherwise, the
caller has to magically know the maximum label size that is returned
from any filesystem and/or allocate specific buffer sizes for different
filesystem types, which makes it not very useful as a generic interface.


Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2019-12-28 20:07:05

by Pali Rohár

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Saturday 28 December 2019 12:14:57 Andreas Dilger wrote:
> > On Dec 28, 2019, at 7:36 AM, Pali Rohár <[email protected]> wrote:
> >
> > Hello!
> >
> > I see that you have introduced in commit 62750d0 two new IOCTLs for
> > filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL.
> >
> > I would like to ask, are these two new ioctls mean to be generic way for
> > userspace to get or set fs label independently of which filesystem is
> > used? Or are they only for btrfs?
> >
> > Because I was not able to find any documentation for it, what is format
> > of passed buffer... null-term string? fixed-length? and in which
> > encoding? utf-8? latin1? utf-16? or filesystem dependent?
>
> It seems that SETFSLABEL is supported by BtrFS and XFS, and GETFSLABEL
> also by GFS2. We were just discussing recently about adding it to ext4,
> so if you wanted to submit a patch for that it would be welcome.

I have in-progress patches for FAT with following ioctls:

#define FAT_IOCTL_GET_VOLUME_LABEL(len) _IOC(_IOC_READ, 'r', 0x14, len) /* variable length */
#define FAT_IOCTL_SET_VOLUME_LABEL _IOC(_IOC_WRITE, 'r', 0x15, 0) /* variable length, nul term string */

GET ioctl macro takes length of buffer and pass it via standard ioctl
bits into kernel would know how many bytes can copy to userspace.

And via SET ioctl is passed null term string, so kernel copy from
userspace whole null term string (with some upper limit to page size or
1024 bytes).

I plan to finish FAT patches in next month.

> It looks like the label is a NUL-terminated string, up to the length
> allowed by the various filesystems. That said, it seems like a bit of
> a bug that the kernel will return -EFAULT if a string shorter than the
> maximum size is supplied (256 chars for btrfs).

FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls now seems like a nice
general / unified approach, but it has problems which I already mention
in questions in previous email.

There are basically two kinds of filesystems when looking how filename /
label should be represented.

First group: file name / label is sequence of bytes (possible null
termined) without any representation. It can be in any encoding, it is
up to the user. There is no mapping to Unicode. Examples: ext4

Second group: file name / label is stored in exact encoding (defined
internally or externally by e.g. code page). There is exact translate
function to Unicode. For these file system there is either iocharset= or
nls= mount option which maps name / label to bytes. Examples: vfat
(UTF-16LE), ntfs (UTF-16LE), udf (Latin1 and UTF-16BE), iso9660 with
joliet (UTF-16LE), apfs (UTF-8).

All mentioned file systems by you are in first group, so you probably
have not hit this "implementation" problem for FS_IOC_GETFSLABEL yet.

And question is, what should FS_IOC_GETFSLABEL returns for file systems
from second group?

In my implementation for FAT_IOCTL_GET_VOLUME_LABEL I'm reading label
buffer from filesyetem, then converting this buffer from encoding
specified in codepage= mount option to Unicode and then from Unicode I'm
converting it to encoding specified in iocharset= mount option. Why?
Because same procedure is used for reading file names from vfat when
corresponding file name does not have long UTF-16LE vfat entry (label
on vfat really is not in UTF-16LE but in OEM code page). And I think
that volume label should be handled in same way like file names.

Is above behavior "label in ioctl would be retrieved in same encoding as
file names" correct / expected also for FS_IOC_GETFSLABEL?

Or do you have better idea how should filesystems from second group
implement FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL? (E.g. always in UTF-8
independently of mount options? Or in filesystem native encoding, so
sometimes in UTF-16LE, sometimes in UTF-16BE, sometimes in DOS OEM code
page, sometimes in Latin1? Or something different?)

> The copy_{to,from}_user() function will (I think) return the number of
> bytes remaining to be copied, so IMHO it would make sense that this is
> compared to the string length of the label, and not return -EFAULT if
> the buffer is large enough to hold the actual label. Otherwise, the
> caller has to magically know the maximum label size that is returned
> from any filesystem and/or allocate specific buffer sizes for different
> filesystem types, which makes it not very useful as a generic interface.

I think that my approach with specifying length in ioctl bits for GET
ioctl is more flexible. But I do not know if it is even possible to
change as API is already in kernel.

Also I see there more important bug in that API:

#define FSLABEL_MAX 256 /* Max chars for the interface; each fs may differ */

git show 62750d040bd13

"This retains 256 chars as the maximum size through the interface, which
is the btrfs limit and AFAIK exceeds any other filesystem's maximum
label size."

It has some artificial limit for volume label length. Which filesystems
was checked that maximal label length cannot be larger?

E.g. on UDF filesystem (supported by Linux kernel for a long time) is
label stored in 128bit buffer. First byte specifies Unicode encoding,
last byte specifies used length of buffer. Two possible encodings are
currently defined: Latin1 and UTF-16BE. Therefore if FS_IOC_GETFSLABEL
for UDF would use UTF-8 encoding, then maximal length for label is:
(128-2)*2 = 252 plus null byte (*2 because all non-ASCII Latin1
characters which are one-byte are encoded in UTF-8 by two bytes). Which
is almost upper limit of API. And this is just first filesytem which
I checked. I would not be surprised if there is already filesystem which
maximal length exceed current limit defined by kernel API.

--
Pali Rohár
[email protected]


Attachments:
(No filename) (5.75 kB)
signature.asc (201.00 B)
Download all attachments

2019-12-31 22:57:58

by Eric Sandeen

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On 12/28/19 6:36 AM, Pali Rohár wrote:
> Hello!
>
> I see that you have introduced in commit 62750d0 two new IOCTLs for
> filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL.
>
> I would like to ask, are these two new ioctls mean to be generic way for
> userspace to get or set fs label independently of which filesystem is
> used? Or are they only for btrfs?

The reason it was lifted out of btrfs to the vfs is so that other filesystems
can use the same interface. However, it is up to each filesystem to implement
it (and to interpret what's been written to or read from disk.)

> Because I was not able to find any documentation for it, what is format
> of passed buffer... null-term string? fixed-length? and in which
> encoding? utf-8? latin1? utf-16? or filesystem dependent?

It simply copies the bits from the memory location you pass in, it knows
nothing of encodings.

For the most part it's up to the filesystem's own utilities to do any
interpretation of the resulting bits on disk, null-terminating maximal-length
label strings, etc.

-Eric

2020-01-01 18:13:26

by Theodore Ts'o

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote:
> > Because I was not able to find any documentation for it, what is format
> > of passed buffer... null-term string? fixed-length? and in which
> > encoding? utf-8? latin1? utf-16? or filesystem dependent?
>
> It simply copies the bits from the memory location you pass in, it knows
> nothing of encodings.
>
> For the most part it's up to the filesystem's own utilities to do any
> interpretation of the resulting bits on disk, null-terminating maximal-length
> label strings, etc.

I'm not sure this is going to be the best API design choice. The
blkid library interprets the on disk format for each file syustem
knowing what is the "native" format for that particular file system.
This is mainly an issue only for the non-Linux file systems; for the
Linux file system, the party line has historically been that we don't
get involved with character encoding, but in practice, what that has
evolved into is that userspace has standardized on UTF-8, and that's
what we pass into the kernel from userspace by convention.

But the problem is that if the goal is to make FS_IOC_GETFSLABEL and
FS_IOC_SETFSLABEL work without the calling program knowing what file
system type a particular pathname happens to be, then it would be
easist for the userspace program if it can expect that it can always
pass in a null-terminated UTF-8 string, and get back a null-terminated
UTF-8. I bet that in practice, that is what most userspace programs
are going to be do anyway, since it works that way for all other file
system syscalls.

So for a file system which is a non-Linux-native file system, if it
happens to store the its label using utf-16, or some other
Windows-system-silliness, it would work a lot better if it assumed
that it was passed in utf-8, and stored in the the Windows file system
using whatever crazy encoding Windows wants to use. Otherwise, why
bother uplifting the ioctl to one which is file system independent, if
the paramters are defined to be file system *dependent*?

- Ted

2020-01-01 18:40:14

by Pali Rohár

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote:
> On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote:
> > > Because I was not able to find any documentation for it, what is format
> > > of passed buffer... null-term string? fixed-length? and in which
> > > encoding? utf-8? latin1? utf-16? or filesystem dependent?
> >
> > It simply copies the bits from the memory location you pass in, it knows
> > nothing of encodings.
> >
> > For the most part it's up to the filesystem's own utilities to do any
> > interpretation of the resulting bits on disk, null-terminating maximal-length
> > label strings, etc.
>
> I'm not sure this is going to be the best API design choice. The
> blkid library interprets the on disk format for each file syustem
> knowing what is the "native" format for that particular file system.
> This is mainly an issue only for the non-Linux file systems; for the
> Linux file system, the party line has historically been that we don't
> get involved with character encoding, but in practice, what that has
> evolved into is that userspace has standardized on UTF-8, and that's
> what we pass into the kernel from userspace by convention.
>
> But the problem is that if the goal is to make FS_IOC_GETFSLABEL and
> FS_IOC_SETFSLABEL work without the calling program knowing what file
> system type a particular pathname happens to be, then it would be
> easist for the userspace program if it can expect that it can always
> pass in a null-terminated UTF-8 string, and get back a null-terminated
> UTF-8. I bet that in practice, that is what most userspace programs
> are going to be do anyway, since it works that way for all other file
> system syscalls.
>
> So for a file system which is a non-Linux-native file system, if it
> happens to store the its label using utf-16, or some other
> Windows-system-silliness, it would work a lot better if it assumed
> that it was passed in utf-8, and stored in the the Windows file system
> using whatever crazy encoding Windows wants to use. Otherwise, why
> bother uplifting the ioctl to one which is file system independent, if
> the paramters are defined to be file system *dependent*?

Exactly. In another email I wrote that for those non-Linux-native
filesystem could be used encoding specified in iocharset= mount
parameter. I think it is better as usage of one fixing encoding (e.g.
UTF-8) if other filesystem strings are propagated to userspace in other
encoding (as specified by iocharset=).

--
Pali Rohár
[email protected]


Attachments:
(No filename) (2.52 kB)
signature.asc (201.00 B)
Download all attachments

2020-01-02 21:59:11

by Darrick J. Wong

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Roh?r wrote:
> On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote:
> > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote:
> > > > Because I was not able to find any documentation for it, what is format
> > > > of passed buffer... null-term string? fixed-length? and in which
> > > > encoding? utf-8? latin1? utf-16? or filesystem dependent?
> > >
> > > It simply copies the bits from the memory location you pass in, it knows
> > > nothing of encodings.
> > >
> > > For the most part it's up to the filesystem's own utilities to do any
> > > interpretation of the resulting bits on disk, null-terminating maximal-length
> > > label strings, etc.
> >
> > I'm not sure this is going to be the best API design choice. The
> > blkid library interprets the on disk format for each file syustem
> > knowing what is the "native" format for that particular file system.
> > This is mainly an issue only for the non-Linux file systems; for the
> > Linux file system, the party line has historically been that we don't
> > get involved with character encoding, but in practice, what that has
> > evolved into is that userspace has standardized on UTF-8, and that's
> > what we pass into the kernel from userspace by convention.
> >
> > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and
> > FS_IOC_SETFSLABEL work without the calling program knowing what file
> > system type a particular pathname happens to be, then it would be
> > easist for the userspace program if it can expect that it can always
> > pass in a null-terminated UTF-8 string, and get back a null-terminated
> > UTF-8. I bet that in practice, that is what most userspace programs
> > are going to be do anyway, since it works that way for all other file
> > system syscalls.

"Null terminated sequence of bytes*" is more or less what xfsprogs do,
and it looks like btrfs does that as well.

(* with the idiotic exception that if the label is exactly 256 bytes long
then the array is not required to have a null terminator, because btrfs
encoded that quirk of their ondisk format into the API. <grumble>)

So for VFAT, I think you can use the same code that does the name
encoding transformations for iocharset= to handle labels, right?

> > So for a file system which is a non-Linux-native file system, if it
> > happens to store the its label using utf-16, or some other
> > Windows-system-silliness, it would work a lot better if it assumed
> > that it was passed in utf-8, and stored in the the Windows file system
> > using whatever crazy encoding Windows wants to use. Otherwise, why
> > bother uplifting the ioctl to one which is file system independent, if
> > the paramters are defined to be file system *dependent*?
>
> Exactly. In another email I wrote that for those non-Linux-native
> filesystem could be used encoding specified in iocharset= mount
> parameter. I think it is better as usage of one fixing encoding (e.g.
> UTF-8) if other filesystem strings are propagated to userspace in other
> encoding (as specified by iocharset=).

I'm confused by this statement... but I think we're saying the same
thing?

--D

>
> --
> Pali Roh?r
> [email protected]


2020-01-02 22:08:58

by Pali Rohár

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Thursday 02 January 2020 13:57:54 Darrick J. Wong wrote:
> On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Rohár wrote:
> > On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote:
> > > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote:
> > > > > Because I was not able to find any documentation for it, what is format
> > > > > of passed buffer... null-term string? fixed-length? and in which
> > > > > encoding? utf-8? latin1? utf-16? or filesystem dependent?
> > > >
> > > > It simply copies the bits from the memory location you pass in, it knows
> > > > nothing of encodings.
> > > >
> > > > For the most part it's up to the filesystem's own utilities to do any
> > > > interpretation of the resulting bits on disk, null-terminating maximal-length
> > > > label strings, etc.
> > >
> > > I'm not sure this is going to be the best API design choice. The
> > > blkid library interprets the on disk format for each file syustem
> > > knowing what is the "native" format for that particular file system.
> > > This is mainly an issue only for the non-Linux file systems; for the
> > > Linux file system, the party line has historically been that we don't
> > > get involved with character encoding, but in practice, what that has
> > > evolved into is that userspace has standardized on UTF-8, and that's
> > > what we pass into the kernel from userspace by convention.
> > >
> > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and
> > > FS_IOC_SETFSLABEL work without the calling program knowing what file
> > > system type a particular pathname happens to be, then it would be
> > > easist for the userspace program if it can expect that it can always
> > > pass in a null-terminated UTF-8 string, and get back a null-terminated
> > > UTF-8. I bet that in practice, that is what most userspace programs
> > > are going to be do anyway, since it works that way for all other file
> > > system syscalls.
>
> "Null terminated sequence of bytes*" is more or less what xfsprogs do,
> and it looks like btrfs does that as well.
>
> (* with the idiotic exception that if the label is exactly 256 bytes long
> then the array is not required to have a null terminator, because btrfs
> encoded that quirk of their ondisk format into the API. <grumble>)
>
> So for VFAT, I think you can use the same code that does the name
> encoding transformations for iocharset= to handle labels, right?

Yes I can! But I need to process also codepage= transformation (details
in email <20191228200523.eaxpwxkpswzuihow@pali>). And I already have
this implementation in progress.

> > > So for a file system which is a non-Linux-native file system, if it
> > > happens to store the its label using utf-16, or some other
> > > Windows-system-silliness, it would work a lot better if it assumed
> > > that it was passed in utf-8, and stored in the the Windows file system
> > > using whatever crazy encoding Windows wants to use. Otherwise, why
> > > bother uplifting the ioctl to one which is file system independent, if
> > > the paramters are defined to be file system *dependent*?
> >
> > Exactly. In another email I wrote that for those non-Linux-native
> > filesystem could be used encoding specified in iocharset= mount
> > parameter. I think it is better as usage of one fixing encoding (e.g.
> > UTF-8) if other filesystem strings are propagated to userspace in other
> > encoding (as specified by iocharset=).
>
> I'm confused by this statement... but I think we're saying the same
> thing?

Theodore suggested to use UTF-8 encoding for FS_IOC_GETFSLABEL. And I
suggested to use iocharset= encoding for FS_IOC_GETFSLABEL. You said to
use for VFAT "same code that does the name encoding", so if I'm
understanding correctly, yes it is the same thing (as VFAT use
iocharset= and codepage= mount options for name encoding). Right?

--
Pali Rohár
[email protected]


Attachments:
(No filename) (3.88 kB)
signature.asc (201.00 B)
Download all attachments

2020-01-07 23:16:36

by Darrick J. Wong

[permalink] [raw]
Subject: Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL

On Thu, Jan 02, 2020 at 11:08:00PM +0100, Pali Roh?r wrote:
> On Thursday 02 January 2020 13:57:54 Darrick J. Wong wrote:
> > On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Roh?r wrote:
> > > On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote:
> > > > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote:
> > > > > > Because I was not able to find any documentation for it, what is format
> > > > > > of passed buffer... null-term string? fixed-length? and in which
> > > > > > encoding? utf-8? latin1? utf-16? or filesystem dependent?
> > > > >
> > > > > It simply copies the bits from the memory location you pass in, it knows
> > > > > nothing of encodings.
> > > > >
> > > > > For the most part it's up to the filesystem's own utilities to do any
> > > > > interpretation of the resulting bits on disk, null-terminating maximal-length
> > > > > label strings, etc.
> > > >
> > > > I'm not sure this is going to be the best API design choice. The
> > > > blkid library interprets the on disk format for each file syustem
> > > > knowing what is the "native" format for that particular file system.
> > > > This is mainly an issue only for the non-Linux file systems; for the
> > > > Linux file system, the party line has historically been that we don't
> > > > get involved with character encoding, but in practice, what that has
> > > > evolved into is that userspace has standardized on UTF-8, and that's
> > > > what we pass into the kernel from userspace by convention.
> > > >
> > > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and
> > > > FS_IOC_SETFSLABEL work without the calling program knowing what file
> > > > system type a particular pathname happens to be, then it would be
> > > > easist for the userspace program if it can expect that it can always
> > > > pass in a null-terminated UTF-8 string, and get back a null-terminated
> > > > UTF-8. I bet that in practice, that is what most userspace programs
> > > > are going to be do anyway, since it works that way for all other file
> > > > system syscalls.
> >
> > "Null terminated sequence of bytes*" is more or less what xfsprogs do,
> > and it looks like btrfs does that as well.
> >
> > (* with the idiotic exception that if the label is exactly 256 bytes long
> > then the array is not required to have a null terminator, because btrfs
> > encoded that quirk of their ondisk format into the API. <grumble>)
> >
> > So for VFAT, I think you can use the same code that does the name
> > encoding transformations for iocharset= to handle labels, right?
>
> Yes I can! But I need to process also codepage= transformation (details
> in email <20191228200523.eaxpwxkpswzuihow@pali>). And I already have
> this implementation in progress.

<nod>

> > > > So for a file system which is a non-Linux-native file system, if it
> > > > happens to store the its label using utf-16, or some other
> > > > Windows-system-silliness, it would work a lot better if it assumed
> > > > that it was passed in utf-8, and stored in the the Windows file system
> > > > using whatever crazy encoding Windows wants to use. Otherwise, why
> > > > bother uplifting the ioctl to one which is file system independent, if
> > > > the paramters are defined to be file system *dependent*?
> > >
> > > Exactly. In another email I wrote that for those non-Linux-native
> > > filesystem could be used encoding specified in iocharset= mount
> > > parameter. I think it is better as usage of one fixing encoding (e.g.
> > > UTF-8) if other filesystem strings are propagated to userspace in other
> > > encoding (as specified by iocharset=).
> >
> > I'm confused by this statement... but I think we're saying the same
> > thing?
>
> Theodore suggested to use UTF-8 encoding for FS_IOC_GETFSLABEL. And I
> suggested to use iocharset= encoding for FS_IOC_GETFSLABEL. You said to
> use for VFAT "same code that does the name encoding", so if I'm
> understanding correctly, yes it is the same thing (as VFAT use
> iocharset= and codepage= mount options for name encoding). Right?

Right.

--D

> --
> Pali Roh?r
> [email protected]