Hi,
I came across an interesting issue recently.
Suppose you have a filesystem which supports case-insensitive file names
(like VFAT, but in this particular case 'NSS' - Novell Storage Services).
And support you export some subdirectory (or sub-volume) using a
non-canonical name.
i.e. you "mkdir /path/export", the put "/path/EXPORT" in /etc/exports and
mount server:/path/EXPORT from some client.
This all works until the export cache times out and the kernel asks mountd
if "/path/export" is exported. mountd says "no" and suddenly all accesses
fail.
I don't think much of case-insensitive file names, but I suspect we should
either make this work, it issue a warning as to why it is failing.
A simple work around is to export the canonical name and use it when
mounting. But if the sysadmin doesn't know they need to, they are unlikely
to guess.
I don't think there is any API to test if a name is canonical, or to get the
canonical name, so I cannot see any way in advance to see if this problem
situation has arisen. So the only option I can think of is to fix it.
The following patch (or something much like it for an older nfs-utils) seems
to do the trick.
What do people think? Is this a reasonable thing to do? Is it likely to
have negative consequences that I haven't thought of?
Thanks,
NeilBrown
diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index 9a1bb2767ac2..2d91db76b867 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -377,6 +377,28 @@ static char *next_mnt(void **v, char *p)
return me->mnt_dir;
}
+static int same_path(char *child, char *parent, int len)
+{
+ static char p[PATH_MAX];
+ struct stat sc, sp;
+ if (len <= 0)
+ len = strlen(child);
+ strncpy(p, child, len);
+ p[len] = 0;
+ if (strcmp(p, parent) == 0)
+ return 1;
+
+ if (lstat(p, &sc) != 0)
+ return 0;
+ if (lstat(parent, &sp) != 0)
+ return 0;
+ if (sc.st_dev != sp.st_dev)
+ return 0;
+ if (sc.st_ino != sp.st_ino)
+ return 0;
+ return 1;
+}
+
static int is_subdirectory(char *child, char *parent)
{
/* Check is child is strictly a subdirectory of
@@ -387,7 +409,7 @@ static int is_subdirectory(char *child, char *parent)
if (strcmp(parent, "/") == 0 && child[1] != 0)
return 1;
- return (strncmp(child, parent, l) == 0 && child[l] == '/');
+ return (same_path(child, parent, l) && child[l] == '/');
}
static int path_matches(nfs_export *exp, char *path)
@@ -396,7 +418,7 @@ static int path_matches(nfs_export *exp, char *path)
* exact match, or does the export have CROSSMOUNT, and path
* is a descendant?
*/
- return strcmp(path, exp->m_export.e_path) == 0
+ return same_path(path, exp->m_export.e_path, 0)
|| ((exp->m_export.e_flags & NFSEXP_CROSSMOUNT)
&& is_subdirectory(path, exp->m_export.e_path));
}
On Thu, 3 Apr 2014 17:36:10 -0400 "J. Bruce Fields" <[email protected]>
wrote:
> (BTW: I also noticed the other day that systemd is calling
> name_to_handle_at to get a mount id. Seems like overkill in both
> cases--shouldn't there be a simpler way to get just the mount id?)
Yes, it is called "xstat" and is an extensible version of stat which even
includes a 'query' bitmap to list which things you want.
Unfortunately it never made it to mainline. We should submit a patch to
systemd to get it to use xstat instead of name_to_handle_at. That would
ensure that the kernel implemented it double-quick :-)
NeilBrown
On Fri, Apr 04, 2014 at 08:21:54AM +1100, NeilBrown wrote:
> On Thu, 3 Apr 2014 13:24:31 -0400 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Thu, Apr 03, 2014 at 01:16:25PM -0400, J. Bruce Fields wrote:
> > > In the presence of bind mounts it's definitely possible for two
> > > different paths to reach the same directory.
> > >
> > > I don't know if that's actually a problem here....
> >
> > Without testing: if we've got the same filesystem mounted at /foo/bar
> > and /foo/baz, but only export /foo/bar, will this change make that
> > export show up at /foo/baz as well?
> >
> > (Since we'll now consider /foo/bar and /foo/baz to match where
> > previously they didn't?)
>
> Good question. I should test, but I suspect "yes". I'm not really happy
> about that.
Yeah.
> Maybe we could use name_to_handle_at(). That returns the mnt_id
> which is different for different bind mounts.
> So if the mnt_id and the handle are the same, it is the same directory. If
> not, then not.
>
> I did worry a bit about all the extra stat calls, but as you say: they are
> cached so it shouldn't be a big problem.
>
> It would make sense to stat (or name_to_handle_at) the name from the kernel
> just once, then compare the value against everything in the export table.
>
> We could possible store handles in the export table, but then we would need
> to check for changes to mountinfo (I think 'poll' can do that) and clear out
> the cache whenever a mount changes.
>
> I might play with some code if I find time...
Sounds reasonable. But also getting a bit more involved for an uncommon
case (path to the root on a case-insensitive filesystem).
--b.
(BTW: I also noticed the other day that systemd is calling
name_to_handle_at to get a mount id. Seems like overkill in both
cases--shouldn't there be a simpler way to get just the mount id?)
On Fri, Apr 04, 2014 at 09:13:24AM +1100, NeilBrown wrote:
> On Thu, 3 Apr 2014 17:36:10 -0400 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > (BTW: I also noticed the other day that systemd is calling
> > name_to_handle_at to get a mount id. Seems like overkill in both
> > cases--shouldn't there be a simpler way to get just the mount id?)
>
> Yes, it is called "xstat" and is an extensible version of stat which even
> includes a 'query' bitmap to list which things you want.
Yes.
I forget that encoding the filehandle is typically just reading inode
and generation number and encoding the results in a funny way, so it's
not particularly heavy-weight.
But lumping the two together does mean that your request can fail just
because the filesystem doesn't support nfs exports.
> Unfortunately it never made it to mainline. We should submit a patch to
> systemd to get it to use xstat instead of name_to_handle_at. That would
> ensure that the kernel implemented it double-quick :-)
Hm....
--b.
On Thu, Apr 03, 2014 at 07:02:49PM +0300, Boaz Harrosh wrote:
> On 04/03/2014 04:09 PM, J. Bruce Fields wrote:
> > On Thu, Apr 03, 2014 at 04:46:52PM +1100, NeilBrown wrote:
> >> diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
> >> index 9a1bb2767ac2..2d91db76b867 100644
> >> --- a/utils/mountd/cache.c
> >> +++ b/utils/mountd/cache.c
> >> @@ -377,6 +377,28 @@ static char *next_mnt(void **v, char *p)
> >> return me->mnt_dir;
> >> }
> >>
> >> +static int same_path(char *child, char *parent, int len)
> >> +{
> >> + static char p[PATH_MAX];
> >> + struct stat sc, sp;
> >> + if (len <= 0)
> >> + len = strlen(child);
> >> + strncpy(p, child, len);
> >> + p[len] = 0;
> >> + if (strcmp(p, parent) == 0)
> >> + return 1;
> >> +
>
> Addressing Bruce concern, perhaps you would like to
> also stricmp the names, so to not lstat on all export
> entries.
If we try to guess the filesystem's case rules we'll go mad.
> + if (stricmp(p, parent) =! 0)
> + return 0;
>
> >> + if (lstat(p, &sc) != 0)
> >> + return 0;
> >> + if (lstat(parent, &sp) != 0)
> >> + return 0;
> >> + if (sc.st_dev != sp.st_dev)
> >> + return 0;
> >> + if (sc.st_ino != sp.st_ino)
> >> + return 0;
>
> Hard links on directories any one ? ;-)
In the presence of bind mounts it's definitely possible for two
different paths to reach the same directory.
I don't know if that's actually a problem here....
--b.
On Thu, Apr 03, 2014 at 04:46:52PM +1100, NeilBrown wrote:
>
> Hi,
> I came across an interesting issue recently.
>
> Suppose you have a filesystem which supports case-insensitive file names
> (like VFAT, but in this particular case 'NSS' - Novell Storage Services).
>
> And support you export some subdirectory (or sub-volume) using a
> non-canonical name.
> i.e. you "mkdir /path/export", the put "/path/EXPORT" in /etc/exports and
> mount server:/path/EXPORT from some client.
>
> This all works until the export cache times out and the kernel asks mountd
> if "/path/export" is exported. mountd says "no" and suddenly all accesses
> fail.
>
> I don't think much of case-insensitive file names, but I suspect we should
> either make this work, it issue a warning as to why it is failing.
> A simple work around is to export the canonical name and use it when
> mounting. But if the sysadmin doesn't know they need to, they are unlikely
> to guess.
>
> I don't think there is any API to test if a name is canonical, or to get the
> canonical name, so I cannot see any way in advance to see if this problem
> situation has arisen. So the only option I can think of is to fix it.
>
> The following patch (or something much like it for an older nfs-utils) seems
> to do the trick.
>
> What do people think? Is this a reasonable thing to do? Is it likely to
> have negative consequences that I haven't thought of?
Neat-o.
Are we adding a stat of every export in the export table where there
wasn't one before? But those should typically be cached, so maybe
there's no problem.
I'm not super-fond of the combining the comparison of two paths and
truncating of one into the same function, though I see why it's
convenient.
--b.
>
> Thanks,
> NeilBrown
>
>
>
> diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
> index 9a1bb2767ac2..2d91db76b867 100644
> --- a/utils/mountd/cache.c
> +++ b/utils/mountd/cache.c
> @@ -377,6 +377,28 @@ static char *next_mnt(void **v, char *p)
> return me->mnt_dir;
> }
>
> +static int same_path(char *child, char *parent, int len)
> +{
> + static char p[PATH_MAX];
> + struct stat sc, sp;
> + if (len <= 0)
> + len = strlen(child);
> + strncpy(p, child, len);
> + p[len] = 0;
> + if (strcmp(p, parent) == 0)
> + return 1;
> +
> + if (lstat(p, &sc) != 0)
> + return 0;
> + if (lstat(parent, &sp) != 0)
> + return 0;
> + if (sc.st_dev != sp.st_dev)
> + return 0;
> + if (sc.st_ino != sp.st_ino)
> + return 0;
> + return 1;
> +}
> +
> static int is_subdirectory(char *child, char *parent)
> {
> /* Check is child is strictly a subdirectory of
> @@ -387,7 +409,7 @@ static int is_subdirectory(char *child, char *parent)
> if (strcmp(parent, "/") == 0 && child[1] != 0)
> return 1;
>
> - return (strncmp(child, parent, l) == 0 && child[l] == '/');
> + return (same_path(child, parent, l) && child[l] == '/');
> }
>
> static int path_matches(nfs_export *exp, char *path)
> @@ -396,7 +418,7 @@ static int path_matches(nfs_export *exp, char *path)
> * exact match, or does the export have CROSSMOUNT, and path
> * is a descendant?
> */
> - return strcmp(path, exp->m_export.e_path) == 0
> + return same_path(path, exp->m_export.e_path, 0)
> || ((exp->m_export.e_flags & NFSEXP_CROSSMOUNT)
> && is_subdirectory(path, exp->m_export.e_path));
> }
On Thu, Apr 03, 2014 at 01:16:25PM -0400, J. Bruce Fields wrote:
> In the presence of bind mounts it's definitely possible for two
> different paths to reach the same directory.
>
> I don't know if that's actually a problem here....
Without testing: if we've got the same filesystem mounted at /foo/bar
and /foo/baz, but only export /foo/bar, will this change make that
export show up at /foo/baz as well?
(Since we'll now consider /foo/bar and /foo/baz to match where
previously they didn't?)
--b.
On 04/03/2014 04:09 PM, J. Bruce Fields wrote:
> On Thu, Apr 03, 2014 at 04:46:52PM +1100, NeilBrown wrote:
>>
>> Hi,
>> I came across an interesting issue recently.
>>
>> Suppose you have a filesystem which supports case-insensitive file names
>> (like VFAT, but in this particular case 'NSS' - Novell Storage Services).
>>
>> And support you export some subdirectory (or sub-volume) using a
>> non-canonical name.
>> i.e. you "mkdir /path/export", the put "/path/EXPORT" in /etc/exports and
>> mount server:/path/EXPORT from some client.
>>
>> This all works until the export cache times out and the kernel asks mountd
>> if "/path/export" is exported. mountd says "no" and suddenly all accesses
>> fail.
>>
>> I don't think much of case-insensitive file names, but I suspect we should
>> either make this work, it issue a warning as to why it is failing.
>> A simple work around is to export the canonical name and use it when
>> mounting. But if the sysadmin doesn't know they need to, they are unlikely
>> to guess.
>>
>> I don't think there is any API to test if a name is canonical, or to get the
>> canonical name, so I cannot see any way in advance to see if this problem
>> situation has arisen. So the only option I can think of is to fix it.
>>
>> The following patch (or something much like it for an older nfs-utils) seems
>> to do the trick.
>>
>> What do people think? Is this a reasonable thing to do? Is it likely to
>> have negative consequences that I haven't thought of?
>
> Neat-o.
>
> Are we adding a stat of every export in the export table where there
> wasn't one before? But those should typically be cached, so maybe
> there's no problem.
>
> I'm not super-fond of the combining the comparison of two paths and
> truncating of one into the same function, though I see why it's
> convenient.
>
> --b.
>
>>
>> Thanks,
>> NeilBrown
>>
>>
>>
>> diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
>> index 9a1bb2767ac2..2d91db76b867 100644
>> --- a/utils/mountd/cache.c
>> +++ b/utils/mountd/cache.c
>> @@ -377,6 +377,28 @@ static char *next_mnt(void **v, char *p)
>> return me->mnt_dir;
>> }
>>
>> +static int same_path(char *child, char *parent, int len)
>> +{
>> + static char p[PATH_MAX];
>> + struct stat sc, sp;
>> + if (len <= 0)
>> + len = strlen(child);
>> + strncpy(p, child, len);
>> + p[len] = 0;
>> + if (strcmp(p, parent) == 0)
>> + return 1;
>> +
Addressing Bruce concern, perhaps you would like to
also stricmp the names, so to not lstat on all export
entries.
+ if (stricmp(p, parent) =! 0)
+ return 0;
>> + if (lstat(p, &sc) != 0)
>> + return 0;
>> + if (lstat(parent, &sp) != 0)
>> + return 0;
>> + if (sc.st_dev != sp.st_dev)
>> + return 0;
>> + if (sc.st_ino != sp.st_ino)
>> + return 0;
Hard links on directories any one ? ;-)
Thanks
Boaz
>> + return 1;
>> +}
>> +
>> static int is_subdirectory(char *child, char *parent)
>> {
>> /* Check is child is strictly a subdirectory of
>> @@ -387,7 +409,7 @@ static int is_subdirectory(char *child, char *parent)
>> if (strcmp(parent, "/") == 0 && child[1] != 0)
>> return 1;
>>
>> - return (strncmp(child, parent, l) == 0 && child[l] == '/');
>> + return (same_path(child, parent, l) && child[l] == '/');
>> }
>>
>> static int path_matches(nfs_export *exp, char *path)
>> @@ -396,7 +418,7 @@ static int path_matches(nfs_export *exp, char *path)
>> * exact match, or does the export have CROSSMOUNT, and path
>> * is a descendant?
>> */
>> - return strcmp(path, exp->m_export.e_path) == 0
>> + return same_path(path, exp->m_export.e_path, 0)
>> || ((exp->m_export.e_flags & NFSEXP_CROSSMOUNT)
>> && is_subdirectory(path, exp->m_export.e_path));
>> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, 3 Apr 2014 13:24:31 -0400 "J. Bruce Fields" <[email protected]>
wrote:
> On Thu, Apr 03, 2014 at 01:16:25PM -0400, J. Bruce Fields wrote:
> > In the presence of bind mounts it's definitely possible for two
> > different paths to reach the same directory.
> >
> > I don't know if that's actually a problem here....
>
> Without testing: if we've got the same filesystem mounted at /foo/bar
> and /foo/baz, but only export /foo/bar, will this change make that
> export show up at /foo/baz as well?
>
> (Since we'll now consider /foo/bar and /foo/baz to match where
> previously they didn't?)
Good question. I should test, but I suspect "yes". I'm not really happy
about that.
Maybe we could use name_to_handle_at(). That returns the mnt_id
which is different for different bind mounts.
So if the mnt_id and the handle are the same, it is the same directory. If
not, then not.
I did worry a bit about all the extra stat calls, but as you say: they are
cached so it shouldn't be a big problem.
It would make sense to stat (or name_to_handle_at) the name from the kernel
just once, then compare the value against everything in the export table.
We could possible store handles in the export table, but then we would need
to check for changes to mountinfo (I think 'poll' can do that) and clear out
the cache whenever a mount changes.
I might play with some code if I find time...
Thanks,
NeilBrown