I discovered this problem by accident while doing some testing of the
Ganesha user space server. It was producing garbage fileids that
happened to have bit 31 set (0x80000000).
The telldir special test from cthon04 would fail. Investigating, I found
that it appeared that the getdents() was returning EOVERFLOW. It wasn't
too hard to track that down to the following code:
static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset,
u64 ino, unsigned int d_type)
{
struct readdir_callback * buf = (struct readdir_callback *) __buf;
struct old_linux_dirent __user * dirent;
unsigned long d_ino;
if (buf->result)
return -EINVAL;
d_ino = ino;
if (sizeof(d_ino) < sizeof(ino) && d_ino != ino)
return -EOVERFLOW;
It took adding some debug code to track the problem down to this
function:
u64 nfs_compat_user_ino64(u64 fileid)
{
int ino;
if (enable_ino64)
return fileid;
ino = fileid;
if (sizeof(ino) < sizeof(fileid))
ino ^= fileid >> (sizeof(fileid)-sizeof(ino)) * 8;
return ino;
}
In trying to reduce a 64 bit fileid to 32 bits, it produces a SIGNED 32
bit int! When this is passed to fillonedir as a uint64, a negative
number is sign extended. bit 31 of ino will be set if bit 31 OR bit 63
(but not both) is set in the fileid.
Turns out the fix is simple! Change ino to an unsigned int.
In order to test my fix in an orderly fashion, I used a simple process
to modify the fileids produced by the kernel server:
u64 warp_fileid(u64 fileid)
{
return (fileid & 0xffffffff7fffffefLL) | ((fileid & 0x10LL) << 27) | ((fileid &0x80000000LL) >> 27);
}
This means that every 16 inode numbers, bit 31 will be flipped,
producing plenty of problem fileids. The telldir test case fails with
this hacked kernel server.
Of course if anyone has a real file system with > 2G inodes, they could
see the problem for real, but I don't have a big enough file system...
Signed-off-by: Frank Filz <[email protected]>
---
diff -X ignore.patcher -ruNp linux-2.6.18-194.el5/fs/nfs/inode.c linux-2.6.18-194.ff/fs/nfs/inode.c
--- linux-2.6.18-194.el5/fs/nfs/inode.c 2010-12-01 15:52:11.000000000 -0800
+++ linux-2.6.18-194.ff/fs/nfs/inode.c 2010-12-01 16:53:28.000000000 -0800
@@ -71,7 +71,7 @@ static kmem_cache_t * nfs_inode_cachep;
*/
u64 nfs_compat_user_ino64(u64 fileid)
{
- int ino;
+ unsigned int ino;
if (enable_ino64)
return fileid;
On Wed, 2010-12-01 at 22:13 -0500, Trond Myklebust wrote:
> I'm suggesting it should rather match the compat_ulong_t, since that is
> what 64-bit kernels will need to deal with if running a 32-bit
> userspace. For 64-bit kernels that have no 32-bit userspace emulation
> layer, why would we care about returning a truncated 32-bit fileid?
>
> Conversely, if running a 32-bit kernel, then 'unsigned long' will
> directly match the types used by fs/readdir.c:filldir()
Ah, got it now... Sorry for denseness...
Hmm, what would be ideal is for nfs_compat_user_ino64() to know if it's
value will be passed to compat_filldir or filldir. If we use
compat_ulong_t when CONFIG_COMPAT is defined, and enable_ino64 = 0, it
will always reduce the fileid to 32 bits, even if being passed to
filldir on a 64 bit machine which is prepared to take a 64 bit fileid.
On the other hand, if enable_ino64 = 1, then fileids >= 2^32 will cause
an error if a 32 bit user space application is being invoked.
With enable_ino64 = 1 on a machine otherwise enabled for 64 bit inode
numbers, it will definitely work like any local file system with 64 bit
inode numebers.
With enable_ino64 = 0, it will always force fileid into 32 bits if there
is any 32 bit user space, which I suppose is good behavior.
I will submit a revised patch.
Thanks
Frank
On Wed, 2010-12-01 at 20:36 -0500, Trond Myklebust wrote:
> On Wed, 2010-12-01 at 17:03 -0800, Frank Filz wrote:
> > Signed-off-by: Frank Filz <[email protected]>
> > ---
> > diff -X ignore.patcher -ruNp linux-2.6.18-194.el5/fs/nfs/inode.c linux-2.6.18-194.ff/fs/nfs/inode.c
> > --- linux-2.6.18-194.el5/fs/nfs/inode.c 2010-12-01 15:52:11.000000000 -0800
> > +++ linux-2.6.18-194.ff/fs/nfs/inode.c 2010-12-01 16:53:28.000000000 -0800
> > @@ -71,7 +71,7 @@ static kmem_cache_t * nfs_inode_cachep;
> > */
> > u64 nfs_compat_user_ino64(u64 fileid)
> > {
> > - int ino;
> > + unsigned int ino;
>
> Shouldn't this just be of type 'compat_ulong_t' if CONFIG_COMPAT is
> defined, and of type 'unsigned long' if not?
The full (patched) function is:
u64 nfs_compat_user_ino64(u64 fileid)
{
unsigned int ino;
if (enable_ino64)
return fileid;
ino = fileid;
if (sizeof(ino) < sizeof(fileid))
ino ^= fileid >> (sizeof(fileid)-sizeof(ino)) * 8;
return ino;
}
ino is only used if the function is expected to return a 32 bit fileid,
so no need for it to be anything other than an unsigned int. I suppose
it should actually be a uint32.
Frank
On Wed, 2010-12-01 at 19:01 -0800, Frank Filz wrote:
> On Wed, 2010-12-01 at 20:36 -0500, Trond Myklebust wrote:
> > On Wed, 2010-12-01 at 17:03 -0800, Frank Filz wrote:
> > > Signed-off-by: Frank Filz <[email protected]>
> > > ---
> > > diff -X ignore.patcher -ruNp linux-2.6.18-194.el5/fs/nfs/inode.c linux-2.6.18-194.ff/fs/nfs/inode.c
> > > --- linux-2.6.18-194.el5/fs/nfs/inode.c 2010-12-01 15:52:11.000000000 -0800
> > > +++ linux-2.6.18-194.ff/fs/nfs/inode.c 2010-12-01 16:53:28.000000000 -0800
> > > @@ -71,7 +71,7 @@ static kmem_cache_t * nfs_inode_cachep;
> > > */
> > > u64 nfs_compat_user_ino64(u64 fileid)
> > > {
> > > - int ino;
> > > + unsigned int ino;
> >
> > Shouldn't this just be of type 'compat_ulong_t' if CONFIG_COMPAT is
> > defined, and of type 'unsigned long' if not?
>
> The full (patched) function is:
>
> u64 nfs_compat_user_ino64(u64 fileid)
> {
> unsigned int ino;
>
> if (enable_ino64)
> return fileid;
> ino = fileid;
> if (sizeof(ino) < sizeof(fileid))
> ino ^= fileid >> (sizeof(fileid)-sizeof(ino)) * 8;
> return ino;
> }
>
> ino is only used if the function is expected to return a 32 bit fileid,
> so no need for it to be anything other than an unsigned int. I suppose
> it should actually be a uint32.
I'm suggesting it should rather match the compat_ulong_t, since that is
what 64-bit kernels will need to deal with if running a 32-bit
userspace. For 64-bit kernels that have no 32-bit userspace emulation
layer, why would we care about returning a truncated 32-bit fileid?
Conversely, if running a 32-bit kernel, then 'unsigned long' will
directly match the types used by fs/readdir.c:filldir()
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Wed, 2010-12-01 at 17:03 -0800, Frank Filz wrote:
>
>
> I discovered this problem by accident while doing some testing of the
> Ganesha user space server. It was producing garbage fileids that
> happened to have bit 31 set (0x80000000).
>
> The telldir special test from cthon04 would fail. Investigating, I found
> that it appeared that the getdents() was returning EOVERFLOW. It wasn't
> too hard to track that down to the following code:
>
> static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset,
> u64 ino, unsigned int d_type)
> {
> struct readdir_callback * buf = (struct readdir_callback *) __buf;
> struct old_linux_dirent __user * dirent;
> unsigned long d_ino;
>
> if (buf->result)
> return -EINVAL;
> d_ino = ino;
> if (sizeof(d_ino) < sizeof(ino) && d_ino != ino)
> return -EOVERFLOW;
>
> It took adding some debug code to track the problem down to this
> function:
>
> u64 nfs_compat_user_ino64(u64 fileid)
> {
> int ino;
>
> if (enable_ino64)
> return fileid;
> ino = fileid;
> if (sizeof(ino) < sizeof(fileid))
> ino ^= fileid >> (sizeof(fileid)-sizeof(ino)) * 8;
> return ino;
> }
>
> In trying to reduce a 64 bit fileid to 32 bits, it produces a SIGNED 32
> bit int! When this is passed to fillonedir as a uint64, a negative
> number is sign extended. bit 31 of ino will be set if bit 31 OR bit 63
> (but not both) is set in the fileid.
>
> Turns out the fix is simple! Change ino to an unsigned int.
>
> In order to test my fix in an orderly fashion, I used a simple process
> to modify the fileids produced by the kernel server:
>
> u64 warp_fileid(u64 fileid)
> {
> return (fileid & 0xffffffff7fffffefLL) | ((fileid & 0x10LL) << 27) | ((fileid &0x80000000LL) >> 27);
> }
>
> This means that every 16 inode numbers, bit 31 will be flipped,
> producing plenty of problem fileids. The telldir test case fails with
> this hacked kernel server.
>
> Of course if anyone has a real file system with > 2G inodes, they could
> see the problem for real, but I don't have a big enough file system...
>
>
>
>
>
> Signed-off-by: Frank Filz <[email protected]>
> ---
> diff -X ignore.patcher -ruNp linux-2.6.18-194.el5/fs/nfs/inode.c linux-2.6.18-194.ff/fs/nfs/inode.c
> --- linux-2.6.18-194.el5/fs/nfs/inode.c 2010-12-01 15:52:11.000000000 -0800
> +++ linux-2.6.18-194.ff/fs/nfs/inode.c 2010-12-01 16:53:28.000000000 -0800
> @@ -71,7 +71,7 @@ static kmem_cache_t * nfs_inode_cachep;
> */
> u64 nfs_compat_user_ino64(u64 fileid)
> {
> - int ino;
> + unsigned int ino;
Shouldn't this just be of type 'compat_ulong_t' if CONFIG_COMPAT is
defined, and of type 'unsigned long' if not?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com