2010-06-29 20:03:16

by David Howells

[permalink] [raw]
Subject: [PATCH 0/3] Extended file stat functions

Implement a pair of new system calls to provide extended and further extensible
stat functions.

The third of the associated patches provides these new system calls:

struct xstat_dev {
unsigned int major;
unsigned int minor;
};

struct xstat_time {
unsigned long long tv_sec;
unsigned long long tv_nsec;
};

struct xstat {
unsigned int struct_version;
#define XSTAT_STRUCT_VERSION 0
unsigned int st_mode;
unsigned int st_nlink;
unsigned int st_uid;
unsigned int st_gid;
unsigned int st_blksize;
struct xstat_dev st_rdev;
struct xstat_dev st_dev;
unsigned long long st_ino;
unsigned long long st_size;
struct xstat_time st_atime;
struct xstat_time st_mtime;
struct xstat_time st_ctime;
struct xstat_time st_crtime;
unsigned long long st_blocks;
unsigned long long st_inode_version;
unsigned long long st_data_version;
unsigned long long query_flags;
#define XSTAT_QUERY_CREATION_TIME 0x00000001ULL
#define XSTAT_QUERY_INODE_VERSION 0x00000002ULL
#define XSTAT_QUERY_DATA_VERSION 0x00000004ULL
unsigned long long extra_results[0];
};

ssize_t ret = xstat(int dfd,
const char *filename,
unsigned atflag,
struct xstat *buffer,
size_t buflen);

ssize_t ret = fxstat(int fd,
struct xstat *buffer,
size_t buflen);

which are more fully documented in that patch's description.

The bonuses of these new stat functions are:

(1) The fields in the xstat struct are cleaned up. There are no split or
duplicated fields.

(2) Some extra information is made available (file creation time, inode
version number and data version number) where provided by the underlying
filesystem.

These are implemented here for Ext4 and AFS, but could also be provided
for CIFS, NTFS and BtrFS and probably others.

(3) The structure is versioned and extensible, meaning that further new system
calls shouldn't be required.

Note that no lstat() equivalent is required as that can be implemented through
xstat() with atflag == 0.


The first patch makes const a bunch of system call userspace string/buffer
arguments. I can then make sys_xstat()'s filename pointer const too (though
the entire first patch is not required for that).

The second patch makes the AFS filesystem use i_generation for the vnode ID
uniquifier rather than i_version, and assigns i_version to hold the AFS data
version number, making them more logical for when I want to get at them from
afs_getattr().


There's a test program attached to the description for patch 3. It can be run
as follows:

[root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/i386/repodata/
xstat(/afs/archive/linuxdev/fedora9/i386/repodata/) = 152
sv=0 qf=6 cr=0.0 iv=7a5 dv=5
Size: 2048 Blocks: 0 IO Block: 4096 directory
Device: 00:13 Inode: 83 Links: 2
Access: (0755/drwxr-xr-x) Uid: 75338 Gid: 0
Access: 2008-11-05 20:00:12.000000000+0000
Modify: 2008-11-05 20:00:12.000000000+0000
Change: 2008-11-05 20:00:12.000000000+0000
Inode version: 7a5h
Data version: 5h


Things that need consideration:

(1) Is it worth retaining the ability to arbitrarily add extra bits onto the
end of the stat buffer? And what's the best way to do this?

I've defined a way that from userspace involves assigning bits in
query_flags to extra results that you might want. But this could instead
be done, say, by just upping the struct version number any time we want to
pass back more information. Alternatively, we could go for a tagged data
method, perhaps using the same format as the recvmsg() control message
field.

If we use tagged data then rather than being selective, we could just
return as many tagged data items as we feel the user might want and we can
cram into the buffer. That could be rather slow, though.

(2) What extra bits of information might we like to see available through the
stat interface? Security labels? NFS file IDs? Xattrs?

If we went for a tagged data method, xstat() could be modified to take a
list of tags as an argument, and could then return arbitrarily-sized
tagged results, including fs-specific stuff.

(3) Does st_blksize really need to be 64 bits on a 64-bit system? Or can it
be 32-bits? Are we really likely to see something with a 4Gb+ blocksize?

(4) Should the inode number and data version number fields be 128-bit?

David
---

David Howells (3):
Add a pair of system calls to make extended file stats available
AFS: Use i_generation not i_version for the vnode uniquifier
Mark arguments to certain syscalls as being const


arch/alpha/kernel/osf_sys.c | 6 +
arch/alpha/kernel/process.c | 2
arch/arm/kernel/sys_arm.c | 4 -
arch/arm/kernel/sys_oabi-compat.c | 6 +
arch/avr32/include/asm/syscalls.h | 2
arch/avr32/kernel/process.c | 3 -
arch/blackfin/kernel/process.c | 2
arch/frv/kernel/process.c | 3 -
arch/h8300/kernel/process.c | 2
arch/ia64/include/asm/unistd.h | 2
arch/ia64/kernel/process.c | 2
arch/m32r/kernel/process.c | 3 -
arch/m68k/kernel/process.c | 2
arch/m68knommu/kernel/process.c | 2
arch/microblaze/kernel/sys_microblaze.c | 2
arch/mips/kernel/syscall.c | 2
arch/mn10300/kernel/process.c | 2
arch/parisc/hpux/fs.c | 7 +
arch/powerpc/kernel/process.c | 2
arch/powerpc/kernel/sys_ppc32.c | 2
arch/s390/kernel/compat_linux.c | 10 +-
arch/s390/kernel/compat_linux.h | 10 +-
arch/s390/kernel/entry.h | 2
arch/s390/kernel/process.c | 2
arch/sh/include/asm/syscalls_32.h | 2
arch/sh/include/asm/syscalls_64.h | 2
arch/sh/kernel/process_64.c | 2
arch/sparc/kernel/sys_sparc32.c | 7 +
arch/um/kernel/exec.c | 6 +
arch/um/kernel/internal.h | 2
arch/um/kernel/syscall.c | 2
arch/x86/ia32/sys_ia32.c | 14 +--
arch/x86/include/asm/sys_ia32.h | 12 +-
arch/x86/include/asm/syscalls.h | 2
arch/x86/include/asm/unistd_32.h | 4 +
arch/x86/include/asm/unistd_64.h | 4 +
arch/x86/kernel/entry_64.S | 4 -
arch/x86/kernel/process.c | 2
arch/xtensa/kernel/process.c | 2
fs/afs/dir.c | 8 +-
fs/afs/fsclient.c | 3 -
fs/afs/inode.c | 22 ++--
fs/compat.c | 23 +++--
fs/ext4/ext4.h | 2
fs/ext4/file.c | 2
fs/ext4/inode.c | 27 +++++
fs/ext4/namei.c | 2
fs/ext4/symlink.c | 2
fs/stat.c | 154 ++++++++++++++++++++++++++++---
fs/utimes.c | 7 +
include/linux/compat.h | 6 +
include/linux/fs.h | 6 +
include/linux/stat.h | 46 +++++++++
include/linux/syscalls.h | 25 +++--
include/linux/time.h | 2
55 files changed, 353 insertions(+), 133 deletions(-)


2010-06-29 20:03:31

by David Howells

[permalink] [raw]
Subject: [PATCH 2/3] AFS: Use i_generation not i_version for the vnode uniquifier

Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct. i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <[email protected]>
---

fs/afs/dir.c | 8 ++++----
fs/afs/fsclient.c | 3 ++-
fs/afs/inode.c | 10 +++++-----
3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index b42d5cc..afb9ff8 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -542,11 +542,11 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
dentry->d_op = &afs_fs_dentry_operations;

d_add(dentry, inode);
- _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%llu }",
+ _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%u }",
fid.vnode,
fid.unique,
dentry->d_inode->i_ino,
- (unsigned long long)dentry->d_inode->i_version);
+ dentry->d_inode->i_generation);

return NULL;
}
@@ -626,10 +626,10 @@ static int afs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
* been deleted and replaced, and the original vnode ID has
* been reused */
if (fid.unique != vnode->fid.unique) {
- _debug("%s: file deleted (uq %u -> %u I:%llu)",
+ _debug("%s: file deleted (uq %u -> %u I:%u)",
dentry->d_name.name, fid.unique,
vnode->fid.unique,
- (unsigned long long)dentry->d_inode->i_version);
+ dentry->d_inode->i_generation);
spin_lock(&vnode->lock);
set_bit(AFS_VNODE_DELETED, &vnode->flags);
spin_unlock(&vnode->lock);
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 4bd0218..346e328 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -89,7 +89,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
i_size_write(&vnode->vfs_inode, size);
vnode->vfs_inode.i_uid = status->owner;
vnode->vfs_inode.i_gid = status->group;
- vnode->vfs_inode.i_version = vnode->fid.unique;
+ vnode->vfs_inode.i_generation = vnode->fid.unique;
vnode->vfs_inode.i_nlink = status->nlink;

mode = vnode->vfs_inode.i_mode;
@@ -102,6 +102,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
vnode->vfs_inode.i_ctime.tv_sec = status->mtime_server;
vnode->vfs_inode.i_mtime = vnode->vfs_inode.i_ctime;
vnode->vfs_inode.i_atime = vnode->vfs_inode.i_ctime;
+ vnode->vfs_inode.i_version = data_version;
}

expected_version = status->data_version;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index d00b312..ee3190a 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -73,7 +73,8 @@ static int afs_inode_map_status(struct afs_vnode *vnode, struct key *key)
inode->i_ctime.tv_nsec = 0;
inode->i_atime = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
- inode->i_version = vnode->fid.unique;
+ inode->i_generation = vnode->fid.unique;
+ inode->i_version = vnode->status.data_version;
inode->i_mapping->a_ops = &afs_fs_aops;

/* check to see whether a symbolic link is really a mountpoint */
@@ -98,7 +99,7 @@ static int afs_iget5_test(struct inode *inode, void *opaque)
struct afs_iget_data *data = opaque;

return inode->i_ino == data->fid.vnode &&
- inode->i_version == data->fid.unique;
+ inode->i_generation == data->fid.unique;
}

/*
@@ -110,7 +111,7 @@ static int afs_iget5_set(struct inode *inode, void *opaque)
struct afs_vnode *vnode = AFS_FS_I(inode);

inode->i_ino = data->fid.vnode;
- inode->i_version = data->fid.unique;
+ inode->i_generation = data->fid.unique;
vnode->fid = data->fid;
vnode->volume = data->volume;

@@ -306,8 +307,7 @@ int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,

inode = dentry->d_inode;

- _enter("{ ino=%lu v=%llu }", inode->i_ino,
- (unsigned long long)inode->i_version);
+ _enter("{ ino=%lu v=%u }", inode->i_ino, inode->i_generation);

generic_fillattr(inode, stat);
return 0;

2010-06-29 20:03:40

by David Howells

[permalink] [raw]
Subject: [PATCH 3/3] Add a pair of system calls to make extended file stats available

Add a pair of system calls to make extended file stats available, including
file creation time, inode version and data version where available through the
underlying filesystem:

struct xstat_dev {
unsigned int major;
unsigned int minor;
};

struct xstat_time {
unsigned long long tv_sec;
unsigned long long tv_nsec;
};

struct xstat {
unsigned int struct_version;
#define XSTAT_STRUCT_VERSION 0
unsigned int st_mode;
unsigned int st_nlink;
unsigned int st_uid;
unsigned int st_gid;
unsigned int st_blksize;
struct xstat_dev st_rdev;
struct xstat_dev st_dev;
unsigned long long st_ino;
unsigned long long st_size;
struct xstat_time st_atime;
struct xstat_time st_mtime;
struct xstat_time st_ctime;
struct xstat_time st_crtime;
unsigned long long st_blocks;
unsigned long long st_inode_version;
unsigned long long st_data_version;
unsigned long long query_flags;
#define XSTAT_QUERY_CREATION_TIME 0x00000001ULL
#define XSTAT_QUERY_INODE_VERSION 0x00000002ULL
#define XSTAT_QUERY_DATA_VERSION 0x00000004ULL
unsigned long long extra_results[0];
};

ssize_t ret = xstat(int dfd,
const char *filename,
unsigned atflag,
struct xstat *buffer,
size_t buflen);

ssize_t ret = fxstat(int fd,
struct xstat *buffer,
size_t buflen);


The dfd, filename, atflag and fd parameters indicate the file to query. There
is no equivalent of lstat() as that can be emulated with xstat(), passing 0
instead of AT_SYMLINK_NOFOLLOW as atflag.

When the system call is executed, the struct_version ID and query_flags bitmask
are read from the buffer to work out what the user is requesting.

If the structure version specified is not supported, the system call will
return ENOTSUPP. The above structure is version 0.

The query_flags should be set by the caller to specify extra results that the
caller may desire. These come in two classes:

(1) Creation time, Inode version and Data version.

These will be returned if available whether the caller asked for them or
not. The corresponding bits in query_flags will be set or cleared as
appropriate to indicate their presence.

Query Flag Field
=============================== ================
XSTAT_QUERY_CREATION_TIME st_crtime
XSTAT_QUERY_INODE_VERSION st_inode_version
XSTAT_QUERY_DATA_VERSION st_data_version

(2) Extra results.

These will only be returned if the caller asked for them by setting their
bits in query_flags. They will be placed in the buffer after the xstat
struct in ascending query_flags bit order. Any bit set in query_flags
mask will be left set if the result is available and cleared otherwise.

The pointer into the results list will be rounded up to the nearest 8-byte
boundary after each result is written in. The size of each extra result
is specific to the definition for that result.

No extra results are currently defined.

If the buffer is insufficiently big, the syscall returns the amount of space it
will need to write the complete result set, but otherwise does nothing.

If successful, the amount of data written into the buffer will be returned.

At the moment, this will only work on x86_64 as it requires system calls to be
wired up.


===========
FILESYSTEMS
===========

Ext4 is modified to make use of this facility. It will return the creation
time and inode version number for all files. It will, however, only return the
data version number for directories as i_version is only maintained for them.

AFS is modified to make use of this facility too. It will return the vnode ID
uniquifier as the inode version and the AFS data version number as the data
version. There is no file creation time available.


=======
TESTING
=======

The following test program can be used to test the xstat system call:

#define _GNU_SOURCE
#define _ATFILE_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <sys/syscall.h>
#include <sys/stat.h>
#include <sys/types.h>

struct xstat_dev {
unsigned int major;
unsigned int minor;
};

struct xstat_time {
unsigned long long tv_sec;
unsigned long long tv_nsec;
};

struct xstat {
unsigned int struct_version;
#define XSTAT_STRUCT_VERSION 0
unsigned int st_mode;
unsigned int st_nlink;
unsigned int st_uid;
unsigned int st_gid;
unsigned int st_blksize;
struct xstat_dev st_rdev;
struct xstat_dev st_dev;
unsigned long long st_ino;
unsigned long long st_size;
struct xstat_time st_atim;
struct xstat_time st_mtim;
struct xstat_time st_ctim;
struct xstat_time st_crtim;
unsigned long long st_blocks;
unsigned long long st_inode_version;
unsigned long long st_data_version;
unsigned long long query_flags;
#define XSTAT_QUERY_CREATION_TIME 0x00000001ULL
#define XSTAT_QUERY_INODE_VERSION 0x00000002ULL
#define XSTAT_QUERY_DATA_VERSION 0x00000004ULL
unsigned long long extra_results[0];
};

#define __NR_xstat 300
#define __NR_fxstat 301

static __attribute__((unused))
ssize_t xstat(int dfd, const char *filename, int atflag,
struct xstat *buffer, size_t bufsize)
{
return syscall(__NR_xstat, dfd, filename, atflag, buffer, bufsize);
}

static __attribute__((unused))
ssize_t fxstat(int fd, struct xstat *buffer, size_t bufsize)
{
return syscall(__NR_fxstat, fd, buffer, bufsize);
}

static void print_time(const struct xstat_time *xstm)
{
struct tm tm;
time_t tim;
char buffer[100];
int len;

tim = xstm->tv_sec;
if (!localtime_r(&tim, &tm)) {
perror("localtime_r");
exit(1);
}
len = strftime(buffer, 100, "%F %T", &tm);
if (len == 0) {
perror("strftime");
exit(1);
}
fwrite(buffer, 1, len, stdout);
printf(".%09llu", xstm->tv_nsec);
len = strftime(buffer, 100, "%z", &tm);
if (len == 0) {
perror("strftime2");
exit(1);
}
fwrite(buffer, 1, len, stdout);
}

static void dump_xstat(struct xstat *xst)
{
char buffer[256], ft;

printf(" Size: %-15llu Blocks: %-10llu IO Block: %-6u ",
xst->st_size, xst->st_blocks, xst->st_blksize);
switch (xst->st_mode & S_IFMT) {
case S_IFIFO: printf("FIFO\n"); ft = 'p'; break;
case S_IFCHR: printf("character special file\n"); ft = 'c'; break;
case S_IFDIR: printf("directory\n"); ft = 'd'; break;
case S_IFBLK: printf("block special file\n"); ft = 'b'; break;
case S_IFREG: printf("regular file\n"); ft = '-'; break;
case S_IFLNK: printf("symbolic link\n"); ft = 'l'; break;
case S_IFSOCK: printf("socket\n"); ft = 's'; break;
default:
printf("unknown type (%o)\n", xst->st_mode & S_IFMT);
ft = '?';
break;
}

sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor);
printf("Device: %-15s Inode: %-11llu Links: %u\n",
buffer, xst->st_ino, xst->st_nlink);

printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ",
xst->st_mode & 07777,
ft,
xst->st_mode & S_IRUSR ? 'r' : '-',
xst->st_mode & S_IWUSR ? 'w' : '-',
xst->st_mode & S_IXUSR ? 'x' : '-',
xst->st_mode & S_IRGRP ? 'r' : '-',
xst->st_mode & S_IWGRP ? 'w' : '-',
xst->st_mode & S_IXGRP ? 'x' : '-',
xst->st_mode & S_IROTH ? 'r' : '-',
xst->st_mode & S_IWOTH ? 'w' : '-',
xst->st_mode & S_IXOTH ? 'x' : '-');
printf("Uid: %d Gid: %u\n", xst->st_uid, xst->st_gid);

printf("Access: "); print_time(&xst->st_atim); printf("\n");
printf("Modify: "); print_time(&xst->st_mtim); printf("\n");
printf("Change: "); print_time(&xst->st_ctim); printf("\n");
if (xst->query_flags & XSTAT_QUERY_CREATION_TIME) {
printf("Create: "); print_time(&xst->st_crtim); printf("\n");
}

if (xst->query_flags & XSTAT_QUERY_INODE_VERSION)
printf("Inode version: %llxh\n", xst->st_inode_version);
if (xst->query_flags & XSTAT_QUERY_DATA_VERSION)
printf("Data version: %llxh\n", xst->st_data_version);
}

int main(int argc, char **argv)
{
struct xstat xst;
int ret, atflag = AT_SYMLINK_NOFOLLOW;

for (argv++; *argv; argv++) {
if (strcmp(*argv, "-L") == 0) {
atflag = 0;
continue;
}

memset(&xst, 0xbf, sizeof(xst));
xst.struct_version = 0;
xst.query_flags = XSTAT_QUERY_CREATION_TIME |
XSTAT_QUERY_INODE_VERSION |
XSTAT_QUERY_DATA_VERSION;
ret = xstat(AT_FDCWD, *argv, atflag, &xst, sizeof(xst));
printf("xstat(%s) = %d\n", *argv, ret);
if (ret < 0) {
perror(*argv);
exit(1);
}

dump_xstat(&xst);
}
return 0;
}

Just compile and run, passing it paths to the files you want to examine:

[root@andromeda ~]# /tmp/xstat /var/cache/fscache/cache/
xstat(/var/cache/fscache/cache/) = 152
Size: 4096 Blocks: 16 IO Block: 4096 directory
Device: 08:06 Inode: 130561 Links: 3
Access: (0700/drwx------) Uid: 0 Gid: 0
Access: 2010-06-29 18:16:33.680703545+0100
Modify: 2010-06-29 18:16:20.132786632+0100
Change: 2010-06-29 18:16:20.132786632+0100
Create: 2010-06-25 15:17:39.471199293+0100
Inode version: f585ab70h
Data version: 2h
[root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/i386/repodata/
xstat(/afs/archive/linuxdev/fedora9/i386/repodata/) = 152
Size: 2048 Blocks: 0 IO Block: 4096 directory
Device: 00:13 Inode: 83 Links: 2
Access: (0755/drwxr-xr-x) Uid: 75338 Gid: 0
Access: 2008-11-05 20:00:12.000000000+0000
Modify: 2008-11-05 20:00:12.000000000+0000
Change: 2008-11-05 20:00:12.000000000+0000
Inode version: 7a5h
Data version: 5h


Signed-off-by: David Howells <[email protected]>
---

arch/x86/include/asm/unistd_32.h | 4 +
arch/x86/include/asm/unistd_64.h | 4 +
fs/afs/inode.c | 12 ++--
fs/ext4/ext4.h | 2 +
fs/ext4/file.c | 2 -
fs/ext4/inode.c | 27 +++++++-
fs/ext4/namei.c | 2 +
fs/ext4/symlink.c | 2 +
fs/stat.c | 125 +++++++++++++++++++++++++++++++++++++-
include/linux/stat.h | 46 ++++++++++++++
include/linux/syscalls.h | 5 ++
11 files changed, 217 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index beb9b5f..a9953cc 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -343,10 +343,12 @@
#define __NR_rt_tgsigqueueinfo 335
#define __NR_perf_event_open 336
#define __NR_recvmmsg 337
+#define __NR_xstat 338
+#define __NR_fxstat 339

#ifdef __KERNEL__

-#define NR_syscalls 338
+#define NR_syscalls 340

#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..c90d240 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,10 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
#define __NR_recvmmsg 299
__SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_xstat 300
+__SYSCALL(__NR_xstat, sys_xstat)
+#define __NR_fxstat 301
+__SYSCALL(__NR_fxstat, sys_fxstat)

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index ee3190a..1b5b4c8 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -300,16 +300,18 @@ error_unlock:
/*
* read the attributes of an inode
*/
-int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
- struct kstat *stat)
+int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
- struct inode *inode;
-
- inode = dentry->d_inode;
+ struct inode *inode = dentry->d_inode;

_enter("{ ino=%lu v=%u }", inode->i_ino, inode->i_generation);

generic_fillattr(inode, stat);
+
+ stat->result_flags |=
+ XSTAT_QUERY_INODE_VERSION | XSTAT_QUERY_DATA_VERSION;
+ stat->inode_version = inode->i_generation;
+ stat->data_version = inode->i_version;
return 0;
}

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..96823f3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1571,6 +1571,8 @@ extern int ext4_write_inode(struct inode *, struct writeback_control *);
extern int ext4_setattr(struct dentry *, struct iattr *);
extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat);
+extern int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat);
extern void ext4_delete_inode(struct inode *);
extern int ext4_sync_inode(handle_t *, struct inode *);
extern void ext4_dirty_inode(struct inode *);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 5313ae4..18c29ab 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -150,7 +150,7 @@ const struct file_operations ext4_file_operations = {
const struct inode_operations ext4_file_inode_operations = {
.truncate = ext4_truncate,
.setattr = ext4_setattr,
- .getattr = ext4_getattr,
+ .getattr = ext4_file_getattr,
#ifdef CONFIG_EXT4_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 42272d6..8e374f3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5550,12 +5550,33 @@ err_out:
int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
{
- struct inode *inode;
- unsigned long delalloc_blocks;
+ struct inode *inode = dentry->d_inode;

- inode = dentry->d_inode;
generic_fillattr(inode, stat);

+ stat->result_flags |= XSTAT_QUERY_CREATION_TIME;
+ stat->crtime.tv_sec = EXT4_I(inode)->i_crtime.tv_sec;
+ stat->crtime.tv_nsec = EXT4_I(inode)->i_crtime.tv_nsec;
+
+ if (inode->i_ino != EXT4_ROOT_INO) {
+ stat->result_flags |= XSTAT_QUERY_INODE_VERSION;
+ stat->inode_version = inode->i_generation;
+ }
+ if (S_ISDIR(inode->i_mode)) {
+ stat->result_flags |= XSTAT_QUERY_DATA_VERSION;
+ stat->data_version = inode->i_version;
+ }
+ return 0;
+}
+
+int ext4_file_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+ struct inode *inode = dentry->d_inode;
+ unsigned long delalloc_blocks;
+
+ ext4_getattr(mnt, dentry, stat);
+
/*
* We can't update i_blocks if the block allocation is delayed
* otherwise in the case of system crash before the real block
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a43e661..0f776c7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2542,6 +2542,7 @@ const struct inode_operations ext4_dir_inode_operations = {
.mknod = ext4_mknod,
.rename = ext4_rename,
.setattr = ext4_setattr,
+ .getattr = ext4_getattr,
#ifdef CONFIG_EXT4_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
@@ -2554,6 +2555,7 @@ const struct inode_operations ext4_dir_inode_operations = {

const struct inode_operations ext4_special_inode_operations = {
.setattr = ext4_setattr,
+ .getattr = ext4_getattr,
#ifdef CONFIG_EXT4_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index ed9354a..d8fe7fb 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -35,6 +35,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
.follow_link = page_follow_link_light,
.put_link = page_put_link,
.setattr = ext4_setattr,
+ .getattr = ext4_getattr,
#ifdef CONFIG_EXT4_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
@@ -47,6 +48,7 @@ const struct inode_operations ext4_fast_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = ext4_follow_link,
.setattr = ext4_setattr,
+ .getattr = ext4_getattr,
#ifdef CONFIG_EXT4_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
diff --git a/fs/stat.c b/fs/stat.c
index 12e90e2..5edb63a 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -115,7 +115,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
{
static int warncount = 5;
struct __old_kernel_stat tmp;
-
+
if (warncount > 0) {
warncount--;
printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n",
@@ -140,7 +140,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
#if BITS_PER_LONG == 32
if (stat->size > MAX_NON_LFS)
return -EOVERFLOW;
-#endif
+#endif
tmp.st_size = stat->size;
tmp.st_atime = stat->atime.tv_sec;
tmp.st_mtime = stat->mtime.tv_sec;
@@ -222,7 +222,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
#if BITS_PER_LONG == 32
if (stat->size > MAX_NON_LFS)
return -EOVERFLOW;
-#endif
+#endif
tmp.st_size = stat->size;
tmp.st_atime = stat->atime.tv_sec;
tmp.st_mtime = stat->mtime.tv_sec;
@@ -408,6 +408,125 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
}
#endif /* __ARCH_WANT_STAT64 */

+/*
+ * check the input parameters in the xstat struct
+ */
+static noinline int xstat_check_param(struct xstat __user *buffer, size_t bufsize,
+ struct kstat *stat)
+{
+ u32 struct_version;
+ int ret;
+
+ /* if the buffer isn't large enough, return how much we wanted to
+ * write, but otherwise do nothing */
+ if (bufsize < sizeof(struct xstat))
+ return sizeof(struct xstat);
+
+ ret = get_user(struct_version, &buffer->struct_version);
+ if (ret < 0)
+ return ret;
+ if (struct_version != 0)
+ return -ENOTSUPP;
+
+ memset(stat, 0xde, sizeof(*stat));
+
+ ret = get_user(stat->query_flags, &buffer->query_flags);
+ if (ret < 0)
+ return ret;
+
+ /* nothing outside this set has a defined purpose */
+ stat->query_flags &= (XSTAT_QUERY_CREATION_TIME |
+ XSTAT_QUERY_INODE_VERSION |
+ XSTAT_QUERY_DATA_VERSION);
+
+ /* the user gets these whatever */
+ stat->query_flags |= (XSTAT_QUERY_CREATION_TIME |
+ XSTAT_QUERY_INODE_VERSION |
+ XSTAT_QUERY_DATA_VERSION);
+ stat->result_flags = 0;
+ return 0;
+}
+
+/*
+ * copy the extended stats to userspace and return the amount of data written
+ * into the buffer
+ */
+static noinline long xstat_set_result(struct kstat *stat,
+ struct xstat __user *buffer, size_t bufsize)
+{
+ struct xstat tmp;
+
+ memset(&tmp, 0, sizeof(tmp));
+ tmp.struct_version = XSTAT_STRUCT_VERSION;
+ tmp.query_flags = stat->result_flags;
+ tmp.st_dev.major = MAJOR(stat->dev);
+ tmp.st_dev.minor = MINOR(stat->dev);
+ tmp.st_rdev.major = MAJOR(stat->rdev);
+ tmp.st_rdev.minor = MINOR(stat->rdev);
+ tmp.st_ino = stat->ino;
+ tmp.st_mode = stat->mode;
+ tmp.st_nlink = stat->nlink;
+ tmp.st_uid = stat->uid;
+ tmp.st_gid = stat->gid;
+ tmp.st_atime.tv_sec = stat->atime.tv_sec;
+ tmp.st_atime.tv_nsec = stat->atime.tv_nsec;
+ tmp.st_mtime.tv_sec = stat->mtime.tv_sec;
+ tmp.st_mtime.tv_nsec = stat->mtime.tv_nsec;
+ tmp.st_ctime.tv_sec = stat->ctime.tv_sec;
+ tmp.st_ctime.tv_nsec = stat->ctime.tv_nsec;
+ tmp.st_size = stat->size;
+ tmp.st_blocks = stat->blocks;
+ tmp.st_blksize = stat->blksize;
+
+ if (stat->result_flags & XSTAT_QUERY_CREATION_TIME) {
+ tmp.st_crtime.tv_sec = stat->crtime.tv_sec;
+ tmp.st_crtime.tv_nsec = stat->crtime.tv_nsec;
+ }
+ if (stat->result_flags & XSTAT_QUERY_INODE_VERSION)
+ tmp.st_inode_version = stat->inode_version;
+ if (stat->result_flags & XSTAT_QUERY_DATA_VERSION)
+ tmp.st_data_version = stat->data_version;
+
+ return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : sizeof(tmp);
+}
+
+/*
+ * System call to get extended stats by path
+ */
+SYSCALL_DEFINE5(xstat,
+ int, dfd, const char __user *, filename, unsigned, atflag,
+ struct xstat __user *, buffer, size_t, bufsize)
+{
+ struct kstat stat;
+ int error;
+
+ error = xstat_check_param(buffer, bufsize, &stat);
+ if (error != 0)
+ return error;
+ error = vfs_fstatat(dfd, filename, &stat, atflag);
+ if (error)
+ return error;
+ return xstat_set_result(&stat, buffer, bufsize);
+}
+
+/*
+ * System call to get extended stats by file descriptor
+ */
+SYSCALL_DEFINE3(fxstat, int, fd, struct xstat __user *, buffer, size_t, bufsize)
+{
+ struct kstat stat;
+ int error;
+
+ error = xstat_check_param(buffer, bufsize, &stat);
+ if (error < 0)
+ return error;
+ error = vfs_fstat(fd, &stat);
+ if (error)
+ return error;
+
+ return xstat_set_result(&stat, buffer, bufsize);
+}
+
/* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
void __inode_add_bytes(struct inode *inode, loff_t bytes)
{
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 611c398..d48bb5d 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -46,6 +46,45 @@

#endif

+/*
+ * Extended stat structures
+ */
+struct xstat_dev {
+ unsigned int major;
+ unsigned int minor;
+};
+
+struct xstat_time {
+ unsigned long long tv_sec;
+ unsigned long long tv_nsec;
+};
+
+struct xstat {
+ unsigned int struct_version;
+#define XSTAT_STRUCT_VERSION 0
+ unsigned int st_mode;
+ unsigned int st_nlink;
+ unsigned int st_uid;
+ unsigned int st_gid;
+ unsigned int st_blksize;
+ struct xstat_dev st_rdev;
+ struct xstat_dev st_dev;
+ unsigned long long st_ino;
+ unsigned long long st_size;
+ struct xstat_time st_atime;
+ struct xstat_time st_mtime;
+ struct xstat_time st_ctime;
+ struct xstat_time st_crtime;
+ unsigned long long st_blocks;
+ unsigned long long st_inode_version;
+ unsigned long long st_data_version;
+ unsigned long long query_flags;
+#define XSTAT_QUERY_CREATION_TIME 0x00000001ULL
+#define XSTAT_QUERY_INODE_VERSION 0x00000002ULL
+#define XSTAT_QUERY_DATA_VERSION 0x00000004ULL
+ unsigned long long extra_results[0];
+};
+
#ifdef __KERNEL__
#define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
#define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
@@ -68,11 +107,16 @@ struct kstat {
gid_t gid;
dev_t rdev;
loff_t size;
- struct timespec atime;
+ struct timespec atime;
struct timespec mtime;
struct timespec ctime;
+ struct timespec crtime;
unsigned long blksize;
unsigned long long blocks;
+ u64 query_flags; /* what extras the user asked for */
+ u64 result_flags; /* what extras the user got */
+ u64 inode_version;
+ u64 data_version;
};

#endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 8812a63..760a303 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -44,6 +44,7 @@ struct shmid_ds;
struct sockaddr;
struct stat;
struct stat64;
+struct xstat;
struct statfs;
struct statfs64;
struct __sysctl_args;
@@ -824,4 +825,8 @@ asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,
unsigned long fd, unsigned long pgoff);
asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);

+asmlinkage long sys_xstat(int, const char __user *, unsigned,
+ struct xstat __user *, size_t);
+asmlinkage long sys_fxstat(int, struct xstat __user *, size_t);
+
#endif

2010-06-29 20:03:55

by David Howells

[permalink] [raw]
Subject: [PATCH 1/3] Mark arguments to certain syscalls as being const

Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:

(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.

(*) The filename arguments of some syscall helpers relating to the above.

(*) The buffer argument of various write syscalls.

Signed-off-by: David Howells <[email protected]>
---

arch/alpha/kernel/osf_sys.c | 6 +++---
arch/alpha/kernel/process.c | 2 +-
arch/arm/kernel/sys_arm.c | 4 ++--
arch/arm/kernel/sys_oabi-compat.c | 6 +++---
arch/avr32/include/asm/syscalls.h | 2 +-
arch/avr32/kernel/process.c | 3 ++-
arch/blackfin/kernel/process.c | 2 +-
arch/frv/kernel/process.c | 3 ++-
arch/h8300/kernel/process.c | 2 +-
arch/ia64/include/asm/unistd.h | 2 +-
arch/ia64/kernel/process.c | 2 +-
arch/m32r/kernel/process.c | 3 ++-
arch/m68k/kernel/process.c | 2 +-
arch/m68knommu/kernel/process.c | 2 +-
arch/microblaze/kernel/sys_microblaze.c | 2 +-
arch/mips/kernel/syscall.c | 2 +-
arch/mn10300/kernel/process.c | 2 +-
arch/parisc/hpux/fs.c | 7 ++++---
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/kernel/sys_ppc32.c | 2 +-
arch/s390/kernel/compat_linux.c | 10 +++++-----
arch/s390/kernel/compat_linux.h | 10 +++++-----
arch/s390/kernel/entry.h | 2 +-
arch/s390/kernel/process.c | 2 +-
arch/sh/include/asm/syscalls_32.h | 2 +-
arch/sh/include/asm/syscalls_64.h | 2 +-
arch/sh/kernel/process_64.c | 2 +-
arch/sparc/kernel/sys_sparc32.c | 7 ++++---
arch/um/kernel/exec.c | 6 +++---
arch/um/kernel/internal.h | 2 +-
arch/um/kernel/syscall.c | 2 +-
arch/x86/ia32/sys_ia32.c | 14 +++++++-------
arch/x86/include/asm/sys_ia32.h | 12 ++++++------
arch/x86/include/asm/syscalls.h | 2 +-
arch/x86/kernel/entry_64.S | 4 ++--
arch/x86/kernel/process.c | 2 +-
arch/xtensa/kernel/process.c | 2 +-
fs/compat.c | 23 +++++++++++++----------
fs/stat.c | 29 ++++++++++++++++++-----------
fs/utimes.c | 7 ++++---
include/linux/compat.h | 6 +++---
include/linux/fs.h | 6 +++---
include/linux/syscalls.h | 20 ++++++++++----------
include/linux/time.h | 2 +-
44 files changed, 125 insertions(+), 109 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index de9d397..1719fe3 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -244,7 +244,7 @@ do_osf_statfs(struct dentry * dentry, struct osf_statfs __user *buffer,
return error;
}

-SYSCALL_DEFINE3(osf_statfs, char __user *, pathname,
+SYSCALL_DEFINE3(osf_statfs, const char __user *, pathname,
struct osf_statfs __user *, buffer, unsigned long, bufsiz)
{
struct path path;
@@ -358,7 +358,7 @@ osf_procfs_mount(char *dirname, struct procfs_args __user *args, int flags)
return do_mount("", dirname, "proc", flags, NULL);
}

-SYSCALL_DEFINE4(osf_mount, unsigned long, typenr, char __user *, path,
+SYSCALL_DEFINE4(osf_mount, unsigned long, typenr, const char __user *, path,
int, flag, void __user *, data)
{
int retval;
@@ -932,7 +932,7 @@ SYSCALL_DEFINE3(osf_setitimer, int, which, struct itimerval32 __user *, in,

}

-SYSCALL_DEFINE2(osf_utimes, char __user *, filename,
+SYSCALL_DEFINE2(osf_utimes, const char __user *, filename,
struct timeval32 __user *, tvs)
{
struct timespec tv[2];
diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 395a464..88e608a 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -387,7 +387,7 @@ EXPORT_SYMBOL(dump_elf_task_fp);
* sys_execve() executes a new program.
*/
asmlinkage int
-do_sys_execve(char __user *ufilename, char __user * __user *argv,
+do_sys_execve(const char __user *ufilename, char __user * __user *argv,
char __user * __user *envp, struct pt_regs *regs)
{
int error;
diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index c235018..5b7c541 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -62,7 +62,7 @@ asmlinkage int sys_vfork(struct pt_regs *regs)
/* sys_execve() executes a new program.
* This is called indirectly via a small wrapper
*/
-asmlinkage int sys_execve(char __user *filenamei, char __user * __user *argv,
+asmlinkage int sys_execve(const char __user *filenamei, char __user * __user *argv,
char __user * __user *envp, struct pt_regs *regs)
{
int error;
@@ -84,7 +84,7 @@ int kernel_execve(const char *filename, char *const argv[], char *const envp[])
int ret;

memset(&regs, 0, sizeof(struct pt_regs));
- ret = do_execve((char *)filename, (char __user * __user *)argv,
+ ret = do_execve(filename, (char __user * __user *)argv,
(char __user * __user *)envp, &regs);
if (ret < 0)
goto out;
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 33ff678..4ad8da1 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -141,7 +141,7 @@ static long cp_oldabi_stat64(struct kstat *stat,
return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-asmlinkage long sys_oabi_stat64(char __user * filename,
+asmlinkage long sys_oabi_stat64(const char __user * filename,
struct oldabi_stat64 __user * statbuf)
{
struct kstat stat;
@@ -151,7 +151,7 @@ asmlinkage long sys_oabi_stat64(char __user * filename,
return error;
}

-asmlinkage long sys_oabi_lstat64(char __user * filename,
+asmlinkage long sys_oabi_lstat64(const char __user * filename,
struct oldabi_stat64 __user * statbuf)
{
struct kstat stat;
@@ -172,7 +172,7 @@ asmlinkage long sys_oabi_fstat64(unsigned long fd,
}

asmlinkage long sys_oabi_fstatat64(int dfd,
- char __user *filename,
+ const char __user *filename,
struct oldabi_stat64 __user *statbuf,
int flag)
{
diff --git a/arch/avr32/include/asm/syscalls.h b/arch/avr32/include/asm/syscalls.h
index 66a1972..ab608b7 100644
--- a/arch/avr32/include/asm/syscalls.h
+++ b/arch/avr32/include/asm/syscalls.h
@@ -21,7 +21,7 @@ asmlinkage int sys_clone(unsigned long, unsigned long,
unsigned long, unsigned long,
struct pt_regs *);
asmlinkage int sys_vfork(struct pt_regs *);
-asmlinkage int sys_execve(char __user *, char __user *__user *,
+asmlinkage int sys_execve(const char __user *, char __user *__user *,
char __user *__user *, struct pt_regs *);

/* kernel/signal.c */
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 2d76515..e5daddf 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -383,7 +383,8 @@ asmlinkage int sys_vfork(struct pt_regs *regs)
0, NULL, NULL);
}

-asmlinkage int sys_execve(char __user *ufilename, char __user *__user *uargv,
+asmlinkage int sys_execve(const char __user *ufilename,
+ char __user *__user *uargv,
char __user *__user *uenvp, struct pt_regs *regs)
{
int error;
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 93ec07d..a566f61 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -209,7 +209,7 @@ copy_thread(unsigned long clone_flags,
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char __user *name, char __user * __user *argv, char __user * __user *envp)
+asmlinkage int sys_execve(const char __user *name, char __user * __user *argv, char __user * __user *envp)
{
int error;
char *filename;
diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index 21d0fd1..428931c 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -250,7 +250,8 @@ int copy_thread(unsigned long clone_flags,
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char __user *name, char __user * __user *argv, char __user * __user *envp)
+asmlinkage int sys_execve(const char __user *name, char __user * __user *argv,
+ char __user * __user *envp)
{
int error;
char * filename;
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 8c8b0ff..8b7b78d 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -212,7 +212,7 @@ int copy_thread(unsigned long clone_flags,
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char *name, char **argv, char **envp,int dummy,...)
+asmlinkage int sys_execve(const char *name, char **argv, char **envp,int dummy,...)
{
int error;
char * filename;
diff --git a/arch/ia64/include/asm/unistd.h b/arch/ia64/include/asm/unistd.h
index bb8b0ff..46f36fc 100644
--- a/arch/ia64/include/asm/unistd.h
+++ b/arch/ia64/include/asm/unistd.h
@@ -353,7 +353,7 @@ asmlinkage unsigned long sys_mmap2(
int fd, long pgoff);
struct pt_regs;
struct sigaction;
-long sys_execve(char __user *filename, char __user * __user *argv,
+long sys_execve(const char __user *filename, char __user * __user *argv,
char __user * __user *envp, struct pt_regs *regs);
asmlinkage long sys_ia64_pipe(void);
asmlinkage long sys_rt_sigaction(int sig,
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 53f1648..a879c03 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -633,7 +633,7 @@ dump_fpu (struct pt_regs *pt, elf_fpregset_t dst)
}

long
-sys_execve (char __user *filename, char __user * __user *argv, char __user * __user *envp,
+sys_execve (const char __user *filename, char __user * __user *argv, char __user * __user *envp,
struct pt_regs *regs)
{
char *fname;
diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c
index bc8c8c1..8665a4d 100644
--- a/arch/m32r/kernel/process.c
+++ b/arch/m32r/kernel/process.c
@@ -288,7 +288,8 @@ asmlinkage int sys_vfork(unsigned long r0, unsigned long r1, unsigned long r2,
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char __user *ufilename, char __user * __user *uargv,
+asmlinkage int sys_execve(const char __user *ufilename,
+ char __user * __user *uargv,
char __user * __user *uenvp,
unsigned long r3, unsigned long r4, unsigned long r5,
unsigned long r6, struct pt_regs regs)
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index 1a6be27..221d0b7 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -315,7 +315,7 @@ EXPORT_SYMBOL(dump_fpu);
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char __user *name, char __user * __user *argv, char __user * __user *envp)
+asmlinkage int sys_execve(const char __user *name, char __user * __user *argv, char __user * __user *envp)
{
int error;
char * filename;
diff --git a/arch/m68knommu/kernel/process.c b/arch/m68knommu/kernel/process.c
index 6aa6613..6350f68 100644
--- a/arch/m68knommu/kernel/process.c
+++ b/arch/m68knommu/kernel/process.c
@@ -350,7 +350,7 @@ void dump(struct pt_regs *fp)
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char *name, char **argv, char **envp)
+asmlinkage int sys_execve(const char *name, char **argv, char **envp)
{
int error;
char * filename;
diff --git a/arch/microblaze/kernel/sys_microblaze.c b/arch/microblaze/kernel/sys_microblaze.c
index f4e00b7..6abab6e 100644
--- a/arch/microblaze/kernel/sys_microblaze.c
+++ b/arch/microblaze/kernel/sys_microblaze.c
@@ -47,7 +47,7 @@ asmlinkage long microblaze_clone(int flags, unsigned long stack, struct pt_regs
return do_fork(flags, stack, regs, 0, NULL, NULL);
}

-asmlinkage long microblaze_execve(char __user *filenamei, char __user *__user *argv,
+asmlinkage long microblaze_execve(const char __user *filenamei, char __user *__user *argv,
char __user *__user *envp, struct pt_regs *regs)
{
int error;
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index dd81b0f..6322c39 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -207,7 +207,7 @@ asmlinkage int sys_execve(nabi_no_regargs struct pt_regs regs)
int error;
char * filename;

- filename = getname((char __user *) (long)regs.regs[4]);
+ filename = getname((const char __user *) (long)regs.regs[4]);
error = PTR_ERR(filename);
if (IS_ERR(filename))
goto out;
diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
index 82b817c..762eb32 100644
--- a/arch/mn10300/kernel/process.c
+++ b/arch/mn10300/kernel/process.c
@@ -268,7 +268,7 @@ asmlinkage long sys_vfork(void)
0, NULL, NULL);
}

-asmlinkage long sys_execve(char __user *name,
+asmlinkage long sys_execve(const char __user *name,
char __user * __user *argv,
char __user * __user *envp)
{
diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 6935123..1444875 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -36,7 +36,7 @@ int hpux_execve(struct pt_regs *regs)
int error;
char *filename;

- filename = getname((char __user *) regs->gr[26]);
+ filename = getname((const char __user *) regs->gr[26]);
error = PTR_ERR(filename);
if (IS_ERR(filename))
goto out;
@@ -169,7 +169,7 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-long hpux_stat64(char __user *filename, struct hpux_stat64 __user *statbuf)
+long hpux_stat64(const char __user *filename, struct hpux_stat64 __user *statbuf)
{
struct kstat stat;
int error = vfs_stat(filename, &stat);
@@ -191,7 +191,8 @@ long hpux_fstat64(unsigned int fd, struct hpux_stat64 __user *statbuf)
return error;
}

-long hpux_lstat64(char __user *filename, struct hpux_stat64 __user *statbuf)
+long hpux_lstat64(const char __user *filename,
+ struct hpux_stat64 __user *statbuf)
{
struct kstat stat;
int error = vfs_lstat(filename, &stat);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 773424d..3ef6ed4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -991,7 +991,7 @@ int sys_execve(unsigned long a0, unsigned long a1, unsigned long a2,
int error;
char *filename;

- filename = getname((char __user *) a0);
+ filename = getname((const char __user *) a0);
error = PTR_ERR(filename);
if (IS_ERR(filename))
goto out;
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 19471a1..20fd701 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -546,7 +546,7 @@ compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, compat_siz
return sys_pread64(fd, ubuf, count, ((loff_t)poshi << 32) | poslo);
}

-compat_ssize_t compat_sys_pwrite64(unsigned int fd, char __user *ubuf, compat_size_t count,
+compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, compat_size_t count,
u32 reg6, u32 poshi, u32 poslo)
{
return sys_pwrite64(fd, ubuf, count, ((loff_t)poshi << 32) | poslo);
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 73b624e..1e6449c 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -436,7 +436,7 @@ sys32_rt_sigqueueinfo(int pid, int sig, compat_siginfo_t __user *uinfo)
* sys32_execve() executes a new program after the asm stub has set
* things up for us. This should basically do what I want it to.
*/
-asmlinkage long sys32_execve(char __user *name, compat_uptr_t __user *argv,
+asmlinkage long sys32_execve(const char __user *name, compat_uptr_t __user *argv,
compat_uptr_t __user *envp)
{
struct pt_regs *regs = task_pt_regs(current);
@@ -570,7 +570,7 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
return copy_to_user(ubuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-asmlinkage long sys32_stat64(char __user * filename, struct stat64_emu31 __user * statbuf)
+asmlinkage long sys32_stat64(const char __user * filename, struct stat64_emu31 __user * statbuf)
{
struct kstat stat;
int ret = vfs_stat(filename, &stat);
@@ -579,7 +579,7 @@ asmlinkage long sys32_stat64(char __user * filename, struct stat64_emu31 __user
return ret;
}

-asmlinkage long sys32_lstat64(char __user * filename, struct stat64_emu31 __user * statbuf)
+asmlinkage long sys32_lstat64(const char __user * filename, struct stat64_emu31 __user * statbuf)
{
struct kstat stat;
int ret = vfs_lstat(filename, &stat);
@@ -597,7 +597,7 @@ asmlinkage long sys32_fstat64(unsigned long fd, struct stat64_emu31 __user * sta
return ret;
}

-asmlinkage long sys32_fstatat64(unsigned int dfd, char __user *filename,
+asmlinkage long sys32_fstatat64(unsigned int dfd, const char __user *filename,
struct stat64_emu31 __user* statbuf, int flag)
{
struct kstat stat;
@@ -655,7 +655,7 @@ asmlinkage long sys32_read(unsigned int fd, char __user * buf, size_t count)
return sys_read(fd, buf, count);
}

-asmlinkage long sys32_write(unsigned int fd, char __user * buf, size_t count)
+asmlinkage long sys32_write(unsigned int fd, const char __user * buf, size_t count)
{
if ((compat_ssize_t) count < 0)
return -EINVAL;
diff --git a/arch/s390/kernel/compat_linux.h b/arch/s390/kernel/compat_linux.h
index cb97afc..9635d75 100644
--- a/arch/s390/kernel/compat_linux.h
+++ b/arch/s390/kernel/compat_linux.h
@@ -193,7 +193,7 @@ long sys32_rt_sigprocmask(int how, compat_sigset_t __user *set,
compat_sigset_t __user *oset, size_t sigsetsize);
long sys32_rt_sigpending(compat_sigset_t __user *set, size_t sigsetsize);
long sys32_rt_sigqueueinfo(int pid, int sig, compat_siginfo_t __user *uinfo);
-long sys32_execve(char __user *name, compat_uptr_t __user *argv,
+long sys32_execve(const char __user *name, compat_uptr_t __user *argv,
compat_uptr_t __user *envp);
long sys32_init_module(void __user *umod, unsigned long len,
const char __user *uargs);
@@ -207,16 +207,16 @@ long sys32_sendfile(int out_fd, int in_fd, compat_off_t __user *offset,
size_t count);
long sys32_sendfile64(int out_fd, int in_fd, compat_loff_t __user *offset,
s32 count);
-long sys32_stat64(char __user * filename, struct stat64_emu31 __user * statbuf);
-long sys32_lstat64(char __user * filename,
+long sys32_stat64(const char __user * filename, struct stat64_emu31 __user * statbuf);
+long sys32_lstat64(const char __user * filename,
struct stat64_emu31 __user * statbuf);
long sys32_fstat64(unsigned long fd, struct stat64_emu31 __user * statbuf);
-long sys32_fstatat64(unsigned int dfd, char __user *filename,
+long sys32_fstatat64(unsigned int dfd, const char __user *filename,
struct stat64_emu31 __user* statbuf, int flag);
unsigned long old32_mmap(struct mmap_arg_struct_emu31 __user *arg);
long sys32_mmap2(struct mmap_arg_struct_emu31 __user *arg);
long sys32_read(unsigned int fd, char __user * buf, size_t count);
-long sys32_write(unsigned int fd, char __user * buf, size_t count);
+long sys32_write(unsigned int fd, const char __user * buf, size_t count);
long sys32_fadvise64(int fd, loff_t offset, size_t len, int advise);
long sys32_fadvise64_64(struct fadvise64_64_args __user *args);
long sys32_sigaction(int sig, const struct old_sigaction32 __user *act,
diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
index eb15c12..e2c048b 100644
--- a/arch/s390/kernel/entry.h
+++ b/arch/s390/kernel/entry.h
@@ -42,7 +42,7 @@ long sys_clone(unsigned long newsp, unsigned long clone_flags,
int __user *parent_tidptr, int __user *child_tidptr);
long sys_vfork(void);
void execve_tail(void);
-long sys_execve(char __user *name, char __user * __user *argv,
+long sys_execve(const char __user *name, char __user * __user *argv,
char __user * __user *envp);
long sys_sigsuspend(int history0, int history1, old_sigset_t mask);
long sys_sigaction(int sig, const struct old_sigaction __user *act,
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 1039fde..7eafaf2 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -267,7 +267,7 @@ asmlinkage void execve_tail(void)
/*
* sys_execve() executes a new program.
*/
-SYSCALL_DEFINE3(execve, char __user *, name, char __user * __user *, argv,
+SYSCALL_DEFINE3(execve, const char __user *, name, char __user * __user *, argv,
char __user * __user *, envp)
{
struct pt_regs *regs = task_pt_regs(current);
diff --git a/arch/sh/include/asm/syscalls_32.h b/arch/sh/include/asm/syscalls_32.h
index 8b30200..be201fd 100644
--- a/arch/sh/include/asm/syscalls_32.h
+++ b/arch/sh/include/asm/syscalls_32.h
@@ -19,7 +19,7 @@ asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
asmlinkage int sys_vfork(unsigned long r4, unsigned long r5,
unsigned long r6, unsigned long r7,
struct pt_regs __regs);
-asmlinkage int sys_execve(char __user *ufilename, char __user * __user *uargv,
+asmlinkage int sys_execve(const char __user *ufilename, char __user * __user *uargv,
char __user * __user *uenvp, unsigned long r7,
struct pt_regs __regs);
asmlinkage int sys_sigsuspend(old_sigset_t mask, unsigned long r5,
diff --git a/arch/sh/include/asm/syscalls_64.h b/arch/sh/include/asm/syscalls_64.h
index 751fd88..ee519f4 100644
--- a/arch/sh/include/asm/syscalls_64.h
+++ b/arch/sh/include/asm/syscalls_64.h
@@ -21,7 +21,7 @@ asmlinkage int sys_vfork(unsigned long r2, unsigned long r3,
unsigned long r4, unsigned long r5,
unsigned long r6, unsigned long r7,
struct pt_regs *pregs);
-asmlinkage int sys_execve(char *ufilename, char **uargv,
+asmlinkage int sys_execve(const char *ufilename, char **uargv,
char **uenvp, unsigned long r5,
unsigned long r6, unsigned long r7,
struct pt_regs *pregs);
diff --git a/arch/sh/kernel/process_64.c b/arch/sh/kernel/process_64.c
index d4ca648..68d128d 100644
--- a/arch/sh/kernel/process_64.c
+++ b/arch/sh/kernel/process_64.c
@@ -483,7 +483,7 @@ asmlinkage int sys_vfork(unsigned long r2, unsigned long r3,
/*
* sys_execve() executes a new program.
*/
-asmlinkage int sys_execve(char *ufilename, char **uargv,
+asmlinkage int sys_execve(const char *ufilename, char **uargv,
char **uenvp, unsigned long r5,
unsigned long r6, unsigned long r7,
struct pt_regs *pregs)
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index c0ca875..e6375a7 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -162,7 +162,7 @@ static int cp_compat_stat64(struct kstat *stat,
return err;
}

-asmlinkage long compat_sys_stat64(char __user * filename,
+asmlinkage long compat_sys_stat64(const char __user * filename,
struct compat_stat64 __user *statbuf)
{
struct kstat stat;
@@ -173,7 +173,7 @@ asmlinkage long compat_sys_stat64(char __user * filename,
return error;
}

-asmlinkage long compat_sys_lstat64(char __user * filename,
+asmlinkage long compat_sys_lstat64(const char __user * filename,
struct compat_stat64 __user *statbuf)
{
struct kstat stat;
@@ -195,7 +195,8 @@ asmlinkage long compat_sys_fstat64(unsigned int fd,
return error;
}

-asmlinkage long compat_sys_fstatat64(unsigned int dfd, char __user *filename,
+asmlinkage long compat_sys_fstatat64(unsigned int dfd,
+ const char __user *filename,
struct compat_stat64 __user * statbuf, int flag)
{
struct kstat stat;
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index 97974c1..59b20d9 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -44,7 +44,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
PT_REGS_SP(regs) = esp;
}

-static long execve1(char *file, char __user * __user *argv,
+static long execve1(const char *file, char __user * __user *argv,
char __user *__user *env)
{
long error;
@@ -61,7 +61,7 @@ static long execve1(char *file, char __user * __user *argv,
return error;
}

-long um_execve(char *file, char __user *__user *argv, char __user *__user *env)
+long um_execve(const char *file, char __user *__user *argv, char __user *__user *env)
{
long err;

@@ -71,7 +71,7 @@ long um_execve(char *file, char __user *__user *argv, char __user *__user *env)
return err;
}

-long sys_execve(char __user *file, char __user *__user *argv,
+long sys_execve(const char __user *file, char __user *__user *argv,
char __user *__user *env)
{
long error;
diff --git a/arch/um/kernel/internal.h b/arch/um/kernel/internal.h
index 3bda43c..1303a10 100644
--- a/arch/um/kernel/internal.h
+++ b/arch/um/kernel/internal.h
@@ -1 +1 @@
-extern long um_execve(char *file, char __user *__user *argv, char __user *__user *env);
+extern long um_execve(const char *file, char __user *__user *argv, char __user *__user *env);
diff --git a/arch/um/kernel/syscall.c b/arch/um/kernel/syscall.c
index 4393173..7427c0b 100644
--- a/arch/um/kernel/syscall.c
+++ b/arch/um/kernel/syscall.c
@@ -58,7 +58,7 @@ int kernel_execve(const char *filename, char *const argv[], char *const envp[])

fs = get_fs();
set_fs(KERNEL_DS);
- ret = um_execve((char *)filename, (char __user *__user *)argv,
+ ret = um_execve(filename, (char __user *__user *)argv,
(char __user *__user *) envp);
set_fs(fs);

diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index 626be15..1baddad 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -51,7 +51,7 @@
#define AA(__x) ((unsigned long)(__x))


-asmlinkage long sys32_truncate64(char __user *filename,
+asmlinkage long sys32_truncate64(const char __user *filename,
unsigned long offset_low,
unsigned long offset_high)
{
@@ -96,7 +96,7 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
return 0;
}

-asmlinkage long sys32_stat64(char __user *filename,
+asmlinkage long sys32_stat64(const char __user *filename,
struct stat64 __user *statbuf)
{
struct kstat stat;
@@ -107,7 +107,7 @@ asmlinkage long sys32_stat64(char __user *filename,
return ret;
}

-asmlinkage long sys32_lstat64(char __user *filename,
+asmlinkage long sys32_lstat64(const char __user *filename,
struct stat64 __user *statbuf)
{
struct kstat stat;
@@ -126,7 +126,7 @@ asmlinkage long sys32_fstat64(unsigned int fd, struct stat64 __user *statbuf)
return ret;
}

-asmlinkage long sys32_fstatat(unsigned int dfd, char __user *filename,
+asmlinkage long sys32_fstatat(unsigned int dfd, const char __user *filename,
struct stat64 __user *statbuf, int flag)
{
struct kstat stat;
@@ -408,8 +408,8 @@ asmlinkage long sys32_pread(unsigned int fd, char __user *ubuf, u32 count,
((loff_t)AA(poshi) << 32) | AA(poslo));
}

-asmlinkage long sys32_pwrite(unsigned int fd, char __user *ubuf, u32 count,
- u32 poslo, u32 poshi)
+asmlinkage long sys32_pwrite(unsigned int fd, const char __user *ubuf,
+ u32 count, u32 poslo, u32 poshi)
{
return sys_pwrite64(fd, ubuf, count,
((loff_t)AA(poshi) << 32) | AA(poslo));
@@ -449,7 +449,7 @@ asmlinkage long sys32_sendfile(int out_fd, int in_fd,
return ret;
}

-asmlinkage long sys32_execve(char __user *name, compat_uptr_t __user *argv,
+asmlinkage long sys32_execve(const char __user *name, compat_uptr_t __user *argv,
compat_uptr_t __user *envp, struct pt_regs *regs)
{
long error;
diff --git a/arch/x86/include/asm/sys_ia32.h b/arch/x86/include/asm/sys_ia32.h
index 3ad4217..c8a052a 100644
--- a/arch/x86/include/asm/sys_ia32.h
+++ b/arch/x86/include/asm/sys_ia32.h
@@ -18,13 +18,13 @@
#include <asm/ia32.h>

/* ia32/sys_ia32.c */
-asmlinkage long sys32_truncate64(char __user *, unsigned long, unsigned long);
+asmlinkage long sys32_truncate64(const char __user *, unsigned long, unsigned long);
asmlinkage long sys32_ftruncate64(unsigned int, unsigned long, unsigned long);

-asmlinkage long sys32_stat64(char __user *, struct stat64 __user *);
-asmlinkage long sys32_lstat64(char __user *, struct stat64 __user *);
+asmlinkage long sys32_stat64(const char __user *, struct stat64 __user *);
+asmlinkage long sys32_lstat64(const char __user *, struct stat64 __user *);
asmlinkage long sys32_fstat64(unsigned int, struct stat64 __user *);
-asmlinkage long sys32_fstatat(unsigned int, char __user *,
+asmlinkage long sys32_fstatat(unsigned int, const char __user *,
struct stat64 __user *, int);
struct mmap_arg_struct32;
asmlinkage long sys32_mmap(struct mmap_arg_struct32 __user *);
@@ -49,12 +49,12 @@ asmlinkage long sys32_rt_sigpending(compat_sigset_t __user *, compat_size_t);
asmlinkage long sys32_rt_sigqueueinfo(int, int, compat_siginfo_t __user *);

asmlinkage long sys32_pread(unsigned int, char __user *, u32, u32, u32);
-asmlinkage long sys32_pwrite(unsigned int, char __user *, u32, u32, u32);
+asmlinkage long sys32_pwrite(unsigned int, const char __user *, u32, u32, u32);

asmlinkage long sys32_personality(unsigned long);
asmlinkage long sys32_sendfile(int, int, compat_off_t __user *, s32);

-asmlinkage long sys32_execve(char __user *, compat_uptr_t __user *,
+asmlinkage long sys32_execve(const char __user *, compat_uptr_t __user *,
compat_uptr_t __user *, struct pt_regs *);
asmlinkage long sys32_clone(unsigned int, unsigned int, struct pt_regs *);

diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index 5c044b4..feb2ff9 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -23,7 +23,7 @@ long sys_iopl(unsigned int, struct pt_regs *);
/* kernel/process.c */
int sys_fork(struct pt_regs *);
int sys_vfork(struct pt_regs *);
-long sys_execve(char __user *, char __user * __user *,
+long sys_execve(const char __user *, char __user * __user *,
char __user * __user *, struct pt_regs *);
long sys_clone(unsigned long, unsigned long, void __user *,
void __user *, struct pt_regs *);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0697ff1..77f5986 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1185,13 +1185,13 @@ END(kernel_thread_helper)
* execve(). This function needs to use IRET, not SYSRET, to set up all state properly.
*
* C extern interface:
- * extern long execve(char *name, char **argv, char **envp)
+ * extern long execve(const char *name, char **argv, char **envp)
*
* asm input arguments:
* rdi: name, rsi: argv, rdx: envp
*
* We want to fallback into:
- * extern long sys_execve(char *name, char **argv,char **envp, struct pt_regs *regs)
+ * extern long sys_execve(const char *name, char **argv,char **envp, struct pt_regs *regs)
*
* do_sys_execve asm fallback arguments:
* rdi: name, rsi: argv, rdx: envp, rcx: fake frame on the stack
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e7e3521..f5c816e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -300,7 +300,7 @@ EXPORT_SYMBOL(kernel_thread);
/*
* sys_execve() executes a new program.
*/
-long sys_execve(char __user *name, char __user * __user *argv,
+long sys_execve(const char __user *name, char __user * __user *argv,
char __user * __user *envp, struct pt_regs *regs)
{
long error;
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index f167e0f..7c2f38f 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -318,7 +318,7 @@ long xtensa_clone(unsigned long clone_flags, unsigned long newsp,
*/

asmlinkage
-long xtensa_execve(char __user *name, char __user * __user *argv,
+long xtensa_execve(const char __user *name, char __user * __user *argv,
char __user * __user *envp,
long a3, long a4, long a5,
struct pt_regs *regs)
diff --git a/fs/compat.c b/fs/compat.c
index 6490d21..d72591a 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -76,7 +76,8 @@ int compat_printk(const char *fmt, ...)
* Not all architectures have sys_utime, so implement this in terms
* of sys_utimes.
*/
-asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf __user *t)
+asmlinkage long compat_sys_utime(const char __user *filename,
+ struct compat_utimbuf __user *t)
{
struct timespec tv[2];

@@ -90,7 +91,7 @@ asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf __
return do_utimes(AT_FDCWD, filename, t ? tv : NULL, 0);
}

-asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, struct compat_timespec __user *t, int flags)
+asmlinkage long compat_sys_utimensat(unsigned int dfd, const char __user *filename, struct compat_timespec __user *t, int flags)
{
struct timespec tv[2];

@@ -105,7 +106,7 @@ asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, st
return do_utimes(dfd, filename, t ? tv : NULL, flags);
}

-asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, struct compat_timeval __user *t)
+asmlinkage long compat_sys_futimesat(unsigned int dfd, const char __user *filename, struct compat_timeval __user *t)
{
struct timespec tv[2];

@@ -124,7 +125,7 @@ asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, st
return do_utimes(dfd, filename, t ? tv : NULL, 0);
}

-asmlinkage long compat_sys_utimes(char __user *filename, struct compat_timeval __user *t)
+asmlinkage long compat_sys_utimes(const char __user *filename, struct compat_timeval __user *t)
{
return compat_sys_futimesat(AT_FDCWD, filename, t);
}
@@ -168,7 +169,7 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
return err;
}

-asmlinkage long compat_sys_newstat(char __user * filename,
+asmlinkage long compat_sys_newstat(const char __user * filename,
struct compat_stat __user *statbuf)
{
struct kstat stat;
@@ -180,7 +181,7 @@ asmlinkage long compat_sys_newstat(char __user * filename,
return cp_compat_stat(&stat, statbuf);
}

-asmlinkage long compat_sys_newlstat(char __user * filename,
+asmlinkage long compat_sys_newlstat(const char __user * filename,
struct compat_stat __user *statbuf)
{
struct kstat stat;
@@ -193,7 +194,8 @@ asmlinkage long compat_sys_newlstat(char __user * filename,
}

#ifndef __ARCH_WANT_STAT64
-asmlinkage long compat_sys_newfstatat(unsigned int dfd, char __user *filename,
+asmlinkage long compat_sys_newfstatat(unsigned int dfd,
+ const char __user *filename,
struct compat_stat __user *statbuf, int flag)
{
struct kstat stat;
@@ -836,9 +838,10 @@ static int do_nfs4_super_data_conv(void *raw_data)
#define NCPFS_NAME "ncpfs"
#define NFS4_NAME "nfs4"

-asmlinkage long compat_sys_mount(char __user * dev_name, char __user * dir_name,
- char __user * type, unsigned long flags,
- void __user * data)
+asmlinkage long compat_sys_mount(const char __user * dev_name,
+ const char __user * dir_name,
+ const char __user * type, unsigned long flags,
+ const void __user * data)
{
char *kernel_type;
unsigned long data_page;
diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..12e90e2 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -68,7 +68,8 @@ int vfs_fstat(unsigned int fd, struct kstat *stat)
}
EXPORT_SYMBOL(vfs_fstat);

-int vfs_fstatat(int dfd, char __user *filename, struct kstat *stat, int flag)
+int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
+ int flag)
{
struct path path;
int error = -EINVAL;
@@ -91,13 +92,13 @@ out:
}
EXPORT_SYMBOL(vfs_fstatat);

-int vfs_stat(char __user *name, struct kstat *stat)
+int vfs_stat(const char __user *name, struct kstat *stat)
{
return vfs_fstatat(AT_FDCWD, name, stat, 0);
}
EXPORT_SYMBOL(vfs_stat);

-int vfs_lstat(char __user *name, struct kstat *stat)
+int vfs_lstat(const char __user *name, struct kstat *stat)
{
return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
}
@@ -147,7 +148,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-SYSCALL_DEFINE2(stat, char __user *, filename, struct __old_kernel_stat __user *, statbuf)
+SYSCALL_DEFINE2(stat, const char __user *, filename,
+ struct __old_kernel_stat __user *, statbuf)
{
struct kstat stat;
int error;
@@ -159,7 +161,8 @@ SYSCALL_DEFINE2(stat, char __user *, filename, struct __old_kernel_stat __user *
return cp_old_stat(&stat, statbuf);
}

-SYSCALL_DEFINE2(lstat, char __user *, filename, struct __old_kernel_stat __user *, statbuf)
+SYSCALL_DEFINE2(lstat, const char __user *, filename,
+ struct __old_kernel_stat __user *, statbuf)
{
struct kstat stat;
int error;
@@ -234,7 +237,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-SYSCALL_DEFINE2(newstat, char __user *, filename, struct stat __user *, statbuf)
+SYSCALL_DEFINE2(newstat, const char __user *, filename,
+ struct stat __user *, statbuf)
{
struct kstat stat;
int error = vfs_stat(filename, &stat);
@@ -244,7 +248,8 @@ SYSCALL_DEFINE2(newstat, char __user *, filename, struct stat __user *, statbuf)
return cp_new_stat(&stat, statbuf);
}

-SYSCALL_DEFINE2(newlstat, char __user *, filename, struct stat __user *, statbuf)
+SYSCALL_DEFINE2(newlstat, const char __user *, filename,
+ struct stat __user *, statbuf)
{
struct kstat stat;
int error;
@@ -257,7 +262,7 @@ SYSCALL_DEFINE2(newlstat, char __user *, filename, struct stat __user *, statbuf
}

#if !defined(__ARCH_WANT_STAT64) || defined(__ARCH_WANT_SYS_NEWFSTATAT)
-SYSCALL_DEFINE4(newfstatat, int, dfd, char __user *, filename,
+SYSCALL_DEFINE4(newfstatat, int, dfd, const char __user *, filename,
struct stat __user *, statbuf, int, flag)
{
struct kstat stat;
@@ -355,7 +360,8 @@ static long cp_new_stat64(struct kstat *stat, struct stat64 __user *statbuf)
return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
}

-SYSCALL_DEFINE2(stat64, char __user *, filename, struct stat64 __user *, statbuf)
+SYSCALL_DEFINE2(stat64, const char __user *, filename,
+ struct stat64 __user *, statbuf)
{
struct kstat stat;
int error = vfs_stat(filename, &stat);
@@ -366,7 +372,8 @@ SYSCALL_DEFINE2(stat64, char __user *, filename, struct stat64 __user *, statbuf
return error;
}

-SYSCALL_DEFINE2(lstat64, char __user *, filename, struct stat64 __user *, statbuf)
+SYSCALL_DEFINE2(lstat64, const char __user *, filename,
+ struct stat64 __user *, statbuf)
{
struct kstat stat;
int error = vfs_lstat(filename, &stat);
@@ -388,7 +395,7 @@ SYSCALL_DEFINE2(fstat64, unsigned long, fd, struct stat64 __user *, statbuf)
return error;
}

-SYSCALL_DEFINE4(fstatat64, int, dfd, char __user *, filename,
+SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
struct stat64 __user *, statbuf, int, flag)
{
struct kstat stat;
diff --git a/fs/utimes.c b/fs/utimes.c
index e4c75db..179b586 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -126,7 +126,8 @@ out:
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
-long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags)
+long do_utimes(int dfd, const char __user *filename, struct timespec *times,
+ int flags)
{
int error = -EINVAL;

@@ -170,7 +171,7 @@ out:
return error;
}

-SYSCALL_DEFINE4(utimensat, int, dfd, char __user *, filename,
+SYSCALL_DEFINE4(utimensat, int, dfd, const char __user *, filename,
struct timespec __user *, utimes, int, flags)
{
struct timespec tstimes[2];
@@ -188,7 +189,7 @@ SYSCALL_DEFINE4(utimensat, int, dfd, char __user *, filename,
return do_utimes(dfd, filename, utimes ? tstimes : NULL, flags);
}

-SYSCALL_DEFINE3(futimesat, int, dfd, char __user *, filename,
+SYSCALL_DEFINE3(futimesat, int, dfd, const char __user *, filename,
struct timeval __user *, utimes)
{
struct timeval times[2];
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 168f7da..9ddc878 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -331,7 +331,7 @@ asmlinkage long compat_sys_epoll_pwait(int epfd,
const compat_sigset_t __user *sigmask,
compat_size_t sigsetsize);

-asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename,
+asmlinkage long compat_sys_utimensat(unsigned int dfd, const char __user *filename,
struct compat_timespec __user *t, int flags);

asmlinkage long compat_sys_signalfd(int ufd,
@@ -348,9 +348,9 @@ asmlinkage long compat_sys_move_pages(pid_t pid, unsigned long nr_page,
const int __user *nodes,
int __user *status,
int flags);
-asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename,
+asmlinkage long compat_sys_futimesat(unsigned int dfd, const char __user *filename,
struct compat_timeval __user *t);
-asmlinkage long compat_sys_newfstatat(unsigned int dfd, char __user * filename,
+asmlinkage long compat_sys_newfstatat(unsigned int dfd, const char __user * filename,
struct compat_stat __user *statbuf,
int flag);
asmlinkage long compat_sys_openat(unsigned int dfd, const char __user *filename,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7c443c3..a18bcea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2339,10 +2339,10 @@ void inode_set_bytes(struct inode *inode, loff_t bytes);

extern int vfs_readdir(struct file *, filldir_t, void *);

-extern int vfs_stat(char __user *, struct kstat *);
-extern int vfs_lstat(char __user *, struct kstat *);
+extern int vfs_stat(const char __user *, struct kstat *);
+extern int vfs_lstat(const char __user *, struct kstat *);
extern int vfs_fstat(unsigned int, struct kstat *);
-extern int vfs_fstatat(int , char __user *, struct kstat *, int);
+extern int vfs_fstatat(int , const char __user *, struct kstat *, int);

extern int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
unsigned long arg);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 7f614ce..8812a63 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -393,7 +393,7 @@ asmlinkage long sys_umount(char __user *name, int flags);
asmlinkage long sys_oldumount(char __user *name);
asmlinkage long sys_truncate(const char __user *path, long length);
asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length);
-asmlinkage long sys_stat(char __user *filename,
+asmlinkage long sys_stat(const char __user *filename,
struct __old_kernel_stat __user *statbuf);
asmlinkage long sys_statfs(const char __user * path,
struct statfs __user *buf);
@@ -402,21 +402,21 @@ asmlinkage long sys_statfs64(const char __user *path, size_t sz,
asmlinkage long sys_fstatfs(unsigned int fd, struct statfs __user *buf);
asmlinkage long sys_fstatfs64(unsigned int fd, size_t sz,
struct statfs64 __user *buf);
-asmlinkage long sys_lstat(char __user *filename,
+asmlinkage long sys_lstat(const char __user *filename,
struct __old_kernel_stat __user *statbuf);
asmlinkage long sys_fstat(unsigned int fd,
struct __old_kernel_stat __user *statbuf);
-asmlinkage long sys_newstat(char __user *filename,
+asmlinkage long sys_newstat(const char __user *filename,
struct stat __user *statbuf);
-asmlinkage long sys_newlstat(char __user *filename,
+asmlinkage long sys_newlstat(const char __user *filename,
struct stat __user *statbuf);
asmlinkage long sys_newfstat(unsigned int fd, struct stat __user *statbuf);
asmlinkage long sys_ustat(unsigned dev, struct ustat __user *ubuf);
#if BITS_PER_LONG == 32
-asmlinkage long sys_stat64(char __user *filename,
+asmlinkage long sys_stat64(const char __user *filename,
struct stat64 __user *statbuf);
asmlinkage long sys_fstat64(unsigned long fd, struct stat64 __user *statbuf);
-asmlinkage long sys_lstat64(char __user *filename,
+asmlinkage long sys_lstat64(const char __user *filename,
struct stat64 __user *statbuf);
asmlinkage long sys_truncate64(const char __user *path, loff_t length);
asmlinkage long sys_ftruncate64(unsigned int fd, loff_t length);
@@ -756,7 +756,7 @@ asmlinkage long sys_linkat(int olddfd, const char __user *oldname,
int newdfd, const char __user *newname, int flags);
asmlinkage long sys_renameat(int olddfd, const char __user * oldname,
int newdfd, const char __user * newname);
-asmlinkage long sys_futimesat(int dfd, char __user *filename,
+asmlinkage long sys_futimesat(int dfd, const char __user *filename,
struct timeval __user *utimes);
asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
@@ -765,13 +765,13 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
int mode);
-asmlinkage long sys_newfstatat(int dfd, char __user *filename,
+asmlinkage long sys_newfstatat(int dfd, const char __user *filename,
struct stat __user *statbuf, int flag);
-asmlinkage long sys_fstatat64(int dfd, char __user *filename,
+asmlinkage long sys_fstatat64(int dfd, const char __user *filename,
struct stat64 __user *statbuf, int flag);
asmlinkage long sys_readlinkat(int dfd, const char __user *path, char __user *buf,
int bufsiz);
-asmlinkage long sys_utimensat(int dfd, char __user *filename,
+asmlinkage long sys_utimensat(int dfd, const char __user *filename,
struct timespec __user *utimes, int flags);
asmlinkage long sys_unshare(unsigned long unshare_flags);

diff --git a/include/linux/time.h b/include/linux/time.h
index ea3559f..16346c0 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -135,7 +135,7 @@ extern void do_gettimeofday(struct timeval *tv);
extern int do_settimeofday(struct timespec *tv);
extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
#define do_posix_clock_monotonic_gettime(ts) ktime_get_ts(ts)
-extern long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags);
+extern long do_utimes(int dfd, const char __user *filename, struct timespec *times, int flags);
struct itimerval;
extern int do_setitimer(int which, struct itimerval *value,
struct itimerval *ovalue);

2010-06-29 20:23:13

by Steve French

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tue, Jun 29, 2010 at 3:02 PM, David Howells <[email protected]> wrote:
> Implement a pair of new system calls to provide extended and further extensible
> stat functions.
>
> The third of the associated patches provides these new system calls:
>
> ? ? ? ?struct xstat_dev {
> ? ? ? ? ? ? ? ?unsigned int ? ?major;
> ? ? ? ? ? ? ? ?unsigned int ? ?minor;
> ? ? ? ?};
>
> ? ? ? ?struct xstat_time {
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?tv_sec;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?tv_nsec;
> ? ? ? ?};
>
> ? ? ? ?struct xstat {
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?struct_version;
> ? ? ? ?#define XSTAT_STRUCT_VERSION ? ?0
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?st_mode;
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?st_nlink;
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?st_uid;
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?st_gid;
> ? ? ? ? ? ? ? ?unsigned int ? ? ? ? ? ?st_blksize;
> ? ? ? ? ? ? ? ?struct xstat_dev ? ? ? ?st_rdev;
> ? ? ? ? ? ? ? ?struct xstat_dev ? ? ? ?st_dev;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?st_ino;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?st_size;
> ? ? ? ? ? ? ? ?struct xstat_time ? ? ? st_atime;
> ? ? ? ? ? ? ? ?struct xstat_time ? ? ? st_mtime;
> ? ? ? ? ? ? ? ?struct xstat_time ? ? ? st_ctime;
> ? ? ? ? ? ? ? ?struct xstat_time ? ? ? st_crtime;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?st_blocks;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?st_inode_version;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?st_data_version;
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?query_flags;
> ? ? ? ?#define XSTAT_QUERY_CREATION_TIME ? ? ? 0x00000001ULL
> ? ? ? ?#define XSTAT_QUERY_INODE_VERSION ? ? ? 0x00000002ULL
> ? ? ? ?#define XSTAT_QUERY_DATA_VERSION ? ? ? ?0x00000004ULL
> ? ? ? ? ? ? ? ?unsigned long long ? ? ?extra_results[0];
> ? ? ? ?};
>
> ? ? ? ?ssize_t ret = xstat(int dfd,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?const char *filename,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned atflag,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct xstat *buffer,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?size_t buflen);
>
> ? ? ? ?ssize_t ret = fxstat(int fd,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct xstat *buffer,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? size_t buflen);
>
> which are more fully documented in that patch's description.
>
> The bonuses of these new stat functions are:
>
> ?(1) The fields in the xstat struct are cleaned up. ?There are no split or
> ? ? duplicated fields.
>
> ?(2) Some extra information is made available (file creation time, inode
> ? ? version number and data version number) where provided by the underlying
> ? ? filesystem.
>
> ? ? These are implemented here for Ext4 and AFS, but could also be provided
> ? ? for CIFS, NTFS and BtrFS and probably others.

NFSv4 protocol also has a "recommended attribute" for create time that servers
should return if possible (which presumably now it would be possible to return
for Linux servers)

time_create 50 nfstime4 R/W The time of
creation of the object.

SMB2 protocol also returns the equivalent.

> ?(3) The structure is versioned and extensible, meaning that further new system
> ? ? calls shouldn't be required.

How does a fs return an "unknown" value for one
(e.g. version field) ... 0 or -1 or ...


> ?(2) What extra bits of information might we like to see available through the
> ? ? stat interface? ?Security labels? ?NFS file IDs? ?Xattrs?

The list of mandatory ones for NFS is fairly small, the list of recommended
one for NFSv4 is larger (see page 44ff of
http://www.ietf.org/rfc/rfc3530.txt e.g.)

One hole that this reminded me about is how to return the superblock
time granularity (for NFSv4 this is attribute 51 "time_delta" which
is called on a superblock not on a file). We run into time rounding
issues with Samba too.

>
> ?(4) Should the inode number and data version number fields be 128-bit?
This is tricky for SMB2, if you can also provide a device id (or an object id of
some sort for the superblock) then 64 bit inode number is ok.


--
Thanks,

Steve

2010-06-29 20:28:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tue, 2010-06-29 at 21:02 +0100, David Howells wrote:
> Implement a pair of new system calls to provide extended and further extensible
> stat functions.
>
> The third of the associated patches provides these new system calls:
>
> struct xstat_dev {
> unsigned int major;
> unsigned int minor;
> };
>
> struct xstat_time {
> unsigned long long tv_sec;
> unsigned long long tv_nsec;
> };
>
> struct xstat {
> unsigned int struct_version;
> #define XSTAT_STRUCT_VERSION 0
> unsigned int st_mode;
> unsigned int st_nlink;
> unsigned int st_uid;
> unsigned int st_gid;
> unsigned int st_blksize;
> struct xstat_dev st_rdev;
> struct xstat_dev st_dev;
> unsigned long long st_ino;
> unsigned long long st_size;
> struct xstat_time st_atime;
> struct xstat_time st_mtime;
> struct xstat_time st_ctime;
> struct xstat_time st_crtime;
> unsigned long long st_blocks;
> unsigned long long st_inode_version;
> unsigned long long st_data_version;
> unsigned long long query_flags;
> #define XSTAT_QUERY_CREATION_TIME 0x00000001ULL
> #define XSTAT_QUERY_INODE_VERSION 0x00000002ULL
> #define XSTAT_QUERY_DATA_VERSION 0x00000004ULL
> unsigned long long extra_results[0];
> };
>
> ssize_t ret = xstat(int dfd,
> const char *filename,
> unsigned atflag,
> struct xstat *buffer,
> size_t buflen);
>
> ssize_t ret = fxstat(int fd,
> struct xstat *buffer,
> size_t buflen);
>
> which are more fully documented in that patch's description.
>
> The bonuses of these new stat functions are:
>
> (1) The fields in the xstat struct are cleaned up. There are no split or
> duplicated fields.
>
> (2) Some extra information is made available (file creation time, inode
> version number and data version number) where provided by the underlying
> filesystem.
>
> These are implemented here for Ext4 and AFS, but could also be provided
> for CIFS, NTFS and BtrFS and probably others.
>
> (3) The structure is versioned and extensible, meaning that further new system
> calls shouldn't be required.
>
> Note that no lstat() equivalent is required as that can be implemented through
> xstat() with atflag == 0.
>
>
> The first patch makes const a bunch of system call userspace string/buffer
> arguments. I can then make sys_xstat()'s filename pointer const too (though
> the entire first patch is not required for that).
>
> The second patch makes the AFS filesystem use i_generation for the vnode ID
> uniquifier rather than i_version, and assigns i_version to hold the AFS data
> version number, making them more logical for when I want to get at them from
> afs_getattr().
>
>
> There's a test program attached to the description for patch 3. It can be run
> as follows:
>
> [root@andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/i386/repodata/
> xstat(/afs/archive/linuxdev/fedora9/i386/repodata/) = 152
> sv=0 qf=6 cr=0.0 iv=7a5 dv=5
> Size: 2048 Blocks: 0 IO Block: 4096 directory
> Device: 00:13 Inode: 83 Links: 2
> Access: (0755/drwxr-xr-x) Uid: 75338 Gid: 0
> Access: 2008-11-05 20:00:12.000000000+0000
> Modify: 2008-11-05 20:00:12.000000000+0000
> Change: 2008-11-05 20:00:12.000000000+0000
> Inode version: 7a5h
> Data version: 5h
>
>
> Things that need consideration:
>
> (1) Is it worth retaining the ability to arbitrarily add extra bits onto the
> end of the stat buffer? And what's the best way to do this?
>
> I've defined a way that from userspace involves assigning bits in
> query_flags to extra results that you might want. But this could instead
> be done, say, by just upping the struct version number any time we want to
> pass back more information. Alternatively, we could go for a tagged data
> method, perhaps using the same format as the recvmsg() control message
> field.
>
> If we use tagged data then rather than being selective, we could just
> return as many tagged data items as we feel the user might want and we can
> cram into the buffer. That could be rather slow, though.
>
> (2) What extra bits of information might we like to see available through the
> stat interface? Security labels? NFS file IDs? Xattrs?
>
> If we went for a tagged data method, xstat() could be modified to take a
> list of tags as an argument, and could then return arbitrarily-sized
> tagged results, including fs-specific stuff.
>
> (3) Does st_blksize really need to be 64 bits on a 64-bit system? Or can it
> be 32-bits? Are we really likely to see something with a 4Gb+ blocksize?
>
> (4) Should the inode number and data version number fields be 128-bit?

There has been a lot of interest in allowing the user to specify exactly
which fields they want the filesystem to return, and whether or not the
kernel can use cached data or not. The main use is to allow
specification of a 'stat light' that could help speed up
"readdir()+multiple stat()" type queries. At last year's Filesystem and
Storage Workshop, Mark Fasheh actually came up with an initial design:

http://www.kerneltrap.com/mailarchive/linux-fsdevel/2009/4/7/5427274

If we're going to add in a whole new syscall for stat, should we perhaps
revisit this discussion?

Cheers
Trond

2010-06-29 20:41:15

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

Steve French <[email protected]> wrote:

> How does a fs return an "unknown" value for one
> (e.g. version field) ... 0 or -1 or ...

Well, for the new creation time, inode version and data version fields, the
query_flags field has a bit for each that's set if the field contains a value,
and is clear if it doesn't.

See the test program on patch 3.

> One hole that this reminded me about is how to return the superblock
> time granularity (for NFSv4 this is attribute 51 "time_delta" which
> is called on a superblock not on a file). We run into time rounding
> issues with Samba too.

That sounds like something that should be accessible through statfs. But it
could be made accessible here too. It would also apply to FAT, which I
believe has a 2s granularity.

> > ?(4) Should the inode number and data version number fields be 128-bit?
> This is tricky for SMB2, if you can also provide a device id (or an object
> id of some sort for the superblock) then 64 bit inode number is ok.

A remote device ID? That would be possible. That could be used by AFS to
return the numeric volume ID (32 bits) and by NFS to return the FSID (128
bits). Would you be using the VolumeGUID (128 bits) for SMB2?


David

2010-06-29 20:51:07

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

Trond Myklebust <[email protected]> wrote:

> There has been a lot of interest in allowing the user to specify exactly
> which fields they want the filesystem to return, and whether or not the
> kernel can use cached data or not. The main use is to allow
> specification of a 'stat light' that could help speed up
> "readdir()+multiple stat()" type queries. At last year's Filesystem and
> Storage Workshop, Mark Fasheh actually came up with an initial design:
>
> http://www.kerneltrap.com/mailarchive/linux-fsdevel/2009/4/7/5427274
>
> If we're going to add in a whole new syscall for stat, should we perhaps
> revisit this discussion?

I could certainly absorb that patch.

One further consideration following on from what you said: Is it worth having
an extended getdents() that can return stat data too? That might be useful
for NFS.

David

2010-06-29 21:07:47

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tuesday, June 29, 2010, David Howells wrote:
> Implement a pair of new system calls to provide extended and further
> extensible stat functions.

Is there any chance we can use that chance and also add a field

unsigned long long st_gen

to struct_ xstat? Inode generation numbers really would be useful for
userspace NFS servers and some fuse filesystems.


Thanks,
Bernd

2010-06-29 21:11:36

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

Bernd Schubert <[email protected]> wrote:

> Is there any chance we can use that chance and also add a field
>
> unsigned long long st_gen
>
> to struct_ xstat? Inode generation numbers really would be useful for
> userspace NFS servers and some fuse filesystems.

That would be st_inode_version (equivalent to i_generation internally).

David

2010-06-29 21:24:37

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tuesday, June 29, 2010, David Howells wrote:
> Bernd Schubert <[email protected]> wrote:
> > Is there any chance we can use that chance and also add a field
> >
> > unsigned long long st_gen
> >
> > to struct_ xstat? Inode generation numbers really would be useful for
> > userspace NFS servers and some fuse filesystems.
>
> That would be st_inode_version (equivalent to i_generation internally).

Ah, great, so already there :) I was looking for st_gen, as it is called that
way on BSD. And as BSD already has it for a long time, shouldn't linux use the
BSD identifier?


Thanks,
Bernd

2010-06-29 21:28:29

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

Bernd Schubert <[email protected]> wrote:

> Ah, great, so already there :) I was looking for st_gen, as it is called
> that way on BSD. And as BSD already has it for a long time, shouldn't linux
> use the BSD identifier?

Sure. I guess you'd also want it to be a u64?

David

2010-06-29 21:53:08

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tuesday, June 29, 2010, David Howells wrote:
> Bernd Schubert <[email protected]> wrote:
> > Ah, great, so already there :) I was looking for st_gen, as it is called
> > that way on BSD. And as BSD already has it for a long time, shouldn't
> > linux use the BSD identifier?
>
> Sure. I guess you'd also want it to be a u64?

Hmm, as far as I can see, BSD has u32. I only need it to verify for recycled
inodes and at least for me the probability of a recyled inode + 32 bit
generation number that overflew to exactly the same value as the previous
inode had would be sufficiently small.


Thanks a lot for your work on this,
Bernd

2010-06-29 22:13:26

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Tue, Jun 29, 2010 at 13:03, David Howells <[email protected]> wrote:
> Add a pair of system calls to make extended file stats available, including
> file creation time, inode version and data version where available through the
> underlying filesystem:

If you add something like this you might want to integrate another
extension. This has been discussed a long time ago. In almost no
situation all the information is needed. Some of the pieces of
information returned by the syscall might be harder to collect than
other. It makes sense in such a situation to allow the caller to
specify what she is interested in. A bitmask of some sort. This was
brought up by the HPC people with gigantic filesystems.

For this the syscall interface should have a parameter to specify what
is requested and the stat-like structure should have a field
specifying what is actually present. The latter bitmask must be a
superset of the former.

Previous discussions centered around reusing the stat data structure
and somehow make it work. But no clean solution was found. If a new
structure is added anyway this could solve the issue.


And while you're at it, maybe some spare fields at the end are nice.

2010-06-29 22:34:00

by Steve French

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Tue, Jun 29, 2010 at 5:13 PM, Ulrich Drepper <[email protected]> wrote:
> On Tue, Jun 29, 2010 at 13:03, David Howells <[email protected]> wrote:
>> Add a pair of system calls to make extended file stats available, including
>> file creation time, inode version and data version where available through the
>> underlying filesystem:
>
> If you add something like this you might want to integrate another
> extension. ?This has been discussed a long time ago. ?In almost no
> situation all the information is needed. ?Some of the pieces of
> information returned by the syscall might be harder to collect than
> other. ?It makes sense in such a situation to allow the caller to
> specify what she is interested in. ?A bitmask of some sort. ?This was
> brought up by the HPC people with gigantic filesystems.
>
> For this the syscall interface should have a parameter to specify what
> is requested and the stat-like structure should have a field
> specifying what is actually present. ?The latter bitmask must be a
> superset of the former.
>
> Previous discussions centered around reusing the stat data structure
> and somehow make it work. ?But no clean solution was found. ?If a new
> structure is added anyway this could solve the issue.

That makes sense, especially for network file systems. NFSv4
protocol spec anticipates that:

"With the NFS version 4 protocol, the client is able query what attributes
the server supports and construct requests with only those supported
attributes (or a subset thereof)."

and we were talking about something similar for SMB2 Unix Extensions
(posix extensions) at the last plugfest (for SMB2 kernel
client to Samba)
and testing events.
--
Thanks,

Steve

2010-06-29 22:37:10

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

Ulrich Drepper <[email protected]> wrote:

> On Tue, Jun 29, 2010 at 13:03, David Howells <[email protected]> wrote:
> > Add a pair of system calls to make extended file stats available,
> > including file creation time, inode version and data version where
> > available through the underlying filesystem:
>
> If you add something like this you might want to integrate another
> extension. This has been discussed a long time ago. In almost no
> situation all the information is needed. Some of the pieces of
> information returned by the syscall might be harder to collect than
> other.

Trond mentioned this:

There has been a lot of interest in allowing the user to specify
exactly which fields they want the filesystem to return, and whether
or not the kernel can use cached data or not. The main use is to allow
specification of a 'stat light' that could help speed up
"readdir()+multiple stat()" type queries. At last year's Filesystem
and Storage Workshop, Mark Fasheh actually came up with an initial
design:

http://www.kerneltrap.com/mailarchive/linux-fsdevel/2009/4/7/5427274

It'd be easy enough to absorb the functionality from that patch.

> It makes sense in such a situation to allow the caller to specify what she
> is interested in. A bitmask of some sort.

I have one of those. See the query_flags field. One question, though, is how
to break things down. Obvious groupings of the already extant stat stuff
might be:

- st_dev, st_ino, st_mode, st_nlink, st_uid, st_gid, st_rdev, st_size
- st_block, st_blksize
- st_atime, st_mtime, st_ctime

However, what seems obvious to me might not be for some netfs or other.

> For this the syscall interface should have a parameter to specify what
> is requested and the stat-like structure should have a field
> specifying what is actually present. The latter bitmask must be a
> superset of the former.

Got that.

> Previous discussions centered around reusing the stat data structure
> and somehow make it work. But no clean solution was found. If a new
> structure is added anyway this could solve the issue.

That's what I thought. Linux has a tangled mess of stat structs:-/

> And while you're at it, maybe some spare fields at the end are nice.

I made it so that the syscall can return variable length data: the main xstat
struct, plus extra records yet to be defined. They could even be variable
length and assembled/disassembled with something like the control message
macros for recvmsg().

David

2010-06-29 22:47:40

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Tue, 29 Jun 2010, David Howells wrote:
> Ulrich Drepper <[email protected]> wrote:
>
> > On Tue, Jun 29, 2010 at 13:03, David Howells <[email protected]> wrote:
> > > Add a pair of system calls to make extended file stats available,
> > > including file creation time, inode version and data version where
> > > available through the underlying filesystem:
> >
> > If you add something like this you might want to integrate another
> > extension. This has been discussed a long time ago. In almost no
> > situation all the information is needed. Some of the pieces of
> > information returned by the syscall might be harder to collect than
> > other.
>
> Trond mentioned this:
>
> There has been a lot of interest in allowing the user to specify
> exactly which fields they want the filesystem to return, and whether
> or not the kernel can use cached data or not. The main use is to allow
> specification of a 'stat light' that could help speed up
> "readdir()+multiple stat()" type queries. At last year's Filesystem
> and Storage Workshop, Mark Fasheh actually came up with an initial
> design:
>
> http://www.kerneltrap.com/mailarchive/linux-fsdevel/2009/4/7/5427274
>
> It'd be easy enough to absorb the functionality from that patch.

That would be nice. HPC folks have been looking for this functionality
for some time now.

> > It makes sense in such a situation to allow the caller to specify what she
> > is interested in. A bitmask of some sort.
>
> I have one of those. See the query_flags field. One question, though, is how
> to break things down. Obvious groupings of the already extant stat stuff
> might be:
>
> - st_dev, st_ino, st_mode, st_nlink, st_uid, st_gid, st_rdev, st_size
> - st_block, st_blksize
> - st_atime, st_mtime, st_ctime
>
> However, what seems obvious to me might not be for some netfs or other.

The problem is that groupings that may seem logical now may not match
reality for some specific file system for various implementation reasons.
IMO a bit per field makes the most sense, with some simple way to include
all fields (-1 or 0). A mask argument that is separate from flags might
make that simpler?

sage

2010-06-29 22:51:17

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Tue, Jun 29, 2010 at 11:36:56PM +0100, David Howells wrote:
> Ulrich Drepper <[email protected]> wrote:
> > And while you're at it, maybe some spare fields at the end are nice.
>
> I made it so that the syscall can return variable length data: the main xstat
> struct, plus extra records yet to be defined. They could even be variable
> length and assembled/disassembled with something like the control message
> macros for recvmsg().

The less variable length stuff the better, I think. At least,
for the stuff stat(2) already returns, you should have a fixed-size
structure. Even if I only pass the GIVE_ME_UIDS flag, I don't want to
have to deal with the variable size stuff until I've actually asked for
esoteric things. I'll know that the non-UIDS fields are garbage by the
fact that I didn't ask for them.

Joel

--

"Time is an illusion, lunchtime doubly so."
-Douglas Adams

Joel Becker
Consulting Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2010-06-29 23:00:00

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/3] Extended file stat functions

On Tue, 29 Jun 2010, David Howells wrote:

> > Ah, great, so already there :) I was looking for st_gen, as it is called
> > that way on BSD. And as BSD already has it for a long time, shouldn't linux
> > use the BSD identifier?
>
> Sure. I guess you'd also want it to be a u64?

Note the Alpha port has had an st_gen member reserved in its struct stat
for many years now ;) -- which could have been DEC OSF/1 legacy. I'm glad
to see this member seriously considered after these many years and
previously rejected proposals.

Maciej

2010-06-29 23:30:11

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

Joel Becker <[email protected]> wrote:

> The less variable length stuff the better, I think. At least,
> for the stuff stat(2) already returns, you should have a fixed-size
> structure. Even if I only pass the GIVE_ME_UIDS flag, I don't want to
> have to deal with the variable size stuff until I've actually asked for
> esoteric things. I'll know that the non-UIDS fields are garbage by the
> fact that I didn't ask for them.

I was thinking of the fixed length xstat struct plus appendable extensions to
be defined later.

I could live with each defined extension being of a fixed length, so for
example, you set bit 20, and it adds, say, a 16-byte volume ID in the
appropriate order, padded out appropriately for the filesystem.

David

2010-06-30 08:21:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Tuesday 29 June 2010 22:03:15 David Howells wrote:
> ssize_t ret = xstat(int dfd,
> const char *filename,
> unsigned atflag,
> struct xstat *buffer,
> size_t buflen);
>
> ssize_t ret = fxstat(int fd,
> struct xstat *buffer,
> size_t buflen);
>
>
> The dfd, filename, atflag and fd parameters indicate the file to query. There
> is no equivalent of lstat() as that can be emulated with xstat(), passing 0
> instead of AT_SYMLINK_NOFOLLOW as atflag.

Do we actually need the fxstat variant? IIRC, some *at syscalls just
operate on dfd when filename==NULL, which would be trivial to do here.

Arnd

2010-06-30 08:59:56

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

Arnd Bergmann <[email protected]> wrote:

> Do we actually need the fxstat variant? IIRC, some *at syscalls just
> operate on dfd when filename==NULL, which would be trivial to do here.

user_path_at() doesn't seem to work like that, so fstatat() doesn't. It's a
possibility though.

David

2010-07-01 01:12:53

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add a pair of system calls to make extended file stats available

On Wed, Jun 30, 2010 at 12:29:52AM +0100, David Howells wrote:
> Joel Becker <[email protected]> wrote:
>
> > The less variable length stuff the better, I think. At least,
> > for the stuff stat(2) already returns, you should have a fixed-size
> > structure. Even if I only pass the GIVE_ME_UIDS flag, I don't want to
> > have to deal with the variable size stuff until I've actually asked for
> > esoteric things. I'll know that the non-UIDS fields are garbage by the
> > fact that I didn't ask for them.
>
> I was thinking of the fixed length xstat struct plus appendable extensions to
> be defined later.

I meant this.

Joel

--

Life's Little Instruction Book #267

"Lie on your back and look at the stars."

Joel Becker
Consulting Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127