LinuxLists.cc - [PATCH] smbfs fsx'ed

2002-01-03 12:20:51

Subject: [PATCH] smbfs fsx'ed

Hello

Here are some updates for smbfs:
+ Drops kmap/kunmap from prepare_write/commit_write
+ Fixes two problems spotted by fsx-linux, leaving one unfixed
- Shared mmaps must be written back before closing as smb does not allow
you to write after the file is closed. Flushing written pages before
closing makes smbfs do what the application wants.
- On truncate, dirty pages need to be written out first because of the
order of smb_proc_trunc and vmtruncate. I believe this is the same
issue as NFS had.
- I can still see one problem, but only with windows servers. When
creating a hole in a file the hole does not always contain all zeroes.
As far as I can tell the command to truncate is sent to the server,
and then the page fsx errors on is re-read.
Server issue? Don't know but I can't trigger it with a samba server.
+ Rene Scharfe has added options display for /proc/mounts.

Patch vs 2.5.2-pre5, but should work for 2.5.2-pre6 and 2.5.1-dj11.
Please apply.

/Urban

diff -urN -X exclude linux-2.5.2-pre5-orig/fs/smbfs/ChangeLog linux-2.5.2-pre5-smbfs/fs/smbfs/ChangeLog
--- linux-2.5.2-pre5-orig/fs/smbfs/ChangeLog Thu Nov 8 19:01:00 2001
+++ linux-2.5.2-pre5-smbfs/fs/smbfs/ChangeLog Thu Jan 3 11:33:52 2002
@@ -1,5 +1,15 @@
ChangeLog for smbfs.

+2001-12-31 Ren? Scharfe <[email protected]>
+
+ * inode.c: added smb_show_options to show mount options in /proc/mounts
+ * inode.c, getopt.c, getopt.h: merged flag and has_arg in struct option
+ * inode.c: use S_IRWXUGO where appropriate
+
+2001-12-22 Urban Widmark <[email protected]>
+
+ * file.c, proc.c: Fix problems triggered by the "fsx test"
+
2001-09-17 Urban Widmark <[email protected]>

* proc.c: Use 4096 (was 512) as the blocksize for better write
diff -urN -X exclude linux-2.5.2-pre5-orig/fs/smbfs/file.c linux-2.5.2-pre5-smbfs/fs/smbfs/file.c
--- linux-2.5.2-pre5-orig/fs/smbfs/file.c Thu Nov 8 19:01:00 2001
+++ linux-2.5.2-pre5-smbfs/fs/smbfs/file.c Wed Jan 2 21:11:40 2002
@@ -270,7 +270,6 @@
static int smb_prepare_write(struct file *file, struct page *page,
unsigned offset, unsigned to)
{
- kmap(page);
return 0;
}

@@ -283,7 +282,6 @@
lock_kernel();
status = smb_updatepage(file, page, offset, to-offset);
unlock_kernel();
- kunmap(page);
return status;
}

@@ -349,8 +347,14 @@
smb_file_release(struct inode *inode, struct file * file)
{
lock_kernel();
- if (!--inode->u.smbfs_i.openers)
+ if (!--inode->u.smbfs_i.openers) {
+ /* We must flush any dirty pages now as we won't be able to
+ write anything after close. mmap can trigger this.
+ "openers" should perhaps include mmap'ers ... */
+ filemap_fdatasync(inode->i_mapping);
+ filemap_fdatawait(inode->i_mapping);
smb_close(inode);
+ }
unlock_kernel();
return 0;
}
diff -urN -X exclude linux-2.5.2-pre5-orig/fs/smbfs/getopt.c linux-2.5.2-pre5-smbfs/fs/smbfs/getopt.c
--- linux-2.5.2-pre5-orig/fs/smbfs/getopt.c Sun Aug 19 12:08:09 2001
+++ linux-2.5.2-pre5-smbfs/fs/smbfs/getopt.c Thu Jan 3 11:32:54 2002
@@ -46,7 +46,7 @@

for (i = 0; opts[i].name != NULL; i++) {
if (!strcmp(opts[i].name, token)) {
- if (opts[i].has_arg && (!val || !*val)) {
+ if (!opts[i].flag && (!val || !*val)) {
printk("%s: the %s option requires an argument\n",
caller, token);
return -1;
diff -urN -X exclude linux-2.5.2-pre5-orig/fs/smbfs/getopt.h linux-2.5.2-pre5-smbfs/fs/smbfs/getopt.h
--- linux-2.5.2-pre5-orig/fs/smbfs/getopt.h Mon Aug 14 22:31:10 2000
+++ linux-2.5.2-pre5-smbfs/fs/smbfs/getopt.h Thu Jan 3 11:32:54 2002
@@ -3,7 +3,6 @@

struct option {
const char *name;
- int has_arg;
unsigned long flag;
int val;
};
diff -urN -X exclude linux-2.5.2-pre5-orig/fs/smbfs/inode.c linux-2.5.2-pre5-smbfs/fs/smbfs/inode.c
--- linux-2.5.2-pre5-orig/fs/smbfs/inode.c Thu Nov 8 19:01:00 2001
+++ linux-2.5.2-pre5-smbfs/fs/smbfs/inode.c Thu Jan 3 11:41:05 2002
@@ -22,6 +22,7 @@
#include <linux/dcache.h>
#include <linux/smp_lock.h>
#include <linux/nls.h>
+#include <linux/seq_file.h>

#include <linux/smb_fs.h>
#include <linux/smbno.h>
@@ -41,9 +42,12 @@
#define SMB_NLS_REMOTE ""
#endif

+#define SMB_TTL_DEFAULT 1000
+
static void smb_delete_inode(struct inode *);
static void smb_put_super(struct super_block *);
static int smb_statfs(struct super_block *, struct statfs *);
+static int smb_show_options(struct seq_file *, struct vfsmount *);

static struct super_operations smb_sops =
{
@@ -51,6 +55,7 @@
delete_inode: smb_delete_inode,
put_super: smb_put_super,
statfs: smb_statfs,
+ show_options: smb_show_options,
};

@@ -259,21 +264,20 @@
clear_inode(ino);
}

-/* FIXME: flags and has_arg could probably be merged. */
static struct option opts[] = {
- { "version", 1, 0, 'v' },
- { "win95", 0, SMB_MOUNT_WIN95, 1 },
- { "oldattr", 0, SMB_MOUNT_OLDATTR, 1 },
- { "dirattr", 0, SMB_MOUNT_DIRATTR, 1 },
- { "case", 0, SMB_MOUNT_CASE, 1 },
- { "uid", 1, 0, 'u' },
- { "gid", 1, 0, 'g' },
- { "file_mode", 1, 0, 'f' },
- { "dir_mode", 1, 0, 'd' },
- { "iocharset", 1, 0, 'i' },
- { "codepage", 1, 0, 'c' },
- { "ttl", 1, 0, 't' },
- { NULL, 0, 0, 0}
+ { "version", 0, 'v' },
+ { "win95", SMB_MOUNT_WIN95, 1 },
+ { "oldattr", SMB_MOUNT_OLDATTR, 1 },
+ { "dirattr", SMB_MOUNT_DIRATTR, 1 },
+ { "case", SMB_MOUNT_CASE, 1 },
+ { "uid", 0, 'u' },
+ { "gid", 0, 'g' },
+ { "file_mode", 0, 'f' },
+ { "dir_mode", 0, 'd' },
+ { "iocharset", 0, 'i' },
+ { "codepage", 0, 'c' },
+ { "ttl", 0, 't' },
+ { NULL, 0, 0}
};

static int
@@ -310,12 +314,10 @@
mnt->gid = value;
break;
case 'f':
- mnt->file_mode = value & (S_IRWXU | S_IRWXG | S_IRWXO);
- mnt->file_mode |= S_IFREG;
+ mnt->file_mode = (value & S_IRWXUGO) | S_IFREG;
break;
case 'd':
- mnt->dir_mode = value & (S_IRWXU | S_IRWXG | S_IRWXO);
- mnt->dir_mode |= S_IFDIR;
+ mnt->dir_mode = (value & S_IRWXUGO) | S_IFDIR;
break;
case 'i':
strncpy(mnt->codepage.local_name, optarg,
@@ -338,6 +340,45 @@
return c;
}

+/*
+ * smb_show_options() is for displaying mount options in /proc/mounts.
+ * It tries to avoid showing settings that were not changed from their
+ * defaults.
+ */
+static int
+smb_show_options(struct seq_file *s, struct vfsmount *m)
+{
+ struct smb_mount_data_kernel *mnt = m->mnt_sb->u.smbfs_sb.mnt;
+ int i;
+
+ for (i = 0; opts[i].name != NULL; i++)
+ if (mnt->flags & opts[i].flag)
+ seq_printf(s, ",%s", opts[i].name);
+
+ if (mnt->uid != 0)
+ seq_printf(s, ",uid=%d", mnt->uid);
+ if (mnt->gid != 0)
+ seq_printf(s, ",gid=%d", mnt->gid);
+ if (mnt->mounted_uid != 0)
+ seq_printf(s, ",mounted_uid=%d", mnt->mounted_uid);
+
+ /*
+ * Defaults for file_mode and dir_mode are unknown to us; they
+ * depend on the current umask of the user doing the mount.
+ */
+ seq_printf(s, ",file_mode=%04o", mnt->file_mode & S_IRWXUGO);
+ seq_printf(s, ",dir_mode=%04o", mnt->dir_mode & S_IRWXUGO);
+
+ if (strcmp(mnt->codepage.local_name, CONFIG_NLS_DEFAULT))
+ seq_printf(s, ",iocharset=%s", mnt->codepage.local_name);
+ if (strcmp(mnt->codepage.remote_name, SMB_NLS_REMOTE))
+ seq_printf(s, ",codepage=%s", mnt->codepage.remote_name);
+
+ if (mnt->ttl != SMB_TTL_DEFAULT)
+ seq_printf(s, ",ttl=%d", mnt->ttl);
+
+ return 0;
+}

static void
smb_put_super(struct super_block *sb)
@@ -425,7 +466,7 @@
strncpy(mnt->codepage.remote_name, SMB_NLS_REMOTE,
SMB_NLS_MAXNAMELEN);

- mnt->ttl = 1000;
+ mnt->ttl = SMB_TTL_DEFAULT;
if (ver == SMB_MOUNT_OLDVERSION) {
mnt->version = oldmnt->version;

@@ -434,12 +475,8 @@
mnt->uid = oldmnt->uid;
mnt->gid = oldmnt->gid;

- mnt->file_mode =
- oldmnt->file_mode & (S_IRWXU | S_IRWXG | S_IRWXO);
- mnt->dir_mode =
- oldmnt->dir_mode & (S_IRWXU | S_IRWXG | S_IRWXO);
- mnt->file_mode |= S_IFREG;
- mnt->dir_mode |= S_IFDIR;
+ mnt->file_mode = (oldmnt->file_mode & S_IRWXUGO) | S_IFREG;
+ mnt->dir_mode = (oldmnt->dir_mode & S_IRWXUGO) | S_IFDIR;

mnt->flags = (oldmnt->file_mode >> 9);
} else {
@@ -510,7 +547,7 @@
{
struct inode *inode = dentry->d_inode;
struct smb_sb_info *server = server_from_dentry(dentry);
- unsigned int mask = (S_IFREG | S_IFDIR | S_IRWXU | S_IRWXG | S_IRWXO);
+ unsigned int mask = (S_IFREG | S_IFDIR | S_IRWXUGO);
int error, changed, refresh = 0;
struct smb_fattr fattr;

@@ -535,6 +572,10 @@
VERBOSE("changing %s/%s, old size=%ld, new size=%ld\n",
DENTRY_PATH(dentry),
(long) inode->i_size, (long) attr->ia_size);
+
+ filemap_fdatasync(inode->i_mapping);
+ filemap_fdatawait(inode->i_mapping);
+
error = smb_open(dentry, O_WRONLY);
if (error)
goto out;

2002-01-03 21:38:35

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On Thu, 3 Jan 2002, Urban Widmark wrote:
>
> Patch vs 2.5.2-pre5, but should work for 2.5.2-pre6 and 2.5.1-dj11.
> Please apply.

Applied.

Btw, Urban, is anybody working on trying to do "{read|write}page()"
asynchronously? I assume that IO performance on smbfs must be quite
horrible with totally synchronous IO..

(Not as horrible as the NCPFS thing that doesn't understand about the page
cache at all, but still..)

Linus

2002-01-03 22:48:12

by Dan Kegel

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

Linus Torvalds <[email protected]> wrote:
> Btw, Urban, is anybody working on trying to do "{read|write}page()"
> asynchronously? I assume that IO performance on smbfs must be quite
> horrible with totally synchronous IO..

I use smbfs to mount a visual sourcesafe database,
and run ss via wine. The combination is very slow.
Don't know how much of it is wine, and how much is smbfs,
but any speedup would be greatly appreciated.

(Eventually, wine will bundle its own smb code, but for
now if you want to access network shares, smbfs is the only way.)
- Dan

p.s. see http://www.kegel.com/linux/vss-howto.html

2002-01-03 22:50:02

by Urban Widmark

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On Thu, 3 Jan 2002, Linus Torvalds wrote:

> Btw, Urban, is anybody working on trying to do "{read|write}page()"
> asynchronously? I assume that IO performance on smbfs must be quite
> horrible with totally synchronous IO..

Yes. My current list of work:
! Large File Support
! Unicode Support
+ Async requests
+ Fcntl locking + smb "oplock" support
- Async readpage/writepage (and readahead)
- Merge read or write requests for one file into a larger request.
I am guessing that is how NFS does it.
+ smbconnect instead of smbmount. smbfs calls to a userspace program to
set up the connection (a la modprobe) and is mounted by mount directly
like any other fs.
- Review of smbfs BKL use. I believe a lot of it is unnecessary and just
an effect of the VFS-level changes.

! = ready, waiting for the previous patch to be accepted & released
+ = code is almost working (90% done, if that joke is familiar :)
- = not actual patches yet ... more like ideas and that I know where to
find some code to borrow from :)

The current code synchronizes all threads on the same mount since all
threads use "server->packet" as a buffer for send and receive. I have some
code where I have tried to copy how I believe nfs does things with a
"struct request" for each caller.

The requests are filled in and put on a per-server transmit queue and the
caller goes to sleep or returns. There is a smbiod kernel thread that does
the actual send and receive and wakes up the callers. Currently just one
smbiod as that is a little bit easier than having, say, one per processor.

I'm currently looking at a race where it hangs if I run 4 fsx-linux
processes at once. But otherwise it seems to work.

This should be a big win for having more than one process at a time doing
something on a smbfs mount. But I haven't benchmarked it yet.

Oplocks are interesting, they can cause the server to send a request
(a break) to the client and the current smbfs sock.c code doesn't really
expect that. This is the original reason for smbiod and the async request
handling.

With an async request handling making readpage/writepage async shouldn't
be too horrible. NFS has some kind of callbacks ... how hard can it be :)

/Urban

2002-01-03 22:57:12

by Petr Vandrovec

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On 3 Jan 02 at 13:37, Linus Torvalds wrote:
>
> (Not as horrible as the NCPFS thing that doesn't understand about the page
> cache at all, but still..)

Unfortunately it is not easy for me to add pagecache support
to ncpfs, as couple of ncpfs users uses ncpfs in shared environment
with database record locking, and if I'll now read full 4KB instead of
128B record, it can clash with records locked by other clients.

I can for sure add `leases' like Novell Client for Windows does for
possibility of file caching, but I'm not sure whether size of code
needed for supporting this (and for supporting server driven
cache flushes) is worth of effort.
Best regards,
Petr Vandrovec
[email protected]

P.S.: And as NCP protocol is totally synchronous (even if it uses
TCP, I explicitly asked in Utah), only local file caching can increase
ncpfs performance, as there is no such thing like asynchronous file
read/write...

2002-01-03 23:03:23

by Urban Widmark

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On Thu, 3 Jan 2002, Dan Kegel wrote:

> I use smbfs to mount a visual sourcesafe database,
> and run ss via wine. The combination is very slow.
> Don't know how much of it is wine, and how much is smbfs,
> but any speedup would be greatly appreciated.

SS is really painful on a high latency connection, CVS is a wonder of
efficiency in comparison. At least that is my experience. But that is
probably not your setup.
(and yes, I understood that you compared with windows performance)

You may want to know that smbfs does not do any file locking. I don't know
if SS depends on that or not. I do know that some people have tried
running dos based database programs in dosemu accessing a database over
smbfs with database corruption as a result.

> (Eventually, wine will bundle its own smb code, but for
> now if you want to access network shares, smbfs is the only way.)
> - Dan
>
> p.s. see http://www.kegel.com/linux/vss-howto.html

Very nice. The commandline SS client would never work for me under wine.
I'll have to try that again.

/Urban

2002-01-03 23:50:00

by Urban Widmark

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On Thu, 3 Jan 2002, Petr Vandrovec wrote:

> On 3 Jan 02 at 13:37, Linus Torvalds wrote:
> >
> > (Not as horrible as the NCPFS thing that doesn't understand about the page
> > cache at all, but still..)
>
> Unfortunately it is not easy for me to add pagecache support
> to ncpfs, as couple of ncpfs users uses ncpfs in shared environment
> with database record locking, and if I'll now read full 4KB instead of
> 128B record, it can clash with records locked by other clients.

Does the locks prevent you from even looking? You could read only the
parts requested if the file has locks and fill the rest with 0. Only using
the page cache if there are no locks. Not too pretty but ...

I was thinking about writes when you wrote that.

A write of 128 bytes to a file cause a commit_write of 128 bytes, if I am
reading generic_file_write correctly. So that should not cause it to write
the full page and that would be ok for the locking case.

You only read the 128 bytes the user has locked and requested, and then
you only write those bytes. The userspace won't care about the parts of
the page that is 0 and since the file is being shared you will have to
re-read the data on the next syscall anyway.

> I can for sure add `leases' like Novell Client for Windows does for
> possibility of file caching, but I'm not sure whether size of code
> needed for supporting this (and for supporting server driven
> cache flushes) is worth of effort.

smbfs needs these for cooperating clients to work. It can only cache data
if it has a lease. If someone else is also accessing the file then each
smb_file_read must re-read the page.

shared mmaps and SMB oplocks doesn't seem to mix however. Hard to know
when to invalidate a page ... what is caching and what is the "read"?

> P.S.: And as NCP protocol is totally synchronous (even if it uses
> TCP, I explicitly asked in Utah), only local file caching can increase
> ncpfs performance, as there is no such thing like asynchronous file
> read/write...

SMB has no async read/write, but all requests are marked with an ID and it
is allowed to have a certain number of simultaneous requests in transit.

Even without multiple requests you could let ncpfs accept one read
request, send that to the server and return without waiting for the reply.
The readahead code may then queue up the next request for ncpfs, and ncpfs
could process that while the previously read page is returned to the user.

(Warning: I don't know if that is how it actually would work ...
I am looking at "readpage:" of do_generic_file_read, mm/filemap.c)

I am not saying that it would be worth the effort.

/Urban

2002-01-04 11:44:20

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

>>>>> " " == Urban Widmark <[email protected]> writes:

> The current code synchronizes all threads on the same mount
> since all threads use "server->packet" as a buffer for send and
> receive. I have some code where I have tried to copy how I
> believe nfs does things with a "struct request" for each
> caller.

All NFS does is to wrap the pages to read/write with a struct
'nfs_page' that allows us to string them together in a list. When
somebody calls sync_page() or decides to flush out the pending writes,
we collate these 'nfs_page' things into appropriately sized private
lists (NFS has a server-provided upper limit on the number of bytes
you can send) and generate an RPC call.
In addition, there is code to limit the total number of pending
nfs_page structs (in order to avoid trouble due to flooding memory
with cached requests), and for managing request timeouts.

See the files include/linux/nfs_page.h, and fs/nfs/pagelist.c for details.

The struct nfs_page does contain one or 2 entries which are
NFS-specific (the RPC credential and commit cookie), but if you ignore
them, the rest of the machinery should be fairly easily adaptable for
reuse in the SMB code. One would perhaps have to rip out the
NFS_SERVER() stuff in pagelist.c (which is used to maintain a couple
of mount-global lists), and replace it with a slightly more generic
interface, but that's all trivial stuff.

Cheers,
Trond

2002-01-04 12:24:04

by Petr Vandrovec

[permalink] [raw]

Subject: Re: [PATCH] smbfs fsx'ed

On 4 Jan 02 at 0:48, Urban Widmark wrote:
> On Thu, 3 Jan 2002, Petr Vandrovec wrote:
>
> > On 3 Jan 02 at 13:37, Linus Torvalds wrote:
> > >
> > > (Not as horrible as the NCPFS thing that doesn't understand about the page
> > > cache at all, but still..)
> >
> > Unfortunately it is not easy for me to add pagecache support
> > to ncpfs, as couple of ncpfs users uses ncpfs in shared environment
> > with database record locking, and if I'll now read full 4KB instead of
> > 128B record, it can clash with records locked by other clients.
>
> Does the locks prevent you from even looking? You could read only the
> parts requested if the file has locks and fill the rest with 0. Only using
> the page cache if there are no locks. Not too pretty but ...

It results in short read when there is partial overlap, or error if
overlap is complete.

> A write of 128 bytes to a file cause a commit_write of 128 bytes, if I am
> reading generic_file_write correctly. So that should not cause it to write
> the full page and that would be ok for the locking case.

OK. Thanks.

> > I can for sure add `leases' like Novell Client for Windows does for
> > possibility of file caching, but I'm not sure whether size of code
> > needed for supporting this (and for supporting server driven
> > cache flushes) is worth of effort.
>
> smbfs needs these for cooperating clients to work. It can only cache data
> if it has a lease. If someone else is also accessing the file then each
> smb_file_read must re-read the page.

It is same for Netware, with minor difference that NW4 refuses 'leases'
by default, and on NW5 it should be disabled by default, as otherwise
server can go mad when some client errors happen.

> > P.S.: And as NCP protocol is totally synchronous (even if it uses
> > TCP, I explicitly asked in Utah), only local file caching can increase
> > ncpfs performance, as there is no such thing like asynchronous file
> > read/write...
>
> SMB has no async read/write, but all requests are marked with an ID and it
> is allowed to have a certain number of simultaneous requests in transit.

That's difference. With NCP you can have only one request in flight.
For sure on IPX and UDP, and according to Novell even on TCP.

> Even without multiple requests you could let ncpfs accept one read
> request, send that to the server and return without waiting for the reply.

On IPX maximum compatible block size is 1024 bytes... so reading one page
requires 4 exchanges. On UDP/TCP it is easier, but unfortunately majority
of installed servers is still IPX-only (at least I have this feedback
from users).

> The readahead code may then queue up the next request for ncpfs, and ncpfs
> could process that while the previously read page is returned to the user.

It would require either another kernel thread doing readahead, or I have
to submit new request from IPX/UDP data_ready callbacks (for TCP it needs
another thread without discussion, as data_ready callback documentation
says that I cannot read data/send new data directly from it... or maybe
special ncpfs bottomhalf).
Best regards,
Petr Vandrovec
[email protected]