2002-01-17 13:52:11

by Rainer Krienke

[permalink] [raw]
Subject: 2.4.17:Increase number of anonymous filesystems beyond 256?

Hello,

I have to increase the number of anonymous filesystems the kernel can handle
and found the array unnamed_dev_in_use fs/super.c and changed the array size
from the default of 256 to 1024. Testing this patch by mounting more and more
NFS-filesystems I found that still no more than 800 NFS mounts are possible.
One more mount results in the kernel saying:

Jan 17 14:03:11 gl kernel: RPC: Can't bind to reserved port (98).
Jan 17 14:03:11 gl kernel: NFS: cannot create RPC transport.
Jan 17 14:03:11 gl kernel: nfs warning: mount version older than kernel

This bug can easily be reproduced any time. Does anyone know how to overcome
this strange limitation? I am really in need of a solution to get a server
running. Please help.

Thanks
Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------


2002-01-17 17:50:17

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Rainer Krienke <[email protected]> writes:

> Hello,
>
> I have to increase the number of anonymous filesystems the kernel can handle
> and found the array unnamed_dev_in_use fs/super.c and changed the array size
> from the default of 256 to 1024. Testing this patch by mounting more and more
> NFS-filesystems I found that still no more than 800 NFS mounts are possible.
> One more mount results in the kernel saying:
>
> Jan 17 14:03:11 gl kernel: RPC: Can't bind to reserved port (98).

Some NFS servers only accept connections from 'secure ports' < 1024.
You need one local port per connections. Some Ports <1024 are already
used by other services. This a natural limit with secure ports.

If you can configure your NFS server to not insist on secure ports
(it's usually an export option, unfortunately defaulting to on with
many OS) you can just use any ports. With the appended patch it should
work
[should be probably a sysctl instead of an #ifdef, but I'm too lazy
for that right now]

Another way if you wanted to avoid source patching would be to make
sure that different connections go via different source IP addresses;
e.g. by defining multiple IP aliases on the server and the client and
defining different routes for the server aliases with different local
ip addresses as prefered from address (=from option in iproute2).
This way the ports would be unique again because they're for multiple local
IP addresses; you would have 800 ports per local IP. Using the patch is
probably cleaner though.

Hope this helps,
-Andi

--- linux-work/net/sunrpc/xprt.c-o Mon Oct 8 21:36:07 2001
+++ linux-work/net/sunrpc/xprt.c Thu Jan 17 18:44:05 2002
@@ -1507,6 +1507,13 @@

memset(&myaddr, 0, sizeof(myaddr));
myaddr.sin_family = AF_INET;
+#define SUNRPC_NONSECURE_PORT 1
+#ifdef SUNRPC_NONSECURE_PORT
+ err = sock->ops->bind(sock, (struct sockaddr *) &myaddr,
+ sizeof(myaddr));
+ if (err < 0)
+ printk("RPC: cannot bind to a port\n");
+#else
port = 800;
do {
myaddr.sin_port = htons(port);
@@ -1516,6 +1523,9 @@

if (err < 0)
printk("RPC: Can't bind to reserved port (%d).\n", -err);
+
+#endif
+

return err;
}


-Andi

2002-01-17 18:55:23

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

>[from linux-kernel]
> I have to increase the number of anonymous filesystems the kernel can handle
> and found the array unnamed_dev_in_use fs/super.c and changed the array size
> from the default of 256 to 1024. Testing this patch by mounting more and more
> NFS-filesystems I found that still no more than 800 NFS mounts are possible.
> One more mount results in the kernel saying:
>
> Jan 17 14:03:11 gl kernel: RPC: Can't bind to reserved port (98).
> Jan 17 14:03:11 gl kernel: NFS: cannot create RPC transport.
> Jan 17 14:03:11 gl kernel: nfs warning: mount version older than kernel

I did that. You also need a small fix to mount(8) that adds
a mount argument "-o nores". I've got an RPM at my website.

Initially I did a sysctl, but Trond M. asked for a mount
argument, in case you have to mount from several servers,
some of which require reserved ports, some do not.
Our NetApps work ok with non-reserved ports on clients.

I am surprised anyone is interested. If you need more than 800
mounts I think your system planning may be screwed.

Anyone cares to review? Trond? Viro?

-- Pete

--- linux-2.4.9-unmaj-6.diff ---
Copyright 2001 Red Hat, Inc.
GPL v2 - XXX fill in the legal blob.

diff -ur -X dontdiff linux-2.4.9-18.3/Documentation/Configure.help linux-2.4.9-18.3-p3/Documentation/Configure.help
--- linux-2.4.9-18.3/Documentation/Configure.help Tue Dec 18 13:01:06 2001
+++ linux-2.4.9-18.3-p3/Documentation/Configure.help Tue Dec 18 13:53:25 2001
@@ -23926,4 +23926,13 @@
in the lm_sensors package, which you can download at
http://www.lm-sensors.nu

+Additional unnamed block majors
+CONFIG_MORE_UNNAMED_MAJORS
+ This option allows to use majors 12, 14, 38, and 39 in addition to
+ major 0 for unnamed block devices, thus letting you to mount 1279
+ virtual filesystems.
+
+ If unsure, answer N. Thousands of mount points are unlikely to work
+ anyways.
+
# End:
diff -ur -X dontdiff linux-2.4.9-18.3/Makefile linux-2.4.9-18.3-p3/Makefile
--- linux-2.4.9-18.3/Makefile Tue Dec 18 13:10:50 2001
+++ linux-2.4.9-18.3-p3/Makefile Thu Jan 3 17:02:41 2002
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 4
SUBLEVEL = 9
-EXTRAVERSION = -18.3
+EXTRAVERSION = -18.3-p3

KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

@@ -339,7 +339,8 @@
$(TOPDIR)/include/linux/compile.h: include/linux/compile.h

newversion:
- . scripts/mkversion > .version
+ . scripts/mkversion > .tmpversion
+ @mv -f .tmpversion .version

include/linux/compile.h: $(CONFIGURATION) include/linux/version.h newversion
@echo -n \#define UTS_VERSION \"\#`cat .version` > .ver
diff -ur -X dontdiff linux-2.4.9-18.3/fs/Config.in linux-2.4.9-18.3-p3/fs/Config.in
--- linux-2.4.9-18.3/fs/Config.in Tue Dec 18 13:00:48 2001
+++ linux-2.4.9-18.3-p3/fs/Config.in Tue Dec 18 13:58:31 2001
@@ -137,6 +137,8 @@
define_bool CONFIG_SMB_FS n
fi

+bool 'More majors for unnamed block devices' CONFIG_MORE_UNNAMED_MAJORS
+
mainmenu_option next_comment
comment 'Partition Types'
source fs/partitions/Config.in
diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/clntproc.c linux-2.4.9-18.3-p3/fs/lockd/clntproc.c
--- linux-2.4.9-18.3/fs/lockd/clntproc.c Tue Dec 18 13:01:03 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/clntproc.c Mon Jan 7 13:28:29 2002
@@ -107,6 +107,7 @@
sigset_t oldset;
unsigned long flags;
int status, proto, vers;
+ int resport;

vers = (NFS_PROTO(inode)->version == 3) ? 4 : 1;
if (NFS_PROTO(inode)->version > 3) {
@@ -116,6 +117,7 @@

/* Retrieve transport protocol from NFS client */
proto = NFS_CLIENT(inode)->cl_xprt->prot;
+ resport = NFS_CLIENT(inode)->cl_xprt->resport;

if (!(host = nlmclnt_lookup_host(NFS_ADDR(inode), proto, vers)))
return -ENOLCK;
@@ -127,7 +129,7 @@

/* Bind an rpc client to this host handle (does not
* perform a portmapper lookup) */
- if (!(clnt = nlm_bind_host(host))) {
+ if (!(clnt = nlm_bind_host(host, resport))) {
status = -ENOLCK;
goto done;
}
@@ -162,6 +164,7 @@
locks_init_lock(&call->a_res.lock.fl);
}
call->a_host = host;
+ call->a_resport = resport;

/* Set up the argument struct */
nlmclnt_setlockargs(call, fl);
@@ -260,7 +263,7 @@
}

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* Perform the RPC call. If an error occurs, try again */
@@ -328,7 +331,7 @@
nlm_procname(proc), host->h_name);

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* bootstrap and kick off the async RPC call */
@@ -356,7 +359,7 @@
nlm_procname(proc), host->h_name);

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* bootstrap and kick off the async RPC call */
diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/host.c linux-2.4.9-18.3-p3/fs/lockd/host.c
--- linux-2.4.9-18.3/fs/lockd/host.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/host.c Mon Jan 7 12:34:26 2002
@@ -163,7 +163,7 @@
* Create the NLM RPC client for an NLM peer
*/
struct rpc_clnt *
-nlm_bind_host(struct nlm_host *host)
+nlm_bind_host(struct nlm_host *host, int resport)
{
struct rpc_clnt *clnt;
struct rpc_xprt *xprt;
@@ -187,15 +187,19 @@
host->h_nextrebind - jiffies);
}
} else {
- uid_t saved_fsuid = current->fsuid;
- kernel_cap_t saved_cap = current->cap_effective;
+ if (resport) {
+ uid_t saved_fsuid = current->fsuid;
+ kernel_cap_t saved_cap = current->cap_effective;

- /* Create RPC socket as root user so we get a priv port */
- current->fsuid = 0;
- cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE);
- xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL);
- current->fsuid = saved_fsuid;
- current->cap_effective = saved_cap;
+ /* Create RPC socket as root user so we get a priv port */
+ current->fsuid = 0;
+ cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE);
+ xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL, 1);
+ current->fsuid = saved_fsuid;
+ current->cap_effective = saved_cap;
+ } else {
+ xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL, 0);
+ }
if (xprt == NULL)
goto forgetit;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/mon.c linux-2.4.9-18.3-p3/fs/lockd/mon.c
--- linux-2.4.9-18.3/fs/lockd/mon.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/mon.c Sun Jan 6 01:08:03 2002
@@ -110,7 +110,7 @@
sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
sin.sin_port = 0;

- xprt = xprt_create_proto(IPPROTO_UDP, &sin, NULL);
+ xprt = xprt_create_proto(IPPROTO_UDP, &sin, NULL, 1);
if (!xprt)
goto out;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/nfs/inode.c linux-2.4.9-18.3-p3/fs/nfs/inode.c
--- linux-2.4.9-18.3/fs/nfs/inode.c Tue Dec 18 13:01:21 2001
+++ linux-2.4.9-18.3-p3/fs/nfs/inode.c Mon Jan 7 13:31:06 2002
@@ -351,7 +351,7 @@

/* Now create transport and client */
xprt = xprt_create_proto(tcp? IPPROTO_TCP : IPPROTO_UDP,
- &srvaddr, &timeparms);
+ &srvaddr, &timeparms, (data->flags & NFS_MOUNT_NORES) == 0);
if (xprt == NULL)
goto out_no_xprt;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/nfs/mount_clnt.c linux-2.4.9-18.3-p3/fs/nfs/mount_clnt.c
--- linux-2.4.9-18.3/fs/nfs/mount_clnt.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/nfs/mount_clnt.c Sun Jan 6 01:07:12 2002
@@ -82,7 +82,7 @@
struct rpc_xprt *xprt;
struct rpc_clnt *clnt;

- if (!(xprt = xprt_create_proto(IPPROTO_UDP, srvaddr, NULL)))
+ if (!(xprt = xprt_create_proto(IPPROTO_UDP, srvaddr, NULL, 1)))
return NULL;

clnt = rpc_create_client(xprt, hostname,
diff -ur -X dontdiff linux-2.4.9-18.3/fs/super.c linux-2.4.9-18.3-p3/fs/super.c
--- linux-2.4.9-18.3/fs/super.c Tue Dec 18 13:00:59 2001
+++ linux-2.4.9-18.3-p3/fs/super.c Wed Dec 19 10:56:35 2001
@@ -516,27 +516,57 @@
* filesystems which don't use real block-devices. -- jrs
*/

-static unsigned long unnamed_dev_in_use[256/(8*sizeof(unsigned long))];
+static int unnamed_dev_majors[] = {
+ UNNAMED_MAJOR,
+#ifdef CONFIG_MORE_UNNAMED_MAJORS /* Always on, keeps Configure.help */
+ 12, 14, 38, 39,
+#endif
+};
+#define UNNAMED_NMAJ (sizeof(unnamed_dev_majors)/sizeof(int))
+
+static int unnamed_dev_nmaj = 1;
+static int unnamed_maj_in_use[UNNAMED_NMAJ] = { UNNAMED_MAJOR, };
+static unsigned long unnamed_dev_in_use[(UNNAMED_NMAJ*256)/(8*sizeof(long))];
+
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+void majorhog_init(void);
+#endif

kdev_t get_unnamed_dev(void)
{
int i;

- for (i = 1; i < 256; i++) {
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+ if (!test_and_set_bit(0, unnamed_maj_in_use)) { /* first call */
+ /* relatively SMP safe: only adds majors and does it once */
+ majorhog_init();
+ }
+#endif
+
+ /* find_first_zero_bit isn't atomic */
+ for (i = 1; i < unnamed_dev_nmaj*256; i++) {
if (!test_and_set_bit(i,unnamed_dev_in_use))
- return MKDEV(UNNAMED_MAJOR, i);
+ return MKDEV(unnamed_maj_in_use[i/256], i & 255);
}
+
return 0;
}

void put_unnamed_dev(kdev_t dev)
{
- if (!dev || MAJOR(dev) != UNNAMED_MAJOR)
- return;
- if (test_and_clear_bit(MINOR(dev), unnamed_dev_in_use))
+ int i;
+
+ if (!dev)
return;
- printk("VFS: put_unnamed_dev: freeing unused device %s\n",
- kdevname(dev));
+ for (i = 0; i < unnamed_dev_nmaj; i++) {
+ if (unnamed_maj_in_use[i] == MAJOR(dev)) {
+ if (test_and_clear_bit(i * 256 + MINOR(dev), unnamed_dev_in_use))
+ return;
+ printk("VFS: put_unnamed_dev: freeing unused device %s\n",
+ kdevname(dev));
+ return;
+ }
+ }
}

static int grab_super(struct super_block *sb)
@@ -1090,3 +1120,41 @@
return;
}
}
+
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+
+/* #include <linux/major.h> */
+/* #include <linux/errno.h> */
+/* #include <linux/fs.h> */
+#include <linux/devfs_fs_kernel.h>
+
+static int majorhog_open(struct inode *inode, struct file *file)
+{
+ return -EDOM; /* Something rididculous for identification */
+}
+
+static struct block_device_operations majorhog_fops = {
+ open: majorhog_open,
+};
+
+void majorhog_init(void)
+{
+ int i, j;
+
+ if (unnamed_dev_nmaj != 1)
+ return;
+
+ j = 1;
+ for (i = 1; i < UNNAMED_NMAJ; i++) {
+ if (devfs_register_blkdev(unnamed_dev_majors[i],
+ "unnamed", &majorhog_fops) == 0) {
+ unnamed_maj_in_use[j++] = unnamed_dev_majors[i];
+ } else {
+ printk(KERN_WARNING "Unable to hog major number %d\n",
+ unnamed_dev_majors[i]);
+ }
+ }
+ unnamed_dev_nmaj = j;
+}
+
+#endif /* CONFIG_MORE_UNNAMED_MAJORS */
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/lockd/lockd.h linux-2.4.9-18.3-p3/include/linux/lockd/lockd.h
--- linux-2.4.9-18.3/include/linux/lockd/lockd.h Wed Aug 15 14:24:26 2001
+++ linux-2.4.9-18.3-p3/include/linux/lockd/lockd.h Mon Jan 7 12:42:58 2002
@@ -63,6 +63,7 @@
#define NLMCLNT_OHSIZE (sizeof(system_utsname.nodename)+10)
struct nlm_rqst {
unsigned int a_flags; /* initial RPC task flags */
+ int a_resport;
struct nlm_host * a_host; /* host handle */
struct nlm_args a_args; /* arguments */
struct nlm_res a_res; /* result */
@@ -144,7 +145,7 @@
struct nlm_host * nlmsvc_lookup_host(struct svc_rqst *);
struct nlm_host * nlm_lookup_host(struct svc_client *,
struct sockaddr_in *, int, int);
-struct rpc_clnt * nlm_bind_host(struct nlm_host *);
+struct rpc_clnt * nlm_bind_host(struct nlm_host *, int);
void nlm_rebind_host(struct nlm_host *);
struct nlm_host * nlm_get_host(struct nlm_host *);
void nlm_release_host(struct nlm_host *);
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/nfs_mount.h linux-2.4.9-18.3-p3/include/linux/nfs_mount.h
--- linux-2.4.9-18.3/include/linux/nfs_mount.h Tue Dec 18 13:00:52 2001
+++ linux-2.4.9-18.3-p3/include/linux/nfs_mount.h Sun Jan 6 01:10:53 2002
@@ -54,6 +54,7 @@
#define NFS_MOUNT_KERBEROS 0x0100 /* 3 */
#define NFS_MOUNT_NONLM 0x0200 /* 3 */
#define NFS_MOUNT_BROKEN_SUID 0x0400 /* 4 */
+#define NFS_MOUNT_NORES 0x0800 /* ? XXX */
#define NFS_MOUNT_FLAGMASK 0xFFFF

#endif
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/sunrpc/xprt.h linux-2.4.9-18.3-p3/include/linux/sunrpc/xprt.h
--- linux-2.4.9-18.3/include/linux/sunrpc/xprt.h Wed Aug 15 14:24:26 2001
+++ linux-2.4.9-18.3-p3/include/linux/sunrpc/xprt.h Mon Jan 7 01:48:40 2002
@@ -143,7 +143,8 @@
nocong : 1, /* no congestion control */
stream : 1, /* TCP */
tcp_more : 1, /* more record fragments */
- connecting : 1; /* being reconnected */
+ connecting : 1, /* being reconnected */
+ resport : 1; /* use reserved port */

/*
* State of TCP reply receive stuff
@@ -171,7 +172,8 @@
#ifdef __KERNEL__

struct rpc_xprt * xprt_create_proto(int proto, struct sockaddr_in *addr,
- struct rpc_timeout *toparms);
+ struct rpc_timeout *toparms,
+ int use_res_port);
int xprt_destroy(struct rpc_xprt *);
void xprt_shutdown(struct rpc_xprt *);
void xprt_default_timeout(struct rpc_timeout *, int);
diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c
--- linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c Wed Jun 21 12:43:37 2000
+++ linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c Mon Jan 7 12:59:54 2002
@@ -189,7 +189,7 @@
struct rpc_clnt *clnt;

/* printk("pmap: create xprt\n"); */
- if (!(xprt = xprt_create_proto(proto, srvaddr, NULL)))
+ if (!(xprt = xprt_create_proto(proto, srvaddr, NULL, 0)))
return NULL;
xprt->addr.sin_port = htons(RPC_PMAP_PORT);

diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/xprt.c linux-2.4.9-18.3-p3/net/sunrpc/xprt.c
--- linux-2.4.9-18.3/net/sunrpc/xprt.c Tue Dec 18 13:00:54 2001
+++ linux-2.4.9-18.3-p3/net/sunrpc/xprt.c Mon Jan 7 13:26:30 2002
@@ -97,7 +97,7 @@
static void xprt_reserve_status(struct rpc_task *task);
static void xprt_disconnect(struct rpc_xprt *);
static void xprt_reconn_status(struct rpc_task *task);
-static struct socket *xprt_create_socket(int, struct rpc_timeout *);
+static struct socket *xprt_create_socket(int, struct rpc_timeout *, int);
static int xprt_bind_socket(struct rpc_xprt *, struct socket *);
static void xprt_remove_pending(struct rpc_xprt *);

@@ -434,7 +434,8 @@
status = -ENOTCONN;
if (!inet) {
/* Create an unconnected socket */
- if (!(sock = xprt_create_socket(xprt->prot, &xprt->timeout)))
+ if (!(sock = xprt_create_socket(xprt->prot, &xprt->timeout,
+ xprt->resport)))
goto defer;
xprt_bind_socket(xprt, sock);
inet = sock->sk;
@@ -1459,7 +1460,7 @@
*/
static struct rpc_xprt *
xprt_setup(struct socket *sock, int proto,
- struct sockaddr_in *ap, struct rpc_timeout *to)
+ struct sockaddr_in *ap, struct rpc_timeout *to, int use_resport)
{
struct rpc_xprt *xprt;
struct rpc_rqst *req;
@@ -1504,6 +1505,8 @@

INIT_LIST_HEAD(&xprt->rx_pending);

+ xprt->resport = use_resport;
+
dprintk("RPC: created transport %p\n", xprt);

xprt_bind_socket(xprt, sock);
@@ -1574,7 +1577,7 @@
* Create a client socket given the protocol and peer address.
*/
static struct socket *
-xprt_create_socket(int proto, struct rpc_timeout *to)
+xprt_create_socket(int proto, struct rpc_timeout *to, int resport)
{
struct socket *sock;
int type, err;
@@ -1590,7 +1593,8 @@
}

/* If the caller has the capability, bind to a reserved port */
- if (capable(CAP_NET_BIND_SERVICE) && xprt_bindresvport(sock) < 0)
+ if (resport &&
+ capable(CAP_NET_BIND_SERVICE) && xprt_bindresvport(sock) < 0)
goto failed;

return sock;
@@ -1604,17 +1608,18 @@
* Create an RPC client transport given the protocol and peer address.
*/
struct rpc_xprt *
-xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to,
+ int use_resport)
{
struct socket *sock;
struct rpc_xprt *xprt;

dprintk("RPC: xprt_create_proto called\n");

- if (!(sock = xprt_create_socket(proto, to)))
+ if (!(sock = xprt_create_socket(proto, to, use_resport)))
return NULL;

- if (!(xprt = xprt_setup(sock, proto, sap, to)))
+ if (!(xprt = xprt_setup(sock, proto, sap, to, use_resport)))
sock_release(sock);

return xprt;

2002-01-18 12:12:51

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Thursday, 17. January 2002 19:55, Pete Zaitcev wrote:
> >[from linux-kernel]
> > I have to increase the number of anonymous filesystems the kernel can
> > handle and found the array unnamed_dev_in_use fs/super.c and changed the
> > array size from the default of 256 to 1024. Testing this patch by
> > mounting more and more NFS-filesystems I found that still no more than
> > 800 NFS mounts are possible. One more mount results in the kernel saying:
...
>
> Initially I did a sysctl, but Trond M. asked for a mount
> argument, in case you have to mount from several servers,
> some of which require reserved ports, some do not.
> Our NetApps work ok with non-reserved ports on clients.
>
> I am surprised anyone is interested. If you need more than 800
> mounts I think your system planning may be screwed.
>

First of all, thank you for your answer. Well I don't think that such a setup
is really screwed. Just as a reasoning I can give some examples why I think
that it is basically very useful for many sites running a large count of
users:

At our site we store all user (~4000 users) data
centrally on several NFS servers (running solaris up to now). In order to
ease administration we chose the approach to mount each user directory
direcly (via automount configured by NIS) on a NFS client where the user
wants to access his data. The most
important effect of this is, that each users directory is always reachable
under the path /home/<user>. This proofed to be very useful (from the
administrators point of view) when moving users from one server to another,
installing additionl NFS servers etc, because the only path the user knows
about and sees when e.g. issuing pwd is /home/<user>. The second advantage
is, that there is no need to touch the client system: No need for NFS mounts
in /etc/fstab to mount the servers export directory and so there are no
unneeded dependencies frpm any client to the NFS servers.

Now think of a setup where no user directory mounts are configured but the
whole directory of a NFS server with many users is exported. Of course this
makes things easyer for the NFS-system since only one mount is needed but on
the client you need to create link trees or something similar so the user
still can access his home under /home/<user> and not something like
/home/server1/<user>. Moreover even if you create link trees when you issue
commands like pwd you see the real path (eg /server1/<user>) instead of the
logical (/home/<user>). Such paths are soon written into scripts etc, so that
if the user is moved sometime later things will be broken.
You simply loose a layer of abstraction if you do not mount the users dir
directly. The only other solution I know of would be amd. Amd automatically
places a link. But since we come from the sun world, we simply uses suns
automounter and there were no problems up to now.

As another not umcommon setup you might think of NAS storage in a larger
company. Again you have the choice to mount on a per user basis or you mount
the parent directory containing many users. In the latter case again you
loose the /home abstraction to some degree.

So I think it would be really good to have at least the option to have more
than 256 NFS mounts even if one has to use unsecure ports for this purpose.

Thanks
Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-18 12:27:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

>>>>> " " == Pete Zaitcev <[email protected]> writes:

> Anyone cares to review? Trond? Viro?

Have you discussed the choice of device number with hpa
([email protected])? Although the numbers you chose were marked as
'obsolete', you really ought to ensure that they get reserved.

Otherwise, things look pretty straightforward...

Cheers,
Trond

2002-01-18 19:40:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Followup to: <[email protected]>
By author: Rainer Krienke <[email protected]>
In newsgroup: linux.dev.kernel
>
> Now think of a setup where no user directory mounts are configured but the
> whole directory of a NFS server with many users is exported. Of course this
> makes things easyer for the NFS-system since only one mount is needed but on
> the client you need to create link trees or something similar so the user
> still can access his home under /home/<user> and not something like
> /home/server1/<user>. Moreover even if you create link trees when you issue
> commands like pwd you see the real path (eg /server1/<user>) instead of the
> logical (/home/<user>). Such paths are soon written into scripts etc, so that
> if the user is moved sometime later things will be broken.
> You simply loose a layer of abstraction if you do not mount the users dir
> directly. The only other solution I know of would be amd. Amd automatically
> places a link. But since we come from the sun world, we simply uses suns
> automounter and there were no problems up to now.
>

This can easily be resolved with vfsbinds. Even Sun has a specific
syntax in their automounter to deal with this
(server:common_root:tail). If I ever do another autofs v3 release I
will probably try to incorporate that via vfsbinds.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-18 20:33:57

by Horst von Brand

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Rainer Krienke <[email protected]> said:

[...]

> At our site we store all user (~4000 users) data centrally on several NFS
> servers (running solaris up to now). In order to ease administration we
> chose the approach to mount each user directory direcly (via automount
> configured by NIS) on a NFS client where the user wants to access his
> data. The most important effect of this is, that each users directory is
> always reachable under the path /home/<user>. This proofed to be very
> useful (from the administrators point of view) when moving users from one
> server to another, installing additionl NFS servers etc, because the only
> path the user knows about and sees when e.g. issuing pwd is
> /home/<user>. The second advantage is, that there is no need to touch the
> client system: No need for NFS mounts in /etc/fstab to mount the servers
> export directory and so there are no unneeded dependencies frpm any
> client to the NFS servers.

This is exactly what we do with our (much more modest) 600 accounts at the
Departamento de Informatica of the UTFSM (Valparaiso, Chile).

> Now think of a setup where no user directory mounts are configured but
> the whole directory of a NFS server with many users is exported. Of
> course this makes things easyer for the NFS-system since only one mount
> is needed but on the client you need to create link trees or something
> similar so the user still can access his home under /home/<user> and not
> something like /home/server1/<user>. Moreover even if you create link
> trees when you issue commands like pwd you see the real path (eg
> /server1/<user>) instead of the logical (/home/<user>). Such paths are
> soon written into scripts etc, so that if the user is moved sometime
> later things will be broken.

> You simply loose a layer of abstraction if you do not mount the users dir
> directly. The only other solution I know of would be amd. Amd automatically
> places a link. But since we come from the sun world, we simply uses suns
> automounter and there were no problems up to now.

The SunOS automounter (which we used before Solaris) did this too. It was a
pain in the neck, as the "real" path to the home does show through, and you
get the same problems with scripts &c containing physical, not logical,
paths to files. Fixing the _users_ is much harder than fixing up the
configurations...
--
Horst von Brand http://counter.li.org # 22616

2002-01-19 22:09:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Followup to: <[email protected]>
By author: Horst von Brand <[email protected]>
In newsgroup: linux.dev.kernel
>
> The SunOS automounter (which we used before Solaris) did this too. It was a
> pain in the neck, as the "real" path to the home does show through, and you
> get the same problems with scripts &c containing physical, not logical,
> paths to files. Fixing the _users_ is much harder than fixing up the
> configurations...
>

However, with vfsbinds this is not an issue.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-20 10:34:58

by Rainer krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Am Freitag, 18. Januar 2002 20:40 schrieb H. Peter Anvin:
> Followup to: <[email protected]>
> By author: Rainer Krienke <[email protected]>
> In newsgroup: linux.dev.kernel
>
> > Now think of a setup where no user directory mounts are configured but
> > the whole directory of a NFS server with many users is exported. Of
> > course this makes things easyer for the NFS-system since only one mount
> > is needed but on the client you need to create link trees or something
> > similar so the user still can access his home under /home/<user> and not
> > something like /home/server1/<user>. Moreover even if you create link
> > trees when you issue commands like pwd you see the real path (eg
> > /server1/<user>) instead of the logical (/home/<user>). Such paths are
> > soon written into scripts etc, so that if the user is moved sometime
> > later things will be broken.
> > You simply loose a layer of abstraction if you do not mount the users dir
> > directly. The only other solution I know of would be amd. Amd
> > automatically places a link. But since we come from the sun world, we
> > simply uses suns automounter and there were no problems up to now.
>
> This can easily be resolved with vfsbinds. Even Sun has a specific
> syntax in their automounter to deal with this
> (server:common_root:tail). If I ever do another autofs v3 release I
> will probably try to incorporate that via vfsbinds.
>
> -hpa

You mentioned, you'd probably include this in another V3 release. Does this
mean, that V4 already can do this? What syntax is needed? What about the
logical/physical path problem? Is this already solved by using vfsbinds in V4?

Thanks Rainer
--
Rainer Krienke, [email protected]

2002-01-20 10:38:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Rainer krienke wrote:

>
> You mentioned, you'd probably include this in another V3 release. Does this
> mean, that V4 already can do this? What syntax is needed? What about the
> logical/physical path problem? Is this already solved by using vfsbinds in V4?
>
> Thanks Rainer
>

I don't know about v4. You have to ask Jeremy about that.

vfsbinds takes care of any "logical/physical" path problem, so it's not
an issue there.

-hpa

2002-01-21 12:41:30

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Thursday, 17. January 2002 19:55, Pete Zaitcev wrote:
> >[from linux-kernel]
> > I have to increase the number of anonymous filesystems the kernel can
> > handle and found the array unnamed_dev_in_use fs/super.c and changed the
> > array size from the default of 256 to 1024. Testing this patch by
> > mounting more and more NFS-filesystems I found that still no more than
> > 800 NFS mounts are possible. One more mount results in the kernel saying:
> >
> > Jan 17 14:03:11 gl kernel: RPC: Can't bind to reserved port (98).
> > Jan 17 14:03:11 gl kernel: NFS: cannot create RPC transport.
> > Jan 17 14:03:11 gl kernel: nfs warning: mount version older than kernel
>
> I did that. You also need a small fix to mount(8) that adds
> a mount argument "-o nores". I've got an RPM at my website.
>
> Initially I did a sysctl, but Trond M. asked for a mount
> argument, in case you have to mount from several servers,
> some of which require reserved ports, some do not.
> Our NetApps work ok with non-reserved ports on clients.
>

Anyone has the patch Pete mentioned above for mount. He mailed the kernel
patches but I am unable to find the mount fix he talked about.

Thanks Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-21 22:55:06

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Fri, Jan 18, 2002 at 01:12:16PM +0100, Rainer Krienke wrote:
> On Thursday, 17. January 2002 19:55, Pete Zaitcev wrote:
> >
> > I am surprised anyone is interested. If you need more than 800
> > mounts I think your system planning may be screwed.

> ease administration we chose the approach to mount each user directory
> direcly (via automount configured by NIS) on a NFS client where the user
> wants to access his data. The most
> important effect of this is, that each users directory is always reachable
> under the path /home/<user>.

This is not an unusual setup, but normally servers and
workstations do not need to mount enormous number of volumes.
So, I did the same because it's very useful, but I prohibited
~/.forward. Instead, requests for vacation messages were
submitted centrally and processed with the help of /etc/aliases
and automation scripts. This way mailing loops were under control,
and, as a side effect, sending something to a mailing list
did not require the mailserver to mount a gazillion of home
directories in order to fetch ~/.forward for each recipient.

> So I think it would be really good to have at least the option to have more
> than 256 NFS mounts even if one has to use unsecure ports for this purpose.

Sure... The thing is, the 1279 mounts that I did is not
even close to being adequate to combat .forward or something
like separate mounted mail spools for large ISPs. You
really need 10,000 of mounts, at which point the whole idea of
anonymous device numbers falls apart.

-- Pete

2002-01-22 10:25:47

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Thursday, 17. January 2002 19:55, Pete Zaitcev wrote:
> >[from linux-kernel]
> > I have to increase the number of anonymous filesystems the kernel can
> > handle and found the array unnamed_dev_in_use fs/super.c and changed the
> > array size from the default of 256 to 1024. Testing this patch by
> > mounting more and more NFS-filesystems I found that still no more than
> > 800 NFS mounts are possible. One more mount results in the kernel saying:
> >
> > Jan 17 14:03:11 gl kernel: RPC: Can't bind to reserved port (98).
> > Jan 17 14:03:11 gl kernel: NFS: cannot create RPC transport.
> > Jan 17 14:03:11 gl kernel: nfs warning: mount version older than kernel
>
> I did that. You also need a small fix to mount(8) that adds
> a mount argument "-o nores". I've got an RPM at my website.
>
> Initially I did a sysctl, but Trond M. asked for a mount
> argument, in case you have to mount from several servers,
> some of which require reserved ports, some do not.
> Our NetApps work ok with non-reserved ports on clients.

Thanks. In between I tested the patch and it works but seems to have a
painful side effect. Using the patch I can now mount more than 256 NFS
filesystems on the patched host.

The trouble is, that no other machine can NFS mount anything from this
patched machine. This is because nfsd cannot be started any longer (kernel
nfsd) on the patches server. When I try to start rpc.nfsd on this patched
host you see the following in /var/log/messages:

portmap: connect from 127.0.0.1 to set(nfs): request from unprivileged port
rpc.nfsd: nfssvc: error Permission denied

A strace of nfsd shows the problem:
...
nfsservctl(0, 0xbfffeed8, 0) = -1 EACCES (Permission denied)
...

Anyone know how this can be fixed?

Thanks Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-22 10:41:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

>>>>> " " == Rainer Krienke <[email protected]> writes:


> portmap: connect from 127.0.0.1 to set(nfs): request from
> unprivileged port rpc.nfsd: nfssvc: error Permission denied

> A strace of nfsd shows the problem: ... nfsservctl(0,
> 0xbfffeed8, 0) = -1 EACCES (Permission denied) ...

'man 5 exports'

secure This option requires that requests originate on an
internet port less than IPPORT_RESERVED (1024).
This option is on by default. To turn it off, spec?
ify insecure.

Cheers,
Trond

2002-01-22 13:08:38

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Tuesday, 22. January 2002 11:40, Trond Myklebust wrote:
> >>>>> " " == Rainer Krienke <[email protected]> writes:
> > portmap: connect from 127.0.0.1 to set(nfs): request from
> > unprivileged port rpc.nfsd: nfssvc: error Permission denied
> >
> > A strace of nfsd shows the problem: ... nfsservctl(0,
> > 0xbfffeed8, 0) = -1 EACCES (Permission denied) ...
>
> 'man 5 exports'
>
> secure This option requires that requests originate on an
> internet port less than IPPORT_RESERVED (1024).
> This option is on by default. To turn it off, spec?
> ify insecure.

This is not the problem. The exported filesystem is marked insecure. The
problem is that on the machine running Petes patch you cannot even start the
kernel nfsd, no matter what /etc/export contains. I f you try to start
/usr/sbin/rpc.nfsd (the kernel nfsd version) it tries to register with
portmap and to my interpretation the kernel denies this request with the
message from above. Since no nfsd can be started I cannot mount any
filesystem from this host.

So I still think that the reason for this is a check in the kernel, that
prevents connections from ports > 1024. But where exactly is this done?

Thanks Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-22 13:29:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Tuesday 22. January 2002 14:08, you wrote:

> So I still think that the reason for this is a check in the kernel, that
> prevents connections from ports > 1024.

Nope. It's the following hunk:

diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c
linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c
--- linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c Wed Jun 21 12:43:37 2000
+++ linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c Mon Jan 7 12:59:54 2002
@@ -189,7 +189,7 @@
struct rpc_clnt *clnt;

/* printk("pmap: create xprt\n"); */
- if (!(xprt = xprt_create_proto(proto, srvaddr, NULL)))
+ if (!(xprt = xprt_create_proto(proto, srvaddr, NULL, 0)))
return NULL;
xprt->addr.sin_port = htons(RPC_PMAP_PORT);


The above change implies that the portmapper can always be run from an
insecure port.
It can if the purpose of the RPC call is trying to read off a port number for
an RPC service. If the idea is to register a new service, however, then the
portmapper demands that we use a secure port.

The fix would be to add an argument to the function pmap_create() in order to
allow rpc_register() to specify that the call to xprt_create_proto() should
set up the socket on a secure port.

Cheers,
Trond

2002-01-22 15:24:26

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Tuesday, 22. January 2002 14:28, Trond Myklebust wrote:
> On Tuesday 22. January 2002 14:08, you wrote:
> > So I still think that the reason for this is a check in the kernel, that
> > prevents connections from ports > 1024.
>
> Nope. It's the following hunk:
>
> diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c
> linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c
> --- linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c Wed Jun 21 12:43:37 2000
> +++ linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c Mon Jan 7 12:59:54 2002
> @@ -189,7 +189,7 @@
> struct rpc_clnt *clnt;
>
> /* printk("pmap: create xprt\n"); */
> - if (!(xprt = xprt_create_proto(proto, srvaddr, NULL)))
> + if (!(xprt = xprt_create_proto(proto, srvaddr, NULL, 0)))
> return NULL;
> xprt->addr.sin_port = htons(RPC_PMAP_PORT);
>
>
> The above change implies that the portmapper can always be run from an
> insecure port.
> It can if the purpose of the RPC call is trying to read off a port number
> for an RPC service. If the idea is to register a new service, however, then
> the portmapper demands that we use a secure port.
>
> The fix would be to add an argument to the function pmap_create() in order
> to allow rpc_register() to specify that the call to xprt_create_proto()
> should set up the socket on a secure port.

Thanks for the hint. I fixed pmap_create() according to your proposal and now
nfsd works again.

One more question about something I'd like to understand:
Petes fix limits the number of anonymous mounts to 1279. There was a shorter
patch from Andi Kleen which basically just replaced the search for a secure
port from 800 downwards (in xprt.c, xprt_bindresvport() ) by a bind operation
to any port (not just a secure one). Raising the count of elements of
unnamed_dev_in_use in fs/super.c to eg 4096 resulted in the opportunity to
mount as many NFS directories. Allthough this patch suffered from two NFS
problems (the nfsd problem just discussed, as well as a problem when NFS
mounting from another linux box) it showed a way to use a very large number
of NFS mounts.

Can somebody explain the major difference between both solutions? Why did you
Pete base your patch on 4 new major device numbers whereas Andis patch did
not need them? Are there any major drawbacks involved not doing so?

Thanks Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-22 15:41:08

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

>>>>> " " == Rainer Krienke <[email protected]> writes:

> Can somebody explain the major difference between both
> solutions? Why did you Pete base your patch on 4 new major
> device numbers whereas Andis patch did not need them? Are there
> any major drawbacks involved not doing so?

Both Andi and Pete solve the problem of the limit on the number of
available reserved ports.

In addition, Pete fixes a second problem. There is a limit to the
number of 'unnamed' devices that the kernel can support (see the
function get_unnamed_dev()). Since each NFS mount 'eats' one such
device, this sets an upper limit of 255 simultaneous of NFS mounts
whether or not we have enough reserved ports.

Cheers,
Trond

2002-01-22 17:45:37

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

> From: Rainer Krienke <[email protected]>
> Date: Tue, 22 Jan 2002 16:23:54 +0100

> Thanks for the hint. I fixed pmap_create() according to your proposal and now
> nfsd works again.

Care to share the patch?

> Raising the count of elements of
> unnamed_dev_in_use in fs/super.c to eg 4096 resulted in the opportunity to
> mount as many NFS directories.

You did not send your patch (yet again), so there is no way
to tell precisely what you have accomplished. I suspect that it may
create pages with same device number that belong to different
mounts. I do not pretend to understand how VFS and page cache
use device numbers. If device numbers are used for any indexing,
pages may be mixed up with resulting data corruption.
I cannot say if this scenario is likely without looking
at the VFS code. Perhaps we ought to ask Stephen, Al, or Trond
about it.

> Why did you
> Pete base your patch on 4 new major device numbers whereas Andis patch did
> not need them?

He probably never tested his patch. I asked him and we'll know
soon if it was so.

-- Pete

2002-01-22 19:45:50

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

> From: Trond Myklebust <[email protected]>
> Date: Tue, 22 Jan 2002 14:28:39 +0100

> The fix would be to add an argument to the function pmap_create() in order to
> allow rpc_register() to specify that the call to xprt_create_proto() should
> set up the socket on a secure port.

I am sorry to miss this in our unit test.

-- Pete

--- linux-2.4.9-unmaj-7.diff ---
Copyright 2001 Red Hat, Inc.
GPL v2 - XXX fill in the legal blob.

-7 is with pmap_clnt fix for nfsd registration with portmapper.

diff -ur -X dontdiff linux-2.4.9-18.3/Documentation/Configure.help linux-2.4.9-18.3-p3/Documentation/Configure.help
--- linux-2.4.9-18.3/Documentation/Configure.help Tue Dec 18 13:01:06 2001
+++ linux-2.4.9-18.3-p3/Documentation/Configure.help Tue Dec 18 13:53:25 2001
@@ -23926,4 +23926,13 @@
in the lm_sensors package, which you can download at
http://www.lm-sensors.nu

+Additional unnamed block majors
+CONFIG_MORE_UNNAMED_MAJORS
+ This option allows to use majors 12, 14, 38, and 39 in addition to
+ major 0 for unnamed block devices, thus letting you to mount 1279
+ virtual filesystems.
+
+ If unsure, answer N. Thousands of mount points are unlikely to work
+ anyways.
+
# End:
diff -ur -X dontdiff linux-2.4.9-18.3/Makefile linux-2.4.9-18.3-p3/Makefile
--- linux-2.4.9-18.3/Makefile Tue Dec 18 13:10:50 2001
+++ linux-2.4.9-18.3-p3/Makefile Thu Jan 3 17:02:41 2002
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 4
SUBLEVEL = 9
-EXTRAVERSION = -18.3
+EXTRAVERSION = -18.3-p3

KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

@@ -339,7 +339,8 @@
$(TOPDIR)/include/linux/compile.h: include/linux/compile.h

newversion:
- . scripts/mkversion > .version
+ . scripts/mkversion > .tmpversion
+ @mv -f .tmpversion .version

include/linux/compile.h: $(CONFIGURATION) include/linux/version.h newversion
@echo -n \#define UTS_VERSION \"\#`cat .version` > .ver
diff -ur -X dontdiff linux-2.4.9-18.3/fs/Config.in linux-2.4.9-18.3-p3/fs/Config.in
--- linux-2.4.9-18.3/fs/Config.in Tue Dec 18 13:00:48 2001
+++ linux-2.4.9-18.3-p3/fs/Config.in Tue Dec 18 13:58:31 2001
@@ -137,6 +137,8 @@
define_bool CONFIG_SMB_FS n
fi

+bool 'More majors for unnamed block devices' CONFIG_MORE_UNNAMED_MAJORS
+
mainmenu_option next_comment
comment 'Partition Types'
source fs/partitions/Config.in
diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/clntproc.c linux-2.4.9-18.3-p3/fs/lockd/clntproc.c
--- linux-2.4.9-18.3/fs/lockd/clntproc.c Tue Dec 18 13:01:03 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/clntproc.c Mon Jan 7 13:28:29 2002
@@ -107,6 +107,7 @@
sigset_t oldset;
unsigned long flags;
int status, proto, vers;
+ int resport;

vers = (NFS_PROTO(inode)->version == 3) ? 4 : 1;
if (NFS_PROTO(inode)->version > 3) {
@@ -116,6 +117,7 @@

/* Retrieve transport protocol from NFS client */
proto = NFS_CLIENT(inode)->cl_xprt->prot;
+ resport = NFS_CLIENT(inode)->cl_xprt->resport;

if (!(host = nlmclnt_lookup_host(NFS_ADDR(inode), proto, vers)))
return -ENOLCK;
@@ -127,7 +129,7 @@

/* Bind an rpc client to this host handle (does not
* perform a portmapper lookup) */
- if (!(clnt = nlm_bind_host(host))) {
+ if (!(clnt = nlm_bind_host(host, resport))) {
status = -ENOLCK;
goto done;
}
@@ -162,6 +164,7 @@
locks_init_lock(&call->a_res.lock.fl);
}
call->a_host = host;
+ call->a_resport = resport;

/* Set up the argument struct */
nlmclnt_setlockargs(call, fl);
@@ -260,7 +263,7 @@
}

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* Perform the RPC call. If an error occurs, try again */
@@ -328,7 +331,7 @@
nlm_procname(proc), host->h_name);

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* bootstrap and kick off the async RPC call */
@@ -356,7 +359,7 @@
nlm_procname(proc), host->h_name);

/* If we have no RPC client yet, create one. */
- if ((clnt = nlm_bind_host(host)) == NULL)
+ if ((clnt = nlm_bind_host(host, req->a_resport)) == NULL)
return -ENOLCK;

/* bootstrap and kick off the async RPC call */
diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/host.c linux-2.4.9-18.3-p3/fs/lockd/host.c
--- linux-2.4.9-18.3/fs/lockd/host.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/host.c Mon Jan 7 12:34:26 2002
@@ -163,7 +163,7 @@
* Create the NLM RPC client for an NLM peer
*/
struct rpc_clnt *
-nlm_bind_host(struct nlm_host *host)
+nlm_bind_host(struct nlm_host *host, int resport)
{
struct rpc_clnt *clnt;
struct rpc_xprt *xprt;
@@ -187,15 +187,19 @@
host->h_nextrebind - jiffies);
}
} else {
- uid_t saved_fsuid = current->fsuid;
- kernel_cap_t saved_cap = current->cap_effective;
+ if (resport) {
+ uid_t saved_fsuid = current->fsuid;
+ kernel_cap_t saved_cap = current->cap_effective;

- /* Create RPC socket as root user so we get a priv port */
- current->fsuid = 0;
- cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE);
- xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL);
- current->fsuid = saved_fsuid;
- current->cap_effective = saved_cap;
+ /* Create RPC socket as root user so we get a priv port */
+ current->fsuid = 0;
+ cap_raise (current->cap_effective, CAP_NET_BIND_SERVICE);
+ xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL, 1);
+ current->fsuid = saved_fsuid;
+ current->cap_effective = saved_cap;
+ } else {
+ xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL, 0);
+ }
if (xprt == NULL)
goto forgetit;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/lockd/mon.c linux-2.4.9-18.3-p3/fs/lockd/mon.c
--- linux-2.4.9-18.3/fs/lockd/mon.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/lockd/mon.c Sun Jan 6 01:08:03 2002
@@ -110,7 +110,7 @@
sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
sin.sin_port = 0;

- xprt = xprt_create_proto(IPPROTO_UDP, &sin, NULL);
+ xprt = xprt_create_proto(IPPROTO_UDP, &sin, NULL, 1);
if (!xprt)
goto out;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/nfs/inode.c linux-2.4.9-18.3-p3/fs/nfs/inode.c
--- linux-2.4.9-18.3/fs/nfs/inode.c Tue Dec 18 13:01:21 2001
+++ linux-2.4.9-18.3-p3/fs/nfs/inode.c Mon Jan 7 13:31:06 2002
@@ -351,7 +351,7 @@

/* Now create transport and client */
xprt = xprt_create_proto(tcp? IPPROTO_TCP : IPPROTO_UDP,
- &srvaddr, &timeparms);
+ &srvaddr, &timeparms, (data->flags & NFS_MOUNT_NORES) == 0);
if (xprt == NULL)
goto out_no_xprt;

diff -ur -X dontdiff linux-2.4.9-18.3/fs/nfs/mount_clnt.c linux-2.4.9-18.3-p3/fs/nfs/mount_clnt.c
--- linux-2.4.9-18.3/fs/nfs/mount_clnt.c Tue Dec 18 13:00:49 2001
+++ linux-2.4.9-18.3-p3/fs/nfs/mount_clnt.c Sun Jan 6 01:07:12 2002
@@ -82,7 +82,7 @@
struct rpc_xprt *xprt;
struct rpc_clnt *clnt;

- if (!(xprt = xprt_create_proto(IPPROTO_UDP, srvaddr, NULL)))
+ if (!(xprt = xprt_create_proto(IPPROTO_UDP, srvaddr, NULL, 1)))
return NULL;

clnt = rpc_create_client(xprt, hostname,
diff -ur -X dontdiff linux-2.4.9-18.3/fs/super.c linux-2.4.9-18.3-p3/fs/super.c
--- linux-2.4.9-18.3/fs/super.c Tue Dec 18 13:00:59 2001
+++ linux-2.4.9-18.3-p3/fs/super.c Wed Dec 19 10:56:35 2001
@@ -516,27 +516,57 @@
* filesystems which don't use real block-devices. -- jrs
*/

-static unsigned long unnamed_dev_in_use[256/(8*sizeof(unsigned long))];
+static int unnamed_dev_majors[] = {
+ UNNAMED_MAJOR,
+#ifdef CONFIG_MORE_UNNAMED_MAJORS /* Always on, keeps Configure.help */
+ 12, 14, 38, 39,
+#endif
+};
+#define UNNAMED_NMAJ (sizeof(unnamed_dev_majors)/sizeof(int))
+
+static int unnamed_dev_nmaj = 1;
+static int unnamed_maj_in_use[UNNAMED_NMAJ] = { UNNAMED_MAJOR, };
+static unsigned long unnamed_dev_in_use[(UNNAMED_NMAJ*256)/(8*sizeof(long))];
+
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+void majorhog_init(void);
+#endif

kdev_t get_unnamed_dev(void)
{
int i;

- for (i = 1; i < 256; i++) {
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+ if (!test_and_set_bit(0, unnamed_maj_in_use)) { /* first call */
+ /* relatively SMP safe: only adds majors and does it once */
+ majorhog_init();
+ }
+#endif
+
+ /* find_first_zero_bit isn't atomic */
+ for (i = 1; i < unnamed_dev_nmaj*256; i++) {
if (!test_and_set_bit(i,unnamed_dev_in_use))
- return MKDEV(UNNAMED_MAJOR, i);
+ return MKDEV(unnamed_maj_in_use[i/256], i & 255);
}
+
return 0;
}

void put_unnamed_dev(kdev_t dev)
{
- if (!dev || MAJOR(dev) != UNNAMED_MAJOR)
- return;
- if (test_and_clear_bit(MINOR(dev), unnamed_dev_in_use))
+ int i;
+
+ if (!dev)
return;
- printk("VFS: put_unnamed_dev: freeing unused device %s\n",
- kdevname(dev));
+ for (i = 0; i < unnamed_dev_nmaj; i++) {
+ if (unnamed_maj_in_use[i] == MAJOR(dev)) {
+ if (test_and_clear_bit(i * 256 + MINOR(dev), unnamed_dev_in_use))
+ return;
+ printk("VFS: put_unnamed_dev: freeing unused device %s\n",
+ kdevname(dev));
+ return;
+ }
+ }
}

static int grab_super(struct super_block *sb)
@@ -1090,3 +1120,41 @@
return;
}
}
+
+#ifdef CONFIG_MORE_UNNAMED_MAJORS
+
+/* #include <linux/major.h> */
+/* #include <linux/errno.h> */
+/* #include <linux/fs.h> */
+#include <linux/devfs_fs_kernel.h>
+
+static int majorhog_open(struct inode *inode, struct file *file)
+{
+ return -EDOM; /* Something rididculous for identification */
+}
+
+static struct block_device_operations majorhog_fops = {
+ open: majorhog_open,
+};
+
+void majorhog_init(void)
+{
+ int i, j;
+
+ if (unnamed_dev_nmaj != 1)
+ return;
+
+ j = 1;
+ for (i = 1; i < UNNAMED_NMAJ; i++) {
+ if (devfs_register_blkdev(unnamed_dev_majors[i],
+ "unnamed", &majorhog_fops) == 0) {
+ unnamed_maj_in_use[j++] = unnamed_dev_majors[i];
+ } else {
+ printk(KERN_WARNING "Unable to hog major number %d\n",
+ unnamed_dev_majors[i]);
+ }
+ }
+ unnamed_dev_nmaj = j;
+}
+
+#endif /* CONFIG_MORE_UNNAMED_MAJORS */
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/lockd/lockd.h linux-2.4.9-18.3-p3/include/linux/lockd/lockd.h
--- linux-2.4.9-18.3/include/linux/lockd/lockd.h Wed Aug 15 14:24:26 2001
+++ linux-2.4.9-18.3-p3/include/linux/lockd/lockd.h Mon Jan 7 12:42:58 2002
@@ -63,6 +63,7 @@
#define NLMCLNT_OHSIZE (sizeof(system_utsname.nodename)+10)
struct nlm_rqst {
unsigned int a_flags; /* initial RPC task flags */
+ int a_resport;
struct nlm_host * a_host; /* host handle */
struct nlm_args a_args; /* arguments */
struct nlm_res a_res; /* result */
@@ -144,7 +145,7 @@
struct nlm_host * nlmsvc_lookup_host(struct svc_rqst *);
struct nlm_host * nlm_lookup_host(struct svc_client *,
struct sockaddr_in *, int, int);
-struct rpc_clnt * nlm_bind_host(struct nlm_host *);
+struct rpc_clnt * nlm_bind_host(struct nlm_host *, int);
void nlm_rebind_host(struct nlm_host *);
struct nlm_host * nlm_get_host(struct nlm_host *);
void nlm_release_host(struct nlm_host *);
Only in linux-2.4.9-18.3-p3/include/linux: modules
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/nfs_mount.h linux-2.4.9-18.3-p3/include/linux/nfs_mount.h
--- linux-2.4.9-18.3/include/linux/nfs_mount.h Tue Dec 18 13:00:52 2001
+++ linux-2.4.9-18.3-p3/include/linux/nfs_mount.h Sun Jan 6 01:10:53 2002
@@ -54,6 +54,7 @@
#define NFS_MOUNT_KERBEROS 0x0100 /* 3 */
#define NFS_MOUNT_NONLM 0x0200 /* 3 */
#define NFS_MOUNT_BROKEN_SUID 0x0400 /* 4 */
+#define NFS_MOUNT_NORES 0x0800 /* ? XXX */
#define NFS_MOUNT_FLAGMASK 0xFFFF

#endif
diff -ur -X dontdiff linux-2.4.9-18.3/include/linux/sunrpc/xprt.h linux-2.4.9-18.3-p3/include/linux/sunrpc/xprt.h
--- linux-2.4.9-18.3/include/linux/sunrpc/xprt.h Wed Aug 15 14:24:26 2001
+++ linux-2.4.9-18.3-p3/include/linux/sunrpc/xprt.h Mon Jan 7 01:48:40 2002
@@ -143,7 +143,8 @@
nocong : 1, /* no congestion control */
stream : 1, /* TCP */
tcp_more : 1, /* more record fragments */
- connecting : 1; /* being reconnected */
+ connecting : 1, /* being reconnected */
+ resport : 1; /* use reserved port */

/*
* State of TCP reply receive stuff
@@ -171,7 +172,8 @@
#ifdef __KERNEL__

struct rpc_xprt * xprt_create_proto(int proto, struct sockaddr_in *addr,
- struct rpc_timeout *toparms);
+ struct rpc_timeout *toparms,
+ int use_res_port);
int xprt_destroy(struct rpc_xprt *);
void xprt_shutdown(struct rpc_xprt *);
void xprt_default_timeout(struct rpc_timeout *, int);
diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c
--- linux-2.4.9-18.3/net/sunrpc/pmap_clnt.c Wed Jun 21 12:43:37 2000
+++ linux-2.4.9-18.3-p3/net/sunrpc/pmap_clnt.c Tue Jan 22 10:36:56 2002
@@ -28,7 +28,7 @@
#define PMAP_UNSET 2
#define PMAP_GETPORT 3

-static struct rpc_clnt * pmap_create(char *, struct sockaddr_in *, int);
+static struct rpc_clnt *pmap_create(char *, struct sockaddr_in *, int, int);
static void pmap_getport_done(struct rpc_task *);
extern struct rpc_program pmap_program;
spinlock_t pmap_lock = SPIN_LOCK_UNLOCKED;
@@ -60,7 +60,7 @@
spin_unlock(&pmap_lock);

task->tk_status = -EACCES; /* why set this? returns -EIO below */
- if (!(pmap_clnt = pmap_create(clnt->cl_server, sap, map->pm_prot)))
+ if (!(pmap_clnt = pmap_create(clnt->cl_server, sap, map->pm_prot, 0)))
goto bailout;
task->tk_status = 0;

@@ -101,7 +101,7 @@
NIPQUAD(sin->sin_addr.s_addr), prog, vers, prot);

strcpy(hostname, in_ntoa(sin->sin_addr.s_addr));
- if (!(pmap_clnt = pmap_create(hostname, sin, prot)))
+ if (!(pmap_clnt = pmap_create(hostname, sin, prot, 0)))
return -EACCES;

/* Setup the call info struct */
@@ -158,7 +158,8 @@

sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
- if (!(pmap_clnt = pmap_create("localhost", &sin, IPPROTO_UDP))) {
+ /* Use a privileged port or else portmapper rejects out request. */
+ if (!(pmap_clnt = pmap_create("localhost", &sin, IPPROTO_UDP, 1))) {
dprintk("RPC: couldn't create pmap client\n");
return -EACCES;
}
@@ -183,13 +184,13 @@
}

static struct rpc_clnt *
-pmap_create(char *hostname, struct sockaddr_in *srvaddr, int proto)
+pmap_create(char *hostname, struct sockaddr_in *srvaddr, int proto, int resport)
{
struct rpc_xprt *xprt;
struct rpc_clnt *clnt;

/* printk("pmap: create xprt\n"); */
- if (!(xprt = xprt_create_proto(proto, srvaddr, NULL)))
+ if (!(xprt = xprt_create_proto(proto, srvaddr, NULL, resport)))
return NULL;
xprt->addr.sin_port = htons(RPC_PMAP_PORT);

diff -ur -X dontdiff linux-2.4.9-18.3/net/sunrpc/xprt.c linux-2.4.9-18.3-p3/net/sunrpc/xprt.c
--- linux-2.4.9-18.3/net/sunrpc/xprt.c Tue Dec 18 13:00:54 2001
+++ linux-2.4.9-18.3-p3/net/sunrpc/xprt.c Mon Jan 7 13:26:30 2002
@@ -97,7 +97,7 @@
static void xprt_reserve_status(struct rpc_task *task);
static void xprt_disconnect(struct rpc_xprt *);
static void xprt_reconn_status(struct rpc_task *task);
-static struct socket *xprt_create_socket(int, struct rpc_timeout *);
+static struct socket *xprt_create_socket(int, struct rpc_timeout *, int);
static int xprt_bind_socket(struct rpc_xprt *, struct socket *);
static void xprt_remove_pending(struct rpc_xprt *);

@@ -434,7 +434,8 @@
status = -ENOTCONN;
if (!inet) {
/* Create an unconnected socket */
- if (!(sock = xprt_create_socket(xprt->prot, &xprt->timeout)))
+ if (!(sock = xprt_create_socket(xprt->prot, &xprt->timeout,
+ xprt->resport)))
goto defer;
xprt_bind_socket(xprt, sock);
inet = sock->sk;
@@ -1459,7 +1460,7 @@
*/
static struct rpc_xprt *
xprt_setup(struct socket *sock, int proto,
- struct sockaddr_in *ap, struct rpc_timeout *to)
+ struct sockaddr_in *ap, struct rpc_timeout *to, int use_resport)
{
struct rpc_xprt *xprt;
struct rpc_rqst *req;
@@ -1504,6 +1505,8 @@

INIT_LIST_HEAD(&xprt->rx_pending);

+ xprt->resport = use_resport;
+
dprintk("RPC: created transport %p\n", xprt);

xprt_bind_socket(xprt, sock);
@@ -1574,7 +1577,7 @@
* Create a client socket given the protocol and peer address.
*/
static struct socket *
-xprt_create_socket(int proto, struct rpc_timeout *to)
+xprt_create_socket(int proto, struct rpc_timeout *to, int resport)
{
struct socket *sock;
int type, err;
@@ -1590,7 +1593,8 @@
}

/* If the caller has the capability, bind to a reserved port */
- if (capable(CAP_NET_BIND_SERVICE) && xprt_bindresvport(sock) < 0)
+ if (resport &&
+ capable(CAP_NET_BIND_SERVICE) && xprt_bindresvport(sock) < 0)
goto failed;

return sock;
@@ -1604,17 +1608,18 @@
* Create an RPC client transport given the protocol and peer address.
*/
struct rpc_xprt *
-xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to,
+ int use_resport)
{
struct socket *sock;
struct rpc_xprt *xprt;

dprintk("RPC: xprt_create_proto called\n");

- if (!(sock = xprt_create_socket(proto, to)))
+ if (!(sock = xprt_create_socket(proto, to, use_resport)))
return NULL;

- if (!(xprt = xprt_setup(sock, proto, sap, to)))
+ if (!(xprt = xprt_setup(sock, proto, sap, to, use_resport)))
sock_release(sock);

return xprt;

2002-01-24 08:59:21

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

On Tuesday, 22. January 2002 18:45, Pete Zaitcev wrote:
> > From: Rainer Krienke <[email protected]>
> > Date: Tue, 22 Jan 2002 16:23:54 +0100
> >
> > Thanks for the hint. I fixed pmap_create() according to your proposal and
> > now nfsd works again.
>
> Care to share the patch?

Sorry here it is. Its excatly what I described in my initial posting and very
simple and only raises the limit of unnamed_dev_in_use:

diff -Naur linux-2.4.17.orig/fs/super.c linux-2.4.17/fs/super.c
--- linux-2.4.17.orig/fs/super.c Fri Dec 21 18:42:03 2001
+++ linux-2.4.17/fs/super.c Thu Jan 24 08:23:05 2002
@@ -489,13 +489,13 @@
* filesystems which don't use real block-devices. -- jrs
*/

-static unsigned long unnamed_dev_in_use[256/(8*sizeof(unsigned long))];
+static unsigned long unnamed_dev_in_use[4096/(8*sizeof(unsigned long))];

kdev_t get_unnamed_dev(void)
{
int i;

- for (i = 1; i < 256; i++) {
+ for (i = 1; i < 4096; i++) {
if (!test_and_set_bit(i,unnamed_dev_in_use))
return MKDEV(UNNAMED_MAJOR, i);
}


If you apply this patch as well as andis patch the system then is capable of
mounting up to 4096 NFS dirs. I tested this and it worked without problems.
The only thing I noticed is that the automounter needs quite a long time
(several seconds) do exire such a mass of mounts. But this is only a
performance issue.

...
> You did not send your patch (yet again), so there is no way
> to tell precisely what you have accomplished. I suspect that it may
> create pages with same device number that belong to different
> mounts. I do not pretend to understand how VFS and page cache
> use device numbers. If device numbers are used for any indexing,
> pages may be mixed up with resulting data corruption.
> I cannot say if this scenario is likely without looking
> at the VFS code. Perhaps we ought to ask Stephen, Al, or Trond
> about it.
>

Andis patch together with my small modification has still some problems you
Pete already solved. On the other hand it has the advantage, that you can
mount a very high number only limited by the bitmap unnamed_dev_in_use of NFS
directories. The problems are:

1. You cannot start NFSD on a host patched with the above patches. So you
cannot mount an exported directory on another linux box. Since you and
Trond already described why this happend in Petes first patch it should be
possible to fix this.
2. You cannot NFS mount any filesystem exported by this host, since andis
patch does not include a mount option to select either secure/non secure
port. Since a "normal" kernel linux NFS Client wants always a secure port
mounting is impossible.
3. You cannot NFS mount from another linux box, again this happend since
there is no mount option that would allow to say that only secure ports
should be used.

So I think if it would be possible to include the mount option "nores" like
Pete did but now in andis patch as well as a patch so nfsd will run again,
andis solution has the big adavantage of beeing able to mount more than 1279
NFS directories.

This was the reason why I was interested to know if there is a major
drawback, if you simply do not use more major devices like Pete's patch does.
If there are none, than I would propose to drop the use of new major devices.

Thanks Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-24 17:17:14

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

> From: Rainer Krienke <[email protected]>
> Date: Thu, 24 Jan 2002 09:58:48 +0100

> diff -Naur linux-2.4.17.orig/fs/super.c linux-2.4.17/fs/super.c
> --- linux-2.4.17.orig/fs/super.c Fri Dec 21 18:42:03 2001
> +++ linux-2.4.17/fs/super.c Thu Jan 24 08:23:05 2002
> @@ -489,13 +489,13 @@
> * filesystems which don't use real block-devices. -- jrs
> */
>
> -static unsigned long unnamed_dev_in_use[256/(8*sizeof(unsigned long))];
> +static unsigned long unnamed_dev_in_use[4096/(8*sizeof(unsigned long))];
>
> kdev_t get_unnamed_dev(void)
> {
> int i;
>
> - for (i = 1; i < 256; i++) {
> + for (i = 1; i < 4096; i++) {
> if (!test_and_set_bit(i,unnamed_dev_in_use))
> return MKDEV(UNNAMED_MAJOR, i);
> }
>[...]
> mount a very high number only limited by the bitmap unnamed_dev_in_use of NFS
> directories. The problems are:

Rainer, you missed the point. Nobody cares about small things
such as "cannot start nfsd" while your 4096 mounts patch
simply CORRUPTS YOUR DATA TO HELL.

If you need more than 1200 mounts, you have to add more majors
to my patch. There is a number of them between 115 and 198.
I suspect scalability problems may become evident
with this approach, but it will work.

Trond asked if I requested numbers from HPA, and the answer is no,
because I wanted the patch in circulation for a while before it
is worth bothering. Also, I heard that LANA is closed anyways.

-- Pete

2002-01-24 17:30:35

by Richard Gooch

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

Pete Zaitcev writes:
> > From: Rainer Krienke <[email protected]>
> > Date: Thu, 24 Jan 2002 09:58:48 +0100
>
> > diff -Naur linux-2.4.17.orig/fs/super.c linux-2.4.17/fs/super.c
> > --- linux-2.4.17.orig/fs/super.c Fri Dec 21 18:42:03 2001
> > +++ linux-2.4.17/fs/super.c Thu Jan 24 08:23:05 2002
> > @@ -489,13 +489,13 @@
> > * filesystems which don't use real block-devices. -- jrs
> > */
> >
> > -static unsigned long unnamed_dev_in_use[256/(8*sizeof(unsigned long))];
> > +static unsigned long unnamed_dev_in_use[4096/(8*sizeof(unsigned long))];
> >
> > kdev_t get_unnamed_dev(void)
> > {
> > int i;
> >
> > - for (i = 1; i < 256; i++) {
> > + for (i = 1; i < 4096; i++) {
> > if (!test_and_set_bit(i,unnamed_dev_in_use))
> > return MKDEV(UNNAMED_MAJOR, i);
> > }
> >[...]
> > mount a very high number only limited by the bitmap unnamed_dev_in_use of NFS
> > directories. The problems are:
>
> Rainer, you missed the point. Nobody cares about small things
> such as "cannot start nfsd" while your 4096 mounts patch
> simply CORRUPTS YOUR DATA TO HELL.
>
> If you need more than 1200 mounts, you have to add more majors
> to my patch. There is a number of them between 115 and 198.
> I suspect scalability problems may become evident
> with this approach, but it will work.
>
> Trond asked if I requested numbers from HPA, and the answer is no,
> because I wanted the patch in circulation for a while before it
> is worth bothering. Also, I heard that LANA is closed anyways.

You can use devfs_alloc_major() to safely grab unassigned majors.
If necessary, I can move this to a generic area.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-01-25 07:28:48

by Rainer Krienke

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

...
>
> Rainer, you missed the point. Nobody cares about small things
> such as "cannot start nfsd" while your 4096 mounts patch
> simply CORRUPTS YOUR DATA TO HELL.
>

Well I never said, I really knew what I was doing:-). Thats exacly why I
asked about why to use more major devices? OK the anser to this question
seems to be that minor devices may only be 8 bit due to the static nature of
some kernel structures. Right?

> If you need more than 1200 mounts, you have to add more majors
> to my patch. There is a number of them between 115 and 198.
> I suspect scalability problems may become evident
> with this approach, but it will work.

The solution Richard posted seems to be interesting at this point isn't it?

Rainer
--
---------------------------------------------------------------------
Rainer Krienke [email protected]
Universitaet Koblenz, http://www.uni-koblenz.de/~krienke
Rechenzentrum, Voice: +49 261 287 - 1312
Rheinau 1, 56075 Koblenz, Germany Fax: +49 261 287 - 1001312
---------------------------------------------------------------------

2002-01-25 17:41:28

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

> From: Rainer Krienke <[email protected]>
> Date: Fri, 25 Jan 2002 08:28:13 +0100

> > Rainer, you missed the point. Nobody cares about small things
> > such as "cannot start nfsd" while your 4096 mounts patch
> > simply CORRUPTS YOUR DATA TO HELL.
>
> Well I never said, I really knew what I was doing:-). Thats exacly why I
> asked about why to use more major devices? OK the anser to this question
> seems to be that minor devices may only be 8 bit due to the static nature of
> some kernel structures. Right?

Close enough... Actual reason is the implementation of MINOR().

> > If you need more than 1200 mounts, you have to add more majors
> > to my patch. There is a number of them between 115 and 198.
> > I suspect scalability problems may become evident
> > with this approach, but it will work.
>
> The solution Richard posted seems to be interesting at this point isn't it?

I thought about the rgooch's suggestion, it sounds good for 2.5.
Red Hat do not ship devfs enabled currently, and I cannot use his
allocation function if someone uses static majors, or some modules
may not load. The patch does include a safety element (majorhog_xxx)
that reserves majors properly. The devfs would make that unnecessary.

-- Pete

2002-01-25 18:35:33

by Richard Gooch

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

[Note: nfs-list removed from Cc:ed because slow^Wsourcefore has a
broken mail configuration which always bounces my email]

Pete Zaitcev writes:
> > From: Rainer Krienke <[email protected]>
> > Date: Fri, 25 Jan 2002 08:28:13 +0100
>
> > > Rainer, you missed the point. Nobody cares about small things
> > > such as "cannot start nfsd" while your 4096 mounts patch
> > > simply CORRUPTS YOUR DATA TO HELL.
> >
> > Well I never said, I really knew what I was doing:-). Thats exacly why I
> > asked about why to use more major devices? OK the anser to this question
> > seems to be that minor devices may only be 8 bit due to the static nature of
> > some kernel structures. Right?
>
> Close enough... Actual reason is the implementation of MINOR().
>
> > > If you need more than 1200 mounts, you have to add more majors
> > > to my patch. There is a number of them between 115 and 198.
> > > I suspect scalability problems may become evident
> > > with this approach, but it will work.
> >
> > The solution Richard posted seems to be interesting at this point isn't it?
>
> I thought about the rgooch's suggestion, it sounds good for 2.5.
> Red Hat do not ship devfs enabled currently, and I cannot use his
> allocation function if someone uses static majors, or some modules
> may not load. The patch does include a safety element (majorhog_xxx)
> that reserves majors properly. The devfs would make that unnecessary.

The allocation function should be safe, since it only gives majors
which are not assigned in devices.txt. Drivers which statically grab
unassigned majors are broken, and *will* trip over each other at some
point.

As I said before, I can move the major allocation function into a
generic place and not have it depend on CONFIG_DEVFS_FS. So it doesn't
have to matter if RH ship devfs or not.

BTW: please Cc: me, otherwise I may not see responses.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-01-25 18:43:35

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.17:Increase number of anonymous filesystems beyond 256?

> Date: Fri, 25 Jan 2002 11:34:59 -0700
> From: Richard Gooch <[email protected]>

> The allocation function should be safe, since it only gives majors
> which are not assigned in devices.txt. [...]

Oh, that changes it, I should have looked closer.
I am not sure the "1200 NFS mounts" case warrants the
change though, so far we have only one active user (Rainer) :)
If ISPs and universities clamour for my patch, then sure,
we may improve it with devfs_alloc_major() in 2.4, too.
Otherwise, whatever... Thanks for the explanation, Richard,
I'll keep it in my notes.

-- Pete