Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755753AbYKMVwY (ORCPT ); Thu, 13 Nov 2008 16:52:24 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753855AbYKMVwF (ORCPT ); Thu, 13 Nov 2008 16:52:05 -0500 Received: from qb-out-0506.google.com ([72.14.204.225]:36710 "EHLO qb-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751630AbYKMVv7 (ORCPT ); Thu, 13 Nov 2008 16:51:59 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=bqJeBKVcXD3zc+U5BYuzIjGiHWyeslUvPXpygcpXqSnt80Pod3TzstHAy++JDPh0/a TgjPBi3bEHAtwaqqXBIMkm3/SDheIM9zYelW/Nme4QKTeA5CUK6OEyNg7ZONowdfS4rh +PTkuin1DVBcxrynkdgb41UI2zXENuqvJbN48= Message-ID: <517f3f820811131351l1305b2d2u43ab4e0601d97f93@mail.gmail.com> Date: Thu, 13 Nov 2008 16:51:56 -0500 From: "Michael Kerrisk" To: "Andrew Morton" Subject: Re: [PATCH] reintroduce accept4 Cc: "Subrata Modak" , linux-arch@vger.kernel.org, "Ulrich Drepper" , linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, "Davide Libenzi" , netdev , "Roland McGrath" , "Oleg Nesterov" , "Christoph Hellwig" , "David Miller" , "Alan Cox" , "Jakub Jelinek" , "Michael Kerrisk" In-Reply-To: <200810261641.m9QGfotr024285@hs20-bc2-1.build.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200810261641.m9QGfotr024285@hs20-bc2-1.build.redhat.com> X-Google-Sender-Auth: 319d49e1f673c325 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 17890 Lines: 543 Andrew, On 10/26/08, Ulrich Drepper wrote: > This patch reintroduces accept4, replacing paccept. It's easy to see that > the patch only removes code and then redirects existing code away from the > removed functions. Since the paccept code sans signal handling was never > in question I think there is no reason to quarantine the patch first. I see you accepted this patch into -mm. I've finally got to looking at and testing this, so: Tested-by: Michael Kerrisk Acked-by: Michael Kerrisk In my tests, everything looks fine. I'll forward my test program in a follow-up mail. I think Ulrich wanted to try to see this patch in for 2.6.28; it's past the merge window of course, so it's up to you, but I have no problem with that. The API is the one that Ulrich initially proposed, before taking a detour into paccept() (http://thread.gmane.org/gmane.linux.kernel/671443 ), which I argued against (http://thread.gmane.org/gmane.linux.kernel/723952, http://thread.gmane.org/gmane.linux.network/106071/), since I (and Roland) could see no reason for the added complexity of a signal set argument (like pselect()/ppoll()/epoll_pwait()). (In any case, if someone does come up with a compelling reason to add a sigset argument, then we can add it via the use of a new flag bit.) My only argument is with the name of the new sysytem call. > I've updated the test program which now looks as follows: (I assume that there had been no testing on x86-32, since, the __i386__ ifdef's notwithstanding, the program below can't work on x86-32 -- sys_socketcall() takes its arguments packaged into an array on x86-32, not as an inline list.) Andrew, you noted a lack of explanation accompanying the original patch. Here's something to fill the gap, and which may be suitable for the changelog. == Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). (More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper) The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.) == For the curious, some further background on accept4() and the new system calls in 2.6.27 is at http://linux-man-pages.blogspot.com/2008/10/recent-changes-in-file-descriptor.html and http://lwn.net/Articles/281965/) Cheers, Michael > #include > #include > #include > #include > #include > #include > #include > > #ifdef __x86_64__ > #define __NR_accept4 288 > #define SOCK_CLOEXEC O_CLOEXEC > #elif __i386__ > #define SYS_ACCEPT4 18 > #define USE_SOCKETCALL 1 > #define SOCK_CLOEXEC O_CLOEXEC > #else > #error "define syscall numbers for this architecture" > #endif > > #define PORT 57392 > > static pthread_barrier_t b; > > static void * > tf (void *arg) > { > pthread_barrier_wait (&b); > int s = socket (AF_INET, SOCK_STREAM, 0); > struct sockaddr_in sin; > sin.sin_family = AF_INET; > sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); > sin.sin_port = htons (PORT); > connect (s, (const struct sockaddr *) &sin, sizeof (sin)); > close (s); > pthread_barrier_wait (&b); > > pthread_barrier_wait (&b); > s = socket (AF_INET, SOCK_STREAM, 0); > sin.sin_family = AF_INET; > sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); > sin.sin_port = htons (PORT + 1); > connect (s, (const struct sockaddr *) &sin, sizeof (sin)); > close (s); > return NULL; > } > > int > main (void) > { > alarm (5); > > int status = 0; > > pthread_barrier_init (&b, NULL, 2); > > pthread_t th; > if (pthread_create (&th, NULL, tf, NULL) != 0) > { > puts ("pthread_create failed"); > status = 1; > } > else > { > int s = socket (AF_INET, SOCK_STREAM, 0); > int fl = 1; > setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &fl, sizeof (fl)); > struct sockaddr_in sin; > sin.sin_family = AF_INET; > sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); > sin.sin_port = htons (PORT); > bind (s, (struct sockaddr *) &sin, sizeof (sin)); > listen (s, SOMAXCONN); > > pthread_barrier_wait (&b); > > int s2 = accept (s, NULL, 0); > if (s2 < 0) > { > puts ("accept failed"); > status = 1; > } > else > { > int fl = fcntl (s2, F_GETFD); > if ((fl & FD_CLOEXEC) != 0) > { > puts ("accept did set CLOEXEC"); > status = 1; > } > > close (s2); > } > > close (s); > > pthread_barrier_wait (&b); > > sin.sin_family = AF_INET; > sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); > sin.sin_port = htons (PORT + 1); > s = socket (AF_INET, SOCK_STREAM, 0); > bind (s, (struct sockaddr *) &sin, sizeof (sin)); > listen (s, SOMAXCONN); > > pthread_barrier_wait (&b); > > #if USE_SOCKETCALL > s2 = syscall (__NR_socketcall, SYS_ACCEPT4, s, NULL, 0, SOCK_CLOEXEC); > #else > s2 = syscall (__NR_accept4, s, NULL, 0, SOCK_CLOEXEC); > #endif > if (s2 < 0) > { > puts ("accept4 failed"); > status = 1; > } > else > { > int fl = fcntl (s2, F_GETFD); > if ((fl & FD_CLOEXEC) == 0) > { > puts ("accept4 did not set CLOEXEC"); > status = 1; > } > > close (s2); > } > > close (s); > } > > return status; > } > > > arch/x86/include/asm/unistd_64.h | 4 - > include/linux/net.h | 6 -- > include/linux/syscalls.h | 3 - > kernel/sys_ni.c | 2 > net/compat.c | 50 ++---------------------- > net/socket.c | 80 > ++++----------------------------------- > 6 files changed, 21 insertions(+), 124 deletions(-) > > > Signed-off-by: Ulrich Drepper > > diff --git a/arch/x86/include/asm/unistd_64.h > b/arch/x86/include/asm/unistd_64.h > index 834b2c1..d2e415e 100644 > --- a/arch/x86/include/asm/unistd_64.h > +++ b/arch/x86/include/asm/unistd_64.h > @@ -639,8 +639,8 @@ __SYSCALL(__NR_fallocate, sys_fallocate) > __SYSCALL(__NR_timerfd_settime, sys_timerfd_settime) > #define __NR_timerfd_gettime 287 > __SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime) > -#define __NR_paccept 288 > -__SYSCALL(__NR_paccept, sys_paccept) > +#define __NR_accept4 288 > +__SYSCALL(__NR_accept4, sys_accept4) > #define __NR_signalfd4 289 > __SYSCALL(__NR_signalfd4, sys_signalfd4) > #define __NR_eventfd2 290 > diff --git a/include/linux/net.h b/include/linux/net.h > index 6dc14a2..4515efa 100644 > --- a/include/linux/net.h > +++ b/include/linux/net.h > @@ -40,7 +40,7 @@ > #define SYS_GETSOCKOPT 15 /* sys_getsockopt(2) */ > #define SYS_SENDMSG 16 /* sys_sendmsg(2) */ > #define SYS_RECVMSG 17 /* sys_recvmsg(2) */ > -#define SYS_PACCEPT 18 /* sys_paccept(2) */ > +#define SYS_ACCEPT4 18 /* sys_accept4(2) */ > > typedef enum { > SS_FREE = 0, /* not allocated */ > @@ -100,7 +100,7 @@ enum sock_type { > * remaining bits are used as flags. */ > #define SOCK_TYPE_MASK 0xf > > -/* Flags for socket, socketpair, paccept */ > +/* Flags for socket, socketpair, accept4 */ > #define SOCK_CLOEXEC O_CLOEXEC > #ifndef SOCK_NONBLOCK > #define SOCK_NONBLOCK O_NONBLOCK > @@ -223,8 +223,6 @@ extern int sock_map_fd(struct socket *sock, int > flags); > extern struct socket *sockfd_lookup(int fd, int *err); > #define sockfd_put(sock) fput(sock->file) > extern int net_ratelimit(void); > -extern long do_accept(int fd, struct sockaddr __user *upeer_sockaddr, > - int __user *upeer_addrlen, int flags); > > #define net_random() random32() > #define net_srandom(seed) srandom32((__force u32)seed) > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index d6ff145..04fb47b 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -410,8 +410,7 @@ asmlinkage long sys_getsockopt(int fd, int level, int > optname, > asmlinkage long sys_bind(int, struct sockaddr __user *, int); > asmlinkage long sys_connect(int, struct sockaddr __user *, int); > asmlinkage long sys_accept(int, struct sockaddr __user *, int __user *); > -asmlinkage long sys_paccept(int, struct sockaddr __user *, int __user *, > - const __user sigset_t *, size_t, int); > +asmlinkage long sys_accept4(int, struct sockaddr __user *, int __user *, > int); > asmlinkage long sys_getsockname(int, struct sockaddr __user *, int __user > *); > asmlinkage long sys_getpeername(int, struct sockaddr __user *, int __user > *); > asmlinkage long sys_send(int, void __user *, size_t, unsigned); > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index a77b27b..e14a232 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -31,7 +31,7 @@ cond_syscall(sys_socketpair); > cond_syscall(sys_bind); > cond_syscall(sys_listen); > cond_syscall(sys_accept); > -cond_syscall(sys_paccept); > +cond_syscall(sys_accept4); > cond_syscall(sys_connect); > cond_syscall(sys_getsockname); > cond_syscall(sys_getpeername); > diff --git a/net/compat.c b/net/compat.c > index 67fb6a3..50ce0a9 100644 > --- a/net/compat.c > +++ b/net/compat.c > @@ -725,7 +725,7 @@ EXPORT_SYMBOL(compat_mc_getsockopt); > static unsigned char nas[19]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), > AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), > AL(6),AL(2),AL(5),AL(5),AL(3),AL(3), > - AL(6)}; > + AL(4)}; > #undef AL > > asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user > *msg, unsigned flags) > @@ -738,52 +738,13 @@ asmlinkage long compat_sys_recvmsg(int fd, struct > compat_msghdr __user *msg, uns > return sys_recvmsg(fd, (struct msghdr __user *)msg, flags | > MSG_CMSG_COMPAT); > } > > -asmlinkage long compat_sys_paccept(int fd, struct sockaddr __user > *upeer_sockaddr, > - int __user *upeer_addrlen, > - const compat_sigset_t __user *sigmask, > - compat_size_t sigsetsize, int flags) > -{ > - compat_sigset_t ss32; > - sigset_t ksigmask, sigsaved; > - int ret; > - > - if (sigmask) { > - if (sigsetsize != sizeof(compat_sigset_t)) > - return -EINVAL; > - if (copy_from_user(&ss32, sigmask, sizeof(ss32))) > - return -EFAULT; > - sigset_from_compat(&ksigmask, &ss32); > - > - sigdelsetmask(&ksigmask, sigmask(SIGKILL)|sigmask(SIGSTOP)); > - sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved); > - } > - > - ret = do_accept(fd, upeer_sockaddr, upeer_addrlen, flags); > - > - if (ret == -ERESTARTNOHAND) { > - /* > - * Don't restore the signal mask yet. Let do_signal() deliver > - * the signal on the way back to userspace, before the signal > - * mask is restored. > - */ > - if (sigmask) { > - memcpy(¤t->saved_sigmask, &sigsaved, > - sizeof(sigsaved)); > - set_restore_sigmask(); > - } > - } else if (sigmask) > - sigprocmask(SIG_SETMASK, &sigsaved, NULL); > - > - return ret; > -} > - > asmlinkage long compat_sys_socketcall(int call, u32 __user *args) > { > int ret; > u32 a[6]; > u32 a0, a1; > > - if (call < SYS_SOCKET || call > SYS_PACCEPT) > + if (call < SYS_SOCKET || call > SYS_ACCEPT4) > return -EINVAL; > if (copy_from_user(a, args, nas[call])) > return -EFAULT; > @@ -804,7 +765,7 @@ asmlinkage long compat_sys_socketcall(int call, u32 > __user *args) > ret = sys_listen(a0, a1); > break; > case SYS_ACCEPT: > - ret = do_accept(a0, compat_ptr(a1), compat_ptr(a[2]), 0); > + ret = sys_accept4(a0, compat_ptr(a1), compat_ptr(a[2]), 0); > break; > case SYS_GETSOCKNAME: > ret = sys_getsockname(a0, compat_ptr(a1), compat_ptr(a[2])); > @@ -844,9 +805,8 @@ asmlinkage long compat_sys_socketcall(int call, u32 > __user *args) > case SYS_RECVMSG: > ret = compat_sys_recvmsg(a0, compat_ptr(a1), a[2]); > break; > - case SYS_PACCEPT: > - ret = compat_sys_paccept(a0, compat_ptr(a1), compat_ptr(a[2]), > - compat_ptr(a[3]), a[4], a[5]); > + case SYS_ACCEPT4: > + ret = sys_accept4(a0, compat_ptr(a1), compat_ptr(a[2]), a[3]); > break; > default: > ret = -EINVAL; > diff --git a/net/socket.c b/net/socket.c > index 2b7a4b5..8e72806 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -1427,8 +1427,8 @@ asmlinkage long sys_listen(int fd, int backlog) > * clean when we restucture accept also. > */ > > -long do_accept(int fd, struct sockaddr __user *upeer_sockaddr, > - int __user *upeer_addrlen, int flags) > +asmlinkage long sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, > + int __user *upeer_addrlen, int flags) > { > struct socket *sock, *newsock; > struct file *newfile; > @@ -1511,66 +1511,10 @@ out_fd: > goto out_put; > } > > -#if 0 > -#ifdef HAVE_SET_RESTORE_SIGMASK > -asmlinkage long sys_paccept(int fd, struct sockaddr __user *upeer_sockaddr, > - int __user *upeer_addrlen, > - const sigset_t __user *sigmask, > - size_t sigsetsize, int flags) > -{ > - sigset_t ksigmask, sigsaved; > - int ret; > - > - if (sigmask) { > - /* XXX: Don't preclude handling different sized sigset_t's. */ > - if (sigsetsize != sizeof(sigset_t)) > - return -EINVAL; > - if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask))) > - return -EFAULT; > - > - sigdelsetmask(&ksigmask, sigmask(SIGKILL)|sigmask(SIGSTOP)); > - sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved); > - } > - > - ret = do_accept(fd, upeer_sockaddr, upeer_addrlen, flags); > - > - if (ret < 0 && signal_pending(current)) { > - /* > - * Don't restore the signal mask yet. Let do_signal() deliver > - * the signal on the way back to userspace, before the signal > - * mask is restored. > - */ > - if (sigmask) { > - memcpy(¤t->saved_sigmask, &sigsaved, > - sizeof(sigsaved)); > - set_restore_sigmask(); > - } > - } else if (sigmask) > - sigprocmask(SIG_SETMASK, &sigsaved, NULL); > - > - return ret; > -} > -#else > -asmlinkage long sys_paccept(int fd, struct sockaddr __user *upeer_sockaddr, > - int __user *upeer_addrlen, > - const sigset_t __user *sigmask, > - size_t sigsetsize, int flags) > -{ > - /* The platform does not support restoring the signal mask in the > - * return path. So we do not allow using paccept() with a signal > - * mask. */ > - if (sigmask) > - return -EINVAL; > - > - return do_accept(fd, upeer_sockaddr, upeer_addrlen, flags); > -} > -#endif > -#endif > - > asmlinkage long sys_accept(int fd, struct sockaddr __user *upeer_sockaddr, > int __user *upeer_addrlen) > { > - return do_accept(fd, upeer_sockaddr, upeer_addrlen, 0); > + return sys_accept4(fd, upeer_sockaddr, upeer_addrlen, 0); > } > > /* > @@ -2097,7 +2041,7 @@ static const unsigned char nargs[19]={ > AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), > AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), > AL(6),AL(2),AL(5),AL(5),AL(3),AL(3), > - AL(6) > + AL(4) > }; > > #undef AL > @@ -2116,7 +2060,7 @@ asmlinkage long sys_socketcall(int call, unsigned long > __user *args) > unsigned long a0, a1; > int err; > > - if (call < 1 || call > SYS_PACCEPT) > + if (call < 1 || call > SYS_ACCEPT4) > return -EINVAL; > > /* copy_from_user should be SMP safe. */ > @@ -2144,9 +2088,8 @@ asmlinkage long sys_socketcall(int call, unsigned long > __user *args) > err = sys_listen(a0, a1); > break; > case SYS_ACCEPT: > - err = > - do_accept(a0, (struct sockaddr __user *)a1, > - (int __user *)a[2], 0); > + err = sys_accept4(a0, (struct sockaddr __user *)a1, > + (int __user *)a[2], 0); > break; > case SYS_GETSOCKNAME: > err = > @@ -2193,12 +2136,9 @@ asmlinkage long sys_socketcall(int call, unsigned > long __user *args) > case SYS_RECVMSG: > err = sys_recvmsg(a0, (struct msghdr __user *)a1, a[2]); > break; > - case SYS_PACCEPT: > - err = > - sys_paccept(a0, (struct sockaddr __user *)a1, > - (int __user *)a[2], > - (const sigset_t __user *) a[3], > - a[4], a[5]); > + case SYS_ACCEPT4: > + err = sys_accept4(a0, (struct sockaddr __user *)a1, > + (int __user *)a[2], a[3]); > break; > default: > err = -EINVAL; > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a documentation bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/