LinuxLists.cc - [PATCH] fs: export kern_path_locked

2021-01-25 16:01:21

by Denis Kirjanov

[permalink] [raw]

Subject: [PATCH] fs: export kern_path_locked

the function is used outside and we have a prototype
defined in namei.h

Signed-off-by: Denis Kirjanov <[email protected]>
---
fs/namei.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/namei.c b/fs/namei.c
index 78443a85480a..3de3b3642302 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2450,6 +2450,7 @@ struct dentry *kern_path_locked(const char *name, struct path *path)
putname(filename);
return d;
}
+EXPORT_SYMBOL(kern_path_locked);

int kern_path(const char *name, unsigned int flags, struct path *path)
{
--
2.16.4

2021-01-28 00:10:06

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] fs: export kern_path_locked

On Mon, Jan 25, 2021 at 06:49:37PM +0300, Denis Kirjanov wrote:
> the function is used outside and we have a prototype
> defined in namei.h

Huh? It is only used in drivers/base/devtmpfs.c, and
kernel/audit_*.c. Absolutely no need to export it for that.

2021-02-14 18:21:08

[permalink] [raw]

Subject: Re: [PATCH] fs: export kern_path_locked

On Fri, Jan 29, 2021 at 01:18:55PM +0000, Christoph Hellwig wrote:
> On Fri, Jan 29, 2021 at 04:11:05PM +0300, Denis Kirjanov wrote:
> > Do you mean just:
>
> We'll still need to lock the parent inode.

Not just "lock", we wouldd need to have the lock _held_ across the
entire sequence. Without that there's no warranty that it will refer
to the same object we'd created.

In any case, unlink in any potentially public area is pretty much
never the right approach. Once mknod has happened, that's it - too
late to bail out.

IIRC, most of the PITA in that area is due to unix_autobind()
iteractions. Basically, we try to bind() an unbound socket and
another thread does sendmsg() on the same while we are in the
middle of ->mknod(). Who should wait for whom?

->mknod() really should be a point of no return - any games with
"so we unlink it" are unreliable in the best case, and that's
only if we do _not_ unlock the parent through the entire sequence.

Seeing that we have separate bindlock and iolock now... How about
this (completely untested) delta?

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..c21038b15836 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1034,6 +1034,14 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
goto out;
addr_len = err;

+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err)
+ goto out;
+
+ err = -EINVAL;
+ if (u->addr)
+ goto out_up;
+
if (sun_path[0]) {
umode_t mode = S_IFSOCK |
(SOCK_INODE(sock)->i_mode & ~current_umask());
@@ -1041,18 +1049,10 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
- goto out;
+ goto out_up;
}
}

- err = mutex_lock_interruptible(&u->bindlock);
- if (err)
- goto out_put;
-
- err = -EINVAL;
- if (u->addr)
- goto out_up;
-
err = -ENOMEM;
addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
if (!addr)
@@ -1090,7 +1090,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
spin_unlock(&unix_table_lock);
out_up:
mutex_unlock(&u->bindlock);
-out_put:
if (err)
path_put(&path);
out:

2021-02-16 14:34:28

by Denis Kirjanov

[permalink] [raw]

Subject: Re: [PATCH] fs: export kern_path_locked

On 2/14/21, Al Viro <[email protected]> wrote:
> On Fri, Jan 29, 2021 at 01:18:55PM +0000, Christoph Hellwig wrote:
>> On Fri, Jan 29, 2021 at 04:11:05PM +0300, Denis Kirjanov wrote:
>> > Do you mean just:
>>
>> We'll still need to lock the parent inode.
>
> Not just "lock", we wouldd need to have the lock _held_ across the
> entire sequence. Without that there's no warranty that it will refer
> to the same object we'd created.
>
> In any case, unlink in any potentially public area is pretty much
> never the right approach. Once mknod has happened, that's it - too
> late to bail out.
>
> IIRC, most of the PITA in that area is due to unix_autobind()
> iteractions. Basically, we try to bind() an unbound socket and
> another thread does sendmsg() on the same while we are in the
> middle of ->mknod(). Who should wait for whom?
>
> ->mknod() really should be a point of no return - any games with
> "so we unlink it" are unreliable in the best case, and that's
> only if we do _not_ unlock the parent through the entire sequence.
>
> Seeing that we have separate bindlock and iolock now... How about
> this (completely untested) delta?

We had a change like that:
Author: WANG Cong <[email protected]>
Date: Mon Jan 23 11:17:35 2017 -0800

af_unix: move unix_mknod() out of bindlock

Dmitry reported a deadlock scenario:

unix_bind() path:
u->bindlock ==> sb_writer

do_splice() path:
sb_writer ==> pipe->mutex ==> u->bindlock

In the unix_bind() code path, unix_mknod() does not have to
be done with u->bindlock held, since it is a pure fs operation,
so we can just move unix_mknod() out.

>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 41c3303c3357..c21038b15836 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1034,6 +1034,14 @@ static int unix_bind(struct socket *sock, struct
> sockaddr *uaddr, int addr_len)
> goto out;
> addr_len = err;
>
> + err = mutex_lock_interruptible(&u->bindlock);
> + if (err)
> + goto out;
> +
> + err = -EINVAL;
> + if (u->addr)
> + goto out_up;
> +
> if (sun_path[0]) {
> umode_t mode = S_IFSOCK |
> (SOCK_INODE(sock)->i_mode & ~current_umask());
> @@ -1041,18 +1049,10 @@ static int unix_bind(struct socket *sock, struct
> sockaddr *uaddr, int addr_len)
> if (err) {
> if (err == -EEXIST)
> err = -EADDRINUSE;
> - goto out;
> + goto out_up;
> }
> }
>
> - err = mutex_lock_interruptible(&u->bindlock);
> - if (err)
> - goto out_put;
> -
> - err = -EINVAL;
> - if (u->addr)
> - goto out_up;
> -
> err = -ENOMEM;
> addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
> if (!addr)
> @@ -1090,7 +1090,6 @@ static int unix_bind(struct socket *sock, struct
> sockaddr *uaddr, int addr_len)
> spin_unlock(&unix_table_lock);
> out_up:
> mutex_unlock(&u->bindlock);
> -out_put:
> if (err)
> path_put(&path);
> out:
>

2021-02-16 18:02:42

[permalink] [raw]

Subject: Re: [PATCH] fs: export kern_path_locked

On Tue, Feb 16, 2021 at 05:31:33PM +0300, Denis Kirjanov wrote:

> We had a change like that:
> Author: WANG Cong <[email protected]>
> Date: Mon Jan 23 11:17:35 2017 -0800
>
> af_unix: move unix_mknod() out of bindlock
>
> Dmitry reported a deadlock scenario:
>
> unix_bind() path:
> u->bindlock ==> sb_writer
>
> do_splice() path:
> sb_writer ==> pipe->mutex ==> u->bindlock
>
> In the unix_bind() code path, unix_mknod() does not have to
> be done with u->bindlock held, since it is a pure fs operation,
> so we can just move unix_mknod() out.

*cringe*

I remember now... Process set:

P1: bind() of AF_UNIX socket to /mnt/sock
P2: splice() from pipe to /mnt/foo
P3: freeze /mnt
P4: splice() from pipe to AF_UNIX socket

P1 grabs ->bindlock
P2 sb_start_write() for what's on /mnt
P2 grabs rwsem shared
P3 blocks in sb_wait_write() trying to grab the same rwsem exclusive
P1 sb_start_write() blocks trying to grab the same rwsem shared
P4 calls ->splice_write(), aka generic_splice_sendpage()
P4 grabs pipe->mutex
P4 calls ->sendpage(), aka sock_no_sendpage()
P4 calls ->sendmsg(), aka unix_dgram_sendmsg()
P4 calls unix_autobind()
P4 blocks trying to grab ->bindlock
P2 ->splice_write(), aka iter_file_splice_write()
P2 blocks trying to grab pipe->mutex
DEADLOCK

Sigh... OK, so we want something like
user_path_create()
vfs_mknod()
created = true
grab bindlock
....
drop bindlock
if failed && created
vfs_unlink()
done_path_create()
in unix_bind()... That would push ->bindlock all way down in the hierarchy,
so that should be deadlock-free, but it looks like that'll be fucking ugly ;-/

Let me try and play with that a bit, maybe it can be massaged to something
relatively sane...

2021-02-19 04:13:20

[permalink] [raw]

Subject: Re: [PATCH] fs: export kern_path_locked

On Tue, Feb 16, 2021 at 06:00:34PM +0000, Al Viro wrote:

> Sigh... OK, so we want something like
> user_path_create()
> vfs_mknod()
> created = true
> grab bindlock
> ....
> drop bindlock
> if failed && created
> vfs_unlink()
> done_path_create()
> in unix_bind()... That would push ->bindlock all way down in the hierarchy,
> so that should be deadlock-free, but it looks like that'll be fucking ugly ;-/
>
> Let me try and play with that a bit, maybe it can be massaged to something
> relatively sane...

OK... Completely untested series follows. Preliminary massage in first
6 patches, then actual "add cleanup on failure", then minor followup
cleanup.
af_unix: take address assignment/hash insertion into a new helper
unix_bind(): allocate addr earlier
unix_bind(): separate BSD and abstract cases
unix_bind(): take BSD and abstract address cases into new helpers
fold unix_mknod() into unix_bind_bsd()
unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
unix_bind_bsd(): unlink if we fail after successful mknod
__unix_find_socket_byname(): don't pass hash and type separately

Branch is in git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #misc.af_unix,
individual patches in followups

2021-02-19 04:23:06

[permalink] [raw]

Subject: [PATCH 2/8] unix_bind(): allocate addr earlier

From 24439dbb7b78cb301c73254b364020e6a3f31902 Mon Sep 17 00:00:00 2001
From: Al Viro <[email protected]>
Date: Thu, 18 Feb 2021 15:52:53 -0500
Subject: [PATCH 2/8] unix_bind(): allocate addr earlier

makes it easier to massage; we do pay for that by extra work
(kmalloc+memcpy+kfree) in some error cases, but those are not
on the hot paths anyway.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 179b4fe837e6..451d81f405c0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1039,6 +1039,15 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (err < 0)
goto out;
addr_len = err;
+ err = -ENOMEM;
+ addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
+ if (!addr)
+ goto out;
+
+ memcpy(addr->name, sunaddr, addr_len);
+ addr->len = addr_len;
+ addr->hash = hash ^ sk->sk_type;
+ refcount_set(&addr->refcnt, 1);

if (sun_path[0]) {
umode_t mode = S_IFSOCK |
@@ -1047,7 +1056,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
- goto out;
+ goto out_addr;
}
}

@@ -1059,16 +1068,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (u->addr)
goto out_up;

- err = -ENOMEM;
- addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
- if (!addr)
- goto out_up;
-
- memcpy(addr->name, sunaddr, addr_len);
- addr->len = addr_len;
- addr->hash = hash ^ sk->sk_type;
- refcount_set(&addr->refcnt, 1);
-
if (sun_path[0]) {
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
@@ -1080,19 +1079,22 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (__unix_find_socket_byname(net, sunaddr, addr_len,
sk->sk_type, hash)) {
spin_unlock(&unix_table_lock);
- unix_release_addr(addr);
goto out_up;
}
hash = addr->hash;
}

__unix_set_addr(sk, addr, hash);
+ addr = NULL;
err = 0;
out_up:
mutex_unlock(&u->bindlock);
out_put:
if (err)
path_put(&path);
+out_addr:
+ if (addr)
+ unix_release_addr(addr);
out:
return err;
}
--
2.11.0

2021-02-19 04:23:11

[permalink] [raw]

Subject: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

Duplicated logics in all bind variants (autobind, bind-to-path,
bind-to-abstract) gets taken into a common helper.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..179b4fe837e6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -262,6 +262,16 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk)
sk_add_node(sk, list);
}

+static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
+ unsigned hash)
+ __releases(&unix_table_lock)
+{
+ __unix_remove_socket(sk);
+ smp_store_release(&unix_sk(sk)->addr, addr);
+ __unix_insert_socket(&unix_socket_table[hash], sk);
+ spin_unlock(&unix_table_lock);
+}
+
static inline void unix_remove_socket(struct sock *sk)
{
spin_lock(&unix_table_lock);
@@ -912,10 +922,7 @@ static int unix_autobind(struct socket *sock)
}
addr->hash ^= sk->sk_type;

- __unix_remove_socket(sk);
- smp_store_release(&u->addr, addr);
- __unix_insert_socket(&unix_socket_table[addr->hash], sk);
- spin_unlock(&unix_table_lock);
+ __unix_set_addr(sk, addr, addr->hash);
err = 0;

out: mutex_unlock(&u->bindlock);
@@ -1016,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
int err;
unsigned int hash;
struct unix_address *addr;
- struct hlist_head *list;
struct path path = { };

err = -EINVAL;
@@ -1068,26 +1074,20 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(&unix_table_lock);
u->path = path;
- list = &unix_socket_table[hash];
} else {
spin_lock(&unix_table_lock);
err = -EADDRINUSE;
if (__unix_find_socket_byname(net, sunaddr, addr_len,
sk->sk_type, hash)) {
+ spin_unlock(&unix_table_lock);
unix_release_addr(addr);
- goto out_unlock;
+ goto out_up;
}
-
- list = &unix_socket_table[addr->hash];
+ hash = addr->hash;
}

+ __unix_set_addr(sk, addr, hash);
err = 0;
- __unix_remove_socket(sk);
- smp_store_release(&u->addr, addr);
- __unix_insert_socket(list, sk);
-
-out_unlock:
- spin_unlock(&unix_table_lock);
out_up:
mutex_unlock(&u->bindlock);
out_put:
--
2.11.0

2021-02-19 04:23:43

[permalink] [raw]

Subject: [PATCH 3/8] unix_bind(): separate BSD and abstract cases

We do get some duplication that way, but it's minor compared to
parts that are different. What we get is an ability to change
locking in BSD case without making failure exits very hard to
follow.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 52 ++++++++++++++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 451d81f405c0..11e18b0efbc6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1023,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
int err;
unsigned int hash;
struct unix_address *addr;
- struct path path = { };

err = -EINVAL;
if (addr_len < offsetofend(struct sockaddr_un, sun_family) ||
@@ -1050,6 +1049,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
refcount_set(&addr->refcnt, 1);

if (sun_path[0]) {
+ struct path path = { };
umode_t mode = S_IFSOCK |
(SOCK_INODE(sock)->i_mode & ~current_umask());
err = unix_mknod(sun_path, mode, &path);
@@ -1058,40 +1058,52 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
err = -EADDRINUSE;
goto out_addr;
}
- }

- err = mutex_lock_interruptible(&u->bindlock);
- if (err)
- goto out_put;
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err) {
+ path_put(&path);
+ goto out_addr;
+ }

- err = -EINVAL;
- if (u->addr)
- goto out_up;
+ err = -EINVAL;
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ path_put(&path);
+ goto out_addr;
+ }

- if (sun_path[0]) {
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(&unix_table_lock);
u->path = path;
+ __unix_set_addr(sk, addr, hash);
+ mutex_unlock(&u->bindlock);
+ addr = NULL;
+ err = 0;
} else {
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err)
+ goto out_addr;
+
+ err = -EINVAL;
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ goto out_addr;
+ }
+
spin_lock(&unix_table_lock);
err = -EADDRINUSE;
if (__unix_find_socket_byname(net, sunaddr, addr_len,
sk->sk_type, hash)) {
spin_unlock(&unix_table_lock);
- goto out_up;
+ mutex_unlock(&u->bindlock);
+ goto out_addr;
}
- hash = addr->hash;
+ __unix_set_addr(sk, addr, addr->hash);
+ mutex_unlock(&u->bindlock);
+ addr = NULL;
+ err = 0;
}
-
- __unix_set_addr(sk, addr, hash);
- addr = NULL;
- err = 0;
-out_up:
- mutex_unlock(&u->bindlock);
-out_put:
- if (err)
- path_put(&path);
out_addr:
if (addr)
unix_release_addr(addr);
--
2.11.0

2021-02-19 04:25:13

[permalink] [raw]

Subject: [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers

unix_bind_bsd() and unix_bind_abstract() respectively.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 144 +++++++++++++++++++++++++++--------------------------
1 file changed, 74 insertions(+), 70 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 11e18b0efbc6..d7aeb4827747 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1013,101 +1013,105 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
return err;
}

+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
+{
+ struct unix_sock *u = unix_sk(sk);
+ struct path path = { };
+ umode_t mode = S_IFSOCK |
+ (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+ unsigned int hash;
+ int err;
+
+ err = unix_mknod(addr->name->sun_path, mode, &path);
+ if (err) {
+ if (err == -EEXIST)
+ err = -EADDRINUSE;
+ return err;
+ }
+
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err) {
+ path_put(&path);
+ return err;
+ }
+
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ path_put(&path);
+ return -EINVAL;
+ }
+
+ addr->hash = UNIX_HASH_SIZE;
+ hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+ spin_lock(&unix_table_lock);
+ u->path = path;
+ __unix_set_addr(sk, addr, hash);
+ mutex_unlock(&u->bindlock);
+ return 0;
+}
+
+static int unix_bind_abstract(struct sock *sk, unsigned hash,
+ struct unix_address *addr)
+{
+ struct unix_sock *u = unix_sk(sk);
+ int err;
+
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err)
+ return err;
+
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ return -EINVAL;
+ }
+
+ spin_lock(&unix_table_lock);
+ if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
+ sk->sk_type, hash)) {
+ spin_unlock(&unix_table_lock);
+ mutex_unlock(&u->bindlock);
+ return -EADDRINUSE;
+ }
+ __unix_set_addr(sk, addr, addr->hash);
+ mutex_unlock(&u->bindlock);
+ return 0;
+}
+
static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{
struct sock *sk = sock->sk;
- struct net *net = sock_net(sk);
- struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
char *sun_path = sunaddr->sun_path;
int err;
unsigned int hash;
struct unix_address *addr;

- err = -EINVAL;
if (addr_len < offsetofend(struct sockaddr_un, sun_family) ||
sunaddr->sun_family != AF_UNIX)
- goto out;
+ return -EINVAL;

- if (addr_len == sizeof(short)) {
- err = unix_autobind(sock);
- goto out;
- }
+ if (addr_len == sizeof(short))
+ return unix_autobind(sock);

err = unix_mkname(sunaddr, addr_len, &hash);
if (err < 0)
- goto out;
+ return err;
addr_len = err;
- err = -ENOMEM;
addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
if (!addr)
- goto out;
+ return -ENOMEM;

memcpy(addr->name, sunaddr, addr_len);
addr->len = addr_len;
addr->hash = hash ^ sk->sk_type;
refcount_set(&addr->refcnt, 1);

- if (sun_path[0]) {
- struct path path = { };
- umode_t mode = S_IFSOCK |
- (SOCK_INODE(sock)->i_mode & ~current_umask());
- err = unix_mknod(sun_path, mode, &path);
- if (err) {
- if (err == -EEXIST)
- err = -EADDRINUSE;
- goto out_addr;
- }
-
- err = mutex_lock_interruptible(&u->bindlock);
- if (err) {
- path_put(&path);
- goto out_addr;
- }
-
- err = -EINVAL;
- if (u->addr) {
- mutex_unlock(&u->bindlock);
- path_put(&path);
- goto out_addr;
- }
-
- addr->hash = UNIX_HASH_SIZE;
- hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
- spin_lock(&unix_table_lock);
- u->path = path;
- __unix_set_addr(sk, addr, hash);
- mutex_unlock(&u->bindlock);
- addr = NULL;
- err = 0;
- } else {
- err = mutex_lock_interruptible(&u->bindlock);
- if (err)
- goto out_addr;
-
- err = -EINVAL;
- if (u->addr) {
- mutex_unlock(&u->bindlock);
- goto out_addr;
- }
-
- spin_lock(&unix_table_lock);
- err = -EADDRINUSE;
- if (__unix_find_socket_byname(net, sunaddr, addr_len,
- sk->sk_type, hash)) {
- spin_unlock(&unix_table_lock);
- mutex_unlock(&u->bindlock);
- goto out_addr;
- }
- __unix_set_addr(sk, addr, addr->hash);
- mutex_unlock(&u->bindlock);
- addr = NULL;
- err = 0;
- }
-out_addr:
- if (addr)
+ if (sun_path[0])
+ err = unix_bind_bsd(sk, addr);
+ else
+ err = unix_bind_abstract(sk, hash, addr);
+ if (err)
unix_release_addr(addr);
-out:
return err;
}

--
2.11.0

2021-02-19 04:25:27

[permalink] [raw]

Subject: [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock

Final preparations for doing unlink on failure past the successful
mknod. We can't hold ->bindlock over ->mknod() or ->unlink(), since
either might do sb_start_write() (e.g. on overlayfs). However, we
can do it while holding filesystem and VFS locks - doing
kern_path_create()
vfs_mknod()
grab ->bindlock
if u->addr had been set
drop ->bindlock
done_path_create
return -EINVAL
else
assign the address to socket
drop ->bindlock
done_path_create
return 0
would be deadlock-free. Here we massage unix_bind_bsd() to that
form. We are still doing equivalent transformations.

Next commit will *not* be an equivalent transformation - it will
add a call of vfs_unlink() before done_path_create() in "alread bound"
case.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 36b88c8c438b..d55035a9695f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -989,7 +989,7 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
struct unix_sock *u = unix_sk(sk);
umode_t mode = S_IFSOCK |
(SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
- struct path parent, path;
+ struct path parent;
struct dentry *dentry;
unsigned int hash;
int err;
@@ -1006,38 +1006,34 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
* All right, let's create it.
*/
err = security_path_mknod(&parent, dentry, mode, 0);
- if (!err) {
+ if (!err)
err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
- if (!err) {
- path.mnt = mntget(parent.mnt);
- path.dentry = dget(dentry);
- }
- }
- done_path_create(&parent, dentry);
+
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
+ done_path_create(&parent, dentry);
return err;
}
-
err = mutex_lock_interruptible(&u->bindlock);
if (err) {
- path_put(&path);
+ done_path_create(&parent, dentry);
return err;
}
-
if (u->addr) {
mutex_unlock(&u->bindlock);
- path_put(&path);
+ done_path_create(&parent, dentry);
return -EINVAL;
}

addr->hash = UNIX_HASH_SIZE;
- hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+ hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(&unix_table_lock);
- u->path = path;
+ u->path.mnt = mntget(parent.mnt);
+ u->path.dentry = dget(dentry);
__unix_set_addr(sk, addr, hash);
mutex_unlock(&u->bindlock);
+ done_path_create(&parent, dentry);
return 0;
}

--
2.11.0

2021-02-19 04:25:56

[permalink] [raw]

Subject: [PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod

We can do that more or less safely, since the parent is
held locked all along. Yes, somebody might observe the
object via dcache, only to have it disappear afterwards,
but there's really no good way to prevent that. It won't
race with other bind(2) or attempts to move the sucker
elsewhere, or put something else in its place - locked
parent prevents that.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d55035a9695f..bb4c6200953d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1001,30 +1001,19 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
-
/*
* All right, let's create it.
*/
err = security_path_mknod(&parent, dentry, mode, 0);
if (!err)
err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
-
- if (err) {
- if (err == -EEXIST)
- err = -EADDRINUSE;
- done_path_create(&parent, dentry);
- return err;
- }
+ if (err)
+ goto out;
err = mutex_lock_interruptible(&u->bindlock);
- if (err) {
- done_path_create(&parent, dentry);
- return err;
- }
- if (u->addr) {
- mutex_unlock(&u->bindlock);
- done_path_create(&parent, dentry);
- return -EINVAL;
- }
+ if (err)
+ goto out_unlink;
+ if (u->addr)
+ goto out_unlock;

addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
@@ -1035,6 +1024,16 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
mutex_unlock(&u->bindlock);
done_path_create(&parent, dentry);
return 0;
+
+out_unlock:
+ mutex_unlock(&u->bindlock);
+ err = -EINVAL;
+out_unlink:
+ /* failed after successful mknod? unlink what we'd created... */
+ vfs_unlink(d_inode(parent.dentry), dentry, NULL);
+out:
+ done_path_create(&parent, dentry);
+ return err == -EEXIST ? -EADDRINUSE : err;
}

static int unix_bind_abstract(struct sock *sk, unsigned hash,
--
2.11.0

2021-02-19 04:26:07

[permalink] [raw]

Subject: [PATCH 5/8] fold unix_mknod() into unix_bind_bsd()

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 39 +++++++++++++++------------------------
1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d7aeb4827747..36b88c8c438b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -984,45 +984,36 @@ static struct sock *unix_find_other(struct net *net,
return NULL;
}

-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
{
+ struct unix_sock *u = unix_sk(sk);
+ umode_t mode = S_IFSOCK |
+ (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+ struct path parent, path;
struct dentry *dentry;
- struct path path;
- int err = 0;
+ unsigned int hash;
+ int err;
+
/*
* Get the parent directory, calculate the hash for last
* component.
*/
- dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
- err = PTR_ERR(dentry);
+ dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0);
if (IS_ERR(dentry))
- return err;
+ return PTR_ERR(dentry);

/*
* All right, let's create it.
*/
- err = security_path_mknod(&path, dentry, mode, 0);
+ err = security_path_mknod(&parent, dentry, mode, 0);
if (!err) {
- err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+ err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
if (!err) {
- res->mnt = mntget(path.mnt);
- res->dentry = dget(dentry);
+ path.mnt = mntget(parent.mnt);
+ path.dentry = dget(dentry);
}
}
- done_path_create(&path, dentry);
- return err;
-}
-
-static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
-{
- struct unix_sock *u = unix_sk(sk);
- struct path path = { };
- umode_t mode = S_IFSOCK |
- (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
- unsigned int hash;
- int err;
-
- err = unix_mknod(addr->name->sun_path, mode, &path);
+ done_path_create(&parent, dentry);
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
--
2.11.0

2021-02-19 04:26:30

[permalink] [raw]

Subject: [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately

We only care about exclusive or of those, so pass that directly.
Makes life simpler for callers as well...

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index bb4c6200953d..a7e20fcadfcc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -288,11 +288,11 @@ static inline void unix_insert_socket(struct hlist_head *list, struct sock *sk)

static struct sock *__unix_find_socket_byname(struct net *net,
struct sockaddr_un *sunname,
- int len, int type, unsigned int hash)
+ int len, unsigned int hash)
{
struct sock *s;

- sk_for_each(s, &unix_socket_table[hash ^ type]) {
+ sk_for_each(s, &unix_socket_table[hash]) {
struct unix_sock *u = unix_sk(s);

if (!net_eq(sock_net(s), net))
@@ -307,13 +307,12 @@ static struct sock *__unix_find_socket_byname(struct net *net,

static inline struct sock *unix_find_socket_byname(struct net *net,
struct sockaddr_un *sunname,
- int len, int type,
- unsigned int hash)
+ int len, unsigned int hash)
{
struct sock *s;

spin_lock(&unix_table_lock);
- s = __unix_find_socket_byname(net, sunname, len, type, hash);
+ s = __unix_find_socket_byname(net, sunname, len, hash);
if (s)
sock_hold(s);
spin_unlock(&unix_table_lock);
@@ -900,12 +899,12 @@ static int unix_autobind(struct socket *sock)
retry:
addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
+ addr->hash ^= sk->sk_type;

spin_lock(&unix_table_lock);
ordernum = (ordernum+1)&0xFFFFF;

- if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
- addr->hash)) {
+ if (__unix_find_socket_byname(net, addr->name, addr->len, addr->hash)) {
spin_unlock(&unix_table_lock);
/*
* __unix_find_socket_byname() may take long time if many names
@@ -920,7 +919,6 @@ static int unix_autobind(struct socket *sock)
}
goto retry;
}
- addr->hash ^= sk->sk_type;

__unix_set_addr(sk, addr, addr->hash);
err = 0;
@@ -966,7 +964,7 @@ static struct sock *unix_find_other(struct net *net,
}
} else {
err = -ECONNREFUSED;
- u = unix_find_socket_byname(net, sunname, len, type, hash);
+ u = unix_find_socket_byname(net, sunname, len, type ^ hash);
if (u) {
struct dentry *dentry;
dentry = unix_sk(u)->path.dentry;
@@ -1036,8 +1034,7 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
return err == -EEXIST ? -EADDRINUSE : err;
}

-static int unix_bind_abstract(struct sock *sk, unsigned hash,
- struct unix_address *addr)
+static int unix_bind_abstract(struct sock *sk, struct unix_address *addr)
{
struct unix_sock *u = unix_sk(sk);
int err;
@@ -1053,7 +1050,7 @@ static int unix_bind_abstract(struct sock *sk, unsigned hash,

spin_lock(&unix_table_lock);
if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
- sk->sk_type, hash)) {
+ addr->hash)) {
spin_unlock(&unix_table_lock);
mutex_unlock(&u->bindlock);
return -EADDRINUSE;
@@ -1095,7 +1092,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (sun_path[0])
err = unix_bind_bsd(sk, addr);
else
- err = unix_bind_abstract(sk, hash, addr);
+ err = unix_bind_abstract(sk, addr);
if (err)
unix_release_addr(addr);
return err;
--
2.11.0

2021-02-20 19:17:31

[permalink] [raw]

Subject: Re: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

On Thu, Feb 18, 2021 at 8:22 PM Al Viro <[email protected]> wrote:
>
> Duplicated logics in all bind variants (autobind, bind-to-path,
> bind-to-abstract) gets taken into a common helper.
>
> Signed-off-by: Al Viro <[email protected]>
> ---
> net/unix/af_unix.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 41c3303c3357..179b4fe837e6 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -262,6 +262,16 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk)
> sk_add_node(sk, list);
> }
>
> +static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
> + unsigned hash)
> + __releases(&unix_table_lock)
> +{
> + __unix_remove_socket(sk);
> + smp_store_release(&unix_sk(sk)->addr, addr);
> + __unix_insert_socket(&unix_socket_table[hash], sk);
> + spin_unlock(&unix_table_lock);

Please take the unlock out, it is clearly an anti-pattern.

And please Cc netdev for networking changes.

Thanks.

2021-02-20 19:35:18

[permalink] [raw]

Subject: Re: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

On Sat, Feb 20, 2021 at 11:12:33AM -0800, Cong Wang wrote:
> On Thu, Feb 18, 2021 at 8:22 PM Al Viro <[email protected]> wrote:
> >
> > Duplicated logics in all bind variants (autobind, bind-to-path,
> > bind-to-abstract) gets taken into a common helper.
> >
> > Signed-off-by: Al Viro <[email protected]>
> > ---
> > net/unix/af_unix.c | 30 +++++++++++++++---------------
> > 1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index 41c3303c3357..179b4fe837e6 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -262,6 +262,16 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk)
> > sk_add_node(sk, list);
> > }
> >
> > +static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
> > + unsigned hash)
> > + __releases(&unix_table_lock)
> > +{
> > + __unix_remove_socket(sk);
> > + smp_store_release(&unix_sk(sk)->addr, addr);
> > + __unix_insert_socket(&unix_socket_table[hash], sk);
> > + spin_unlock(&unix_table_lock);
>
> Please take the unlock out, it is clearly an anti-pattern.

Why? "Insert into locked and unlock" is fairly common...

> And please Cc netdev for networking changes.

Sorry about that - I'd ended up emulating git send-email by hand
and missed the Cc. Sorted out by now...

2021-02-20 20:34:30

[permalink] [raw]

Subject: Re: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

On Sat, Feb 20, 2021 at 11:32 AM Al Viro <[email protected]> wrote:
>
> On Sat, Feb 20, 2021 at 11:12:33AM -0800, Cong Wang wrote:
> > On Thu, Feb 18, 2021 at 8:22 PM Al Viro <[email protected]> wrote:
> > >
> > > Duplicated logics in all bind variants (autobind, bind-to-path,
> > > bind-to-abstract) gets taken into a common helper.
> > >
> > > Signed-off-by: Al Viro <[email protected]>
> > > ---
> > > net/unix/af_unix.c | 30 +++++++++++++++---------------
> > > 1 file changed, 15 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > > index 41c3303c3357..179b4fe837e6 100644
> > > --- a/net/unix/af_unix.c
> > > +++ b/net/unix/af_unix.c
> > > @@ -262,6 +262,16 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk)
> > > sk_add_node(sk, list);
> > > }
> > >
> > > +static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
> > > + unsigned hash)
> > > + __releases(&unix_table_lock)
> > > +{
> > > + __unix_remove_socket(sk);
> > > + smp_store_release(&unix_sk(sk)->addr, addr);
> > > + __unix_insert_socket(&unix_socket_table[hash], sk);
> > > + spin_unlock(&unix_table_lock);
> >
> > Please take the unlock out, it is clearly an anti-pattern.
>
> Why? "Insert into locked and unlock" is fairly common...

Because it does not lock the lock, just compare:

lock();
__unix_set_addr();
unlock();

to:

lock();
__unix_set_addr();

Clearly the former is more readable and less error-prone. Even
if you really want to do unlock, pick a name which explicitly says
it, for example, __unix_set_addr_unlock().

Thanks.

2021-02-20 21:16:58

[permalink] [raw]

Subject: Re: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

On Sat, Feb 20, 2021 at 12:31:49PM -0800, Cong Wang wrote:
> Because it does not lock the lock, just compare:
>
> lock();
> __unix_set_addr();
> unlock();
>
> to:
>
> lock();
> __unix_set_addr();
>
> Clearly the former is more readable and less error-prone. Even
> if you really want to do unlock, pick a name which explicitly says
> it, for example, __unix_set_addr_unlock().

*shrug*

If anything, __unix_complete_bind() might make a better name for that,
with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
strong preferences - might as well leave dropping the lock to caller.

I'll post that series to netdev tonight.

2021-02-22 19:09:23

[permalink] [raw]

Subject: [PATCHSET] making unix_bind() undo mknod on failure

On Sat, Feb 20, 2021 at 09:08:56PM +0000, Al Viro wrote:

> *shrug*
>
> If anything, __unix_complete_bind() might make a better name for that,
> with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
> strong preferences - might as well leave dropping the lock to caller.
>
> I'll post that series to netdev tonight.

Took longer than I hoped... Anyway, here's the current variant;
it's 5.11-based, lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.af_unix

Shortlog:
Al Viro (8):
af_unix: take address assignment/hash insertion into a new helper
unix_bind(): allocate addr earlier
unix_bind(): separate BSD and abstract cases
unix_bind(): take BSD and abstract address cases into new helpers
fold unix_mknod() into unix_bind_bsd()
unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
unix_bind_bsd(): unlink if we fail after successful mknod
__unix_find_socket_byname(): don't pass hash and type separately

Diffstat:
net/unix/af_unix.c | 186 +++++++++++++++++++++++++++--------------------------
1 file changed, 94 insertions(+), 92 deletions(-)

The actual fix is in #7/8, the first 6 are massage in preparation to that
and #8/8 is a minor followup cleanup. Individual patches in followups.
Please, review.

2021-02-22 19:20:13

[permalink] [raw]

Subject: Re: [PATCHSET] making unix_bind() undo mknod on failure

On Mon, Feb 22, 2021 at 07:06:00PM +0000, Al Viro wrote:
> On Sat, Feb 20, 2021 at 09:08:56PM +0000, Al Viro wrote:
>
> > *shrug*
> >
> > If anything, __unix_complete_bind() might make a better name for that,
> > with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
> > strong preferences - might as well leave dropping the lock to caller.
> >
> > I'll post that series to netdev tonight.
>
> Took longer than I hoped... Anyway, here's the current variant;
> it's 5.11-based, lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.af_unix
>
> Shortlog:
> Al Viro (8):
> af_unix: take address assignment/hash insertion into a new helper
> unix_bind(): allocate addr earlier
> unix_bind(): separate BSD and abstract cases
> unix_bind(): take BSD and abstract address cases into new helpers
> fold unix_mknod() into unix_bind_bsd()
> unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
> unix_bind_bsd(): unlink if we fail after successful mknod
> __unix_find_socket_byname(): don't pass hash and type separately
>
> Diffstat:
> net/unix/af_unix.c | 186 +++++++++++++++++++++++++++--------------------------
> 1 file changed, 94 insertions(+), 92 deletions(-)
>
> The actual fix is in #7/8, the first 6 are massage in preparation to that
> and #8/8 is a minor followup cleanup. Individual patches in followups.
> Please, review.

Argh... git send-email is playing silly buggers again ;-/

2021-02-22 19:21:06

[permalink] [raw]

Subject: [PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

Duplicated logics in all bind variants (autobind, bind-to-path,
bind-to-abstract) gets taken into a common helper.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..45a40cf7b6af 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -262,6 +262,14 @@ static void __unix_insert_socket(struct hlist_head *list, struct sock *sk)
sk_add_node(sk, list);
}

+static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
+ unsigned hash)
+{
+ __unix_remove_socket(sk);
+ smp_store_release(&unix_sk(sk)->addr, addr);
+ __unix_insert_socket(&unix_socket_table[hash], sk);
+}
+
static inline void unix_remove_socket(struct sock *sk)
{
spin_lock(&unix_table_lock);
@@ -912,9 +920,7 @@ static int unix_autobind(struct socket *sock)
}
addr->hash ^= sk->sk_type;

- __unix_remove_socket(sk);
- smp_store_release(&u->addr, addr);
- __unix_insert_socket(&unix_socket_table[addr->hash], sk);
+ __unix_set_addr(sk, addr, addr->hash);
spin_unlock(&unix_table_lock);
err = 0;

@@ -1016,7 +1022,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
int err;
unsigned int hash;
struct unix_address *addr;
- struct hlist_head *list;
struct path path = { };

err = -EINVAL;
@@ -1068,25 +1073,20 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(&unix_table_lock);
u->path = path;
- list = &unix_socket_table[hash];
} else {
spin_lock(&unix_table_lock);
err = -EADDRINUSE;
if (__unix_find_socket_byname(net, sunaddr, addr_len,
sk->sk_type, hash)) {
+ spin_unlock(&unix_table_lock);
unix_release_addr(addr);
- goto out_unlock;
+ goto out_up;
}
-
- list = &unix_socket_table[addr->hash];
+ hash = addr->hash;
}

err = 0;
- __unix_remove_socket(sk);
- smp_store_release(&u->addr, addr);
- __unix_insert_socket(list, sk);
-
-out_unlock:
+ __unix_set_addr(sk, addr, hash);
spin_unlock(&unix_table_lock);
out_up:
mutex_unlock(&u->bindlock);
--
2.11.0

2021-02-22 19:21:57

[permalink] [raw]

Subject: [PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately

We only care about exclusive or of those, so pass that directly.
Makes life simpler for callers as well...

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8bbdcddbf598..3c1218be7165 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -286,11 +286,11 @@ static inline void unix_insert_socket(struct hlist_head *list, struct sock *sk)

static struct sock *__unix_find_socket_byname(struct net *net,
struct sockaddr_un *sunname,
- int len, int type, unsigned int hash)
+ int len, unsigned int hash)
{
struct sock *s;

- sk_for_each(s, &unix_socket_table[hash ^ type]) {
+ sk_for_each(s, &unix_socket_table[hash]) {
struct unix_sock *u = unix_sk(s);

if (!net_eq(sock_net(s), net))
@@ -305,13 +305,12 @@ static struct sock *__unix_find_socket_byname(struct net *net,

static inline struct sock *unix_find_socket_byname(struct net *net,
struct sockaddr_un *sunname,
- int len, int type,
- unsigned int hash)
+ int len, unsigned int hash)
{
struct sock *s;

spin_lock(&unix_table_lock);
- s = __unix_find_socket_byname(net, sunname, len, type, hash);
+ s = __unix_find_socket_byname(net, sunname, len, hash);
if (s)
sock_hold(s);
spin_unlock(&unix_table_lock);
@@ -898,12 +897,12 @@ static int unix_autobind(struct socket *sock)
retry:
addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
+ addr->hash ^= sk->sk_type;

spin_lock(&unix_table_lock);
ordernum = (ordernum+1)&0xFFFFF;

- if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
- addr->hash)) {
+ if (__unix_find_socket_byname(net, addr->name, addr->len, addr->hash)) {
spin_unlock(&unix_table_lock);
/*
* __unix_find_socket_byname() may take long time if many names
@@ -918,7 +917,6 @@ static int unix_autobind(struct socket *sock)
}
goto retry;
}
- addr->hash ^= sk->sk_type;

__unix_set_addr(sk, addr, addr->hash);
spin_unlock(&unix_table_lock);
@@ -965,7 +963,7 @@ static struct sock *unix_find_other(struct net *net,
}
} else {
err = -ECONNREFUSED;
- u = unix_find_socket_byname(net, sunname, len, type, hash);
+ u = unix_find_socket_byname(net, sunname, len, type ^ hash);
if (u) {
struct dentry *dentry;
dentry = unix_sk(u)->path.dentry;
@@ -1036,8 +1034,7 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
return err;
}

-static int unix_bind_abstract(struct sock *sk, unsigned hash,
- struct unix_address *addr)
+static int unix_bind_abstract(struct sock *sk, struct unix_address *addr)
{
struct unix_sock *u = unix_sk(sk);
int err;
@@ -1053,7 +1050,7 @@ static int unix_bind_abstract(struct sock *sk, unsigned hash,

spin_lock(&unix_table_lock);
if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
- sk->sk_type, hash)) {
+ addr->hash)) {
spin_unlock(&unix_table_lock);
mutex_unlock(&u->bindlock);
return -EADDRINUSE;
@@ -1096,7 +1093,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (sun_path[0])
err = unix_bind_bsd(sk, addr);
else
- err = unix_bind_abstract(sk, hash, addr);
+ err = unix_bind_abstract(sk, addr);
if (err)
unix_release_addr(addr);
return err == -EEXIST ? -EADDRINUSE : err;
--
2.11.0

2021-02-22 19:22:34

[permalink] [raw]

Subject: [PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers

unix_bind_bsd() and unix_bind_abstract() respectively.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 147 +++++++++++++++++++++++++++--------------------------
1 file changed, 74 insertions(+), 73 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 496b069c99fe..56443f05ed9d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1012,104 +1012,105 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
return err;
}

+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
+{
+ struct unix_sock *u = unix_sk(sk);
+ struct path path = { };
+ umode_t mode = S_IFSOCK |
+ (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+ unsigned int hash;
+ int err;
+
+ err = unix_mknod(addr->name->sun_path, mode, &path);
+ if (err)
+ return err;
+
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err) {
+ path_put(&path);
+ return err;
+ }
+
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ path_put(&path);
+ return -EINVAL;
+ }
+
+ addr->hash = UNIX_HASH_SIZE;
+ hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+ spin_lock(&unix_table_lock);
+ u->path = path;
+ __unix_set_addr(sk, addr, hash);
+ spin_unlock(&unix_table_lock);
+ mutex_unlock(&u->bindlock);
+ return 0;
+}
+
+static int unix_bind_abstract(struct sock *sk, unsigned hash,
+ struct unix_address *addr)
+{
+ struct unix_sock *u = unix_sk(sk);
+ int err;
+
+ err = mutex_lock_interruptible(&u->bindlock);
+ if (err)
+ return err;
+
+ if (u->addr) {
+ mutex_unlock(&u->bindlock);
+ return -EINVAL;
+ }
+
+ spin_lock(&unix_table_lock);
+ if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
+ sk->sk_type, hash)) {
+ spin_unlock(&unix_table_lock);
+ mutex_unlock(&u->bindlock);
+ return -EADDRINUSE;
+ }
+ __unix_set_addr(sk, addr, addr->hash);
+ spin_unlock(&unix_table_lock);
+ mutex_unlock(&u->bindlock);
+ return 0;
+}
+
static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{
struct sock *sk = sock->sk;
- struct net *net = sock_net(sk);
- struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
char *sun_path = sunaddr->sun_path;
int err;
unsigned int hash;
struct unix_address *addr;

- err = -EINVAL;
if (addr_len < offsetofend(struct sockaddr_un, sun_family) ||
sunaddr->sun_family != AF_UNIX)
- goto out;
+ return -EINVAL;

- if (addr_len == sizeof(short)) {
- err = unix_autobind(sock);
- goto out;
- }
+ if (addr_len == sizeof(short))
+ return unix_autobind(sock);

err = unix_mkname(sunaddr, addr_len, &hash);
if (err < 0)
- goto out;
+ return err;
addr_len = err;
- err = -ENOMEM;
addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
if (!addr)
- goto out;
+ return -ENOMEM;

memcpy(addr->name, sunaddr, addr_len);
addr->len = addr_len;
addr->hash = hash ^ sk->sk_type;
refcount_set(&addr->refcnt, 1);

- if (sun_path[0]) {
- struct path path = { };
- umode_t mode = S_IFSOCK |
- (SOCK_INODE(sock)->i_mode & ~current_umask());
- err = unix_mknod(sun_path, mode, &path);
- if (err) {
- if (err == -EEXIST)
- err = -EADDRINUSE;
- goto out_addr;
- }
-
- err = mutex_lock_interruptible(&u->bindlock);
- if (err) {
- path_put(&path);
- goto out_addr;
- }
-
- err = -EINVAL;
- if (u->addr) {
- mutex_unlock(&u->bindlock);
- path_put(&path);
- goto out_addr;
- }
-
- addr->hash = UNIX_HASH_SIZE;
- hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
- spin_lock(&unix_table_lock);
- u->path = path;
- __unix_set_addr(sk, addr, hash);
- spin_unlock(&unix_table_lock);
- mutex_unlock(&u->bindlock);
- addr = NULL;
- err = 0;
- } else {
- err = mutex_lock_interruptible(&u->bindlock);
- if (err)
- goto out_addr;
-
- err = -EINVAL;
- if (u->addr) {
- mutex_unlock(&u->bindlock);
- goto out_addr;
- }
-
- spin_lock(&unix_table_lock);
- err = -EADDRINUSE;
- if (__unix_find_socket_byname(net, sunaddr, addr_len,
- sk->sk_type, hash)) {
- spin_unlock(&unix_table_lock);
- mutex_unlock(&u->bindlock);
- goto out_addr;
- }
- __unix_set_addr(sk, addr, addr->hash);
- spin_unlock(&unix_table_lock);
- mutex_unlock(&u->bindlock);
- addr = NULL;
- err = 0;
- }
-out_addr:
- if (addr)
+ if (sun_path[0])
+ err = unix_bind_bsd(sk, addr);
+ else
+ err = unix_bind_abstract(sk, hash, addr);
+ if (err)
unix_release_addr(addr);
-out:
- return err;
+ return err == -EEXIST ? -EADDRINUSE : err;
}

static void unix_state_double_lock(struct sock *sk1, struct sock *sk2)
--
2.11.0

2021-02-22 19:29:41

[permalink] [raw]

Subject: Re: [PATCHSET] making unix_bind() undo mknod on failure

On Mon, Feb 22, 2021 at 07:12:29PM +0000, Al Viro wrote:
> On Mon, Feb 22, 2021 at 07:06:00PM +0000, Al Viro wrote:
> > On Sat, Feb 20, 2021 at 09:08:56PM +0000, Al Viro wrote:
> >
> > > *shrug*
> > >
> > > If anything, __unix_complete_bind() might make a better name for that,
> > > with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
> > > strong preferences - might as well leave dropping the lock to caller.
> > >
> > > I'll post that series to netdev tonight.
> >
> > Took longer than I hoped... Anyway, here's the current variant;
> > it's 5.11-based, lives in
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.af_unix
> >
> > Shortlog:
> > Al Viro (8):
> > af_unix: take address assignment/hash insertion into a new helper
> > unix_bind(): allocate addr earlier
> > unix_bind(): separate BSD and abstract cases
> > unix_bind(): take BSD and abstract address cases into new helpers
> > fold unix_mknod() into unix_bind_bsd()
> > unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
> > unix_bind_bsd(): unlink if we fail after successful mknod
> > __unix_find_socket_byname(): don't pass hash and type separately
> >
> > Diffstat:
> > net/unix/af_unix.c | 186 +++++++++++++++++++++++++++--------------------------
> > 1 file changed, 94 insertions(+), 92 deletions(-)
> >
> > The actual fix is in #7/8, the first 6 are massage in preparation to that
> > and #8/8 is a minor followup cleanup. Individual patches in followups.
> > Please, review.
>
> Argh... git send-email is playing silly buggers again ;-/

Cute - looks like having EMAIL in environment confuses the living hell out
git-send-email. Oh, well...

2021-02-22 20:54:12

[permalink] [raw]

Subject: [PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock

Final preparations for doing unlink on failure past the successful
mknod. We can't hold ->bindlock over ->mknod() or ->unlink(), since
either might do sb_start_write() (e.g. on overlayfs). However, we
can do it while holding filesystem and VFS locks - doing
kern_path_create()
vfs_mknod()
grab ->bindlock
if u->addr had been set
drop ->bindlock
done_path_create
return -EINVAL
else
assign the address to socket
drop ->bindlock
done_path_create
return 0
would be deadlock-free. Here we massage unix_bind_bsd() to that
form. We are still doing equivalent transformations.

Next commit will *not* be an equivalent transformation - it will
add a call of vfs_unlink() before done_path_create() in "alread bound"
case.

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5e04e16e6b88..368376621111 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -988,7 +988,7 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
struct unix_sock *u = unix_sk(sk);
umode_t mode = S_IFSOCK |
(SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
- struct path parent, path;
+ struct path parent;
struct dentry *dentry;
unsigned int hash;
int err;
@@ -1005,36 +1005,32 @@ static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
* All right, let's create it.
*/
err = security_path_mknod(&parent, dentry, mode, 0);
- if (!err) {
+ if (!err)
err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
- if (!err) {
- path.mnt = mntget(parent.mnt);
- path.dentry = dget(dentry);
- }
- }
- done_path_create(&parent, dentry);
- if (err)
+ if (err) {
+ done_path_create(&parent, dentry);
return err;
-
+ }
err = mutex_lock_interruptible(&u->bindlock);
if (err) {
- path_put(&path);
+ done_path_create(&parent, dentry);
return err;
}
-
if (u->addr) {
mutex_unlock(&u->bindlock);
- path_put(&path);
+ done_path_create(&parent, dentry);
return -EINVAL;
}

addr->hash = UNIX_HASH_SIZE;
- hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+ hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(&unix_table_lock);
- u->path = path;
+ u->path.mnt = mntget(parent.mnt);
+ u->path.dentry = dget(dentry);
__unix_set_addr(sk, addr, hash);
spin_unlock(&unix_table_lock);
mutex_unlock(&u->bindlock);
+ done_path_create(&parent, dentry);
return 0;
}

--
2.11.0

2021-02-22 20:56:31

[permalink] [raw]

Subject: [PATCH 5/8] fold unix_mknod() into unix_bind_bsd()

Signed-off-by: Al Viro <[email protected]>
---
net/unix/af_unix.c | 39 +++++++++++++++------------------------
1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 56443f05ed9d..5e04e16e6b88 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -983,45 +983,36 @@ static struct sock *unix_find_other(struct net *net,
return NULL;
}

-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
{
+ struct unix_sock *u = unix_sk(sk);
+ umode_t mode = S_IFSOCK |
+ (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+ struct path parent, path;
struct dentry *dentry;
- struct path path;
- int err = 0;
+ unsigned int hash;
+ int err;
+
/*
* Get the parent directory, calculate the hash for last
* component.
*/
- dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
- err = PTR_ERR(dentry);
+ dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0);
if (IS_ERR(dentry))
- return err;
+ return PTR_ERR(dentry);

/*
* All right, let's create it.
*/
- err = security_path_mknod(&path, dentry, mode, 0);
+ err = security_path_mknod(&parent, dentry, mode, 0);
if (!err) {
- err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+ err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
if (!err) {
- res->mnt = mntget(path.mnt);
- res->dentry = dget(dentry);
+ path.mnt = mntget(parent.mnt);
+ path.dentry = dget(dentry);
}
}
- done_path_create(&path, dentry);
- return err;
-}
-
-static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
-{
- struct unix_sock *u = unix_sk(sk);
- struct path path = { };
- umode_t mode = S_IFSOCK |
- (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
- unsigned int hash;
- int err;
-
- err = unix_mknod(addr->name->sun_path, mode, &path);
+ done_path_create(&parent, dentry);
if (err)
return err;

--
2.11.0

2021-02-24 01:15:14

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCHSET] making unix_bind() undo mknod on failure

On Mon, 22 Feb 2021 19:06:00 +0000 Al Viro wrote:
> On Sat, Feb 20, 2021 at 09:08:56PM +0000, Al Viro wrote:
>
> > *shrug*
> >
> > If anything, __unix_complete_bind() might make a better name for that,
> > with dropping ->bindlock also pulled in, but TBH I don't have sufficiently
> > strong preferences - might as well leave dropping the lock to caller.
> >
> > I'll post that series to netdev tonight.
>
> Took longer than I hoped... Anyway, here's the current variant;
> it's 5.11-based, lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.af_unix
>
> Shortlog:
> Al Viro (8):
> af_unix: take address assignment/hash insertion into a new helper
> unix_bind(): allocate addr earlier
> unix_bind(): separate BSD and abstract cases
> unix_bind(): take BSD and abstract address cases into new helpers
> fold unix_mknod() into unix_bind_bsd()
> unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
> unix_bind_bsd(): unlink if we fail after successful mknod
> __unix_find_socket_byname(): don't pass hash and type separately
>
> Diffstat:
> net/unix/af_unix.c | 186 +++++++++++++++++++++++++++--------------------------
> 1 file changed, 94 insertions(+), 92 deletions(-)
>
> The actual fix is in #7/8, the first 6 are massage in preparation to that
> and #8/8 is a minor followup cleanup. Individual patches in followups.

Dave is out this week, but this looks good to me. You said "please
review" - I'm assuming you'll send these to Linus yourself, so:

Acked-by: Jakub Kicinski <[email protected]>