Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
sockmap and sockhash. O_CLOEXEC is enforced on all fds.
Without this, it's difficult to resize or otherwise rebuild existing
sockmap or sockhashes.
Suggested-by: Jakub Sitnicki <[email protected]>
Signed-off-by: Lorenz Bauer <[email protected]>
---
net/core/sock_map.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 03e04426cd21..3228936aa31e 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
void *value)
{
+ struct file *file;
+ int fd;
+
switch (map->value_size) {
case sizeof(u64):
sock_gen_cookie(sk);
*(u64 *)value = atomic64_read(&sk->sk_cookie);
return 0;
+ case sizeof(u32):
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (unlikely(fd < 0))
+ return fd;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ file = get_file(sk->sk_socket->file);
+ read_unlock_bh(&sk->sk_callback_lock);
+
+ fd_install(fd, file);
+ *(u32 *)value = fd;
+ return 0;
+
default:
return -ENOSPC;
}
--
2.20.1
Lorenz Bauer wrote:
> Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> sockmap and sockhash. O_CLOEXEC is enforced on all fds.
>
> Without this, it's difficult to resize or otherwise rebuild existing
> sockmap or sockhashes.
>
> Suggested-by: Jakub Sitnicki <[email protected]>
> Signed-off-by: Lorenz Bauer <[email protected]>
> ---
> net/core/sock_map.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 03e04426cd21..3228936aa31e 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
> static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
> void *value)
> {
> + struct file *file;
> + int fd;
> +
> switch (map->value_size) {
> case sizeof(u64):
> sock_gen_cookie(sk);
> *(u64 *)value = atomic64_read(&sk->sk_cookie);
> return 0;
>
> + case sizeof(u32):
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> +
> + fd = get_unused_fd_flags(O_CLOEXEC);
> + if (unlikely(fd < 0))
> + return fd;
> +
> + read_lock_bh(&sk->sk_callback_lock);
> + file = get_file(sk->sk_socket->file);
> + read_unlock_bh(&sk->sk_callback_lock);
> +
> + fd_install(fd, file);
> + *(u32 *)value = fd;
> + return 0;
> +
Hi Lorenz, Can you say something about what happens if the sk
is deleted from the map or the sock is closed/unhashed ideally
in the commit message so we have it for later reference. I guess
because we are in an rcu block here the sk will be OK and psock
reference will exist until after the rcu block at least because
of call_rcu(). If the psock is destroyed from another path then
the fd will still point at the sock. correct?
Thanks.
On Wed, 11 Mar 2020 at 23:27, John Fastabend <[email protected]> wrote:
>
> Lorenz Bauer wrote:
> > Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> > sockmap and sockhash. O_CLOEXEC is enforced on all fds.
> >
> > Without this, it's difficult to resize or otherwise rebuild existing
> > sockmap or sockhashes.
> >
> > Suggested-by: Jakub Sitnicki <[email protected]>
> > Signed-off-by: Lorenz Bauer <[email protected]>
> > ---
> > net/core/sock_map.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 03e04426cd21..3228936aa31e 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
> > static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
> > void *value)
> > {
> > + struct file *file;
> > + int fd;
> > +
> > switch (map->value_size) {
> > case sizeof(u64):
> > sock_gen_cookie(sk);
> > *(u64 *)value = atomic64_read(&sk->sk_cookie);
> > return 0;
> >
> > + case sizeof(u32):
> > + if (!capable(CAP_NET_ADMIN))
> > + return -EPERM;
> > +
> > + fd = get_unused_fd_flags(O_CLOEXEC);
> > + if (unlikely(fd < 0))
> > + return fd;
> > +
> > + read_lock_bh(&sk->sk_callback_lock);
> > + file = get_file(sk->sk_socket->file);
> > + read_unlock_bh(&sk->sk_callback_lock);
> > +
> > + fd_install(fd, file);
> > + *(u32 *)value = fd;
> > + return 0;
> > +
>
> Hi Lorenz, Can you say something about what happens if the sk
> is deleted from the map or the sock is closed/unhashed ideally
> in the commit message so we have it for later reference. I guess
> because we are in an rcu block here the sk will be OK and psock
> reference will exist until after the rcu block at least because
> of call_rcu(). If the psock is destroyed from another path then
> the fd will still point at the sock. correct?
This is how I understand it:
* sk is protected by rcu_read_lock (as you point out)
* sk->sk_callback_lock protects against sk->sk_socket being
modified by sock_orphan, sock_graft, etc. via sk_set_socket
* get_file increments the refcount on the file
I'm not sure how the psock figures into this, maybe you can
elaborate a little?
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
http://www.cloudflare.com
On Tue, Mar 10, 2020 at 06:47 PM CET, Lorenz Bauer wrote:
> Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> sockmap and sockhash. O_CLOEXEC is enforced on all fds.
>
> Without this, it's difficult to resize or otherwise rebuild existing
> sockmap or sockhashes.
>
> Suggested-by: Jakub Sitnicki <[email protected]>
> Signed-off-by: Lorenz Bauer <[email protected]>
> ---
> net/core/sock_map.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 03e04426cd21..3228936aa31e 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
> static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
> void *value)
> {
> + struct file *file;
> + int fd;
> +
> switch (map->value_size) {
> case sizeof(u64):
> sock_gen_cookie(sk);
> *(u64 *)value = atomic64_read(&sk->sk_cookie);
> return 0;
>
> + case sizeof(u32):
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> +
> + fd = get_unused_fd_flags(O_CLOEXEC);
> + if (unlikely(fd < 0))
> + return fd;
> +
> + read_lock_bh(&sk->sk_callback_lock);
> + file = get_file(sk->sk_socket->file);
I think this deserves a second look.
We don't lock the sock, so what if tcp_close orphans it before we enter
this critical section? Looks like sk->sk_socket might be NULL.
I'd find a test that tries to trigger the race helpful, like:
thread A: loop in lookup FD from map
thread B: loop in insert FD into map, close FD
> + read_unlock_bh(&sk->sk_callback_lock);
> +
> + fd_install(fd, file);
> + *(u32 *)value = fd;
> + return 0;
> +
> default:
> return -ENOSPC;
> }
Jakub Sitnicki wrote:
> On Tue, Mar 10, 2020 at 06:47 PM CET, Lorenz Bauer wrote:
> > Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> > sockmap and sockhash. O_CLOEXEC is enforced on all fds.
> >
> > Without this, it's difficult to resize or otherwise rebuild existing
> > sockmap or sockhashes.
> >
> > Suggested-by: Jakub Sitnicki <[email protected]>
> > Signed-off-by: Lorenz Bauer <[email protected]>
> > ---
> > net/core/sock_map.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 03e04426cd21..3228936aa31e 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
> > static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
> > void *value)
> > {
> > + struct file *file;
> > + int fd;
> > +
> > switch (map->value_size) {
> > case sizeof(u64):
> > sock_gen_cookie(sk);
> > *(u64 *)value = atomic64_read(&sk->sk_cookie);
> > return 0;
> >
> > + case sizeof(u32):
> > + if (!capable(CAP_NET_ADMIN))
> > + return -EPERM;
> > +
> > + fd = get_unused_fd_flags(O_CLOEXEC);
> > + if (unlikely(fd < 0))
> > + return fd;
> > +
> > + read_lock_bh(&sk->sk_callback_lock);
> > + file = get_file(sk->sk_socket->file);
>
> I think this deserves a second look.
>
> We don't lock the sock, so what if tcp_close orphans it before we enter
> this critical section? Looks like sk->sk_socket might be NULL.
>
> I'd find a test that tries to trigger the race helpful, like:
>
> thread A: loop in lookup FD from map
> thread B: loop in insert FD into map, close FD
Agreed, this was essentially my question above as well.
When the psock is created we call sock_hold() and will only do a sock_put()
after an rcu grace period when its removed. So at least if you have the
sock here it should have a sk_refcnt. (Note the user data is set to NULL
so if you do reference psock you need to check its non-null.)
Is that enough to ensure sk_socket? Seems not to me, tcp_close for example
will still happen and call sock_orphan(sk) based on my admittddly quick
look.
Further, even if you do check sk->sk_socket is non-null what does it mean
to return a file with a socket that is closed, deleted from the sock_map
and psock removed? At this point is it just a dangling reference?
Still a bit confused as well what would or should happen when the sock is closed
after you have the file reference? I could probably dig up what exactly
would happen but I think we need it in the commiit message so we understand
it. I also didn't dig up the details here but if the receiver of the
fd crashes or otherwise disappears this hopefully all get cleaned up?
>
> > + read_unlock_bh(&sk->sk_callback_lock);
> > +
> > + fd_install(fd, file);
> > + *(u32 *)value = fd;
> > + return 0;
> > +
> > default:
> > return -ENOSPC;
> > }