Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp682690ybi; Fri, 31 May 2019 07:26:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqyh5B0ViCcsW2zdM8Gd6P9JgzQCSReY5F6rZu9qToC23IfTyWn+lqxeIiupRBn2320AgXpz X-Received: by 2002:a17:90a:2ec5:: with SMTP id h5mr9531848pjs.93.1559312788908; Fri, 31 May 2019 07:26:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559312788; cv=none; d=google.com; s=arc-20160816; b=QgGqtQOBzmd+xvRbT3o4Rnq0Z/v64yncTJycfJ/vrTFBaRDoXybYQVyySTY9lDwHrf ZfWtLUJB8wuNRuJ8+/AG9OaQHHUZotYfqyWJrGjKoL57zZm6/EnJ88Hjswt3/01qrgg9 YmWcg2yr0wo9wVITagc4ayrEdSrPpORFaMiC9dR+D96ghXmopiuq+PliE+v64nO9FGsT CA6G7Cf98xcfXVKrBWxvLNWeNn4fEnQgNbcqlNlQxMNV5kSfH1Oa915UC5M2r9aTlTsR rfealUsH37lMVOS0EBFNdG96mQXd0HGINRwPV6x7K8B98dcZplKgZhStdyQjWf8Cd2+m B0BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=QIFTqlspE/r0DEruCD4DATm01uaqKkNSOTeIdqXCCHU=; b=yj4u5xboYUAE4TCQV7Ki9Y/AicTesE8Unr9nZvSeehEbe4ZeMa6dhqgche3z6Kidtw yobhb+Q/5HBUrssd1DTgLW2no1UE/1dkYfTqtaua08xVApAoTP6175uJXiTNRZWmfrqU euZg5XFV2wPEuYjuqftehLN46b0d4HAO7vB01fIuF3kEq/SPDmgEuONQ6kRBeJJbsPnM tJ1K+XW90qlauVuqEX/tA47pl6smXkxWj65Qu8Ez38qvT75Cu/8whDQp54pXQZAwRqIi 2n+lworYyGW44Qe1lzlgmse5YeuFx6I672Pf6QPdUVe7blmjniEeYOaooYlY1R38CZjZ owmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kinvolk.io header.s=google header.b=CBQuXNNH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a11si6217102pgj.537.2019.05.31.07.26.11; Fri, 31 May 2019 07:26:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kinvolk.io header.s=google header.b=CBQuXNNH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726658AbfEaOYy (ORCPT + 99 others); Fri, 31 May 2019 10:24:54 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:45872 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726418AbfEaOYx (ORCPT ); Fri, 31 May 2019 10:24:53 -0400 Received: by mail-lf1-f68.google.com with SMTP id u10so1021630lfm.12 for ; Fri, 31 May 2019 07:24:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kinvolk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=QIFTqlspE/r0DEruCD4DATm01uaqKkNSOTeIdqXCCHU=; b=CBQuXNNHE2f4EuWoG9Ch3bMq8576ALClYblADDd6k9DGuFWma33U2JgUlnIe/vTy/a t81C/gpRw5DdID43oRPTQhhTYWwWvTntKIdU2OtZN4i44z/SGdxZ4wrGwpRJJOtcKQy9 wZLfzSmtfNku/dXjhg5JbxuTaUxql77Uwvhf4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=QIFTqlspE/r0DEruCD4DATm01uaqKkNSOTeIdqXCCHU=; b=SFVuCMuq5oEPA9aNcPpVm4woyZvJYcgzvOwqmww0QOR57r8Zy5mQ5RYUxdA2BX8V5h wgXHY3xxQO+nts6ea1jlU9i3XSrrIDsDCdkoWwpSOLCH+1Zw+MdUduN+3RerFhxuTYH7 SRhnKUjGux0Kjq72yamNOYTDP7PzOU5e6BsT20EXPIA/XWpVh1xfB6xkY+NHemhjp5aM De4kwZiGyqS8uUgGopTe5mtIDd9J/7MV9EARSr2JGNHchxAW+oKJTWhpolDIO3BD5XYB 4emR4Li/8Fsz9MznAWlOwcfFKrZA+NTFEPli0ctfyBTvB7TIw3z1NEyj+PH3hYU8TU9K tPeA== X-Gm-Message-State: APjAAAVAPBSrk407SXSj1rpgEIJAhfOGhnF4Mr4VTj+4/eG+D1dKMuC+ +QZ+Ql4KSxBVuIMPde5CNgS5kJ9Ro2UeYby8r69XgvJLEMKPzQ== X-Received: by 2002:ac2:4d17:: with SMTP id r23mr212585lfi.130.1559312690587; Fri, 31 May 2019 07:24:50 -0700 (PDT) MIME-Version: 1.0 References: <20190524155931.7946-1-iago@kinvolk.io> <20190524155931.7946-2-iago@kinvolk.io> In-Reply-To: From: =?UTF-8?Q?Iago_L=C3=B3pez_Galeiras?= Date: Fri, 31 May 2019 16:24:23 +0200 Message-ID: Subject: Re: [PATCH bpf-next v4 1/4] bpf: sock ops: add netns ino and dev in bpf context To: Y Song Cc: John Fastabend , Alexei Starovoitov , Daniel Borkmann , Alban Crequy , Krzesimir Nowak , bpf , netdev , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 24, 2019 at 7:48 PM Y Song wrote: > > On Fri, May 24, 2019 at 9:01 AM Iago L=C3=B3pez Galeiras wrote: > > > > From: Alban Crequy > > > > sockops programs can now access the network namespace inode and device > > via (struct bpf_sock_ops)->netns_ino and ->netns_dev. This can be usefu= l > > to apply different policies on different network namespaces. > > > > In the unlikely case where network namespaces are not compiled in > > (CONFIG_NET_NS=3Dn), the verifier will return netns_dev as usual and wi= ll > > return 0 for netns_ino. > > > > The generated BPF bytecode for netns_ino is loading the correct inode > > number at the time of execution. > > > > However, the generated BPF bytecode for netns_dev is loading an > > immediate value determined at BPF-load-time by looking at the initial > > network namespace. In practice, this works because all netns currently > > use the same virtual device. If this was to change, this code would nee= d > > to be updated too. > > > > Co-authored-by: Iago L=C3=B3pez Galeiras > > Signed-off-by: Alban Crequy > > Signed-off-by: Iago L=C3=B3pez Galeiras > > > > --- > > > > Changes since v1: > > - add netns_dev (review from Alexei) > > > > Changes since v2: > > - replace __u64 by u64 in kernel code (review from Y Song) > > - remove unneeded #else branch: program would be rejected in > > is_valid_access (review from Y Song) > > - allow partial reads ( > > > Changes since v3: > > - return netns_dev unconditionally and set netns_ino to 0 if > > CONFIG_NET_NS is not enabled (review from Jakub Kicinski) > > - use bpf_ctx_record_field_size and bpf_ctx_narrow_access_ok instead of > > manually deal with partial reads (review from Y Song) > > - update commit message to reflect new code and remove note about > > partial reads since it was discussed in the review > > - use bpf_ctx_range() and offsetofend() > > --- > > include/uapi/linux/bpf.h | 2 ++ > > net/core/filter.c | 70 ++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 72 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 63e0cf66f01a..e64066a09a5f 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -3261,6 +3261,8 @@ struct bpf_sock_ops { > > __u32 sk_txhash; > > __u64 bytes_received; > > __u64 bytes_acked; > > + __u64 netns_dev; > > + __u64 netns_ino; > > }; > > > > /* Definitions for bpf_sock_ops_cb_flags */ > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 55bfc941d17a..2b1552a8dd74 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -76,6 +76,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > /** > > * sk_filter_trim_cap - run a packet through a socket filter > > @@ -6822,6 +6824,18 @@ static bool sock_ops_is_valid_access(int off, in= t size, > > } > > } else { > > switch (off) { > > + case bpf_ctx_range(struct bpf_sock_ops, netns_dev): > > + if (off >=3D offsetofend(struct bpf_sock_ops, n= etns_dev)) > > + return false; > > This is not needed. > #define bpf_ctx_range(TYPE, MEMBER) > \ > offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1 > off should never be >=3D offsetofend(struct bpf_sock_ops, netns_dev). > Right, I'll remove it. > > + > > + bpf_ctx_record_field_size(info, sizeof(u64)); > > + if (!bpf_ctx_narrow_access_ok(off, size, sizeof= (u64))) > > + return false; > > + break; > > + case offsetof(struct bpf_sock_ops, netns_ino): > > + if (size !=3D sizeof(u64)) > > + return false; > > + break; > > case bpf_ctx_range_till(struct bpf_sock_ops, bytes_rece= ived, > > bytes_acked): > > if (size !=3D sizeof(__u64)) > > @@ -7739,6 +7753,11 @@ static u32 sock_addr_convert_ctx_access(enum bpf= _access_type type, > > return insn - insn_buf; > > } > > > > +static struct ns_common *sockops_netns_cb(void *private_data) > > +{ > > + return &init_net.ns; > > +} > > + > > static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, > > const struct bpf_insn *si, > > struct bpf_insn *insn_buf, > > @@ -7747,6 +7766,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_= access_type type, > > { > > struct bpf_insn *insn =3D insn_buf; > > int off; > > + struct inode *ns_inode; > > + struct path ns_path; > > + u64 netns_dev; > > + void *res; > > > > /* Helper macro for adding read access to tcp_sock or sock fields. */ > > #define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) = \ > > @@ -7993,6 +8016,53 @@ static u32 sock_ops_convert_ctx_access(enum bpf_= access_type type, > > SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash, > > struct sock, type); > > break; > > + > > + case bpf_ctx_range(struct bpf_sock_ops, netns_dev): > > + /* We get the netns_dev at BPF-load-time and not at > > + * BPF-exec-time. We assume that netns_dev is a constan= t. > > + */ > > + res =3D ns_get_path_cb(&ns_path, sockops_netns_cb, NULL= ); > > + if (IS_ERR(res)) { > > + netns_dev =3D 0; > > What is the proper way to handle error here? > If we indeed get an error and netns_dev =3D 0, do this mean bpf program > should check netns_dev =3D=3D 0 and special case it? Or since this is rea= lly > a lower probability thing we can set to 0 and bpf program's logic does no= t > need to specialize this one. > > At least, maybe we need a little documentation for the field in uapi head= er > to point out this potential caveat? > As far as I can tell, this function can only error with ENOMEM when allocat= ing a new inode or dentry, which is very unlikely. I don't think bpf programs should check for this. Also, for sockops programs worst case is usually tha= t the connection goes through the usual networking stack so it shouldn't be dangerous. I can add a comment like this to the field in the uapi header file: ... /* * netns_dev might be zero if there's an error getting it * when loading the BPF program. This is very unlikely. */ __u64 netns_dev; ... What do you think? > > + } else { > > + ns_inode =3D ns_path.dentry->d_inode; > > + netns_dev =3D new_encode_dev(ns_inode->i_sb->s_= dev); > > + } > > + *target_size =3D 8; > > + *insn++ =3D BPF_MOV64_IMM(si->dst_reg, netns_dev); > > + break; > > + > > + case offsetof(struct bpf_sock_ops, netns_ino): > > +#ifdef CONFIG_NET_NS > > + /* Loading: sk_ops->sk->__sk_common.skc_net.net->ns.inu= m > > + * Type: (struct bpf_sock_ops_kern *) > > + * ->(struct sock *) > > + * ->(struct sock_common) > > + * .possible_net_t > > + * .(struct net *) > > + * ->(struct ns_common) > > + * .(unsigned int) > > + */ > > + BUILD_BUG_ON(offsetof(struct sock, __sk_common) !=3D 0)= ; > > + BUILD_BUG_ON(offsetof(possible_net_t, net) !=3D 0); > > + *insn++ =3D BPF_LDX_MEM(BPF_FIELD_SIZEOF( > > + struct bpf_sock_ops_ker= n, sk), > > + si->dst_reg, si->src_reg, > > + offsetof(struct bpf_sock_ops_kern= , sk)); > > + *insn++ =3D BPF_LDX_MEM(BPF_FIELD_SIZEOF( > > + possible_net_t, net), > > + si->dst_reg, si->dst_reg, > > + offsetof(struct sock_common, skc_= net)); > > + *insn++ =3D BPF_LDX_MEM(BPF_FIELD_SIZEOF( > > + struct ns_common, inum)= , > > + si->dst_reg, si->dst_reg, > > + offsetof(struct net, ns) + > > + offsetof(struct ns_common, inum))= ; > > +#else > > + *insn++ =3D BPF_MOV64_IMM(si->dst_reg, 0); > > +#endif > > + break; > > + > > } > > return insn - insn_buf; > > } > > -- > > 2.21.0 > > -- Iago L=C3=B3pez Galeiras Kinvolk GmbH | Adalbertstr. 6a, 10999 Berlin Gesch=C3=A4ftsf=C3=BChrer/Directors: Alban Crequy, Chris K=C3=BChl, Iago L= =C3=B3pez Galeiras Registergericht/Court of registration: Amtsgericht Charlottenburg Registernummer/Registration number: HRB 171414 B Ust-ID-Nummer/VAT ID number: DE302207000