On Fri, Dec 01, 2017 at 04:54:39AM +0000, Al Viro wrote:
> On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote:
>
> > Something similar to get_prog_path_type() above might make for a usable
> > primitive, IMO...
>
> Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use
> user_path(), rather than wanking with getname()+kern_path(pname->name)+putname().
> Note that kern_path() will do getname_kernel() to get struct pathname...
>
> Would cause problems for tracepoints in there, though. And that, BTW,
> is precisely why I don't want tracepoints in core VFS, TYVM - makes
> restructuring the code harder...
Egads... Contortions in bpf ->mknod() are really obnoxious.
First of all, it checks that ->d_fsdata is non-NULL and fails otherwise.
The only time ->d_fsdata gets non-NULL on that fs? In bpf_obj_do_pin(), this:
dentry->d_fsdata = raw;
ret = vfs_mknod(dir, dentry, mode, devt);
dentry->d_fsdata = NULL;
In other words, it's *not* going to work from normal mknod(2). Why go through
->mknod(), then, especially since it requires that kind of contortions to
pass the data in?
devt is 0:1 or 0:2 here. mode? Character or block device, right? Like hell -
it's a regular file. And devt is a cute way to pass a flag down into bpf_mkobj()
(aka. ->mknod()) through vfs_mknod(). No, it doesn't go into ->i_rdev...
And to make the things even more fun, the damn thing is passed to a couple
of Linux S&M hooks - security_path_mknod() and security_inode_mknod(). Oh, sorry -
three hooks. There's devcgroup_inode_mknod() as well, but that thing sees S_IFREG
in mode and buggers off quietly. Our esteemed sadomaso^Wsecurity community gets
to play, though. Without any way to see _what_ are we attaching to that place in
the bpf fs tree, but hey - it's security, it doesn't need to make sense...
What the hell? If you need a clean way to do something, why don't you describe
(on fsdevel, or in off-list mail to relevant people) what do you really want?
Sure, you can "work around" anything, but doesn't that level of perversion
strike you as a clear sign of something being not right?
For crying out loud, you are trying to pass a tagged pointer to one or another
kind of object into your own function. For that you
* use a field in a globally visible data structure as a temporary storage
for a pointer
* encode your tag (essentially a boolean) into a fucking _device_ _number_,
of all things, and shove it through, hoping that no LSM module gets weirded out by
non-zero device number combined with regular file for mode.
If that does not scream "wrong or missing primitive", I don't know what would.
You want something along the lines of "create a filesystem object at given
location, calling this function with this argument for actual object creation"?
Fair enough, but then let's add a primitive that would do just that.
And grepping around for similar sick tricks catches a slightly milder example -
mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very
similar and could also benefit from the same primitive.
How about something like this:
int vfs_mkobj(struct dentry *dentry, umode_t mode,
int (*f)(struct dentry *, umode_t, void *),
void *arg)
{
struct inode *dir = dentry->d_parent->d_inode;
int error = may_create(dir, dentry);
if (error)
return error;
mode &= S_IALLUGO;
mode |= S_IFREG;
error = security_inode_create(dir, dentry, mode);
if (error)
return error;
error = f(dentry, mode, arg);
if (!error)
fsnotify_create(dir, dentry);
return error;
}
exported by fs/namei.c, with your code doing
switch (type) {
case BPF_TYPE_PROG:
error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw);
break;
case BPF_TYPE_MAP:
error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw);
break;
default:
error = -EPERM;
}
instead that vfs_mknod() hack, with
static int bpf_mkprog(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw)
{
return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops);
}
static int bpf_mkmap(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw)
{
return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops);
}
static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw, struct inode_operations *iops)
{
struct inode *inode;
inode = bpf_get_inode(dir->i_sb, dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = iops;
inode->i_private = raw;
bpf_dentry_finalize(dentry, inode, dir);
return 0;
}
And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all...
Might want to replace security_path_mknod() with something saner, while we are
at it.
Objections?
PS: mqueue.c would also benefit from such primitive - do_create() there would
simply pass attr as callback's argument into vfs_mkobj(), with callback being
the guts of mqueue_create()...
On 12/01/2017 06:39 PM, Al Viro wrote:
[...]
> If that does not scream "wrong or missing primitive", I don't know what would.
> You want something along the lines of "create a filesystem object at given
> location, calling this function with this argument for actual object creation"?
> Fair enough, but then let's add a primitive that would do just that.
>
> And grepping around for similar sick tricks catches a slightly milder example -
> mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very
> similar and could also benefit from the same primitive.
>
> How about something like this:
> int vfs_mkobj(struct dentry *dentry, umode_t mode,
> int (*f)(struct dentry *, umode_t, void *),
> void *arg)
> {
> struct inode *dir = dentry->d_parent->d_inode;
> int error = may_create(dir, dentry);
> if (error)
> return error;
>
> mode &= S_IALLUGO;
> mode |= S_IFREG;
> error = security_inode_create(dir, dentry, mode);
> if (error)
> return error;
> error = f(dentry, mode, arg);
> if (!error)
> fsnotify_create(dir, dentry);
> return error;
> }
>
> exported by fs/namei.c, with your code doing
>
> switch (type) {
> case BPF_TYPE_PROG:
> error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw);
> break;
> case BPF_TYPE_MAP:
> error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw);
> break;
> default:
> error = -EPERM;
> }
> instead that vfs_mknod() hack, with
>
> static int bpf_mkprog(struct inode *dir, struct dentry *dentry,
> umode_t mode, void *raw)
> {
> return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops);
> }
>
> static int bpf_mkmap(struct inode *dir, struct dentry *dentry,
> umode_t mode, void *raw)
> {
> return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops);
> }
>
> static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry,
> umode_t mode, void *raw, struct inode_operations *iops)
> {
> struct inode *inode;
>
> inode = bpf_get_inode(dir->i_sb, dir, mode);
> if (IS_ERR(inode))
> return PTR_ERR(inode);
>
> inode->i_op = iops;
> inode->i_private = raw;
>
> bpf_dentry_finalize(dentry, inode, dir);
> return 0;
> }
>
> And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all...
> Might want to replace security_path_mknod() with something saner, while we are
> at it.
>
> Objections?
No, thanks for looking into this, and sorry for this fugly hack! :( Not
that this doesn't make it any better, but I think back then I took it
over from mqueue implementation ... should have known better and looking
into making this generic instead, sigh. The above looks good to me, so
no objections from my side and thanks for working on it!
> PS: mqueue.c would also benefit from such primitive - do_create() there would
> simply pass attr as callback's argument into vfs_mkobj(), with callback being
> the guts of mqueue_create()...
On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:
> > Might want to replace security_path_mknod() with something saner, while we are
> > at it.
> >
> > Objections?
>
> No, thanks for looking into this, and sorry for this fugly hack! :( Not
> that this doesn't make it any better, but I think back then I took it
> over from mqueue implementation ... should have known better and looking
> into making this generic instead, sigh. The above looks good to me, so
> no objections from my side and thanks for working on it!
>
> > PS: mqueue.c would also benefit from such primitive - do_create() there would
> > simply pass attr as callback's argument into vfs_mkobj(), with callback being
> > the guts of mqueue_create()...
OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
passes LTP tests, but that's not saying much, and BPF side is completely
untested.
On Sat, Dec 02, 2017 at 06:48:50PM +0000, Al Viro wrote:
> On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:
>
> > > Might want to replace security_path_mknod() with something saner, while we are
> > > at it.
> > >
> > > Objections?
> >
> > No, thanks for looking into this, and sorry for this fugly hack! :( Not
> > that this doesn't make it any better, but I think back then I took it
> > over from mqueue implementation ... should have known better and looking
> > into making this generic instead, sigh. The above looks good to me, so
> > no objections from my side and thanks for working on it!
> >
> > > PS: mqueue.c would also benefit from such primitive - do_create() there would
> > > simply pass attr as callback's argument into vfs_mkobj(), with callback being
> > > the guts of mqueue_create()...
>
> OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
> passes LTP tests, but that's not saying much, and BPF side is completely
> untested.
... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e55e4255a210..a7000e4775e7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -514,6 +514,9 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
return bpf_prog_get_type_dev(ufd, type, false);
}
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type);
+bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool);
+
int bpf_prog_offload_compile(struct bpf_prog *prog);
void bpf_prog_offload_destroy(struct bpf_prog *prog);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 2b75faccc771..9d1050dc2a7a 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -364,6 +364,45 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
}
EXPORT_SYMBOL_GPL(bpf_obj_get_user);
+static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type type)
+{
+ struct bpf_prog *prog;
+ int ret = inode_permission(inode, MAY_READ | MAY_WRITE);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (inode->i_op == &bpf_map_iops)
+ return ERR_PTR(-EINVAL);
+ if (inode->i_op != &bpf_prog_iops)
+ return ERR_PTR(-EACCES);
+
+ prog = inode->i_private;
+
+ ret = security_bpf_prog(prog);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ if (!bpf_prog_get_ok(prog, &type, false))
+ return ERR_PTR(-EINVAL);
+
+ return bpf_prog_inc(prog);
+}
+
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type)
+{
+ struct bpf_prog *prog;
+ struct path path;
+ int ret = kern_path(name, LOOKUP_FOLLOW, &path);
+ if (ret)
+ return ERR_PTR(ret);
+ prog = __get_prog_inode(d_backing_inode(path.dentry), type);
+ if (!IS_ERR(prog))
+ touch_atime(&path);
+ path_put(&path);
+ return prog;
+}
+EXPORT_SYMBOL(bpf_prog_get_type_path);
+
static void bpf_evict_inode(struct inode *inode)
{
enum bpf_type type;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2c4cfeaa8d5e..5cb783fc8224 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1057,7 +1057,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
}
EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero);
-static bool bpf_prog_get_ok(struct bpf_prog *prog,
+bool bpf_prog_get_ok(struct bpf_prog *prog,
enum bpf_prog_type *attach_type, bool attach_drv)
{
/* not an attachment, just a refcount inc, always allow */
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 041da0d9c06f..fa2ca0a13619 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -52,18 +52,8 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
{
- mm_segment_t oldfs = get_fs();
- int retval, fd;
-
- set_fs(KERNEL_DS);
- fd = bpf_obj_get_user(path, 0);
- set_fs(oldfs);
- if (fd < 0)
- return fd;
-
- retval = __bpf_mt_check_fd(fd, ret);
- sys_close(fd);
- return retval;
+ *ret = bpf_prog_get_type_path(path, BPF_PROG_TYPE_SOCKET_FILTER);
+ return PTR_ERR_OR_ZERO(*ret);
}
static int bpf_mt_check(const struct xt_mtchk_param *par)
>> OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
>> passes LTP tests, but that's not saying much, and BPF side is completely
>> untested.
>
> ... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows:
Thanks a lot for this fix.
The tree including the bpf fix passes this basic xt_bpf test:
mount -t bpf bpf /sys/fs/bpf
./pin /sys/fs/bpf/pass
iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/five -j LOG
iptables -L INPUT
iptables -F INPUT
where pin is as follows:
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index adeaa1302f34..0cd2bb8d634b 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map
hostprogs-y += xdp_redirect_cpu
hostprogs-y += xdp_monitor
hostprogs-y += syscall_tp
+hostprogs-y += pin
# Libbpf dependencies
LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF)
xdp_redirect_map_user.o
xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+pin-objs := $(LIBBPF) pin.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
diff --git a/samples/bpf/pin.c b/samples/bpf/pin.c
new file mode 100644
index 000000000000..826e86784edf
--- /dev/null
+++ b/samples/bpf/pin.c
@@ -0,0 +1,41 @@
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <error.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "libbpf.h"
+#include "bpf_load.h"
+
+static char log_buf[1 << 16];
+
+int main(int argc, char **argv)
+{
+ struct bpf_insn prog[] = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ };
+ int fd;
+
+ if (argc != 2)
+ error(1, 0, "Usage: %s <filepath>\n", argv[0]);
+
+ fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog,
+ sizeof(prog) / sizeof(prog[0]),
+ "GPL", 0, log_buf, sizeof(log_buf));
+ if (fd == -1)
+ error(1, errno, "load: %s", log_buf);
+
+ if (bpf_obj_pin(fd, argv[1]))
+ error(1, errno, "pin");
+
+ if (close(fd))
+ error(1, errno, "close");
+
+ return 0;
+}
On 12/02/2017 07:48 PM, Al Viro wrote:
> On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:
>>> Might want to replace security_path_mknod() with something saner, while we are
>>> at it.
>>>
>>> Objections?
>>
>> No, thanks for looking into this, and sorry for this fugly hack! :( Not
>> that this doesn't make it any better, but I think back then I took it
>> over from mqueue implementation ... should have known better and looking
>> into making this generic instead, sigh. The above looks good to me, so
>> no objections from my side and thanks for working on it!
>>
>>> PS: mqueue.c would also benefit from such primitive - do_create() there would
>>> simply pass attr as callback's argument into vfs_mkobj(), with callback being
>>> the guts of mqueue_create()...
>
> OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
> passes LTP tests, but that's not saying much, and BPF side is completely
> untested.
I pulled vfs.git#untested.mkobj into my local tree and ran tests for both
progs and maps on it, all went fine and the patch looks good to me.
For 'bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()' when
you push the fix to Linus, feel free to add:
Acked-by: Daniel Borkmann <[email protected]>
Thanks for your help, Al!