2019-03-20 17:35:21

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 1/7] tools: bpftool: fix infinite loop

From: Alban Crequy <[email protected]>

Symptoms: when forgetting to type the keyword 'type' in front of 'hash':
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
(infinite loop, taking all the CPU)
^C

Signed-off-by: Alban Crequy <[email protected]>
---
tools/bpf/bpftool/map.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index e0c650d91784..994a7e0d16fb 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1151,6 +1151,9 @@ static int do_create(int argc, char **argv)
return -1;
}
NEXT_ARG();
+ } else {
+ p_err("unknown arg %s", *argv);
+ return -1;
}
}

--
2.20.1



2019-03-20 17:35:54

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 2/7] tools: bpftool: fix error message on invalid argument count

From: Alban Crequy <[email protected]>

Symptoms:
$ sudo bpftool map create /sys/fs/bpf/dir/foobar type hash key 8 value 8 entries
Error: '8' needs at least 2 arguments, 1 found

After this patch:
$ sudo bpftool map create /sys/fs/bpf/dir/foobar type hash key 8 value 8 entries
Error: can't parse max entries: missing argument

Signed-off-by: Alban Crequy <[email protected]>
---
tools/bpf/bpftool/common.c | 5 +++++
tools/bpf/bpftool/main.h | 11 +++++++----
tools/bpf/bpftool/map.c | 15 ++++++++++++---
tools/bpf/bpftool/prog.c | 2 --
4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index f7261fad45c1..e560cd8f66bc 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -627,6 +627,11 @@ int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what)
return -1;
}

+ if (*argc == 0) {
+ p_err("can't parse %s: missing argument", what);
+ return -1;
+ }
+
*val = strtoul(**argv, &endptr, 0);
if (*endptr) {
p_err("can't parse %s as %s", **argv, what);
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index d7dd84d3c660..7fc2973446d0 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -22,14 +22,17 @@
#define NEXT_ARGP() ({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
#define BAD_ARG() ({ p_err("what is '%s'?", *argv); -1; })
#define GET_ARG() ({ argc--; *argv++; })
-#define REQ_ARGS(cnt) \
+
+#define REQ_ARGS(cnt) REQ_ARGS_GENERIC(cnt, argc, argv)
+#define REQ_ARGSP(cnt) REQ_ARGS_GENERIC(cnt, *argc, *argv)
+#define REQ_ARGS_GENERIC(cnt, ARGC, ARGV) \
({ \
int _cnt = (cnt); \
bool _res; \
\
- if (argc < _cnt) { \
- p_err("'%s' needs at least %d arguments, %d found", \
- argv[-1], _cnt, argc); \
+ if ((ARGC) < _cnt) { \
+ p_err("'%s' needs at least %d arguments, %d found", \
+ (ARGV)[-1], _cnt, (ARGC)); \
_res = false; \
} else { \
_res = true; \
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 994a7e0d16fb..18f9bc3aed4f 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -98,6 +98,9 @@ int map_parse_fd(int *argc, char ***argv)
char *endptr;

NEXT_ARGP();
+ if (!REQ_ARGSP(1)) {
+ return -1;
+ }

id = strtoul(**argv, &endptr, 0);
if (*endptr) {
@@ -114,6 +117,9 @@ int map_parse_fd(int *argc, char ***argv)
char *path;

NEXT_ARGP();
+ if (!REQ_ARGSP(1)) {
+ return -1;
+ }

path = **argv;
NEXT_ARGP();
@@ -1100,11 +1106,10 @@ static int do_create(int argc, char **argv)
pinfile = GET_ARG();

while (argc) {
- if (!REQ_ARGS(2))
- return -1;
-
if (is_prefix(*argv, "type")) {
NEXT_ARG();
+ if (!REQ_ARGS(1))
+ return -1;

if (attr.map_type) {
p_err("map type already specified");
@@ -1119,6 +1124,8 @@ static int do_create(int argc, char **argv)
NEXT_ARG();
} else if (is_prefix(*argv, "name")) {
NEXT_ARG();
+ if (!REQ_ARGS(1))
+ return -1;
attr.name = GET_ARG();
} else if (is_prefix(*argv, "key")) {
if (parse_u32_arg(&argc, &argv, &attr.key_size,
@@ -1138,6 +1145,8 @@ static int do_create(int argc, char **argv)
return -1;
} else if (is_prefix(*argv, "dev")) {
NEXT_ARG();
+ if (!REQ_ARGS(1))
+ return -1;

if (attr.map_ifindex) {
p_err("offload device already specified");
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 8ef80d65a474..e0875ef5e444 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -836,8 +836,6 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd,
}

NEXT_ARG();
- if (!REQ_ARGS(2))
- return -EINVAL;

*mapfd = map_parse_fd(&argc, &argv);
if (*mapfd < 0)
--
2.20.1


2019-03-20 17:36:22

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 3/7] tools: bpftool: create map of maps

From: Alban Crequy <[email protected]>

Before this patch, there was no way to fill attr.inner_map_fd, necessary
for array_of_maps or hash_of_maps.

This patch adds keyword 'innermap' to pass the innermap, either as an id
or as a pinned map.

Example of commands:

$ sudo bpftool map create /sys/fs/bpf/innermap type hash \
key 8 value 8 entries 64 name innermap flags 1
$ sudo bpftool map create /sys/fs/bpf/outermap type hash_of_maps \
innermap pinned /sys/fs/bpf/innermap key 64 value 4 \
entries 64 name myoutermap flags 1
$ sudo bpftool map show pinned /sys/fs/bpf/outermap
47: hash_of_maps name myoutermap flags 0x1
key 64B value 4B max_entries 64 memlock 12288B

Documentation and bash completion updated as well.

Signed-off-by: Alban Crequy <[email protected]>

---

Previous version of this patch was sent while bpf-next was closed.
https://marc.info/?l=linux-kernel&m=155180393501258&w=2

Since then, the following changes were done:
- error management when calling map_parse_fd (review from Jakub)
- fix documentation and bash completion (review from Quentin)
---
.../bpf/bpftool/Documentation/bpftool-map.rst | 9 ++-
tools/bpf/bpftool/bash-completion/bpftool | 9 +++
tools/bpf/bpftool/map.c | 75 +++++++++++++++++--
3 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 5c984ffc9f01..b685641bfd74 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -23,7 +23,7 @@ MAP COMMANDS

| **bpftool** **map { show | list }** [*MAP*]
| **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
-| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
+| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*] [**innermap** MAP]
| **bpftool** **map dump** *MAP*
| **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
| **bpftool** **map lookup** *MAP* [**key** *DATA*]
@@ -60,10 +60,15 @@ DESCRIPTION
Output will start with map ID followed by map type and
zero or more named attributes (depending on kernel version).

- **bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
+ **bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*] [**innermap** MAP]
Create a new map with given parameters and pin it to *bpffs*
as *FILE*.

+ To create a map of type **array_of_maps** or
+ **hash_of_maps**, the additional parameter **innermap** needs
+ to reference an existing map with the type and size of the
+ maps in the values.
+
**bpftool map dump** *MAP*
Dump all entries in a given *MAP*.

diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index b803827d01e8..9e37de8bb227 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -461,6 +461,14 @@ _bpftool()
_sysfs_get_netdevs
return 0
;;
+ innermap)
+ COMPREPLY+=( $( compgen -W "id pinned" -- "$cur" ) )
+ return 0
+ ;;
+ id)
+ _bpftool_get_map_ids
+ return 0
+ ;;
*)
_bpftool_once_attr 'type'
_bpftool_once_attr 'key'
@@ -469,6 +477,7 @@ _bpftool()
_bpftool_once_attr 'name'
_bpftool_once_attr 'flags'
_bpftool_once_attr 'dev'
+ _bpftool_once_attr 'innermap'
return 0
;;
esac
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 18f9bc3aed4f..a576f2a019be 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1100,6 +1100,7 @@ static int do_create(int argc, char **argv)
struct bpf_create_map_attr attr = { NULL, };
const char *pinfile;
int err, fd;
+ int inner_map_fd = -1;

if (!REQ_ARGS(7))
return -1;
@@ -1108,48 +1109,75 @@ static int do_create(int argc, char **argv)
while (argc) {
if (is_prefix(*argv, "type")) {
NEXT_ARG();
- if (!REQ_ARGS(1))
+ if (!REQ_ARGS(1)) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }

if (attr.map_type) {
p_err("map type already specified");
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}

attr.map_type = map_type_from_str(*argv);
if ((int)attr.map_type < 0) {
p_err("unrecognized map type: %s", *argv);
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}
NEXT_ARG();
} else if (is_prefix(*argv, "name")) {
NEXT_ARG();
- if (!REQ_ARGS(1))
+ if (!REQ_ARGS(1)) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }
attr.name = GET_ARG();
} else if (is_prefix(*argv, "key")) {
if (parse_u32_arg(&argc, &argv, &attr.key_size,
- "key size"))
+ "key size")) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }
} else if (is_prefix(*argv, "value")) {
if (parse_u32_arg(&argc, &argv, &attr.value_size,
- "value size"))
+ "value size")) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }
} else if (is_prefix(*argv, "entries")) {
if (parse_u32_arg(&argc, &argv, &attr.max_entries,
- "max entries"))
+ "max entries")) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }
} else if (is_prefix(*argv, "flags")) {
if (parse_u32_arg(&argc, &argv, &attr.map_flags,
- "flags"))
+ "flags")) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }
} else if (is_prefix(*argv, "dev")) {
NEXT_ARG();
- if (!REQ_ARGS(1))
+ if (!REQ_ARGS(1)) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
+ }

if (attr.map_ifindex) {
p_err("offload device already specified");
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}

@@ -1157,28 +1185,59 @@ static int do_create(int argc, char **argv)
if (!attr.map_ifindex) {
p_err("unrecognized netdevice '%s': %s",
*argv, strerror(errno));
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}
NEXT_ARG();
+ } else if (is_prefix(*argv, "innermap")) {
+ NEXT_ARG();
+ if (!REQ_ARGS(1)) {
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
+ return -1;
+ }
+
+ if (inner_map_fd != -1) {
+ close(inner_map_fd);
+ p_err("innermap already specified");
+ return -1;
+ }
+
+ inner_map_fd = map_parse_fd(&argc, &argv);
+ if (inner_map_fd < 0)
+ return -1;
} else {
p_err("unknown arg %s", *argv);
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}
}

if (!attr.name) {
p_err("map name not specified");
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}

set_max_rlimit();

+ if (inner_map_fd != -1)
+ attr.inner_map_fd = inner_map_fd;
+
fd = bpf_create_map_xattr(&attr);
if (fd < 0) {
p_err("map create failed: %s", strerror(errno));
+ if (inner_map_fd != -1)
+ close(inner_map_fd);
return -1;
}

+ if (inner_map_fd != -1)
+ close(inner_map_fd);
+
err = do_pin_fd(fd, pinfile);
close(fd);
if (err)
@@ -1243,7 +1302,7 @@ static int do_help(int argc, char **argv)
"Usage: %s %s { show | list } [MAP]\n"
" %s %s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
" entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
- " [dev NAME]\n"
+ " [dev NAME] [innermap MAP]\n"
" %s %s dump MAP\n"
" %s %s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
" %s %s lookup MAP [key DATA]\n"
--
2.20.1


2019-03-20 17:36:30

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 4/7] tools: bpftool: implement map exec command

From: Alban Crequy <[email protected]>

The map exec commands allows to open an existing map and pass the file
descriptor to a child process. This enables applications to use an
existing BPF map even when they don't support bpffs.

Example of usage:
# bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99
anon_inode:bpf-map

Documentation and bash completion updated as well.

Signed-off-by: Alban Crequy <[email protected]>
---
.../bpf/bpftool/Documentation/bpftool-map.rst | 11 ++++
tools/bpf/bpftool/bash-completion/bpftool | 22 +++++++-
tools/bpf/bpftool/map.c | 53 +++++++++++++++++++
3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index b685641bfd74..dfd8352fa453 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -30,6 +30,7 @@ MAP COMMANDS
| **bpftool** **map getnext** *MAP* [**key** *DATA*]
| **bpftool** **map delete** *MAP* **key** *DATA*
| **bpftool** **map pin** *MAP* *FILE*
+| **bpftool** **map exec** *MAP* **fd** *FD* **cmd** -- *CMD*
| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map peek** *MAP*
| **bpftool** **map push** *MAP* **value** *VALUE*
@@ -101,6 +102,10 @@ DESCRIPTION
contain a dot character ('.'), which is reserved for future
extensions of *bpffs*.

+ **bpftool map exec** *MAP* **fd** *FD* **cmd** -- *CMD*
+ Load map and exec an external command, passing the file
+ descriptor.
+
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.

@@ -254,6 +259,12 @@ would be lost as soon as bpftool exits).
key: 00 00 00 00 value: 22 02 00 00
Found 1 element

+**# bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99**
+
+::
+
+ anon_inode:bpf-map
+
SEE ALSO
========
**bpf**\ (2),
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 9e37de8bb227..63cd53c4d3a7 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -583,6 +583,26 @@ _bpftool()
fi
return 0
;;
+ exec)
+ case $prev in
+ $command)
+ COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
+ return 0
+ ;;
+ fd)
+ return 0
+ ;;
+ cmd)
+ COMPREPLY=( $(compgen -c -- "$cur" ) )
+ return 0
+ ;;
+ *)
+ _bpftool_once_attr 'fd'
+ _bpftool_once_attr 'cmd'
+ return 0
+ ;;
+ esac
+ ;;
event_pipe)
case $prev in
$command)
@@ -610,7 +630,7 @@ _bpftool()
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'delete dump getnext help \
lookup pin event_pipe show list update create \
- peek push enqueue pop dequeue' -- \
+ peek push enqueue pop dequeue exec' -- \
"$cur" ) )
;;
esac
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index a576f2a019be..768497364cee 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -883,6 +883,58 @@ static int do_update(int argc, char **argv)
return err;
}

+static int do_exec(int argc, char **argv)
+{
+ struct bpf_map_info info = {};
+ __u32 len = sizeof(info);
+ int fd, ret;
+ __u32 outfd = 0;
+
+ if (argc < 2)
+ usage();
+
+ fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+ if (fd < 0)
+ return -1;
+
+ while (argc) {
+ if (is_prefix(*argv, "fd")) {
+ if (parse_u32_arg(&argc, &argv, &outfd,
+ "out fd"))
+ return -1;
+ } else if (is_prefix(*argv, "cmd")){
+ NEXT_ARG();
+ if (!REQ_ARGS(1))
+ return -1;
+
+ break;
+ } else {
+ p_err("unknown arg %s", *argv);
+ return -1;
+ }
+ }
+
+ if (outfd == 0) {
+ p_err("invalid fd");
+ return -1;
+ }
+ if (*argv == NULL) {
+ p_err("empty command");
+ return -1;
+ }
+
+
+ ret = dup2(fd, (int)outfd);
+ if (ret == -1) {
+ p_err("dup2 failed: %s", strerror(errno));
+ return -1;
+ }
+
+ execvp(argv[0], argv);
+ p_err("execvp failed: %s", strerror(errno));
+ return -1;
+}
+
static void print_key_value(struct bpf_map_info *info, void *key,
void *value)
{
@@ -1355,6 +1407,7 @@ static const struct cmd cmds[] = {
{ "enqueue", do_update },
{ "pop", do_pop_dequeue },
{ "dequeue", do_pop_dequeue },
+ { "exec", do_exec },
{ 0 }
};

--
2.20.1


2019-03-20 17:37:22

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 6/7] tools: bpftool: fix bpffs mount when pinning subdirectories

From: Alban Crequy <[email protected]>

When trying to pin at a path such as /sys/fs/bpf/subdir/mymap while the
bpffs is mounted at the standard path and the "subdir" directory didn't
exist, bpftool was failing and attempting to mount the bpffs on
/sys/fs/bpf/subdir/. The mount obviously failed, since "subdir" didn't
exist.

Commit 3fc27b71b894 ("tools: bpftool: try to mount bpffs if required for
pinning objects") says:

The code for mnt_bpffs() is a copy of function bpf_mnt_fs() from
iproute2 package (under lib/bpf.c, taken at commit 4b73d52f8a81), with
modifications regarding handling of error messages.

However, the function bpf_mnt_fs() from iproute2 only works when its
parameter is the root of the bpffs (e.g. "/sys/fs/bfs").

iproute2/tc actually only mounts at the standard path "/sys/fs/bfs" or
at the path from the environment variable $TC_BPF_MNT. This patch
implements a similar strategy but uses the environment variable name
$BPFTOOL_BPF_MNT.

This patch still will not do the mkdir automatically. But at least,
there will be no error message about the mount.

Fixes: 3fc27b71b894 ("tools: bpftool: try to mount bpffs if required for pinning objects")
Cc: Quentin Monnet <[email protected]>
Signed-off-by: Alban Crequy <[email protected]>
---
tools/bpf/bpftool/common.c | 20 ++++++++++++++++----
tools/bpf/bpftool/main.h | 6 ++++++
2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index e560cd8f66bc..4783cbbe8037 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -168,6 +168,8 @@ int mount_bpffs_for_pin(const char *name)
char *file;
char *dir;
int err = 0;
+ const char *mnt_env = getenv(BPF_DIR_MNT_ENV);
+ static const char *mnt;

file = malloc(strlen(name) + 1);
strcpy(file, name);
@@ -183,13 +185,23 @@ int mount_bpffs_for_pin(const char *name)
goto out_free;
}

- err = mnt_fs(dir, "bpf", err_str, ERR_MAX_LEN);
+ mnt = mnt_env ? : BPF_DIR_MNT;
+
+ // Don't try to mount if wrong prefix: we don't want a leftover mount that is
+ // not going to help.
+ if (!is_prefix(mnt, dir)) {
+ p_err("refuse to mount BPF file system (%s) to pin the object (%s): wrong directory",
+ mnt, name);
+ err = -1;
+ goto out_free;
+ }
+
+ err = mnt_fs(mnt, "bpf", err_str, ERR_MAX_LEN);
if (err) {
err_str[ERR_MAX_LEN - 1] = '\0';
- p_err("can't mount BPF file system to pin the object (%s): %s",
- name, err_str);
+ p_err("can't mount BPF file system (%s) to pin the object (%s): %s",
+ mnt, name, err_str);
}
-
out_free:
free(file);
return err;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 7fc2973446d0..a2e28e600b72 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -18,6 +18,12 @@

#define ptr_to_u64(ptr) ((__u64)(unsigned long)(ptr))

+/* If bpffs is not mounted, it can be automatically mounted at this location.
+ * The location can be changed with the environment variable $BPFTOOL_BPF_MNT.
+ */
+#define BPF_DIR_MNT "/sys/fs/bpf"
+#define BPF_DIR_MNT_ENV "BPFTOOL_BPF_MNT"
+
#define NEXT_ARG() ({ argc--; argv++; if (argc < 0) usage(); })
#define NEXT_ARGP() ({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
#define BAD_ARG() ({ p_err("what is '%s'?", *argv); -1; })
--
2.20.1


2019-03-20 17:37:52

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 5/7] tools: bpftool: support loading map by fd from parent process

From: Alban Crequy <[email protected]>

Using a file descriptor passed by the parent process enables
applications to fork a bpftool command to inspect a map they know by
file descriptor even when they don't support bpffs or map ids.

Documentation and bash completion updated as well.

Signed-off-by: Alban Crequy <[email protected]>
---
tools/bpf/bpftool/Documentation/bpftool-map.rst | 8 ++++++++
tools/bpf/bpftool/bash-completion/bpftool | 10 +++++-----
tools/bpf/bpftool/map.c | 13 +++++++++++++
3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index dfd8352fa453..658fe2fb8ecf 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -265,6 +265,14 @@ would be lost as soon as bpftool exits).

anon_inode:bpf-map

+**# bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- bpftool map show fd 99**
+
+::
+
+ 10: hash name some_map flags 0x0
+ key 4B value 8B max_entries 2048 memlock 167936B
+
+
SEE ALSO
========
**bpf**\ (2),
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 63cd53c4d3a7..9ec1833bda1e 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -253,7 +253,7 @@ _bpftool()
esac

local PROG_TYPE='id pinned tag'
- local MAP_TYPE='id pinned'
+ local MAP_TYPE='id pinned fd'
case $command in
show|list)
[[ $prev != "$command" ]] && return 0
@@ -343,7 +343,7 @@ _bpftool()
obj=${words[3]}

if [[ ${words[-4]} == "map" ]]; then
- COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
+ COMPREPLY=( $( compgen -W "id pinned fd" -- "$cur" ) )
return 0
fi
if [[ ${words[-3]} == "map" ]]; then
@@ -406,7 +406,7 @@ _bpftool()
esac
;;
map)
- local MAP_TYPE='id pinned'
+ local MAP_TYPE='id pinned fd'
case $command in
show|list|dump|peek|pop|dequeue)
case $prev in
@@ -462,7 +462,7 @@ _bpftool()
return 0
;;
innermap)
- COMPREPLY+=( $( compgen -W "id pinned" -- "$cur" ) )
+ COMPREPLY+=( $( compgen -W "id pinned fd" -- "$cur" ) )
return 0
;;
id)
@@ -525,7 +525,7 @@ _bpftool()
# map, depending on the type of the map to update.
case "$(_bpftool_map_guess_map_type)" in
array_of_maps|hash_of_maps)
- local MAP_TYPE='id pinned'
+ local MAP_TYPE='id pinned fd'
COMPREPLY+=( $( compgen -W "$MAP_TYPE" \
-- "$cur" ) )
return 0
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 768497364cee..9befcabc299d 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -125,6 +125,19 @@ int map_parse_fd(int *argc, char ***argv)
NEXT_ARGP();

return open_obj_pinned_any(path, BPF_OBJ_MAP);
+ } else if (is_prefix(**argv, "fd")) {
+ char *endptr;
+
+ NEXT_ARGP();
+
+ fd = strtoul(**argv, &endptr, 0);
+ if (*endptr) {
+ p_err("can't parse %s as fd", **argv);
+ return -1;
+ }
+ NEXT_ARGP();
+
+ return dup(fd);
}

p_err("expected 'id' or 'pinned', got: '%s'?", **argv);
--
2.20.1


2019-03-20 17:38:55

by Alban Crequy

[permalink] [raw]
Subject: [PATCH bpf-next v1 7/7] tools: bpftool: add error message on map pinning failure

From: Alban Crequy <[email protected]>

Symptoms, when "dir" does not exist:

$ sudo bpftool map create /sys/fs/bpf/dir/fooB type hash key 8 value 8 entries 8 name fooB flags 1
$ echo $?
255

This patch prints an error message when the map pinning fails.

Signed-off-by: Alban Crequy <[email protected]>
---
tools/bpf/bpftool/map.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 9befcabc299d..c825eb1bbf8f 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1305,8 +1305,10 @@ static int do_create(int argc, char **argv)

err = do_pin_fd(fd, pinfile);
close(fd);
- if (err)
+ if (err) {
+ p_err("failed to pin map to %s", pinfile);
return err;
+ }

if (json_output)
jsonw_null(json_wtr);
--
2.20.1


2019-03-20 19:01:56

by Quentin Monnet

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 3/7] tools: bpftool: create map of maps

Hi Alban, thanks for those patches!

2019-03-20 18:33 UTC+0100 ~ Alban Crequy <[email protected]>
> From: Alban Crequy <[email protected]>
>
> Before this patch, there was no way to fill attr.inner_map_fd, necessary
> for array_of_maps or hash_of_maps.
>
> This patch adds keyword 'innermap' to pass the innermap, either as an id
> or as a pinned map.
>
> Example of commands:
>
> $ sudo bpftool map create /sys/fs/bpf/innermap type hash \
> key 8 value 8 entries 64 name innermap flags 1
> $ sudo bpftool map create /sys/fs/bpf/outermap type hash_of_maps \
> innermap pinned /sys/fs/bpf/innermap key 64 value 4 \
> entries 64 name myoutermap flags 1
> $ sudo bpftool map show pinned /sys/fs/bpf/outermap
> 47: hash_of_maps name myoutermap flags 0x1
> key 64B value 4B max_entries 64 memlock 12288B
>
> Documentation and bash completion updated as well.
>
> Signed-off-by: Alban Crequy <[email protected]>
>
> ---
>
> Previous version of this patch was sent while bpf-next was closed.
> https://marc.info/?l=linux-kernel&m=155180393501258&w=2
>
> Since then, the following changes were done:
> - error management when calling map_parse_fd (review from Jakub)
> - fix documentation and bash completion (review from Quentin)
> ---
> .../bpf/bpftool/Documentation/bpftool-map.rst | 9 ++-
> tools/bpf/bpftool/bash-completion/bpftool | 9 +++
> tools/bpf/bpftool/map.c | 75 +++++++++++++++++--
> 3 files changed, 83 insertions(+), 10 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 5c984ffc9f01..b685641bfd74 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -23,7 +23,7 @@ MAP COMMANDS
>
> | **bpftool** **map { show | list }** [*MAP*]
> | **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
> -| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
> +| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*] [**innermap** MAP]
> | **bpftool** **map dump** *MAP*
> | **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
> | **bpftool** **map lookup** *MAP* [**key** *DATA*]
> @@ -60,10 +60,15 @@ DESCRIPTION
> Output will start with map ID followed by map type and
> zero or more named attributes (depending on kernel version).
>
> - **bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
> + **bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*] [**innermap** MAP]
> Create a new map with given parameters and pin it to *bpffs*
> as *FILE*.
>
> + To create a map of type **array_of_maps** or
> + **hash_of_maps**, the additional parameter **innermap** needs
> + to reference an existing map with the type and size of the
> + maps in the values.
> +
> **bpftool map dump** *MAP*
> Dump all entries in a given *MAP*.
>
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index b803827d01e8..9e37de8bb227 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -461,6 +461,14 @@ _bpftool()
> _sysfs_get_netdevs
> return 0
> ;;
> + innermap)
> + COMPREPLY+=( $( compgen -W "id pinned" -- "$cur" ) )
> + return 0
> + ;;
> + id)
> + _bpftool_get_map_ids
> + return 0
> + ;;
> *)
> _bpftool_once_attr 'type'
> _bpftool_once_attr 'key'
> @@ -469,6 +477,7 @@ _bpftool()
> _bpftool_once_attr 'name'
> _bpftool_once_attr 'flags'
> _bpftool_once_attr 'dev'
> + _bpftool_once_attr 'innermap'
> return 0
> ;;
> esac
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 18f9bc3aed4f..a576f2a019be 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1100,6 +1100,7 @@ static int do_create(int argc, char **argv)
> struct bpf_create_map_attr attr = { NULL, };
> const char *pinfile;
> int err, fd;
> + int inner_map_fd = -1;

Nit: reverse-Christmas tree for declaring the variables.

>
> if (!REQ_ARGS(7))
> return -1;
> @@ -1108,48 +1109,75 @@ static int do_create(int argc, char **argv)
> while (argc) {
> if (is_prefix(*argv, "type")) {
> NEXT_ARG();
> - if (!REQ_ARGS(1))
> + if (!REQ_ARGS(1)) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
>
> if (attr.map_type) {
> p_err("map type already specified");
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
>
> attr.map_type = map_type_from_str(*argv);
> if ((int)attr.map_type < 0) {
> p_err("unrecognized map type: %s", *argv);
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
> NEXT_ARG();
> } else if (is_prefix(*argv, "name")) {
> NEXT_ARG();
> - if (!REQ_ARGS(1))
> + if (!REQ_ARGS(1)) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
> attr.name = GET_ARG();
> } else if (is_prefix(*argv, "key")) {
> if (parse_u32_arg(&argc, &argv, &attr.key_size,
> - "key size"))
> + "key size")) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
> } else if (is_prefix(*argv, "value")) {
> if (parse_u32_arg(&argc, &argv, &attr.value_size,
> - "value size"))
> + "value size")) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
> } else if (is_prefix(*argv, "entries")) {
> if (parse_u32_arg(&argc, &argv, &attr.max_entries,
> - "max entries"))
> + "max entries")) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
> } else if (is_prefix(*argv, "flags")) {
> if (parse_u32_arg(&argc, &argv, &attr.map_flags,
> - "flags"))
> + "flags")) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
> } else if (is_prefix(*argv, "dev")) {
> NEXT_ARG();
> - if (!REQ_ARGS(1))
> + if (!REQ_ARGS(1)) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> + }
>
> if (attr.map_ifindex) {
> p_err("offload device already specified");
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
>
> @@ -1157,28 +1185,59 @@ static int do_create(int argc, char **argv)
> if (!attr.map_ifindex) {
> p_err("unrecognized netdevice '%s': %s",
> *argv, strerror(errno));
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
> NEXT_ARG();
> + } else if (is_prefix(*argv, "innermap")) {
> + NEXT_ARG();
> + if (!REQ_ARGS(1)) {
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> + return -1;
> + }
> +
> + if (inner_map_fd != -1) {
> + close(inner_map_fd);
> + p_err("innermap already specified");
> + return -1;
> + }
> +
> + inner_map_fd = map_parse_fd(&argc, &argv);
> + if (inner_map_fd < 0)
> + return -1;
> } else {
> p_err("unknown arg %s", *argv);
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
> }
>
> if (!attr.name) {
> p_err("map name not specified");
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;
> }
>
> set_max_rlimit();
>
> + if (inner_map_fd != -1)
> + attr.inner_map_fd = inner_map_fd;
> +
> fd = bpf_create_map_xattr(&attr);
> if (fd < 0) {
> p_err("map create failed: %s", strerror(errno));
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> return -1;

You might want to move all those

if (inner_map_fd != -1)
close(inner_map_fd);
return -1;

at the end of the function under a label, and jump there with a goto for
the 15 or so occurrences of the pattern?

> }
>
> + if (inner_map_fd != -1)
> + close(inner_map_fd);
> +
> err = do_pin_fd(fd, pinfile);
> close(fd);
> if (err)
> @@ -1243,7 +1302,7 @@ static int do_help(int argc, char **argv)
> "Usage: %s %s { show | list } [MAP]\n"
> " %s %s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
> " entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
> - " [dev NAME]\n"
> + " [dev NAME] [innermap MAP]\n"
> " %s %s dump MAP\n"
> " %s %s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
> " %s %s lookup MAP [key DATA]\n"
>


2019-03-20 19:02:14

by Quentin Monnet

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 4/7] tools: bpftool: implement map exec command

2019-03-20 18:33 UTC+0100 ~ Alban Crequy <[email protected]>
> From: Alban Crequy <[email protected]>
>
> The map exec commands allows to open an existing map and pass the file
> descriptor to a child process. This enables applications to use an
> existing BPF map even when they don't support bpffs.
>
> Example of usage:
> # bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99
> anon_inode:bpf-map
>
> Documentation and bash completion updated as well.
>
> Signed-off-by: Alban Crequy <[email protected]>
> ---
> .../bpf/bpftool/Documentation/bpftool-map.rst | 11 ++++
> tools/bpf/bpftool/bash-completion/bpftool | 22 +++++++-
> tools/bpf/bpftool/map.c | 53 +++++++++++++++++++
> 3 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index b685641bfd74..dfd8352fa453 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -30,6 +30,7 @@ MAP COMMANDS
> | **bpftool** **map getnext** *MAP* [**key** *DATA*]
> | **bpftool** **map delete** *MAP* **key** *DATA*
> | **bpftool** **map pin** *MAP* *FILE*
> +| **bpftool** **map exec** *MAP* **fd** *FD* **cmd** -- *CMD*
> | **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
> | **bpftool** **map peek** *MAP*
> | **bpftool** **map push** *MAP* **value** *VALUE*
> @@ -101,6 +102,10 @@ DESCRIPTION
> contain a dot character ('.'), which is reserved for future
> extensions of *bpffs*.
>
> + **bpftool map exec** *MAP* **fd** *FD* **cmd** -- *CMD*
> + Load map and exec an external command, passing the file
> + descriptor.
> +

Please use tabs for indent.

Could you please add a line to explain that the process spawned get the
the link to the map in the file descriptor passed as *FD*? I'm being
pedantic, I know it's not that hard to understand with the examples,
just trying to make things as clear as possible for users reading the page.

I have nothing against the double dash in the examples or the commit
log, but putting it in the description in the command itself makes it
look like it is some kind of mandatory syntax. Would you consider
removing the occurrences above?

We don't really "exec" the map... Just thinking... Would another keyword
like "pass" or "passfd" be more relevant?

> **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
> Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
>
> @@ -254,6 +259,12 @@ would be lost as soon as bpftool exits).
> key: 00 00 00 00 value: 22 02 00 00
> Found 1 element
>
> +**# bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99**
> +
> +::
> +
> + anon_inode:bpf-map
> +
> SEE ALSO
> ========
> **bpf**\ (2),
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index 9e37de8bb227..63cd53c4d3a7 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -583,6 +583,26 @@ _bpftool()
> fi
> return 0
> ;;
> + exec)
> + case $prev in
> + $command)
> + COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
> + return 0
> + ;;
> + fd)
> + return 0
> + ;;
> + cmd)
> + COMPREPLY=( $(compgen -c -- "$cur" ) )
> + return 0
> + ;;
> + *)
> + _bpftool_once_attr 'fd'
> + _bpftool_once_attr 'cmd'
> + return 0
> + ;;
> + esac
> + ;;
> event_pipe)
> case $prev in
> $command)
> @@ -610,7 +630,7 @@ _bpftool()
> [[ $prev == $object ]] && \
> COMPREPLY=( $( compgen -W 'delete dump getnext help \
> lookup pin event_pipe show list update create \
> - peek push enqueue pop dequeue' -- \
> + peek push enqueue pop dequeue exec' -- \
> "$cur" ) )
> ;;
> esac
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index a576f2a019be..768497364cee 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -883,6 +883,58 @@ static int do_update(int argc, char **argv)
> return err;
> }
>
> +static int do_exec(int argc, char **argv)
> +{
> + struct bpf_map_info info = {};
> + __u32 len = sizeof(info);
> + int fd, ret;
> + __u32 outfd = 0;

Reverse-Christmas tree.

> +
> + if (argc < 2)
> + usage();
> +
> + fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
> + if (fd < 0)
> + return -1;
> +
> + while (argc) {
> + if (is_prefix(*argv, "fd")) {
> + if (parse_u32_arg(&argc, &argv, &outfd,
> + "out fd"))
> + return -1;
> + } else if (is_prefix(*argv, "cmd")){
> + NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -1;
> +
> + break;
> + } else {
> + p_err("unknown arg %s", *argv);
> + return -1;
> + }
> + }
> +
> + if (outfd == 0) {
> + p_err("invalid fd");
> + return -1;
> + }
> + if (*argv == NULL) {
> + p_err("empty command");
> + return -1;
> + }
> +
> +

Spurious blank line.

> + ret = dup2(fd, (int)outfd);
> + if (ret == -1) {
> + p_err("dup2 failed: %s", strerror(errno));
> + return -1;
> + }
> +
> + execvp(argv[0], argv);
> + p_err("execvp failed: %s", strerror(errno));
> + return -1;
> +}
> +
> static void print_key_value(struct bpf_map_info *info, void *key,
> void *value)
> {
> @@ -1355,6 +1407,7 @@ static const struct cmd cmds[] = {
> { "enqueue", do_update },
> { "pop", do_pop_dequeue },
> { "dequeue", do_pop_dequeue },
> + { "exec", do_exec },
> { 0 }
> };
>
>

Please also add the new command in the interactive help (do_help()).

By the way, I like this new command a lot!

2019-03-20 19:02:25

by Quentin Monnet

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 6/7] tools: bpftool: fix bpffs mount when pinning subdirectories

2019-03-20 18:33 UTC+0100 ~ Alban Crequy <[email protected]>
> From: Alban Crequy <[email protected]>
>
> When trying to pin at a path such as /sys/fs/bpf/subdir/mymap while the
> bpffs is mounted at the standard path and the "subdir" directory didn't
> exist, bpftool was failing and attempting to mount the bpffs on
> /sys/fs/bpf/subdir/. The mount obviously failed, since "subdir" didn't
> exist.
>
> Commit 3fc27b71b894 ("tools: bpftool: try to mount bpffs if required for
> pinning objects") says:
>
> The code for mnt_bpffs() is a copy of function bpf_mnt_fs() from
> iproute2 package (under lib/bpf.c, taken at commit 4b73d52f8a81), with
> modifications regarding handling of error messages.
>
> However, the function bpf_mnt_fs() from iproute2 only works when its
> parameter is the root of the bpffs (e.g. "/sys/fs/bfs").
For bpftool, the point was to be able to mount the bpffs at any mount
point the user wanted to use. So one way to fix that would be to simply
prevent the mount attempt if the directory does not exist. Another one
would be to attempt to create the directories (but I guess that sounds
more hazardous already). I'm not really sure we should restrict mounting
to the default /sys/fs/bpf. I understand the environment variable still
allows it, so why not, but I find it less straightforward. Maybe others
will have an opinion.

>
> iproute2/tc actually only mounts at the standard path "/sys/fs/bfs" or
> at the path from the environment variable $TC_BPF_MNT. This patch
> implements a similar strategy but uses the environment variable name
> $BPFTOOL_BPF_MNT.
>
> This patch still will not do the mkdir automatically. But at least,
> there will be no error message about the mount.
>
> Fixes: 3fc27b71b894 ("tools: bpftool: try to mount bpffs if required for pinning objects")
> Cc: Quentin Monnet <[email protected]>
> Signed-off-by: Alban Crequy <[email protected]>
> ---
> tools/bpf/bpftool/common.c | 20 ++++++++++++++++----
> tools/bpf/bpftool/main.h | 6 ++++++
> 2 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index e560cd8f66bc..4783cbbe8037 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -168,6 +168,8 @@ int mount_bpffs_for_pin(const char *name)
> char *file;
> char *dir;
> int err = 0;
> + const char *mnt_env = getenv(BPF_DIR_MNT_ENV);
> + static const char *mnt;

Reverse-Christmas tree style for the declarations, please.

>
> file = malloc(strlen(name) + 1);
> strcpy(file, name);
> @@ -183,13 +185,23 @@ int mount_bpffs_for_pin(const char *name)
> goto out_free;
> }
>
> - err = mnt_fs(dir, "bpf", err_str, ERR_MAX_LEN);
> + mnt = mnt_env ? : BPF_DIR_MNT;
> +
> + // Don't try to mount if wrong prefix: we don't want a leftover mount that is
> + // not going to help.

Please stick to /* */-style comments.

> + if (!is_prefix(mnt, dir)) {
> + p_err("refuse to mount BPF file system (%s) to pin the object (%s): wrong directory",

Not all users know that bpftool attempts to mount the bpffs, I fear this
message might sound a bit obscure. Maybe something along
"<dir> is not in bpffs or under an expected mountpoint, cannot pin ..."?

> + mnt, name);
> + err = -1;
> + goto out_free;
> + }
> +
> + err = mnt_fs(mnt, "bpf", err_str, ERR_MAX_LEN);
> if (err) {
> err_str[ERR_MAX_LEN - 1] = '\0';
> - p_err("can't mount BPF file system to pin the object (%s): %s",
> - name, err_str);
> + p_err("can't mount BPF file system (%s) to pin the object (%s): %s",
> + mnt, name, err_str);
> }
> -

No need to remove the empty line?

> out_free:
> free(file);
> return err;
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index 7fc2973446d0..a2e28e600b72 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -18,6 +18,12 @@
>
> #define ptr_to_u64(ptr) ((__u64)(unsigned long)(ptr))
>
> +/* If bpffs is not mounted, it can be automatically mounted at this location.
> + * The location can be changed with the environment variable $BPFTOOL_BPF_MNT.

Could you please document that in the manual pages? Otherwise very few
people will ever discover this variable. I don't think we have mentioned
anywhere that we try to mount the file system yet, but we could have it
in both bpftool-prog and bpftool-map I suppose.

> + */
> +#define BPF_DIR_MNT "/sys/fs/bpf"
> +#define BPF_DIR_MNT_ENV "BPFTOOL_BPF_MNT"

Could we rename those macros into BPFFS_MNT_* maybe?

> +
> #define NEXT_ARG() ({ argc--; argv++; if (argc < 0) usage(); })
> #define NEXT_ARGP() ({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
> #define BAD_ARG() ({ p_err("what is '%s'?", *argv); -1; })
>


2019-03-20 19:03:51

by Quentin Monnet

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 7/7] tools: bpftool: add error message on map pinning failure

2019-03-20 18:33 UTC+0100 ~ Alban Crequy <[email protected]>
> From: Alban Crequy <[email protected]>
>
> Symptoms, when "dir" does not exist:
>
> $ sudo bpftool map create /sys/fs/bpf/dir/fooB type hash key 8 value 8 entries 8 name fooB flags 1
> $ echo $?
> 255
>
> This patch prints an error message when the map pinning fails.
>
> Signed-off-by: Alban Crequy <[email protected]>
> ---
> tools/bpf/bpftool/map.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 9befcabc299d..c825eb1bbf8f 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1305,8 +1305,10 @@ static int do_create(int argc, char **argv)
>
> err = do_pin_fd(fd, pinfile);
> close(fd);
> - if (err)
> + if (err) {
> + p_err("failed to pin map to %s", pinfile);
> return err;
> + }

do_pin_fd() is already susceptible to print an error message, and this
may lead to several messages (thus breaking JSON output).

Instead, you could change do_pin_fd() to store the result from
bpf_obj_pin(), compare it, and print the message there if non-0.

>
> if (json_output)
> jsonw_null(json_wtr);
>


2019-03-20 19:06:18

by Quentin Monnet

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 5/7] tools: bpftool: support loading map by fd from parent process

2019-03-20 18:33 UTC+0100 ~ Alban Crequy <[email protected]>
> From: Alban Crequy <[email protected]>
>
> Using a file descriptor passed by the parent process enables
> applications to fork a bpftool command to inspect a map they know by
> file descriptor even when they don't support bpffs or map ids.
>
> Documentation and bash completion updated as well.
>
> Signed-off-by: Alban Crequy <[email protected]>
> ---
> tools/bpf/bpftool/Documentation/bpftool-map.rst | 8 ++++++++
> tools/bpf/bpftool/bash-completion/bpftool | 10 +++++-----
> tools/bpf/bpftool/map.c | 13 +++++++++++++
> 3 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index dfd8352fa453..658fe2fb8ecf 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -265,6 +265,14 @@ would be lost as soon as bpftool exits).
>
> anon_inode:bpf-map
>
> +**# bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- bpftool map show fd 99**
> +
> +::
> +
> + 10: hash name some_map flags 0x0
> + key 4B value 8B max_entries 2048 memlock 167936B
> +
> +

Could you please also update the description of the command higher in
that page, and in the interactive help?

2019-03-20 20:57:01

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 1/7] tools: bpftool: fix infinite loop

On Wed, 20 Mar 2019 18:33:26 +0100, Alban Crequy wrote:
> From: Alban Crequy <[email protected]>
>
> Symptoms: when forgetting to type the keyword 'type' in front of 'hash':
> $ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
> (infinite loop, taking all the CPU)
> ^C
>

Please add:

Fixes: 0b592b5a01be ("tools: bpftool: add map create command")

here. And submit this one for the bpf tree.

> Signed-off-by: Alban Crequy <[email protected]>

> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index e0c650d91784..994a7e0d16fb 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1151,6 +1151,9 @@ static int do_create(int argc, char **argv)
> return -1;
> }
> NEXT_ARG();
> + } else {
> + p_err("unknown arg %s", *argv);
> + return -1;
> }
> }
>

2019-03-20 21:03:27

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 2/7] tools: bpftool: fix error message on invalid argument count

On Wed, 20 Mar 2019 18:33:27 +0100, Alban Crequy wrote:
> From: Alban Crequy <[email protected]>
>
> Symptoms:
> $ sudo bpftool map create /sys/fs/bpf/dir/foobar type hash key 8 value 8 entries
> Error: '8' needs at least 2 arguments, 1 found
>
> After this patch:
> $ sudo bpftool map create /sys/fs/bpf/dir/foobar type hash key 8 value 8 entries
> Error: can't parse max entries: missing argument
>
> Signed-off-by: Alban Crequy <[email protected]>
> ---
> tools/bpf/bpftool/common.c | 5 +++++
> tools/bpf/bpftool/main.h | 11 +++++++----
> tools/bpf/bpftool/map.c | 15 ++++++++++++---
> tools/bpf/bpftool/prog.c | 2 --
> 4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index f7261fad45c1..e560cd8f66bc 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -627,6 +627,11 @@ int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what)
> return -1;
> }
>
> + if (*argc == 0) {
> + p_err("can't parse %s: missing argument", what);
> + return -1;
> + }
>
> *val = strtoul(**argv, &endptr, 0);
> if (*endptr) {
> p_err("can't parse %s as %s", **argv, what);
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index d7dd84d3c660..7fc2973446d0 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -22,14 +22,17 @@
> #define NEXT_ARGP() ({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
> #define BAD_ARG() ({ p_err("what is '%s'?", *argv); -1; })
> #define GET_ARG() ({ argc--; *argv++; })
> -#define REQ_ARGS(cnt) \
> +
> +#define REQ_ARGS(cnt) REQ_ARGS_GENERIC(cnt, argc, argv)
> +#define REQ_ARGSP(cnt) REQ_ARGS_GENERIC(cnt, *argc, *argv)
> +#define REQ_ARGS_GENERIC(cnt, ARGC, ARGV) \
> ({ \
> int _cnt = (cnt); \
> bool _res; \
> \
> - if (argc < _cnt) { \
> - p_err("'%s' needs at least %d arguments, %d found", \
> - argv[-1], _cnt, argc); \
> + if ((ARGC) < _cnt) { \
> + p_err("'%s' needs at least %d arguments, %d found", \
> + (ARGV)[-1], _cnt, (ARGC)); \
> _res = false; \
> } else { \
> _res = true; \
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 994a7e0d16fb..18f9bc3aed4f 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -98,6 +98,9 @@ int map_parse_fd(int *argc, char ***argv)
> char *endptr;
>
> NEXT_ARGP();
> + if (!REQ_ARGSP(1)) {
> + return -1;
> + }
>
> id = strtoul(**argv, &endptr, 0);
> if (*endptr) {
> @@ -114,6 +117,9 @@ int map_parse_fd(int *argc, char ***argv)
> char *path;
>
> NEXT_ARGP();
> + if (!REQ_ARGSP(1)) {
> + return -1;
> + }
>
> path = **argv;
> NEXT_ARGP();
> @@ -1100,11 +1106,10 @@ static int do_create(int argc, char **argv)
> pinfile = GET_ARG();
>
> while (argc) {
> - if (!REQ_ARGS(2))
> - return -1;

Seems like it'd be better to make a version of REQ_ARGS() which
uses the next arg in the error message (if it exists) rather than
pushing all the checks out?

I think the format of the parsing loop is quite canonical here, with
all arguments being in the <keyword> <value> form, therefore I think
it's worthwhile improving the REQ_ARGS(2) check.

> if (is_prefix(*argv, "type")) {
> NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -1;
>
> if (attr.map_type) {
> p_err("map type already specified");
> @@ -1119,6 +1124,8 @@ static int do_create(int argc, char **argv)
> NEXT_ARG();
> } else if (is_prefix(*argv, "name")) {
> NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -1;
> attr.name = GET_ARG();
> } else if (is_prefix(*argv, "key")) {
> if (parse_u32_arg(&argc, &argv, &attr.key_size,
> @@ -1138,6 +1145,8 @@ static int do_create(int argc, char **argv)
> return -1;
> } else if (is_prefix(*argv, "dev")) {
> NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -1;
>
> if (attr.map_ifindex) {
> p_err("offload device already specified");
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 8ef80d65a474..e0875ef5e444 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -836,8 +836,6 @@ static int parse_attach_detach_args(int argc, char **argv, int *progfd,
> }
>
> NEXT_ARG();
> - if (!REQ_ARGS(2))
> - return -EINVAL;
>
> *mapfd = map_parse_fd(&argc, &argv);
> if (*mapfd < 0)


2019-03-20 21:24:59

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 4/7] tools: bpftool: implement map exec command

On Wed, 20 Mar 2019 18:33:29 +0100, Alban Crequy wrote:
> From: Alban Crequy <[email protected]>
>
> The map exec commands allows to open an existing map and pass the file
> descriptor to a child process. This enables applications to use an
> existing BPF map even when they don't support bpffs.
>
> Example of usage:
> # bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99
> anon_inode:bpf-map

Would you mind telling us a little more about the use for this feature?
It seems fairly limited. If it's about probing objects (finding out if
they are a map or a program) perhaps we can add a command just for that?

(I guess bpftool -f isn't really the cleanest way of getting at that
info.)

> Documentation and bash completion updated as well.
>
> Signed-off-by: Alban Crequy <[email protected]>

2019-03-21 09:59:20

by Alban Crequy

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 4/7] tools: bpftool: implement map exec command

On Wed, Mar 20, 2019 at 10:23 PM Jakub Kicinski
<[email protected]> wrote:
>
> On Wed, 20 Mar 2019 18:33:29 +0100, Alban Crequy wrote:
> > From: Alban Crequy <[email protected]>
> >
> > The map exec commands allows to open an existing map and pass the file
> > descriptor to a child process. This enables applications to use an
> > existing BPF map even when they don't support bpffs.
> >
> > Example of usage:
> > # bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99
> > anon_inode:bpf-map
>
> Would you mind telling us a little more about the use for this feature?
> It seems fairly limited. If it's about probing objects (finding out if
> they are a map or a program) perhaps we can add a command just for that?

I needed to know the name of the map too. I was preparing a demo based
on python bcc tools (opensnoop) but with added feature that requires
using a pinned map, created and maintained externally. At the moment,
the python API for bcc does not support pinning or using external
maps. Ideally, this should be added in the python API (some discussion
on https://github.com/iovisor/bcc/issues/2223) but meanwhile, I use a
workaround by executing bpftool from the python code.

Arguably, my use case is a temporary hack until we have better support
in python bcc. But other tools implements similar commands to pass
file descriptors between processes: "ip netns exec" and "tc exec bpf".
So I think it could be useful for other scripting use cases.

In my demo, I used the two hacks:
- if the pinned map fd is not given to the python script, re-execute
itself with bpftool:
os.execvp("bpftool", ["bpftool", "map", "exec", "pinned", pin_path,
"fd", "90", "cmd", "--"] + sys.argv)
- once we have the fd 90 (number specified above) of the pinned map in
the python script, overwrite the empty fd created by bcc:
os.dup2(90, 6)
I call dup2() between the bpf map creation and the bpf program
creation. To check which map fd to overwrite, I just call
os.system("bpftool map show fd 6...").


Thanks a lot for the reviews. I'll need some time to address it (maybe
a week or 2).

> (I guess bpftool -f isn't really the cleanest way of getting at that
> info.)
>
> > Documentation and bash completion updated as well.
> >
> > Signed-off-by: Alban Crequy <[email protected]>

2019-03-21 20:17:39

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH bpf-next v1 4/7] tools: bpftool: implement map exec command

On Thu, 21 Mar 2019 10:57:32 +0100, Alban Crequy wrote:
> On Wed, Mar 20, 2019 at 10:23 PM Jakub Kicinski
> <[email protected]> wrote:
> >
> > On Wed, 20 Mar 2019 18:33:29 +0100, Alban Crequy wrote:
> > > From: Alban Crequy <[email protected]>
> > >
> > > The map exec commands allows to open an existing map and pass the file
> > > descriptor to a child process. This enables applications to use an
> > > existing BPF map even when they don't support bpffs.
> > >
> > > Example of usage:
> > > # bpftool map exec pinned /sys/fs/bpf/foo fd 99 cmd -- readlink /proc/self/fd/99
> > > anon_inode:bpf-map
> >
> > Would you mind telling us a little more about the use for this feature?
> > It seems fairly limited. If it's about probing objects (finding out if
> > they are a map or a program) perhaps we can add a command just for that?
>
> I needed to know the name of the map too. I was preparing a demo based
> on python bcc tools (opensnoop) but with added feature that requires
> using a pinned map, created and maintained externally. At the moment,
> the python API for bcc does not support pinning or using external
> maps. Ideally, this should be added in the python API (some discussion
> on https://github.com/iovisor/bcc/issues/2223) but meanwhile, I use a
> workaround by executing bpftool from the python code.
>
> Arguably, my use case is a temporary hack until we have better support
> in python bcc. But other tools implements similar commands to pass
> file descriptors between processes: "ip netns exec" and "tc exec bpf".
> So I think it could be useful for other scripting use cases.

The thing is the receiver of the FD has to be bpf-aware, because there
isn't really much one can do with that file descriptor, in which case
it's kind of strange that the receiver doesn't know how to open a
pinned object..

> In my demo, I used the two hacks:
> - if the pinned map fd is not given to the python script, re-execute
> itself with bpftool:
> os.execvp("bpftool", ["bpftool", "map", "exec", "pinned", pin_path,
> "fd", "90", "cmd", "--"] + sys.argv)
> - once we have the fd 90 (number specified above) of the pinned map in
> the python script, overwrite the empty fd created by bcc:
> os.dup2(90, 6)
> I call dup2() between the bpf map creation and the bpf program
> creation. To check which map fd to overwrite, I just call
> os.system("bpftool map show fd 6...").

I see, thanks for the explanation. That does indeed seems like a hack.

> Thanks a lot for the reviews. I'll need some time to address it (maybe
> a week or 2).
>
> > (I guess bpftool -f isn't really the cleanest way of getting at that
> > info.)
> >
> > > Documentation and bash completion updated as well.
> > >
> > > Signed-off-by: Alban Crequy <[email protected]>