Currently bpf is using the memlock rlimit for the memory accounting.
This approach has its downsides and over time has created a significant
amount of problems:
1) The limit is per-user, but because most bpf operations are performed
as root, the limit has a little value.
2) It's hard to come up with a specific maximum value. Especially because
the counter is shared with non-bpf users (e.g. memlock() users).
Any specific value is either too low and creates false failures
or too high and useless.
3) Charging is not connected to the actual memory allocation. Bpf code
should manually calculate the estimated cost and precharge the counter,
and then take care of uncharging, including all fail paths.
It adds to the code complexity and makes it easy to leak a charge.
4) There is no simple way of getting the current value of the counter.
We've used drgn for it, but it's far from being convenient.
5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
a function to "explain" this case for users.
In order to overcome these problems let's switch to the memcg-based
memory accounting of bpf objects. With the recent addition of the percpu
memory accounting, now it's possible to provide a comprehensive accounting
of memory used by bpf programs and maps.
This approach has the following advantages:
1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
a better control over memory usage by different workloads.
2) The actual memory consumption is taken into account. It happens automatically
on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
performed automatically on releasing the memory. So the code on the bpf side
becomes simpler and safer.
3) There is a simple way to get the current value and statistics.
The patchset consists of the following parts:
1) memcg-based accounting for various bpf objects: progs and maps
2) removal of the rlimit-based accounting
3) removal of rlimit adjustments in userspace tools and tests
v2:
- fixed build issue, caused by the remaining rlimit-based accounting
for sockhash maps
Roman Gushchin (35):
bpf: memcg-based memory accounting for bpf progs
bpf: memcg-based memory accounting for bpf maps
bpf: refine memcg-based memory accounting for arraymap maps
bpf: refine memcg-based memory accounting for cpumap maps
bpf: memcg-based memory accounting for cgroup storage maps
bpf: refine memcg-based memory accounting for devmap maps
bpf: refine memcg-based memory accounting for hashtab maps
bpf: memcg-based memory accounting for lpm_trie maps
bpf: memcg-based memory accounting for bpf ringbuffer
bpf: memcg-based memory accounting for socket storage maps
bpf: refine memcg-based memory accounting for sockmap and sockhash
maps
bpf: refine memcg-based memory accounting for xskmap maps
bpf: eliminate rlimit-based memory accounting for arraymap maps
bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps
bpf: eliminate rlimit-based memory accounting for cpumap maps
bpf: eliminate rlimit-based memory accounting for cgroup storage maps
bpf: eliminate rlimit-based memory accounting for devmap maps
bpf: eliminate rlimit-based memory accounting for hashtab maps
bpf: eliminate rlimit-based memory accounting for lpm_trie maps
bpf: eliminate rlimit-based memory accounting for queue_stack_maps
maps
bpf: eliminate rlimit-based memory accounting for reuseport_array maps
bpf: eliminate rlimit-based memory accounting for bpf ringbuffer
bpf: eliminate rlimit-based memory accounting for sockmap and sockhash
maps
bpf: eliminate rlimit-based memory accounting for stackmap maps
bpf: eliminate rlimit-based memory accounting for socket storage maps
bpf: eliminate rlimit-based memory accounting for xskmap maps
bpf: eliminate rlimit-based memory accounting infra for bpf maps
bpf: eliminate rlimit-based memory accounting for bpf progs
bpf: libbpf: cleanup RLIMIT_MEMLOCK usage
bpf: bpftool: do not touch RLIMIT_MEMLOCK
bpf: runqslower: don't touch RLIMIT_MEMLOCK
bpf: selftests: delete bpf_rlimit.h
bpf: selftests: don't touch RLIMIT_MEMLOCK
bpf: samples: do not touch RLIMIT_MEMLOCK
perf: don't touch RLIMIT_MEMLOCK
include/linux/bpf.h | 23 ---
kernel/bpf/arraymap.c | 30 +---
kernel/bpf/bpf_struct_ops.c | 19 +--
kernel/bpf/core.c | 20 +--
kernel/bpf/cpumap.c | 20 +--
kernel/bpf/devmap.c | 23 +--
kernel/bpf/hashtab.c | 33 +---
kernel/bpf/local_storage.c | 38 ++---
kernel/bpf/lpm_trie.c | 17 +-
kernel/bpf/queue_stack_maps.c | 16 +-
kernel/bpf/reuseport_array.c | 12 +-
kernel/bpf/ringbuf.c | 33 ++--
kernel/bpf/stackmap.c | 16 +-
kernel/bpf/syscall.c | 152 ++----------------
net/core/bpf_sk_storage.c | 23 +--
net/core/sock_map.c | 40 ++---
net/xdp/xskmap.c | 13 +-
samples/bpf/hbm.c | 1 -
samples/bpf/map_perf_test_user.c | 11 --
samples/bpf/offwaketime_user.c | 2 -
samples/bpf/sockex2_user.c | 2 -
samples/bpf/sockex3_user.c | 2 -
samples/bpf/spintest_user.c | 2 -
samples/bpf/syscall_tp_user.c | 2 -
samples/bpf/task_fd_query_user.c | 5 -
samples/bpf/test_lru_dist.c | 3 -
samples/bpf/test_map_in_map_user.c | 9 --
samples/bpf/test_overhead_user.c | 2 -
samples/bpf/trace_event_user.c | 2 -
samples/bpf/tracex2_user.c | 6 -
samples/bpf/tracex3_user.c | 6 -
samples/bpf/tracex4_user.c | 6 -
samples/bpf/tracex5_user.c | 3 -
samples/bpf/tracex6_user.c | 3 -
samples/bpf/xdp1_user.c | 6 -
samples/bpf/xdp_adjust_tail_user.c | 6 -
samples/bpf/xdp_monitor_user.c | 6 -
samples/bpf/xdp_redirect_cpu_user.c | 6 -
samples/bpf/xdp_redirect_map_user.c | 6 -
samples/bpf/xdp_redirect_user.c | 6 -
samples/bpf/xdp_router_ipv4_user.c | 6 -
samples/bpf/xdp_rxq_info_user.c | 6 -
samples/bpf/xdp_sample_pkts_user.c | 6 -
samples/bpf/xdp_tx_iptunnel_user.c | 6 -
samples/bpf/xdpsock_user.c | 7 -
tools/bpf/bpftool/common.c | 7 -
tools/bpf/bpftool/feature.c | 2 -
tools/bpf/bpftool/main.h | 2 -
tools/bpf/bpftool/map.c | 2 -
tools/bpf/bpftool/pids.c | 1 -
tools/bpf/bpftool/prog.c | 3 -
tools/bpf/bpftool/struct_ops.c | 2 -
tools/bpf/runqslower/runqslower.c | 16 --
tools/lib/bpf/libbpf.c | 31 +---
tools/lib/bpf/libbpf.h | 5 -
tools/perf/builtin-trace.c | 10 --
tools/perf/tests/builtin-test.c | 6 -
tools/perf/util/Build | 1 -
tools/perf/util/rlimit.c | 29 ----
tools/perf/util/rlimit.h | 6 -
tools/testing/selftests/bpf/bench.c | 16 --
tools/testing/selftests/bpf/bpf_rlimit.h | 28 ----
.../selftests/bpf/flow_dissector_load.c | 1 -
.../selftests/bpf/get_cgroup_id_user.c | 1 -
.../bpf/prog_tests/select_reuseport.c | 1 -
.../selftests/bpf/prog_tests/sk_lookup.c | 1 -
.../selftests/bpf/progs/bpf_iter_bpf_map.c | 5 +-
.../selftests/bpf/progs/map_ptr_kern.c | 5 -
tools/testing/selftests/bpf/test_btf.c | 1 -
.../selftests/bpf/test_cgroup_storage.c | 1 -
tools/testing/selftests/bpf/test_dev_cgroup.c | 1 -
tools/testing/selftests/bpf/test_lpm_map.c | 1 -
tools/testing/selftests/bpf/test_lru_map.c | 1 -
tools/testing/selftests/bpf/test_maps.c | 1 -
tools/testing/selftests/bpf/test_netcnt.c | 1 -
tools/testing/selftests/bpf/test_progs.c | 1 -
.../selftests/bpf/test_skb_cgroup_id_user.c | 1 -
tools/testing/selftests/bpf/test_sock.c | 1 -
tools/testing/selftests/bpf/test_sock_addr.c | 1 -
.../testing/selftests/bpf/test_sock_fields.c | 1 -
.../selftests/bpf/test_socket_cookie.c | 1 -
tools/testing/selftests/bpf/test_sockmap.c | 1 -
tools/testing/selftests/bpf/test_sysctl.c | 1 -
tools/testing/selftests/bpf/test_tag.c | 1 -
.../bpf/test_tcp_check_syncookie_user.c | 1 -
.../testing/selftests/bpf/test_tcpbpf_user.c | 1 -
.../selftests/bpf/test_tcpnotify_user.c | 1 -
tools/testing/selftests/bpf/test_verifier.c | 1 -
.../testing/selftests/bpf/test_verifier_log.c | 2 -
tools/testing/selftests/bpf/xdping.c | 6 -
tools/testing/selftests/net/reuseport_bpf.c | 20 ---
91 files changed, 97 insertions(+), 794 deletions(-)
delete mode 100644 tools/perf/util/rlimit.c
delete mode 100644 tools/perf/util/rlimit.h
delete mode 100644 tools/testing/selftests/bpf/bpf_rlimit.h
--
2.26.2
Include lpm trie and lpm trie node objects into the memcg-based memory
accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/lpm_trie.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 44474bf3ab7a..d85e0fc2cafc 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -282,7 +282,7 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
if (value)
size += trie->map.value_size;
- node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN,
+ node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT,
trie->map.numa_node);
if (!node)
return NULL;
@@ -557,7 +557,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
attr->value_size > LPM_VAL_SIZE_MAX)
return ERR_PTR(-EINVAL);
- trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
+ trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
if (!trie)
return ERR_PTR(-ENOMEM);
--
2.26.2
Do not use rlimit-based memory accounting for lpm_trie maps.
It has been replaced with the memcg-based memory accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/lpm_trie.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d85e0fc2cafc..c747f0835eb1 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -540,8 +540,6 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
static struct bpf_map *trie_alloc(union bpf_attr *attr)
{
struct lpm_trie *trie;
- u64 cost = sizeof(*trie), cost_per_node;
- int ret;
if (!bpf_capable())
return ERR_PTR(-EPERM);
@@ -567,20 +565,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
offsetof(struct bpf_lpm_trie_key, data);
trie->max_prefixlen = trie->data_size * 8;
- cost_per_node = sizeof(struct lpm_trie_node) +
- attr->value_size + trie->data_size;
- cost += (u64) attr->max_entries * cost_per_node;
-
- ret = bpf_map_charge_init(&trie->map.memory, cost);
- if (ret)
- goto out_err;
-
spin_lock_init(&trie->lock);
return &trie->map;
-out_err:
- kfree(trie);
- return ERR_PTR(ret);
}
static void trie_free(struct bpf_map *map)
--
2.26.2
Do not use rlimit-based memory accounting for devmap maps.
It has been replaced with the memcg-based memory accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/devmap.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 05bf93088063..8148c7260a54 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -109,8 +109,6 @@ static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
{
u32 valsize = attr->value_size;
- u64 cost = 0;
- int err;
/* check sanity of attributes. 2 value sizes supported:
* 4 bytes: ifindex
@@ -135,21 +133,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
if (!dtab->n_buckets) /* Overflow check */
return -EINVAL;
- cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
- } else {
- cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
}
- /* if map size is larger than memlock limit, reject it */
- err = bpf_map_charge_init(&dtab->map.memory, cost);
- if (err)
- return -EINVAL;
-
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets,
dtab->map.numa_node);
if (!dtab->dev_index_head)
- goto free_charge;
+ return -ENOMEM;
spin_lock_init(&dtab->index_lock);
} else {
@@ -157,14 +147,10 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
sizeof(struct bpf_dtab_netdev *),
dtab->map.numa_node);
if (!dtab->netdev_map)
- goto free_charge;
+ return -ENOMEM;
}
return 0;
-
-free_charge:
- bpf_map_charge_finish(&dtab->map.memory);
- return -ENOMEM;
}
static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
--
2.26.2
Since bpf is not using memlock rlimit for memory accounting,
there are no more reasons to bump the limit.
Signed-off-by: Roman Gushchin <[email protected]>
---
tools/bpf/runqslower/runqslower.c | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/tools/bpf/runqslower/runqslower.c b/tools/bpf/runqslower/runqslower.c
index d89715844952..a3380b53ce0c 100644
--- a/tools/bpf/runqslower/runqslower.c
+++ b/tools/bpf/runqslower/runqslower.c
@@ -88,16 +88,6 @@ int libbpf_print_fn(enum libbpf_print_level level,
return vfprintf(stderr, format, args);
}
-static int bump_memlock_rlimit(void)
-{
- struct rlimit rlim_new = {
- .rlim_cur = RLIM_INFINITY,
- .rlim_max = RLIM_INFINITY,
- };
-
- return setrlimit(RLIMIT_MEMLOCK, &rlim_new);
-}
-
void handle_event(void *ctx, int cpu, void *data, __u32 data_sz)
{
const struct event *e = data;
@@ -134,12 +124,6 @@ int main(int argc, char **argv)
libbpf_set_print(libbpf_print_fn);
- err = bump_memlock_rlimit();
- if (err) {
- fprintf(stderr, "failed to increase rlimit: %d", err);
- return 1;
- }
-
obj = runqslower_bpf__open();
if (!obj) {
fprintf(stderr, "failed to open and/or load BPF object\n");
--
2.26.2
Since bpf stopped using memlock rlimit to limit the memory usage,
there is no more reason for bpftool to alter its own limits.
Signed-off-by: Roman Gushchin <[email protected]>
---
tools/bpf/bpftool/common.c | 7 -------
tools/bpf/bpftool/feature.c | 2 --
tools/bpf/bpftool/main.h | 2 --
tools/bpf/bpftool/map.c | 2 --
tools/bpf/bpftool/pids.c | 1 -
tools/bpf/bpftool/prog.c | 3 ---
tools/bpf/bpftool/struct_ops.c | 2 --
7 files changed, 19 deletions(-)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 65303664417e..01b87e8c3040 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -109,13 +109,6 @@ static bool is_bpffs(char *path)
return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
}
-void set_max_rlimit(void)
-{
- struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
-
- setrlimit(RLIMIT_MEMLOCK, &rinf);
-}
-
static int
mnt_fs(const char *target, const char *type, char *buff, size_t bufflen)
{
diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
index 1cd75807673e..2d6c6bff934e 100644
--- a/tools/bpf/bpftool/feature.c
+++ b/tools/bpf/bpftool/feature.c
@@ -885,8 +885,6 @@ static int do_probe(int argc, char **argv)
__u32 ifindex = 0;
char *ifname;
- set_max_rlimit();
-
while (argc) {
if (is_prefix(*argv, "kernel")) {
if (target != COMPONENT_UNSPEC) {
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index e3a79b5a9960..0a3bd1ff14da 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -95,8 +95,6 @@ int detect_common_prefix(const char *arg, ...);
void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
void usage(void) __noreturn;
-void set_max_rlimit(void);
-
int mount_tracefs(const char *target);
struct pinned_obj_table {
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 3a27d31a1856..f08b9e707511 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1315,8 +1315,6 @@ static int do_create(int argc, char **argv)
return -1;
}
- set_max_rlimit();
-
fd = bpf_create_map_xattr(&attr);
if (fd < 0) {
p_err("map create failed: %s", strerror(errno));
diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
index e3b116325403..4c559a8ae4e8 100644
--- a/tools/bpf/bpftool/pids.c
+++ b/tools/bpf/bpftool/pids.c
@@ -96,7 +96,6 @@ int build_obj_refs_table(struct obj_refs_table *table, enum bpf_obj_type type)
libbpf_print_fn_t default_print;
hash_init(table->table);
- set_max_rlimit();
skel = pid_iter_bpf__open();
if (!skel) {
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 3e6ecc6332e2..40e50db60332 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -1291,8 +1291,6 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
}
}
- set_max_rlimit();
-
obj = bpf_object__open_file(file, &open_opts);
if (IS_ERR_OR_NULL(obj)) {
p_err("failed to open object file");
@@ -1833,7 +1831,6 @@ static int do_profile(int argc, char **argv)
}
}
- set_max_rlimit();
err = profiler_bpf__load(profile_obj);
if (err) {
p_err("failed to load profile_obj");
diff --git a/tools/bpf/bpftool/struct_ops.c b/tools/bpf/bpftool/struct_ops.c
index b58b91f62ffb..0915e1e9b7c0 100644
--- a/tools/bpf/bpftool/struct_ops.c
+++ b/tools/bpf/bpftool/struct_ops.c
@@ -498,8 +498,6 @@ static int do_register(int argc, char **argv)
if (IS_ERR_OR_NULL(obj))
return -1;
- set_max_rlimit();
-
load_attr.obj = obj;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
--
2.26.2
Include internal metadata into the memcg-based memory accounting.
Also include the memory allocated on updating an element.
Signed-off-by: Roman Gushchin <[email protected]>
---
net/core/sock_map.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 119f52a99dc1..bc797adca44c 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -38,7 +38,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
- stab = kzalloc(sizeof(*stab), GFP_USER);
+ stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_ACCOUNT);
if (!stab)
return ERR_PTR(-ENOMEM);
@@ -829,7 +829,8 @@ static struct bpf_shtab_elem *sock_hash_alloc_elem(struct bpf_shtab *htab,
}
}
- new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN,
+ new = kmalloc_node(htab->elem_size,
+ GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT,
htab->map.numa_node);
if (!new) {
atomic_dec(&htab->count);
@@ -1011,7 +1012,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
if (attr->key_size > MAX_BPF_STACK)
return ERR_PTR(-E2BIG);
- htab = kzalloc(sizeof(*htab), GFP_USER);
+ htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
if (!htab)
return ERR_PTR(-ENOMEM);
--
2.26.2
Account memory used by the socket storage.
Signed-off-by: Roman Gushchin <[email protected]>
---
net/core/bpf_sk_storage.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index eafcd15e7dfd..fbcd03cd00d3 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -130,7 +130,8 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
if (charge_omem && omem_charge(sk, smap->elem_size))
return NULL;
- selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
+ selem = kzalloc(smap->elem_size,
+ GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT);
if (selem) {
if (value)
memcpy(SDATA(selem)->data, value, smap->map.value_size);
@@ -337,7 +338,8 @@ static int sk_storage_alloc(struct sock *sk,
if (err)
return err;
- sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN);
+ sk_storage = kzalloc(sizeof(*sk_storage),
+ GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT);
if (!sk_storage) {
err = -ENOMEM;
goto uncharge;
@@ -677,7 +679,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
u64 cost;
int ret;
- smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
+ smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
if (!smap)
return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&smap->map, attr);
@@ -695,7 +697,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
}
smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
- GFP_USER | __GFP_NOWARN);
+ GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
if (!smap->buckets) {
bpf_map_charge_finish(&smap->map.memory);
kfree(smap);
@@ -1024,7 +1026,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
}
diag = kzalloc(sizeof(*diag) + sizeof(diag->maps[0]) * nr_maps,
- GFP_KERNEL);
+ GFP_KERNEL | __GFP_ACCOUNT);
if (!diag)
return ERR_PTR(-ENOMEM);
--
2.26.2
Include metadata and percpu data into the memcg-based memory accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/cpumap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index f1c46529929b..74ae9fcbe82e 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -99,7 +99,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
attr->map_flags & ~BPF_F_NUMA_NODE)
return ERR_PTR(-EINVAL);
- cmap = kzalloc(sizeof(*cmap), GFP_USER);
+ cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_ACCOUNT);
if (!cmap)
return ERR_PTR(-ENOMEM);
@@ -418,7 +418,7 @@ static struct bpf_cpu_map_entry *
__cpu_map_entry_alloc(struct bpf_cpumap_val *value, u32 cpu, int map_id)
{
int numa, err, i, fd = value->bpf_prog.fd;
- gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+ gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_NOWARN;
struct bpf_cpu_map_entry *rcpu;
struct xdp_bulk_queue *bq;
--
2.26.2
Include map metadata and the node size (struct bpf_dtab_netdev) on
element update into the accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/devmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 10abb06065bb..05bf93088063 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -175,7 +175,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
if (!capable(CAP_NET_ADMIN))
return ERR_PTR(-EPERM);
- dtab = kzalloc(sizeof(*dtab), GFP_USER);
+ dtab = kzalloc(sizeof(*dtab), GFP_USER | __GFP_ACCOUNT);
if (!dtab)
return ERR_PTR(-ENOMEM);
@@ -603,7 +603,8 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
struct bpf_prog *prog = NULL;
struct bpf_dtab_netdev *dev;
- dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
+ dev = kmalloc_node(sizeof(*dev),
+ GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT,
dtab->map.numa_node);
if (!dev)
return ERR_PTR(-ENOMEM);
--
2.26.2
Since bpf stopped using memlock rlimit to limit the memory usage,
there is no more reason for perf to alter its own limit.
Signed-off-by: Roman Gushchin <[email protected]>
---
tools/perf/builtin-trace.c | 10 ----------
tools/perf/tests/builtin-test.c | 6 ------
tools/perf/util/Build | 1 -
tools/perf/util/rlimit.c | 29 -----------------------------
tools/perf/util/rlimit.h | 6 ------
5 files changed, 52 deletions(-)
delete mode 100644 tools/perf/util/rlimit.c
delete mode 100644 tools/perf/util/rlimit.h
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 4cbb64edc998..3d6a98a12537 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -19,7 +19,6 @@
#include <api/fs/tracing_path.h>
#include <bpf/bpf.h>
#include "util/bpf_map.h"
-#include "util/rlimit.h"
#include "builtin.h"
#include "util/cgroup.h"
#include "util/color.h"
@@ -4838,15 +4837,6 @@ int cmd_trace(int argc, const char **argv)
goto out;
}
- /*
- * Parsing .perfconfig may entail creating a BPF event, that may need
- * to create BPF maps, so bump RLIM_MEMLOCK as the default 64K setting
- * is too small. This affects just this process, not touching the
- * global setting. If it fails we'll get something in 'perf trace -v'
- * to help diagnose the problem.
- */
- rlimit__bump_memlock();
-
err = perf_config(trace__config, &trace);
if (err)
goto out;
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index da5b6cc23f25..e4efbba8202b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -22,7 +22,6 @@
#include <subcmd/parse-options.h>
#include "string2.h"
#include "symbol.h"
-#include "util/rlimit.h"
#include <linux/kernel.h>
#include <linux/string.h>
#include <subcmd/exec-cmd.h>
@@ -794,11 +793,6 @@ int cmd_test(int argc, const char **argv)
if (skip != NULL)
skiplist = intlist__new(skip);
- /*
- * Tests that create BPF maps, for instance, need more than the 64K
- * default:
- */
- rlimit__bump_memlock();
return __cmd_test(argc, argv, skiplist);
}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 8d18380ecd10..4902cd3b3b58 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -26,7 +26,6 @@ perf-y += parse-events.o
perf-y += perf_regs.o
perf-y += path.o
perf-y += print_binary.o
-perf-y += rlimit.o
perf-y += argv_split.o
perf-y += rbtree.o
perf-y += libstring.o
diff --git a/tools/perf/util/rlimit.c b/tools/perf/util/rlimit.c
deleted file mode 100644
index 13521d392a22..000000000000
--- a/tools/perf/util/rlimit.c
+++ /dev/null
@@ -1,29 +0,0 @@
-/* SPDX-License-Identifier: LGPL-2.1 */
-
-#include "util/debug.h"
-#include "util/rlimit.h"
-#include <sys/time.h>
-#include <sys/resource.h>
-
-/*
- * Bump the memlock so that we can get bpf maps of a reasonable size,
- * like the ones used with 'perf trace' and with 'perf test bpf',
- * improve this to some specific request if needed.
- */
-void rlimit__bump_memlock(void)
-{
- struct rlimit rlim;
-
- if (getrlimit(RLIMIT_MEMLOCK, &rlim) == 0) {
- rlim.rlim_cur *= 4;
- rlim.rlim_max *= 4;
-
- if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
- rlim.rlim_cur /= 2;
- rlim.rlim_max /= 2;
-
- if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
- pr_debug("Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc\n");
- }
- }
-}
diff --git a/tools/perf/util/rlimit.h b/tools/perf/util/rlimit.h
deleted file mode 100644
index 9f59d8e710a3..000000000000
--- a/tools/perf/util/rlimit.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef __PERF_RLIMIT_H_
-#define __PERF_RLIMIT_H_
-/* SPDX-License-Identifier: LGPL-2.1 */
-
-void rlimit__bump_memlock(void);
-#endif // __PERF_RLIMIT_H_
--
2.26.2
Do not use rlimit-based memory accounting for cpumap maps.
It has been replaced with the memcg-based memory accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/cpumap.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 74ae9fcbe82e..50f3444a3301 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -86,8 +86,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
u32 value_size = attr->value_size;
struct bpf_cpu_map *cmap;
int err = -ENOMEM;
- u64 cost;
- int ret;
if (!bpf_capable())
return ERR_PTR(-EPERM);
@@ -111,26 +109,14 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
goto free_cmap;
}
- /* make sure page count doesn't overflow */
- cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
-
- /* Notice returns -EPERM on if map size is larger than memlock limit */
- ret = bpf_map_charge_init(&cmap->map.memory, cost);
- if (ret) {
- err = ret;
- goto free_cmap;
- }
-
/* Alloc array for possible remote "destination" CPUs */
cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
sizeof(struct bpf_cpu_map_entry *),
cmap->map.numa_node);
if (!cmap->cpu_map)
- goto free_charge;
+ goto free_cmap;
return &cmap->map;
-free_charge:
- bpf_map_charge_finish(&cmap->map.memory);
free_cmap:
kfree(cmap);
return ERR_PTR(err);
--
2.26.2
Include percpu objects and the size of map metadata into the
accounting.
Signed-off-by: Roman Gushchin <[email protected]>
---
kernel/bpf/hashtab.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 024276787055..9d0432170812 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -263,10 +263,11 @@ static int prealloc_init(struct bpf_htab *htab)
goto skip_percpu_elems;
for (i = 0; i < num_entries; i++) {
+ const gfp_t gfp = GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT;
u32 size = round_up(htab->map.value_size, 8);
void __percpu *pptr;
- pptr = __alloc_percpu_gfp(size, 8, GFP_USER | __GFP_NOWARN);
+ pptr = __alloc_percpu_gfp(size, 8, gfp);
if (!pptr)
goto free_elems;
htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size,
@@ -321,7 +322,7 @@ static int alloc_extra_elems(struct bpf_htab *htab)
int cpu;
pptr = __alloc_percpu_gfp(sizeof(struct htab_elem *), 8,
- GFP_USER | __GFP_NOWARN);
+ GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
if (!pptr)
return -ENOMEM;
@@ -424,7 +425,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
u64 cost;
int err;
- htab = kzalloc(sizeof(*htab), GFP_USER);
+ htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
if (!htab)
return ERR_PTR(-ENOMEM);
@@ -827,6 +828,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
bool percpu, bool onallcpus,
struct htab_elem *old_elem)
{
+ const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT;
u32 size = htab->map.value_size;
bool prealloc = htab_is_prealloc(htab);
struct htab_elem *l_new, **pl_new;
@@ -859,8 +861,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
l_new = ERR_PTR(-E2BIG);
goto dec_count;
}
- l_new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN,
- htab->map.numa_node);
+ l_new = kmalloc_node(htab->elem_size, gfp, htab->map.numa_node);
if (!l_new) {
l_new = ERR_PTR(-ENOMEM);
goto dec_count;
@@ -876,8 +877,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
pptr = htab_elem_get_ptr(l_new, key_size);
} else {
/* alloc_percpu zero-fills */
- pptr = __alloc_percpu_gfp(size, 8,
- GFP_ATOMIC | __GFP_NOWARN);
+ pptr = __alloc_percpu_gfp(size, 8, gfp);
if (!pptr) {
kfree(l_new);
l_new = ERR_PTR(-ENOMEM);
--
2.26.2
On Mon, Jul 27, 2020 at 12:23 PM Roman Gushchin <[email protected]> wrote:
>
> Include metadata and percpu data into the memcg-based memory accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> kernel/bpf/cpumap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index f1c46529929b..74ae9fcbe82e 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -99,7 +99,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> attr->map_flags & ~BPF_F_NUMA_NODE)
> return ERR_PTR(-EINVAL);
>
> - cmap = kzalloc(sizeof(*cmap), GFP_USER);
> + cmap = kzalloc(sizeof(*cmap), GFP_USER | __GFP_ACCOUNT);
> if (!cmap)
> return ERR_PTR(-ENOMEM);
>
> @@ -418,7 +418,7 @@ static struct bpf_cpu_map_entry *
> __cpu_map_entry_alloc(struct bpf_cpumap_val *value, u32 cpu, int map_id)
> {
> int numa, err, i, fd = value->bpf_prog.fd;
> - gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
> + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_NOWARN;
> struct bpf_cpu_map_entry *rcpu;
> struct xdp_bulk_queue *bq;
>
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:22 PM Roman Gushchin <[email protected]> wrote:
>
> Include map metadata and the node size (struct bpf_dtab_netdev) on
> element update into the accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
On Mon, Jul 27, 2020 at 12:20 PM Roman Gushchin <[email protected]> wrote:
>
> Include percpu objects and the size of map metadata into the
> accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
On Mon, Jul 27, 2020 at 12:22 PM Roman Gushchin <[email protected]> wrote:
>
> Include lpm trie and lpm trie node objects into the memcg-based memory
> accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> kernel/bpf/lpm_trie.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 44474bf3ab7a..d85e0fc2cafc 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -282,7 +282,7 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
> if (value)
> size += trie->map.value_size;
>
> - node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN,
> + node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT,
> trie->map.numa_node);
> if (!node)
> return NULL;
> @@ -557,7 +557,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
> attr->value_size > LPM_VAL_SIZE_MAX)
> return ERR_PTR(-EINVAL);
>
> - trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
> + trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
> if (!trie)
> return ERR_PTR(-ENOMEM);
>
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:28 PM Roman Gushchin <[email protected]> wrote:
>
> Account memory used by the socket storage.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> net/core/bpf_sk_storage.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index eafcd15e7dfd..fbcd03cd00d3 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -130,7 +130,8 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
> if (charge_omem && omem_charge(sk, smap->elem_size))
> return NULL;
>
> - selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
> + selem = kzalloc(smap->elem_size,
> + GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT);
> if (selem) {
> if (value)
> memcpy(SDATA(selem)->data, value, smap->map.value_size);
> @@ -337,7 +338,8 @@ static int sk_storage_alloc(struct sock *sk,
> if (err)
> return err;
>
> - sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN);
> + sk_storage = kzalloc(sizeof(*sk_storage),
> + GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT);
> if (!sk_storage) {
> err = -ENOMEM;
> goto uncharge;
> @@ -677,7 +679,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
> u64 cost;
> int ret;
>
> - smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
> + smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
> if (!smap)
> return ERR_PTR(-ENOMEM);
> bpf_map_init_from_attr(&smap->map, attr);
> @@ -695,7 +697,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
> }
>
> smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
> - GFP_USER | __GFP_NOWARN);
> + GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
> if (!smap->buckets) {
> bpf_map_charge_finish(&smap->map.memory);
> kfree(smap);
> @@ -1024,7 +1026,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
> }
>
> diag = kzalloc(sizeof(*diag) + sizeof(diag->maps[0]) * nr_maps,
> - GFP_KERNEL);
> + GFP_KERNEL | __GFP_ACCOUNT);
> if (!diag)
> return ERR_PTR(-ENOMEM);
>
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:27 PM Roman Gushchin <[email protected]> wrote:
>
> Include internal metadata into the memcg-based memory accounting.
> Also include the memory allocated on updating an element.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> net/core/sock_map.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 119f52a99dc1..bc797adca44c 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -38,7 +38,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
> attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
> return ERR_PTR(-EINVAL);
>
> - stab = kzalloc(sizeof(*stab), GFP_USER);
> + stab = kzalloc(sizeof(*stab), GFP_USER | __GFP_ACCOUNT);
> if (!stab)
> return ERR_PTR(-ENOMEM);
>
> @@ -829,7 +829,8 @@ static struct bpf_shtab_elem *sock_hash_alloc_elem(struct bpf_shtab *htab,
> }
> }
>
> - new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN,
> + new = kmalloc_node(htab->elem_size,
> + GFP_ATOMIC | __GFP_NOWARN | __GFP_ACCOUNT,
> htab->map.numa_node);
> if (!new) {
> atomic_dec(&htab->count);
> @@ -1011,7 +1012,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
> if (attr->key_size > MAX_BPF_STACK)
> return ERR_PTR(-E2BIG);
>
> - htab = kzalloc(sizeof(*htab), GFP_USER);
> + htab = kzalloc(sizeof(*htab), GFP_USER | __GFP_ACCOUNT);
> if (!htab)
> return ERR_PTR(-ENOMEM);
>
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:22 PM Roman Gushchin <[email protected]> wrote:
>
> Do not use rlimit-based memory accounting for cpumap maps.
> It has been replaced with the memcg-based memory accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> kernel/bpf/cpumap.c | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)
>
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index 74ae9fcbe82e..50f3444a3301 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -86,8 +86,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> u32 value_size = attr->value_size;
> struct bpf_cpu_map *cmap;
> int err = -ENOMEM;
> - u64 cost;
> - int ret;
>
> if (!bpf_capable())
> return ERR_PTR(-EPERM);
> @@ -111,26 +109,14 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> goto free_cmap;
> }
>
> - /* make sure page count doesn't overflow */
> - cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
> -
> - /* Notice returns -EPERM on if map size is larger than memlock limit */
> - ret = bpf_map_charge_init(&cmap->map.memory, cost);
> - if (ret) {
> - err = ret;
> - goto free_cmap;
> - }
> -
> /* Alloc array for possible remote "destination" CPUs */
> cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
> sizeof(struct bpf_cpu_map_entry *),
> cmap->map.numa_node);
> if (!cmap->cpu_map)
> - goto free_charge;
> + goto free_cmap;
>
> return &cmap->map;
> -free_charge:
> - bpf_map_charge_finish(&cmap->map.memory);
> free_cmap:
> kfree(cmap);
> return ERR_PTR(err);
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:20 PM Roman Gushchin <[email protected]> wrote:
>
> Do not use rlimit-based memory accounting for devmap maps.
> It has been replaced with the memcg-based memory accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> kernel/bpf/devmap.c | 18 ++----------------
> 1 file changed, 2 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 05bf93088063..8148c7260a54 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -109,8 +109,6 @@ static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
> static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
> {
> u32 valsize = attr->value_size;
> - u64 cost = 0;
> - int err;
>
> /* check sanity of attributes. 2 value sizes supported:
> * 4 bytes: ifindex
> @@ -135,21 +133,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
>
> if (!dtab->n_buckets) /* Overflow check */
> return -EINVAL;
> - cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
> - } else {
> - cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
> }
>
> - /* if map size is larger than memlock limit, reject it */
> - err = bpf_map_charge_init(&dtab->map.memory, cost);
> - if (err)
> - return -EINVAL;
> -
> if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
> dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets,
> dtab->map.numa_node);
> if (!dtab->dev_index_head)
> - goto free_charge;
> + return -ENOMEM;
>
> spin_lock_init(&dtab->index_lock);
> } else {
> @@ -157,14 +147,10 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
> sizeof(struct bpf_dtab_netdev *),
> dtab->map.numa_node);
> if (!dtab->netdev_map)
> - goto free_charge;
> + return -ENOMEM;
> }
>
> return 0;
> -
> -free_charge:
> - bpf_map_charge_finish(&dtab->map.memory);
> - return -ENOMEM;
> }
>
> static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:25 PM Roman Gushchin <[email protected]> wrote:
>
> Do not use rlimit-based memory accounting for lpm_trie maps.
> It has been replaced with the memcg-based memory accounting.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
> ---
> kernel/bpf/lpm_trie.c | 13 -------------
> 1 file changed, 13 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index d85e0fc2cafc..c747f0835eb1 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -540,8 +540,6 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
> static struct bpf_map *trie_alloc(union bpf_attr *attr)
> {
> struct lpm_trie *trie;
> - u64 cost = sizeof(*trie), cost_per_node;
> - int ret;
>
> if (!bpf_capable())
> return ERR_PTR(-EPERM);
> @@ -567,20 +565,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
> offsetof(struct bpf_lpm_trie_key, data);
> trie->max_prefixlen = trie->data_size * 8;
>
> - cost_per_node = sizeof(struct lpm_trie_node) +
> - attr->value_size + trie->data_size;
> - cost += (u64) attr->max_entries * cost_per_node;
> -
> - ret = bpf_map_charge_init(&trie->map.memory, cost);
> - if (ret)
> - goto out_err;
> -
> spin_lock_init(&trie->lock);
>
> return &trie->map;
> -out_err:
> - kfree(trie);
> - return ERR_PTR(ret);
> }
>
> static void trie_free(struct bpf_map *map)
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:21 PM Roman Gushchin <[email protected]> wrote:
>
> Since bpf stopped using memlock rlimit to limit the memory usage,
> there is no more reason for bpftool to alter its own limits.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
This can't be removed either, due to old kernel support. We probably
should have a helper function to probe RLIMIT_MEMLOCK use by BPF
subsystem, though, and not call set_max_rlimit() is not necessary.
> tools/bpf/bpftool/common.c | 7 -------
> tools/bpf/bpftool/feature.c | 2 --
> tools/bpf/bpftool/main.h | 2 --
> tools/bpf/bpftool/map.c | 2 --
> tools/bpf/bpftool/pids.c | 1 -
> tools/bpf/bpftool/prog.c | 3 ---
> tools/bpf/bpftool/struct_ops.c | 2 --
> 7 files changed, 19 deletions(-)
>
[...]
On Mon, Jul 27, 2020 at 12:21 PM Roman Gushchin <[email protected]> wrote:
>
> Since bpf stopped using memlock rlimit to limit the memory usage,
> there is no more reason for bpftool to alter its own limits.
>
> Signed-off-by: Roman Gushchin <[email protected]>
I think we will need feature check for memcg based accounting.
Thanks,
Song
On Mon, Jul 27, 2020 at 12:24 PM Roman Gushchin <[email protected]> wrote:
>
> Since bpf is not using memlock rlimit for memory accounting,
> there are no more reasons to bump the limit.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
> tools/bpf/runqslower/runqslower.c | 16 ----------------
> 1 file changed, 16 deletions(-)
>
This can go, I suppose, we still have a runqslower variant in BCC with
this logic, to show an example on what/how to do this for kernels
without this patch set applied.
Acked-by: Andrii Nakryiko <[email protected]>
> diff --git a/tools/bpf/runqslower/runqslower.c b/tools/bpf/runqslower/runqslower.c
> index d89715844952..a3380b53ce0c 100644
> --- a/tools/bpf/runqslower/runqslower.c
> +++ b/tools/bpf/runqslower/runqslower.c
> @@ -88,16 +88,6 @@ int libbpf_print_fn(enum libbpf_print_level level,
> return vfprintf(stderr, format, args);
> }
>
> -static int bump_memlock_rlimit(void)
> -{
> - struct rlimit rlim_new = {
> - .rlim_cur = RLIM_INFINITY,
> - .rlim_max = RLIM_INFINITY,
> - };
> -
> - return setrlimit(RLIMIT_MEMLOCK, &rlim_new);
> -}
> -
> void handle_event(void *ctx, int cpu, void *data, __u32 data_sz)
> {
> const struct event *e = data;
> @@ -134,12 +124,6 @@ int main(int argc, char **argv)
>
> libbpf_set_print(libbpf_print_fn);
>
> - err = bump_memlock_rlimit();
> - if (err) {
> - fprintf(stderr, "failed to increase rlimit: %d", err);
> - return 1;
> - }
> -
> obj = runqslower_bpf__open();
> if (!obj) {
> fprintf(stderr, "failed to open and/or load BPF object\n");
> --
> 2.26.2
>
On Mon, Jul 27, 2020 at 12:21 PM Roman Gushchin <[email protected]> wrote:
>
> Since bpf stopped using memlock rlimit to limit the memory usage,
> there is no more reason for perf to alter its own limit.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
Cc'd Armaldo, but I'm guessing it's a similar situation that latest
perf might be running on older kernel and should keep working.
> tools/perf/builtin-trace.c | 10 ----------
> tools/perf/tests/builtin-test.c | 6 ------
> tools/perf/util/Build | 1 -
> tools/perf/util/rlimit.c | 29 -----------------------------
> tools/perf/util/rlimit.h | 6 ------
> 5 files changed, 52 deletions(-)
> delete mode 100644 tools/perf/util/rlimit.c
> delete mode 100644 tools/perf/util/rlimit.h
>
[...]
Em Mon, Jul 27, 2020 at 11:09:43PM -0700, Andrii Nakryiko escreveu:
> On Mon, Jul 27, 2020 at 12:21 PM Roman Gushchin <[email protected]> wrote:
> >
> > Since bpf stopped using memlock rlimit to limit the memory usage,
> > there is no more reason for perf to alter its own limit.
> >
> > Signed-off-by: Roman Gushchin <[email protected]>
> > ---
>
> Cc'd Armaldo, but I'm guessing it's a similar situation that latest
> perf might be running on older kernel and should keep working.
Yes, please leave it as is, the latest perf should continue working with
older kernels, so if there is a way to figure out if the kernel running
is one where BPF doesn't use memlock rlimit for that purpose, then in
those cases we shouldn't use it.
- Arnaldo
> > tools/perf/builtin-trace.c | 10 ----------
> > tools/perf/tests/builtin-test.c | 6 ------
> > tools/perf/util/Build | 1 -
> > tools/perf/util/rlimit.c | 29 -----------------------------
> > tools/perf/util/rlimit.h | 6 ------
> > 5 files changed, 52 deletions(-)
> > delete mode 100644 tools/perf/util/rlimit.c
> > delete mode 100644 tools/perf/util/rlimit.h
> >
>
> [...]
--
- Arnaldo