From: KP Singh <[email protected]>
bpf_sk_storage can already be used by some BPF program types to annotate
socket objects. These annotations are managed with the life-cycle of the
object (i.e. freed when the object is freed) which makes BPF programs
much simpler and less prone to errors and leaks.
This patch series:
* Generalizes the bpf_sk_storage infrastructure to allow easy
implementation of local storage for other objects
* Implements local storage for inodes
* Makes both bpf_{sk, inode}_storage available to LSM programs.
Local storage is safe to use in LSM programs as the attachment sites are
limited and the owning object won't be freed, however, this is not the
case for tracing. Usage in tracing is expected to follow a white-list
based approach similar to the d_path helper
(https://lore.kernel.org/bpf/[email protected]).
Access to local storage would allow LSM programs to implement stateful
detections like detecting the unlink of a running executable from the
examples shared as a part of the KRSI series
https://lore.kernel.org/bpf/[email protected]/
and
https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_detect_exec_unlink.c
*** BLURB HERE ***
KP Singh (4):
bpf: Generalize bpf_sk_storage
bpf: Implement bpf_local_storage for inodes
bpf: Allow local storage to be used from LSM programs
bpf: Add selftests for local_storage
fs/inode.c | 3 +
.../bpf_local_storage.h} | 14 +-
include/linux/bpf_types.h | 1 +
include/linux/fs.h | 5 +
include/net/sock.h | 4 +-
include/uapi/linux/bpf.h | 54 +-
kernel/bpf/Makefile | 4 +
kernel/bpf/bpf_local_storage.c | 1595 +++++++++++++++++
kernel/bpf/bpf_lsm.c | 20 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/syscall.c | 3 +-
kernel/bpf/verifier.c | 10 +
net/bpf/test_run.c | 2 +-
net/core/Makefile | 1 -
net/core/bpf_sk_storage.c | 1183 ------------
net/core/filter.c | 2 +-
net/core/sock.c | 2 +-
net/ipv4/bpf_tcp_ca.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
tools/bpf/bpftool/map.c | 1 +
tools/include/uapi/linux/bpf.h | 54 +-
tools/lib/bpf/libbpf_probes.c | 5 +-
.../bpf/prog_tests/test_local_storage.c | 60 +
.../selftests/bpf/progs/local_storage.c | 139 ++
24 files changed, 1959 insertions(+), 1209 deletions(-)
rename include/{net/bpf_sk_storage.h => linux/bpf_local_storage.h} (72%)
create mode 100644 kernel/bpf/bpf_local_storage.c
delete mode 100644 net/core/bpf_sk_storage.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
--
2.27.0.rc0.183.gde8f92d652-goog
From: KP Singh <[email protected]>
Refactor the functionality in bpf_sk_storage.c so that concept of
storage linked to kernel objects can be extended to other objects like
inode, task_struct etc.
bpf_sk_storage is updated to be bpf_local_storage with a union that
contains a pointer to the owner object. The type of the
bpf_local_storage can be determined using the newly added
bpf_local_storage_type enum.
Each new local storage will still be a separate map and provide its own
set of helpers. This allows for future object specific extensions and
still share a lot of the underlying implementation.
Signed-off-by: KP Singh <[email protected]>
---
.../bpf_local_storage.h} | 6 +-
include/net/sock.h | 4 +-
include/uapi/linux/bpf.h | 13 +-
kernel/bpf/Makefile | 4 +
.../bpf/bpf_local_storage.c | 593 ++++++++++--------
kernel/bpf/cgroup.c | 2 +-
net/bpf/test_run.c | 2 +-
net/core/Makefile | 1 -
net/core/filter.c | 2 +-
net/core/sock.c | 2 +-
net/ipv4/bpf_tcp_ca.c | 2 +-
net/ipv4/inet_diag.c | 2 +-
tools/include/uapi/linux/bpf.h | 13 +-
13 files changed, 365 insertions(+), 281 deletions(-)
rename include/{net/bpf_sk_storage.h => linux/bpf_local_storage.h} (93%)
rename net/core/bpf_sk_storage.c => kernel/bpf/bpf_local_storage.c (63%)
diff --git a/include/net/bpf_sk_storage.h b/include/linux/bpf_local_storage.h
similarity index 93%
rename from include/net/bpf_sk_storage.h
rename to include/linux/bpf_local_storage.h
index 5036c94c0503..85524f18cd91 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2019 Facebook */
-#ifndef _BPF_SK_STORAGE_H
-#define _BPF_SK_STORAGE_H
+#ifndef _BPF_LOCAL_STORAGE_H
+#define _BPF_LOCAL_STORAGE_H
struct sock;
@@ -47,4 +47,4 @@ static inline int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
}
#endif
-#endif /* _BPF_SK_STORAGE_H */
+#endif /* _BPF_LOCAL_STORAGE_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 3e8c6d4b4b59..06b093788eb6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -245,7 +245,7 @@ struct sock_common {
/* public: */
};
-struct bpf_sk_storage;
+struct bpf_local_storage;
/**
* struct sock - network layer representation of sockets
@@ -516,7 +516,7 @@ struct sock {
void (*sk_destruct)(struct sock *sk);
struct sock_reuseport __rcu *sk_reuseport_cb;
#ifdef CONFIG_BPF_SYSCALL
- struct bpf_sk_storage __rcu *sk_bpf_storage;
+ struct bpf_local_storage __rcu *sk_bpf_storage;
#endif
struct rcu_head sk_rcu;
};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97e1fd19ff58..4a202eea15c0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2788,10 +2788,10 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*.
*
- * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used
- * together with **BPF_SK_STORAGE_GET_F_CREATE** to specify
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
* the initial value of a bpf-local-storage. If *value* is
* **NULL**, the new bpf-local-storage will be zero initialized.
* Return
@@ -3388,11 +3388,16 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
};
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<local>_storage_get flags */
enum {
- BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0),
+ BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
+ /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+ * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+ */
+ BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
};
+
/* BPF_FUNC_read_branch_records flags. */
enum {
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 375b933010dd..0d2c6703e279 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,6 +12,10 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o
ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o
+ifeq ($(CONFIG_XDP_SOCKETS),y)
+obj-$(CONFIG_BPF_SYSCALL) += xskmap.o
+endif
obj-$(CONFIG_BPF_SYSCALL) += offload.o
endif
ifeq ($(CONFIG_PERF_EVENTS),y)
diff --git a/net/core/bpf_sk_storage.c b/kernel/bpf/bpf_local_storage.c
similarity index 63%
rename from net/core/bpf_sk_storage.c
rename to kernel/bpf/bpf_local_storage.c
index d2c4d16dadba..0a1caac2f5f7 100644
--- a/net/core/bpf_sk_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -6,14 +6,14 @@
#include <linux/types.h>
#include <linux/spinlock.h>
#include <linux/bpf.h>
-#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
static atomic_t cache_idx;
-#define SK_STORAGE_CREATE_FLAG_MASK \
+#define LOCAL_STORAGE_CREATE_FLAG_MASK \
(BPF_F_NO_PREALLOC | BPF_F_CLONE)
struct bucket {
@@ -21,11 +21,15 @@ struct bucket {
raw_spinlock_t lock;
};
-/* Thp map is not the primary owner of a bpf_sk_storage_elem.
+enum bpf_local_storage_type {
+ BPF_LOCAL_STORAGE_SK,
+};
+
+/* Thp map is not the primary owner of a bpf_local_storage_elem.
* Instead, the sk->sk_bpf_storage is.
*
- * The map (bpf_sk_storage_map) is for two purposes
- * 1. Define the size of the "sk local storage". It is
+ * The map (bpf_local_storage_map) is for two purposes
+ * 1. Define the size of the "local storage". It is
* the map's value_size.
*
* 2. Maintain a list to keep track of all elems such
@@ -34,12 +38,12 @@ struct bucket {
* When a bpf local storage is being looked up for a
* particular sk, the "bpf_map" pointer is actually used
* as the "key" to search in the list of elem in
- * sk->sk_bpf_storage.
+ * the respective bpf_local_storage owned by the object.
*
- * Hence, consider sk->sk_bpf_storage is the mini-map
- * with the "bpf_map" pointer as the searching key.
+ * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
+ * as the searching key.
*/
-struct bpf_sk_storage_map {
+struct bpf_local_storage_map {
struct bpf_map map;
/* Lookup elem does not require accessing the map.
*
@@ -53,46 +57,51 @@ struct bpf_sk_storage_map {
u16 cache_idx;
};
-struct bpf_sk_storage_data {
+struct bpf_local_storage_data {
/* smap is used as the searching key when looking up
* from sk->sk_bpf_storage.
*
* Put it in the same cacheline as the data to minimize
* the number of cachelines access during the cache hit case.
*/
- struct bpf_sk_storage_map __rcu *smap;
+ struct bpf_local_storage_map __rcu *smap;
u8 data[] __aligned(8);
};
-/* Linked to bpf_sk_storage and bpf_sk_storage_map */
-struct bpf_sk_storage_elem {
- struct hlist_node map_node; /* Linked to bpf_sk_storage_map */
- struct hlist_node snode; /* Linked to bpf_sk_storage */
- struct bpf_sk_storage __rcu *sk_storage;
+/* Linked to bpf_local_storage and bpf_local_storage_map */
+struct bpf_local_storage_elem {
+ struct hlist_node map_node; /* Linked to bpf_local_storage_map */
+ struct hlist_node snode; /* Linked to bpf_local_storage */
+ struct bpf_local_storage __rcu *local_storage;
struct rcu_head rcu;
/* 8 bytes hole */
/* The data is stored in aother cacheline to minimize
* the number of cachelines access during a cache hit.
*/
- struct bpf_sk_storage_data sdata ____cacheline_aligned;
+ struct bpf_local_storage_data sdata ____cacheline_aligned;
};
-#define SELEM(_SDATA) container_of((_SDATA), struct bpf_sk_storage_elem, sdata)
+#define SELEM(_SDATA) \
+ container_of((_SDATA), struct bpf_local_storage_elem, sdata)
#define SDATA(_SELEM) (&(_SELEM)->sdata)
-#define BPF_SK_STORAGE_CACHE_SIZE 16
-
-struct bpf_sk_storage {
- struct bpf_sk_storage_data __rcu *cache[BPF_SK_STORAGE_CACHE_SIZE];
- struct hlist_head list; /* List of bpf_sk_storage_elem */
- struct sock *sk; /* The sk that owns the the above "list" of
- * bpf_sk_storage_elem.
- */
+#define BPF_STORAGE_CACHE_SIZE 16
+
+struct bpf_local_storage {
+ struct bpf_local_storage_data __rcu *cache[BPF_STORAGE_CACHE_SIZE];
+ struct hlist_head list; /* List of bpf_local_storage_elem */
+ /* The object that owns the the above "list" of
+ * bpf_local_storage_elem
+ */
+ union {
+ struct sock *sk;
+ };
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
+ enum bpf_local_storage_type stype;
};
-static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+static struct bucket *select_bucket(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
}
@@ -109,24 +118,20 @@ static int omem_charge(struct sock *sk, unsigned int size)
return -ENOMEM;
}
-static bool selem_linked_to_sk(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_node(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->snode);
}
-static bool selem_linked_to_map(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->map_node);
}
-static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
- struct sock *sk, void *value,
- bool charge_omem)
+static struct bpf_local_storage_elem *selem_alloc(
+ struct bpf_local_storage_map *smap, void *value)
{
- struct bpf_sk_storage_elem *selem;
-
- if (charge_omem && omem_charge(sk, smap->elem_size))
- return NULL;
+ struct bpf_local_storage_elem *selem;
selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
if (selem) {
@@ -135,94 +140,103 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
return selem;
}
+ return NULL;
+}
+
+static struct bpf_local_storage_elem *sk_selem_alloc(
+ struct bpf_local_storage_map *smap, struct sock *sk, void *value,
+ bool charge_omem)
+{
+ struct bpf_local_storage_elem *selem;
+
+ if (charge_omem && omem_charge(sk, smap->elem_size))
+ return NULL;
+
+ selem = selem_alloc(smap, value);
+ if (selem)
+ return selem;
+
if (charge_omem)
atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
return NULL;
}
-/* sk_storage->lock must be held and selem->sk_storage == sk_storage.
+static void __unlink_local_storage(struct bpf_local_storage *local_storage,
+ bool uncharge_omem)
+{
+ struct sock *sk;
+
+ switch (local_storage->stype) {
+ case BPF_LOCAL_STORAGE_SK:
+ sk = local_storage->sk;
+ if (uncharge_omem)
+ atomic_sub(sizeof(struct bpf_local_storage),
+ &sk->sk_omem_alloc);
+
+ /* After this RCU_INIT, sk may be freed and cannot be used */
+ RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
+ local_storage->sk = NULL;
+ break;
+ }
+}
+
+/* local_storage->lock must be held and selem->local_storage == local_storage.
* The caller must ensure selem->smap is still valid to be
* dereferenced for its smap->elem_size and smap->cache_idx.
+ *
+ * uncharge_omem is only relevant when:
+ *
+ * local_storage->stype == BPF_LOCAL_STORAGE_SK
+ *
*/
-static bool __selem_unlink_sk(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_elem *selem,
- bool uncharge_omem)
+static bool __selem_unlink(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_omem)
{
- struct bpf_sk_storage_map *smap;
- bool free_sk_storage;
- struct sock *sk;
+ struct bpf_local_storage_map *smap;
+ bool free_local_storage;
smap = rcu_dereference(SDATA(selem)->smap);
- sk = sk_storage->sk;
-
- /* All uncharging on sk->sk_omem_alloc must be done first.
- * sk may be freed once the last selem is unlinked from sk_storage.
+ free_local_storage = hlist_is_singular_node(&selem->snode,
+ &local_storage->list);
+
+ /* local_storage is not freed now. local_storage->lock is
+ * still held and raw_spin_unlock_bh(&local_storage->lock)
+ * will be done by the caller.
+ * Although the unlock will be done under
+ * rcu_read_lock(), it is more intutivie to
+ * read if kfree_rcu(local_storage, rcu) is done
+ * after the raw_spin_unlock_bh(&local_storage->lock).
+ *
+ * Hence, a "bool free_local_storage" is returned
+ * to the caller which then calls the kfree_rcu()
+ * after unlock.
*/
- if (uncharge_omem)
- atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+ if (free_local_storage)
+ __unlink_local_storage(local_storage, uncharge_omem);
- free_sk_storage = hlist_is_singular_node(&selem->snode,
- &sk_storage->list);
- if (free_sk_storage) {
- atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc);
- sk_storage->sk = NULL;
- /* After this RCU_INIT, sk may be freed and cannot be used */
- RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
- /* sk_storage is not freed now. sk_storage->lock is
- * still held and raw_spin_unlock_bh(&sk_storage->lock)
- * will be done by the caller.
- *
- * Although the unlock will be done under
- * rcu_read_lock(), it is more intutivie to
- * read if kfree_rcu(sk_storage, rcu) is done
- * after the raw_spin_unlock_bh(&sk_storage->lock).
- *
- * Hence, a "bool free_sk_storage" is returned
- * to the caller which then calls the kfree_rcu()
- * after unlock.
- */
- }
hlist_del_init_rcu(&selem->snode);
- if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) ==
+ if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
SDATA(selem))
- RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL);
+ RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
kfree_rcu(selem, rcu);
- return free_sk_storage;
+ return free_local_storage;
}
-static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
+static void __selem_link(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage *sk_storage;
- bool free_sk_storage = false;
-
- if (unlikely(!selem_linked_to_sk(selem)))
- /* selem has already been unlinked from sk */
- return;
-
- sk_storage = rcu_dereference(selem->sk_storage);
- raw_spin_lock_bh(&sk_storage->lock);
- if (likely(selem_linked_to_sk(selem)))
- free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
- raw_spin_unlock_bh(&sk_storage->lock);
-
- if (free_sk_storage)
- kfree_rcu(sk_storage, rcu);
-}
-
-static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_elem *selem)
-{
- RCU_INIT_POINTER(selem->sk_storage, sk_storage);
- hlist_add_head(&selem->snode, &sk_storage->list);
+ RCU_INIT_POINTER(selem->local_storage, local_storage);
+ hlist_add_head(&selem->snode, &local_storage->list);
}
-static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
+static void selem_unlink_map(struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
struct bucket *b;
if (unlikely(!selem_linked_to_map(selem)))
@@ -237,8 +251,8 @@ static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
raw_spin_unlock_bh(&b->lock);
}
-static void selem_link_map(struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+static void selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
struct bucket *b = select_bucket(smap, selem);
@@ -248,31 +262,46 @@ static void selem_link_map(struct bpf_sk_storage_map *smap,
raw_spin_unlock_bh(&b->lock);
}
-static void selem_unlink(struct bpf_sk_storage_elem *selem)
+static void selem_unlink(struct bpf_local_storage_elem *selem)
{
- /* Always unlink from map before unlinking from sk_storage
+ struct bpf_local_storage *local_storage;
+ bool free_local_storage = false;
+
+ /* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
- * the sk_storage.
+ * the local_storage.
*/
selem_unlink_map(selem);
- selem_unlink_sk(selem);
+
+ if (unlikely(!selem_linked_to_node(selem)))
+ /* selem has already been unlinked from its owner */
+ return;
+
+ local_storage = rcu_dereference(selem->local_storage);
+ raw_spin_lock_bh(&local_storage->lock);
+ if (likely(selem_linked_to_node(selem)))
+ free_local_storage = __selem_unlink(local_storage, selem, true);
+ raw_spin_unlock_bh(&local_storage->lock);
+
+ if (free_local_storage)
+ kfree_rcu(local_storage, rcu);
}
-static struct bpf_sk_storage_data *
-__sk_storage_lookup(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_map *smap,
+static struct bpf_local_storage_data *
+__local_storage_lookup(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_map *smap,
bool cacheit_lockit)
{
- struct bpf_sk_storage_data *sdata;
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage_data *sdata;
+ struct bpf_local_storage_elem *selem;
/* Fast path (cache hit) */
- sdata = rcu_dereference(sk_storage->cache[smap->cache_idx]);
+ sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
if (sdata && rcu_access_pointer(sdata->smap) == smap)
return sdata;
/* Slow path (cache miss) */
- hlist_for_each_entry_rcu(selem, &sk_storage->list, snode)
+ hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
if (rcu_access_pointer(SDATA(selem)->smap) == smap)
break;
@@ -284,33 +313,33 @@ __sk_storage_lookup(struct bpf_sk_storage *sk_storage,
/* spinlock is needed to avoid racing with the
* parallel delete. Otherwise, publishing an already
* deleted sdata to the cache will become a use-after-free
- * problem in the next __sk_storage_lookup().
+ * problem in the next __local_storage_lookup().
*/
- raw_spin_lock_bh(&sk_storage->lock);
- if (selem_linked_to_sk(selem))
- rcu_assign_pointer(sk_storage->cache[smap->cache_idx],
- sdata);
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_lock_bh(&local_storage->lock);
+ if (selem_linked_to_node(selem))
+ rcu_assign_pointer(
+ local_storage->cache[smap->cache_idx], sdata);
+ raw_spin_unlock_bh(&local_storage->lock);
}
return sdata;
}
-static struct bpf_sk_storage_data *
+static struct bpf_local_storage_data *
sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
{
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_map *smap;
sk_storage = rcu_dereference(sk->sk_bpf_storage);
if (!sk_storage)
return NULL;
- smap = (struct bpf_sk_storage_map *)map;
- return __sk_storage_lookup(sk_storage, smap, cacheit_lockit);
+ smap = (struct bpf_local_storage_map *)map;
+ return __local_storage_lookup(sk_storage, smap, cacheit_lockit);
}
-static int check_flags(const struct bpf_sk_storage_data *old_sdata,
+static int check_flags(const struct bpf_local_storage_data *old_sdata,
u64 map_flags)
{
if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
@@ -324,93 +353,129 @@ static int check_flags(const struct bpf_sk_storage_data *old_sdata,
return 0;
}
-static int sk_storage_alloc(struct sock *sk,
- struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *first_selem)
+static struct bpf_local_storage *
+bpf_local_storage_alloc(struct bpf_local_storage_map *smap)
{
- struct bpf_sk_storage *prev_sk_storage, *sk_storage;
- int err;
+ struct bpf_local_storage *storage;
- err = omem_charge(sk, sizeof(*sk_storage));
- if (err)
- return err;
+ storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
+ if (!storage)
+ return NULL;
- sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN);
- if (!sk_storage) {
- err = -ENOMEM;
- goto uncharge;
- }
- INIT_HLIST_HEAD(&sk_storage->list);
- raw_spin_lock_init(&sk_storage->lock);
- sk_storage->sk = sk;
+ INIT_HLIST_HEAD(&storage->list);
+ raw_spin_lock_init(&storage->lock);
+ return storage;
+}
- __selem_link_sk(sk_storage, first_selem);
- selem_link_map(smap, first_selem);
- /* Publish sk_storage to sk. sk->sk_lock cannot be acquired.
- * Hence, atomic ops is used to set sk->sk_bpf_storage
- * from NULL to the newly allocated sk_storage ptr.
- *
- * From now on, the sk->sk_bpf_storage pointer is protected
- * by the sk_storage->lock. Hence, when freeing
- * the sk->sk_bpf_storage, the sk_storage->lock must
- * be held before setting sk->sk_bpf_storage to NULL.
- */
- prev_sk_storage = cmpxchg((struct bpf_sk_storage **)&sk->sk_bpf_storage,
- NULL, sk_storage);
- if (unlikely(prev_sk_storage)) {
- selem_unlink_map(first_selem);
- err = -EAGAIN;
- goto uncharge;
+/* Publish local_storage to the address. This is used because we are already
+ * in a region where we cannot grab a lock on the object owning the storage (
+ * (e.g sk->sk_lock). Hence, atomic ops is used.
+ *
+ * From now on, the addr pointer is protected
+ * by the local_storage->lock. Hence, upon freeing,
+ * the local_storage->lock must be held before unlinking the storage from the
+ * owner.
+ */
+static int publish_local_storage(struct bpf_local_storage_elem *first_selem,
+ struct bpf_local_storage **addr,
+ struct bpf_local_storage *curr)
+{
+ struct bpf_local_storage *prev;
+ prev = cmpxchg(addr, NULL, curr);
+ if (unlikely(prev)) {
/* Note that even first_selem was linked to smap's
* bucket->list, first_selem can be freed immediately
* (instead of kfree_rcu) because
- * bpf_sk_storage_map_free() does a
+ * bpf_local_storage_map_free() does a
* synchronize_rcu() before walking the bucket->list.
* Hence, no one is accessing selem from the
* bucket->list under rcu_read_lock().
*/
+ selem_unlink_map(first_selem);
+ return -EAGAIN;
}
return 0;
+}
+
+static int sk_storage_alloc(struct sock *sk,
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem)
+{
+ struct bpf_local_storage *curr;
+ int err;
+
+ err = omem_charge(sk, sizeof(*curr));
+ if (err)
+ return err;
+
+ curr = bpf_local_storage_alloc(smap);
+ if (!curr) {
+ err = -ENOMEM;
+ goto uncharge;
+ }
+
+ curr->sk = sk;
+ curr->stype = BPF_LOCAL_STORAGE_SK;
+
+ __selem_link(curr, first_selem);
+ selem_link_map(smap, first_selem);
+
+ err = publish_local_storage(first_selem,
+ (struct bpf_local_storage **)&sk->sk_bpf_storage, curr);
+ if (err)
+ goto uncharge;
+
+ return 0;
uncharge:
- kfree(sk_storage);
- atomic_sub(sizeof(*sk_storage), &sk->sk_omem_alloc);
+ kfree(curr);
+ atomic_sub(sizeof(*curr), &sk->sk_omem_alloc);
return err;
}
+static int check_update_flags(struct bpf_map *map, u64 map_flags)
+{
+ /* BPF_EXIST and BPF_NOEXIST cannot be both set */
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
+ /* BPF_F_LOCK can only be used in a value with spin_lock */
+ unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+ return -EINVAL;
+
+ return 0;
+}
+
/* sk cannot be going away because it is linking new elem
* to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
* Otherwise, it will become a leak (and other memory issues
* during map destruction).
*/
-static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
+static struct bpf_local_storage_data *sk_storage_update(struct sock *sk,
struct bpf_map *map,
void *value,
u64 map_flags)
{
- struct bpf_sk_storage_data *old_sdata = NULL;
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_data *old_sdata = NULL;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ struct bpf_local_storage_map *smap;
int err;
- /* BPF_EXIST and BPF_NOEXIST cannot be both set */
- if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
- /* BPF_F_LOCK can only be used in a value with spin_lock */
- unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
- return ERR_PTR(-EINVAL);
+ err = check_update_flags(map, map_flags);
+ if (err)
+ return ERR_PTR(err);
- smap = (struct bpf_sk_storage_map *)map;
- sk_storage = rcu_dereference(sk->sk_bpf_storage);
- if (!sk_storage || hlist_empty(&sk_storage->list)) {
+ smap = (struct bpf_local_storage_map *)map;
+
+ local_storage = rcu_dereference(sk->sk_bpf_storage);
+ if (!local_storage || hlist_empty(&local_storage->list)) {
/* Very first elem for this sk */
err = check_flags(NULL, map_flags);
if (err)
return ERR_PTR(err);
- selem = selem_alloc(smap, sk, value, true);
+ selem = sk_selem_alloc(smap, sk, value, true);
if (!selem)
return ERR_PTR(-ENOMEM);
@@ -426,25 +491,26 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
/* Hoping to find an old_sdata to do inline update
- * such that it can avoid taking the sk_storage->lock
+ * such that it can avoid taking the local_storage->lock
* and changing the lists.
*/
- old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+ old_sdata = __local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
return ERR_PTR(err);
- if (old_sdata && selem_linked_to_sk(SELEM(old_sdata))) {
+
+ if (old_sdata && selem_linked_to_node(SELEM(old_sdata))) {
copy_map_value_locked(map, old_sdata->data,
value, false);
return old_sdata;
}
}
- raw_spin_lock_bh(&sk_storage->lock);
+ raw_spin_lock_bh(&local_storage->lock);
- /* Recheck sk_storage->list under sk_storage->lock */
- if (unlikely(hlist_empty(&sk_storage->list))) {
- /* A parallel del is happening and sk_storage is going
+ /* Recheck local_storage->list under local_storage->lock */
+ if (unlikely(hlist_empty(&local_storage->list))) {
+ /* A parallel del is happening and local_storage is going
* away. It has just been checked before, so very
* unlikely. Return instead of retry to keep things
* simple.
@@ -453,7 +519,7 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
goto unlock_err;
}
- old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+ old_sdata = __local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
goto unlock_err;
@@ -464,15 +530,15 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
goto unlock;
}
- /* sk_storage->lock is held. Hence, we are sure
+ /* local_storage->lock is held. Hence, we are sure
* we can unlink and uncharge the old_sdata successfully
* later. Hence, instead of charging the new selem now
* and then uncharge the old selem later (which may cause
* a potential but unnecessary charge failure), avoid taking
* a charge at all here (the "!old_sdata" check) and the
- * old_sdata will not be uncharged later during __selem_unlink_sk().
+ * old_sdata will not be uncharged later during __selem_unlink().
*/
- selem = selem_alloc(smap, sk, value, !old_sdata);
+ selem = sk_selem_alloc(smap, sk, value, !old_sdata);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
@@ -481,27 +547,27 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
/* First, link the new selem to the map */
selem_link_map(smap, selem);
- /* Second, link (and publish) the new selem to sk_storage */
- __selem_link_sk(sk_storage, selem);
+ /* Second, link (and publish) the new selem to local_storage */
+ __selem_link(local_storage, selem);
/* Third, remove old selem, SELEM(old_sdata) */
if (old_sdata) {
selem_unlink_map(SELEM(old_sdata));
- __selem_unlink_sk(sk_storage, SELEM(old_sdata), false);
+ __selem_unlink(local_storage, SELEM(old_sdata), false);
}
unlock:
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_unlock_bh(&local_storage->lock);
return SDATA(selem);
unlock_err:
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_unlock_bh(&local_storage->lock);
return ERR_PTR(err);
}
static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
sdata = sk_storage_lookup(sk, map, false);
if (!sdata)
@@ -515,8 +581,8 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
void bpf_sk_storage_free(struct sock *sk)
{
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *sk_storage;
bool free_sk_storage = false;
struct hlist_node *n;
@@ -530,9 +596,9 @@ void bpf_sk_storage_free(struct sock *sk)
/* Netiher the bpf_prog nor the bpf-map's syscall
* could be modifying the sk_storage->list now.
* Thus, no elem can be added-to or deleted-from the
- * sk_storage->list by the bpf_prog or by the bpf-map's syscall.
+ * local_storage->list by the bpf_prog or by the bpf-map's syscall.
*
- * It is racing with bpf_sk_storage_map_free() alone
+ * It is racing with bpf_local_storage_map_free() alone
* when unlinking elem from the sk_storage->list and
* the map's bucket->list.
*/
@@ -542,7 +608,7 @@ void bpf_sk_storage_free(struct sock *sk)
* sk_storage.
*/
selem_unlink_map(selem);
- free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
+ free_sk_storage = __selem_unlink(sk_storage, selem, true);
}
raw_spin_unlock_bh(&sk_storage->lock);
rcu_read_unlock();
@@ -551,14 +617,14 @@ void bpf_sk_storage_free(struct sock *sk)
kfree_rcu(sk_storage, rcu);
}
-static void bpf_sk_storage_map_free(struct bpf_map *map)
+static void bpf_local_storage_map_free(struct bpf_map *map)
{
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_map *smap;
struct bucket *b;
unsigned int i;
- smap = (struct bpf_sk_storage_map *)map;
+ smap = (struct bpf_local_storage_map *)map;
/* Note that this map might be concurrently cloned from
* bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
@@ -569,11 +635,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
/* bpf prog and the userspace can no longer access this map
* now. No new selem (of this map) can be added
- * to the sk->sk_bpf_storage or to the map bucket's list.
+ * to the bpf_local_storage or to the map bucket's list.
*
* The elem of this map can be cleaned up here
- * or
- * by bpf_sk_storage_free() during __sk_destruct().
+ * or by bpf_local_storage_free() during the destruction of the
+ * owner object. eg. __sk_destruct.
*/
for (i = 0; i < (1U << smap->bucket_log); i++) {
b = &smap->buckets[i];
@@ -581,7 +647,7 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
rcu_read_lock();
/* No one is adding to b->list now */
while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
- struct bpf_sk_storage_elem,
+ struct bpf_local_storage_elem,
map_node))) {
selem_unlink(selem);
cond_resched_rcu();
@@ -589,17 +655,17 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
rcu_read_unlock();
}
- /* bpf_sk_storage_free() may still need to access the map.
- * e.g. bpf_sk_storage_free() has unlinked selem from the map
+ /* bpf_local_storage_free() may still need to access the map.
+ * e.g. bpf_local_storage_free() has unlinked selem from the map
* which then made the above while((selem = ...)) loop
* exited immediately.
*
- * However, the bpf_sk_storage_free() still needs to access
+ * However, the bpf_local_storage_free() still needs to access
* the smap->elem_size to do the uncharging in
- * __selem_unlink_sk().
+ * __selem_unlink().
*
* Hence, wait another rcu grace period for the
- * bpf_sk_storage_free() to finish.
+ * bpf_local_storage_free() to finish.
*/
synchronize_rcu();
@@ -612,12 +678,13 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
*/
#define MAX_VALUE_SIZE \
min_t(u32, \
- (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \
- (U16_MAX - sizeof(struct bpf_sk_storage_elem)))
+ (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \
+ sizeof(struct bpf_local_storage_elem)), \
+ (U16_MAX - sizeof(struct bpf_local_storage_elem)))
-static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
+static int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
{
- if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+ if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK ||
!(attr->map_flags & BPF_F_NO_PREALLOC) ||
attr->max_entries ||
attr->key_size != sizeof(int) || !attr->value_size ||
@@ -634,9 +701,11 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
return 0;
}
-static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
+
+static struct bpf_map *
+bpf_local_storage_map_alloc(union bpf_attr *attr)
{
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
u64 cost;
@@ -672,9 +741,10 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
raw_spin_lock_init(&smap->buckets[i].lock);
}
- smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
+ smap->elem_size =
+ sizeof(struct bpf_local_storage_elem) + attr->value_size;
smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
- BPF_SK_STORAGE_CACHE_SIZE;
+ BPF_STORAGE_CACHE_SIZE;
return &smap->map;
}
@@ -685,7 +755,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
-static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
+static int bpf_local_storage_map_check_btf(const struct bpf_map *map,
const struct btf *btf,
const struct btf_type *key_type,
const struct btf_type *value_type)
@@ -702,11 +772,11 @@ static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
return 0;
}
-static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
+static void *bpf_sk_storage_lookup_elem(struct bpf_map *map, void *key)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
struct socket *sock;
- int fd, err;
+ int fd, err = -EINVAL;
fd = *(int *)key;
sock = sockfd_lookup(fd, &err);
@@ -719,17 +789,18 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
return ERR_PTR(err);
}
-static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
+static int bpf_sk_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
struct socket *sock;
- int fd, err;
+ int fd, err = -EINVAL;
fd = *(int *)key;
sock = sockfd_lookup(fd, &err);
if (sock) {
- sdata = sk_storage_update(sock->sk, map, value, map_flags);
+ sdata = sk_storage_update(sock->sk, map, value,
+ map_flags);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -737,30 +808,29 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
return err;
}
-static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
+static int bpf_sk_storage_delete_elem(struct bpf_map *map, void *key)
{
struct socket *sock;
- int fd, err;
+ int fd, err = -EINVAL;
fd = *(int *)key;
sock = sockfd_lookup(fd, &err);
if (sock) {
err = sk_storage_delete(sock->sk, map);
sockfd_put(sock);
- return err;
}
return err;
}
-static struct bpf_sk_storage_elem *
+static struct bpf_local_storage_elem *
bpf_sk_storage_clone_elem(struct sock *newsk,
- struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage_elem *copy_selem;
+ struct bpf_local_storage_elem *copy_selem;
- copy_selem = selem_alloc(smap, newsk, NULL, true);
+ copy_selem = sk_selem_alloc(smap, newsk, NULL, true);
if (!copy_selem)
return NULL;
@@ -776,9 +846,9 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
{
- struct bpf_sk_storage *new_sk_storage = NULL;
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage *new_sk_storage = NULL;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
int ret = 0;
RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
@@ -790,8 +860,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
goto out;
hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
- struct bpf_sk_storage_elem *copy_selem;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_elem *copy_selem;
+ struct bpf_local_storage_map *smap;
struct bpf_map *map;
smap = rcu_dereference(SDATA(selem)->smap);
@@ -799,7 +869,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
continue;
/* Note that for lockless listeners adding new element
- * here can race with cleanup in bpf_sk_storage_map_free.
+ * here can race with cleanup in bpf_local_storage_map_free.
* Try to grab map refcnt to make sure that it's still
* alive and prevent concurrent removal.
*/
@@ -816,7 +886,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
if (new_sk_storage) {
selem_link_map(smap, copy_selem);
- __selem_link_sk(new_sk_storage, copy_selem);
+ __selem_link(new_sk_storage, copy_selem);
} else {
ret = sk_storage_alloc(newsk, smap, copy_selem);
if (ret) {
@@ -827,7 +897,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
goto out;
}
- new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+ new_sk_storage =
+ rcu_dereference(copy_selem->local_storage);
}
bpf_map_put(map);
}
@@ -836,7 +907,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
rcu_read_unlock();
/* In case of an error, don't free anything explicitly here, the
- * caller is responsible to call bpf_sk_storage_free.
+ * caller is responsible to call bpf_local_storage_free.
*/
return ret;
@@ -845,7 +916,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
if (flags > BPF_SK_STORAGE_GET_F_CREATE)
return (unsigned long)NULL;
@@ -854,7 +925,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
if (sdata)
return (unsigned long)sdata->data;
- if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
+ if (flags == BPF_LOCAL_STORAGE_GET_F_CREATE &&
/* Cannot add new elem to a going away sk.
* Otherwise, the new elem may become a leak
* (and also other memory issues during map
@@ -887,14 +958,14 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
}
const struct bpf_map_ops sk_storage_map_ops = {
- .map_alloc_check = bpf_sk_storage_map_alloc_check,
- .map_alloc = bpf_sk_storage_map_alloc,
- .map_free = bpf_sk_storage_map_free,
+ .map_alloc_check = bpf_local_storage_map_alloc_check,
+ .map_alloc = bpf_local_storage_map_alloc,
+ .map_free = bpf_local_storage_map_free,
.map_get_next_key = notsupp_get_next_key,
- .map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
- .map_update_elem = bpf_fd_sk_storage_update_elem,
- .map_delete_elem = bpf_fd_sk_storage_delete_elem,
- .map_check_btf = bpf_sk_storage_map_check_btf,
+ .map_lookup_elem = bpf_sk_storage_lookup_elem,
+ .map_update_elem = bpf_sk_storage_update_elem,
+ .map_delete_elem = bpf_sk_storage_delete_elem,
+ .map_check_btf = bpf_local_storage_map_check_btf,
};
const struct bpf_func_proto bpf_sk_storage_get_proto = {
@@ -975,7 +1046,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
u32 nr_maps = 0;
int rem, err;
- /* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as
+ /* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as
* the map_alloc_check() side also does.
*/
if (!bpf_capable())
@@ -1025,10 +1096,10 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
}
EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc);
-static int diag_get(struct bpf_sk_storage_data *sdata, struct sk_buff *skb)
+static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb)
{
struct nlattr *nla_stg, *nla_value;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
/* It cannot exceed max nlattr's payload */
BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < MAX_VALUE_SIZE);
@@ -1067,9 +1138,9 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
{
/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
unsigned int diag_size = nla_total_size(0);
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_map *smap;
struct nlattr *nla_stgs;
unsigned int saved_len;
int err = 0;
@@ -1122,8 +1193,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
{
/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
unsigned int diag_size = nla_total_size(0);
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_data *sdata;
struct nlattr *nla_stgs;
unsigned int saved_len;
int err = 0;
@@ -1150,8 +1221,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
saved_len = skb->len;
for (i = 0; i < diag->nr_maps; i++) {
- sdata = __sk_storage_lookup(sk_storage,
- (struct bpf_sk_storage_map *)diag->maps[i],
+ sdata = __local_storage_lookup(sk_storage,
+ (struct bpf_local_storage_map *)diag->maps[i],
false);
if (!sdata)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5c0e964105ac..a6018ece357f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -14,8 +14,8 @@
#include <linux/string.h>
#include <linux/bpf.h>
#include <linux/bpf-cgroup.h>
+#include <linux/bpf_local_storage.h>
#include <net/sock.h>
-#include <net/bpf_sk_storage.h>
#include "../cgroup/cgroup-internal.h"
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index bfd4ccd80847..b1547efcc842 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -7,7 +7,7 @@
#include <linux/etherdevice.h>
#include <linux/filter.h>
#include <linux/sched/signal.h>
-#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
#include <net/sock.h>
#include <net/tcp.h>
#include <linux/error-injection.h>
diff --git a/net/core/Makefile b/net/core/Makefile
index 3e2c378e5f31..9ddc2f3ecd97 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -34,4 +34,3 @@ obj-$(CONFIG_HWBM) += hwbm.o
obj-$(CONFIG_NET_DEVLINK) += devlink.o
obj-$(CONFIG_GRO_CELLS) += gro_cells.o
obj-$(CONFIG_FAILOVER) += failover.o
-obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
diff --git a/net/core/filter.c b/net/core/filter.c
index bd2853d23b50..b1556baa1235 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -72,7 +72,7 @@
#include <net/seg6_local.h>
#include <net/lwtunnel.h>
#include <net/ipv6_stubs.h>
-#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
diff --git a/net/core/sock.c b/net/core/sock.c
index fd85e651ce28..26e25b868dda 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -130,8 +130,8 @@
#include <linux/sock_diag.h>
#include <linux/filter.h>
+#include <linux/bpf_local_storage.h>
#include <net/sock_reuseport.h>
-#include <net/bpf_sk_storage.h>
#include <trace/events/sock.h>
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index e3939f76b024..075abf1c6d7e 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -7,7 +7,7 @@
#include <linux/btf.h>
#include <linux/filter.h>
#include <net/tcp.h>
-#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
static u32 optional_ops[] = {
offsetof(struct tcp_congestion_ops, init),
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 125f4f8a36b4..ce22867cce5d 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -23,7 +23,7 @@
#include <net/inet_hashtables.h>
#include <net/inet_timewait_sock.h>
#include <net/inet6_hashtables.h>
-#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
#include <net/netlink.h>
#include <linux/inet.h>
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 97e1fd19ff58..4a202eea15c0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2788,10 +2788,10 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*.
*
- * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used
- * together with **BPF_SK_STORAGE_GET_F_CREATE** to specify
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
* the initial value of a bpf-local-storage. If *value* is
* **NULL**, the new bpf-local-storage will be zero initialized.
* Return
@@ -3388,11 +3388,16 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
};
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<local>_storage_get flags */
enum {
- BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0),
+ BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
+ /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+ * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+ */
+ BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
};
+
/* BPF_FUNC_read_branch_records flags. */
enum {
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
--
2.27.0.rc0.183.gde8f92d652-goog
From: KP Singh <[email protected]>
Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.
Since, the intention is to use this in LSM programs, the destruction is
done after security_inode_free in __destroy_inode.
Signed-off-by: KP Singh <[email protected]>
---
fs/inode.c | 3 +
include/linux/bpf_local_storage.h | 6 +
include/linux/bpf_types.h | 1 +
include/linux/fs.h | 5 +
include/uapi/linux/bpf.h | 41 +++-
kernel/bpf/bpf_local_storage.c | 321 +++++++++++++++++++++++++++++-
kernel/bpf/syscall.c | 3 +-
kernel/bpf/verifier.c | 10 +
tools/bpf/bpftool/map.c | 1 +
tools/include/uapi/linux/bpf.h | 41 +++-
tools/lib/bpf/libbpf_probes.c | 5 +-
11 files changed, 431 insertions(+), 6 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index cc6e701b7e5d..91c81644f33d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -22,6 +22,7 @@
#include <linux/list_lru.h>
#include <linux/iversion.h>
#include <trace/events/writeback.h>
+#include <linux/bpf_local_storage.h>
#include "internal.h"
/*
@@ -257,6 +258,8 @@ void __destroy_inode(struct inode *inode)
security_inode_free(inode);
fsnotify_inode_delete(inode);
locks_free_lock_context(inode);
+ bpf_inode_storage_free(inode);
+
if (!inode->i_nlink) {
WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
atomic_long_dec(&inode->i_sb->s_remove_count);
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 85524f18cd91..c6837e7838fc 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -9,6 +9,8 @@ void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+extern const struct bpf_func_proto bpf_inode_storage_get_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
struct bpf_sk_storage_diag;
struct sk_buff;
@@ -16,6 +18,7 @@ struct nlattr;
struct sock;
#ifdef CONFIG_BPF_SYSCALL
+void bpf_inode_storage_free(struct inode *inode);
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
struct bpf_sk_storage_diag *
bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs);
@@ -35,6 +38,9 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla)
{
return NULL;
}
+static inline void void bpf_inode_storage_free(struct inode *inode)
+{
+}
static inline void bpf_sk_storage_diag_free(struct bpf_sk_storage_diag *diag)
{
}
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 29d22752fc87..07181fb89bdd 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -101,6 +101,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP_HASH, dev_map_hash_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
#if defined(CONFIG_BPF_STREAM_PARSER)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5ee9e583bde2..23a6b8fbd381 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -624,6 +624,7 @@ is_uncached_acl(struct posix_acl *acl)
#define IOP_DEFAULT_READLINK 0x0010
struct fsnotify_mark_connector;
+struct bpf_local_storage;
/*
* Keep mostly read-only and often accessed (especially for
@@ -740,6 +741,10 @@ struct inode {
struct fsverity_info *i_verity_info;
#endif
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_local_storage __rcu *inode_bpf_storage;
+#endif
+
void *i_private; /* fs or device private pointer */
} __randomize_layout;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4a202eea15c0..410fb2e5fdd4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -147,6 +147,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
+ BPF_MAP_TYPE_INODE_STORAGE,
};
/* Note that tracing related programs such as
@@ -3157,6 +3158,42 @@ union bpf_attr {
* **bpf_sk_cgroup_id**\ ().
* Return
* The id is returned or 0 in case the id could not be retrieved.
+ *
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from an *inode*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *inode* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ * helper enforces the key must be an inode and the map must also
+ * be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *inode* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *inode*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ * Description
+ * Delete a bpf_local_storage from an *inode*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3288,7 +3325,9 @@ union bpf_attr {
FN(seq_printf), \
FN(seq_write), \
FN(sk_cgroup_id), \
- FN(sk_ancestor_cgroup_id),
+ FN(sk_ancestor_cgroup_id), \
+ FN(inode_storage_get), \
+ FN(inode_storage_delete),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 0a1caac2f5f7..bf807cfe3a73 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -23,6 +23,7 @@ struct bucket {
enum bpf_local_storage_type {
BPF_LOCAL_STORAGE_SK,
+ BPF_LOCAL_STORAGE_INODE,
};
/* Thp map is not the primary owner of a bpf_local_storage_elem.
@@ -94,6 +95,7 @@ struct bpf_local_storage {
*/
union {
struct sock *sk;
+ struct inode *inode;
};
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
@@ -165,6 +167,7 @@ static struct bpf_local_storage_elem *sk_selem_alloc(
static void __unlink_local_storage(struct bpf_local_storage *local_storage,
bool uncharge_omem)
{
+ struct inode *inode;
struct sock *sk;
switch (local_storage->stype) {
@@ -178,6 +181,12 @@ static void __unlink_local_storage(struct bpf_local_storage *local_storage,
RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
local_storage->sk = NULL;
break;
+ case BPF_LOCAL_STORAGE_INODE:
+ inode = local_storage->inode;
+ /* After this RCU_INIT, sk may be freed and cannot be used */
+ RCU_INIT_POINTER(inode->inode_bpf_storage, NULL);
+ local_storage->inode = NULL;
+ break;
}
}
@@ -339,6 +348,20 @@ sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
return __local_storage_lookup(sk_storage, smap, cacheit_lockit);
}
+static struct bpf_local_storage_data *inode_storage_lookup(
+ struct inode *inode, struct bpf_map *map, bool cacheit_lockit)
+{
+ struct bpf_local_storage *inode_storage;
+ struct bpf_local_storage_map *smap;
+
+ inode_storage = rcu_dereference(inode->inode_bpf_storage);
+ if (!inode_storage)
+ return NULL;
+
+ smap = (struct bpf_local_storage_map *)map;
+ return __local_storage_lookup(inode_storage, smap, cacheit_lockit);
+}
+
static int check_flags(const struct bpf_local_storage_data *old_sdata,
u64 map_flags)
{
@@ -435,6 +458,33 @@ static int sk_storage_alloc(struct sock *sk,
return err;
}
+static int inode_storage_alloc(struct inode *inode,
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem)
+{
+ struct bpf_local_storage *curr;
+ int err;
+
+ curr = bpf_local_storage_alloc(smap);
+ if (!curr)
+ return -ENOMEM;
+
+ curr->inode = inode;
+ curr->stype = BPF_LOCAL_STORAGE_INODE;
+
+ __selem_link(curr, first_selem);
+ selem_link_map(smap, first_selem);
+
+ err = publish_local_storage(first_selem,
+ (struct bpf_local_storage **)&inode->inode_bpf_storage, curr);
+ if (err) {
+ kfree(curr);
+ return err;
+ }
+
+ return 0;
+}
+
static int check_update_flags(struct bpf_map *map, u64 map_flags)
{
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -565,6 +615,109 @@ static struct bpf_local_storage_data *sk_storage_update(struct sock *sk,
return ERR_PTR(err);
}
+static struct bpf_local_storage_data *inode_storage_update(
+ struct inode *inode, struct bpf_map *map, void *value, u64 map_flags)
+{
+ struct bpf_local_storage_data *old_sdata = NULL;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ struct bpf_local_storage_map *smap;
+ int err;
+
+ err = check_update_flags(map, map_flags);
+ if (err)
+ return ERR_PTR(err);
+
+ smap = (struct bpf_local_storage_map *)map;
+ local_storage = rcu_dereference(inode->inode_bpf_storage);
+
+ if (!local_storage || hlist_empty(&local_storage->list)) {
+ /* Very first elem for this inode */
+ err = check_flags(NULL, map_flags);
+ if (err)
+ return ERR_PTR(err);
+
+ selem = selem_alloc(smap, value);
+ if (!selem)
+ return ERR_PTR(-ENOMEM);
+
+ err = inode_storage_alloc(inode, smap, selem);
+ if (err) {
+ kfree(selem);
+ return ERR_PTR(err);
+ }
+
+ return SDATA(selem);
+ }
+
+ if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
+ /* Hoping to find an old_sdata to do inline update
+ * such that it can avoid taking the local_storage->lock
+ * and changing the lists.
+ */
+ old_sdata = __local_storage_lookup(local_storage, smap, false);
+ err = check_flags(old_sdata, map_flags);
+ if (err)
+ return ERR_PTR(err);
+
+ if (old_sdata && selem_linked_to_node(SELEM(old_sdata))) {
+ copy_map_value_locked(map, old_sdata->data,
+ value, false);
+ return old_sdata;
+ }
+ }
+
+ raw_spin_lock_bh(&local_storage->lock);
+
+ /* Recheck local_storage->list under local_storage->lock */
+ if (unlikely(hlist_empty(&local_storage->list))) {
+ /* A parallel del is happening and local_storage is going
+ * away. It has just been checked before, so very
+ * unlikely. Return instead of retry to keep things
+ * simple.
+ */
+ err = -EAGAIN;
+ goto unlock_err;
+ }
+
+ old_sdata = __local_storage_lookup(local_storage, smap, false);
+ err = check_flags(old_sdata, map_flags);
+ if (err)
+ goto unlock_err;
+
+ if (old_sdata && (map_flags & BPF_F_LOCK)) {
+ copy_map_value_locked(map, old_sdata->data, value, false);
+ selem = SELEM(old_sdata);
+ goto unlock;
+ }
+
+ selem = selem_alloc(smap, value);
+ if (!selem) {
+ err = -ENOMEM;
+ goto unlock_err;
+ }
+
+ /* First, link the new selem to the map */
+ selem_link_map(smap, selem);
+
+ /* Second, link (and publish) the new selem to sk_storage */
+ __selem_link(local_storage, selem);
+
+ /* Third, remove old selem, SELEM(old_sdata) */
+ if (old_sdata) {
+ selem_unlink_map(SELEM(old_sdata));
+ __selem_unlink(local_storage, SELEM(old_sdata), false);
+ }
+
+unlock:
+ raw_spin_unlock_bh(&local_storage->lock);
+ return SDATA(selem);
+
+unlock_err:
+ raw_spin_unlock_bh(&local_storage->lock);
+ return ERR_PTR(err);
+}
+
static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
{
struct bpf_local_storage_data *sdata;
@@ -578,6 +731,19 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
return 0;
}
+static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
+{
+ struct bpf_local_storage_data *sdata;
+
+ sdata = inode_storage_lookup(inode, map, false);
+ if (!sdata)
+ return -ENOENT;
+
+ selem_unlink(SELEM(sdata));
+
+ return 0;
+}
+
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
void bpf_sk_storage_free(struct sock *sk)
{
@@ -617,6 +783,46 @@ void bpf_sk_storage_free(struct sock *sk)
kfree_rcu(sk_storage, rcu);
}
+/* Called by __destroy_inode() */
+void bpf_inode_storage_free(struct inode *inode)
+{
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ bool free_inode_storage = false;
+ struct hlist_node *n;
+
+ rcu_read_lock();
+ local_storage = rcu_dereference(inode->inode_bpf_storage);
+ if (!local_storage) {
+ rcu_read_unlock();
+ return;
+ }
+
+ /* Netiher the bpf_prog nor the bpf-map's syscall
+ * could be modifying the local_storage->list now.
+ * Thus, no elem can be added-to or deleted-from the
+ * local_storage->list by the bpf_prog or by the bpf-map's syscall.
+ *
+ * It is racing with bpf_local_storage_map_free() alone
+ * when unlinking elem from the local_storage->list and
+ * the map's bucket->list.
+ */
+ raw_spin_lock_bh(&local_storage->lock);
+ hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
+ /* Always unlink from map before unlinking from
+ * local_storage.
+ */
+ selem_unlink_map(selem);
+ free_inode_storage =
+ __selem_unlink(local_storage, selem, false);
+ }
+ raw_spin_unlock_bh(&local_storage->lock);
+ rcu_read_unlock();
+
+ if (free_inode_storage)
+ kfree_rcu(local_storage, rcu);
+}
+
static void bpf_local_storage_map_free(struct bpf_map *map)
{
struct bpf_local_storage_elem *selem;
@@ -639,7 +845,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
*
* The elem of this map can be cleaned up here
* or by bpf_local_storage_free() during the destruction of the
- * owner object. eg. __sk_destruct.
+ * owner object. eg. __sk_destruct or __destroy_inode.
*/
for (i = 0; i < (1U << smap->bucket_log); i++) {
b = &smap->buckets[i];
@@ -789,6 +995,21 @@ static void *bpf_sk_storage_lookup_elem(struct bpf_map *map, void *key)
return ERR_PTR(err);
}
+static void *bpf_inode_storage_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_local_storage_data *sdata;
+ struct inode *inode;
+ int err = -EINVAL;
+
+ inode = *(struct inode **)(key);
+ if (inode) {
+ sdata = inode_storage_lookup(inode, map, true);
+ return sdata ? sdata->data : NULL;
+ }
+
+ return ERR_PTR(err);
+}
+
static int bpf_sk_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
@@ -808,6 +1029,22 @@ static int bpf_sk_storage_update_elem(struct bpf_map *map, void *key,
return err;
}
+static int bpf_inode_storage_update_elem(struct bpf_map *map, void *key,
+ void *value, u64 map_flags)
+{
+ struct bpf_local_storage_data *sdata;
+ struct inode *inode;
+ int err = -EINVAL;
+
+ inode = *(struct inode **)(key);
+ if (inode) {
+ sdata = inode_storage_update(inode, map, value,
+ map_flags);
+ return PTR_ERR_OR_ZERO(sdata);
+ }
+ return err;
+}
+
static int bpf_sk_storage_delete_elem(struct bpf_map *map, void *key)
{
struct socket *sock;
@@ -823,6 +1060,18 @@ static int bpf_sk_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
+static int bpf_inode_storage_delete_elem(struct bpf_map *map, void *key)
+{
+ struct inode *inode;
+ int err = -EINVAL;
+
+ inode = *(struct inode **)(key);
+ if (inode)
+ err = inode_storage_delete(inode, map);
+
+ return err;
+}
+
static struct bpf_local_storage_elem *
bpf_sk_storage_clone_elem(struct sock *newsk,
struct bpf_local_storage_map *smap,
@@ -944,6 +1193,29 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
return (unsigned long)NULL;
}
+BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+ void *, value, u64, flags)
+{
+ struct bpf_local_storage_data *sdata;
+
+ if (flags > BPF_LOCAL_STORAGE_GET_F_CREATE)
+ return (unsigned long)NULL;
+
+ sdata = inode_storage_lookup(inode, map, true);
+ if (sdata)
+ return (unsigned long)sdata->data;
+
+ if (flags == BPF_LOCAL_STORAGE_GET_F_CREATE &&
+ atomic_inc_not_zero(&inode->i_count)) {
+ sdata = inode_storage_update(inode, map, value, BPF_NOEXIST);
+ iput(inode);
+ return IS_ERR(sdata) ?
+ (unsigned long)NULL : (unsigned long)sdata->data;
+ }
+
+ return (unsigned long)NULL;
+}
+
BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
{
if (refcount_inc_not_zero(&sk->sk_refcnt)) {
@@ -957,6 +1229,20 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
return -ENOENT;
}
+BPF_CALL_2(bpf_inode_storage_delete,
+ struct bpf_map *, map, struct inode *, inode)
+{
+ int err;
+
+ if (atomic_inc_not_zero(&inode->i_count)) {
+ err = inode_storage_delete(inode, map);
+ iput(inode);
+ return err;
+ }
+
+ return inode_storage_delete(inode, map);
+}
+
const struct bpf_map_ops sk_storage_map_ops = {
.map_alloc_check = bpf_local_storage_map_alloc_check,
.map_alloc = bpf_local_storage_map_alloc,
@@ -968,6 +1254,17 @@ const struct bpf_map_ops sk_storage_map_ops = {
.map_check_btf = bpf_local_storage_map_check_btf,
};
+const struct bpf_map_ops inode_storage_map_ops = {
+ .map_alloc_check = bpf_local_storage_map_alloc_check,
+ .map_alloc = bpf_local_storage_map_alloc,
+ .map_free = bpf_local_storage_map_free,
+ .map_get_next_key = notsupp_get_next_key,
+ .map_lookup_elem = bpf_inode_storage_lookup_elem,
+ .map_update_elem = bpf_inode_storage_update_elem,
+ .map_delete_elem = bpf_inode_storage_delete_elem,
+ .map_check_btf = bpf_local_storage_map_check_btf,
+};
+
const struct bpf_func_proto bpf_sk_storage_get_proto = {
.func = bpf_sk_storage_get,
.gpl_only = false,
@@ -986,6 +1283,28 @@ const struct bpf_func_proto bpf_sk_storage_delete_proto = {
.arg2_type = ARG_PTR_TO_SOCKET,
};
+static int bpf_inode_storage_get_btf_ids[4];
+const struct bpf_func_proto bpf_inode_storage_get_proto = {
+ .func = bpf_inode_storage_get,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+ .btf_id = bpf_inode_storage_get_btf_ids,
+};
+
+static int bpf_inode_storage_delete_btf_ids[2];
+const struct bpf_func_proto bpf_inode_storage_delete_proto = {
+ .func = bpf_sk_storage_delete,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .btf_id = bpf_inode_storage_delete_btf_ids,
+};
+
struct bpf_sk_storage_diag {
u32 nr_maps;
struct bpf_map *maps[];
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d13b804ff045..60407c2d8fbd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -765,7 +765,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
- map->map_type != BPF_MAP_TYPE_SK_STORAGE)
+ map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
+ map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
return -ENOTSUPP;
if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
map->value_size) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d2e27dba4ac6..ff90e85e243c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4000,6 +4000,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_sk_storage_delete)
goto error;
break;
+ case BPF_MAP_TYPE_INODE_STORAGE:
+ if (func_id != BPF_FUNC_inode_storage_get &&
+ func_id != BPF_FUNC_inode_storage_delete)
+ goto error;
+ break;
default:
break;
}
@@ -4073,6 +4078,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
goto error;
break;
+ case BPF_FUNC_inode_storage_get:
+ case BPF_FUNC_inode_storage_delete:
+ if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
+ goto error;
+ break;
default:
break;
}
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index c5fac8068ba1..e8fbafb3e87b 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -49,6 +49,7 @@ const char * const map_type_name[] = {
[BPF_MAP_TYPE_STACK] = "stack",
[BPF_MAP_TYPE_SK_STORAGE] = "sk_storage",
[BPF_MAP_TYPE_STRUCT_OPS] = "struct_ops",
+ [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage",
};
const size_t map_type_name_size = ARRAY_SIZE(map_type_name);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a202eea15c0..410fb2e5fdd4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -147,6 +147,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
+ BPF_MAP_TYPE_INODE_STORAGE,
};
/* Note that tracing related programs such as
@@ -3157,6 +3158,42 @@ union bpf_attr {
* **bpf_sk_cgroup_id**\ ().
* Return
* The id is returned or 0 in case the id could not be retrieved.
+ *
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from an *inode*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *inode* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ * helper enforces the key must be an inode and the map must also
+ * be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *inode* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *inode*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ * Description
+ * Delete a bpf_local_storage from an *inode*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3288,7 +3325,9 @@ union bpf_attr {
FN(seq_printf), \
FN(seq_write), \
FN(sk_cgroup_id), \
- FN(sk_ancestor_cgroup_id),
+ FN(sk_ancestor_cgroup_id), \
+ FN(inode_storage_get), \
+ FN(inode_storage_delete),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 2c92059c0c90..795d7938ab56 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -170,7 +170,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
return btf_fd;
}
-static int load_sk_storage_btf(void)
+static int load_local_storage_btf(void)
{
const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l";
/* struct bpf_spin_lock {
@@ -229,12 +229,13 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
key_size = 0;
break;
case BPF_MAP_TYPE_SK_STORAGE:
+ case BPF_MAP_TYPE_INODE_STORAGE:
btf_key_type_id = 1;
btf_value_type_id = 3;
value_size = 8;
max_entries = 0;
map_flags = BPF_F_NO_PREALLOC;
- btf_fd = load_sk_storage_btf();
+ btf_fd = load_local_storage_btf();
if (btf_fd < 0)
return false;
break;
--
2.27.0.rc0.183.gde8f92d652-goog
From: KP Singh <[email protected]>
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used
in LSM programs. These helpers are not used for tracing programs
(currently) as their usage is tied to the life-cycle of the object and
should only be used where the owning object won't be freed. Thus, they
are safe to use in LSM hooks, but can only be enabled in tracing
programs using a whitelist based approach.
Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock,
it, leads to a compilation warning for LSM programs, it's also updated
to accept a void * pointer instead.
Signed-off-by: KP Singh <[email protected]>
---
include/linux/bpf_local_storage.h | 2 ++
kernel/bpf/bpf_local_storage.c | 22 ++++++++++++++++++++++
kernel/bpf/bpf_lsm.c | 20 +++++++++++++++++++-
3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index c6837e7838fc..8982c0c69332 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -9,6 +9,8 @@ void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+extern const struct bpf_func_proto sk_storage_get_btf_proto;
+extern const struct bpf_func_proto sk_storage_delete_btf_proto;
extern const struct bpf_func_proto bpf_inode_storage_get_proto;
extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index bf807cfe3a73..07e02d32feb0 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -1305,6 +1305,28 @@ const struct bpf_func_proto bpf_inode_storage_delete_proto = {
.btf_id = bpf_inode_storage_delete_btf_ids,
};
+static int sk_storage_get_btf_ids[4];
+const struct bpf_func_proto sk_storage_get_btf_proto = {
+ .func = bpf_sk_storage_get,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+ .btf_id = sk_storage_get_btf_ids,
+};
+
+static int sk_storage_delete_btf_ids[2];
+const struct bpf_func_proto sk_storage_delete_btf_proto = {
+ .func = bpf_sk_storage_delete,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .btf_id = sk_storage_delete_btf_ids,
+};
+
struct bpf_sk_storage_diag {
u32 nr_maps;
struct bpf_map *maps[];
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 19636703b24e..fce0a11b63ca 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -11,6 +11,7 @@
#include <linux/bpf_lsm.h>
#include <linux/kallsyms.h>
#include <linux/bpf_verifier.h>
+#include <linux/bpf_local_storage.h>
/* For every LSM hook that allows attachment of BPF programs, declare a nop
* function where a BPF program can be attached.
@@ -45,10 +46,27 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return 0;
}
+static const struct bpf_func_proto *
+bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_inode_storage_get:
+ return &bpf_inode_storage_get_proto;
+ case BPF_FUNC_inode_storage_delete:
+ return &bpf_inode_storage_delete_proto;
+ case BPF_FUNC_sk_storage_get:
+ return &sk_storage_get_btf_proto;
+ case BPF_FUNC_sk_storage_delete:
+ return &sk_storage_delete_btf_proto;
+ default:
+ return bpf_tracing_func_proto(func_id, prog);
+ }
+}
+
const struct bpf_prog_ops lsm_prog_ops = {
};
const struct bpf_verifier_ops lsm_verifier_ops = {
- .get_func_proto = bpf_tracing_func_proto,
+ .get_func_proto = bpf_lsm_func_proto,
.is_valid_access = btf_ctx_access,
};
--
2.27.0.rc0.183.gde8f92d652-goog
From: KP Singh <[email protected]>
inode_local_storage:
* Hook to the file_open and inode_unlink LSM hooks.
* Create and unlink a temporary file.
* Store some information in the inode's bpf_local_storage during
file_open.
* Verify that this information exists when the file is unlinked.
sk_local_storage:
* Hook to the socket_post_create and socket_bind LSM hooks.
* Open and bind a socket and set the sk_storage in the
socket_post_create hook using the start_server helper.
* Verify if the information is set in the socket_bind hook.
Signed-off-by: KP Singh <[email protected]>
---
.../bpf/prog_tests/test_local_storage.c | 60 ++++++++
.../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
2 files changed, 199 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
diff --git a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
new file mode 100644
index 000000000000..ee4e27473c1d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2020 Google LLC.
+ */
+
+#include <test_progs.h>
+#include <linux/limits.h>
+
+#include "local_storage.skel.h"
+#include "network_helpers.h"
+
+int create_and_unlink_file(void)
+{
+ char fname[PATH_MAX] = "/tmp/fileXXXXXX";
+ int fd;
+
+ fd = mkstemp(fname);
+ if (fd < 0)
+ return fd;
+
+ close(fd);
+ unlink(fname);
+ return 0;
+}
+
+void test_test_local_storage(void)
+{
+ struct local_storage *skel = NULL;
+ int err, duration = 0, serv_sk = -1;
+
+ skel = local_storage__open_and_load();
+ if (CHECK(!skel, "skel_load", "lsm skeleton failed\n"))
+ goto close_prog;
+
+ err = local_storage__attach(skel);
+ if (CHECK(err, "attach", "lsm attach failed: %d\n", err))
+ goto close_prog;
+
+ skel->bss->monitored_pid = getpid();
+
+ err = create_and_unlink_file();
+ if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ CHECK(!skel->bss->inode_storage_result, "inode_storage_result",
+ "inode_local_storage not set");
+
+ serv_sk = start_server(AF_INET6, SOCK_STREAM);
+ if (CHECK(serv_sk < 0, "start_server", "failed to start server\n"))
+ goto close_prog;
+
+ CHECK(!skel->bss->sk_storage_result, "sk_storage_result",
+ "sk_local_storage not set");
+
+ close(serv_sk);
+
+close_prog:
+ local_storage__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c
new file mode 100644
index 000000000000..1aa4ccbe0369
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/local_storage.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+#define DUMMY_STORAGE_VALUE 0xdeadbeef
+
+int monitored_pid = 0;
+bool inode_storage_result = false;
+bool sk_storage_result = false;
+
+struct dummy_storage {
+ __u32 value;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, struct dummy_storage);
+} inode_storage_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+ __type(key, int);
+ __type(value, struct dummy_storage);
+} sk_storage_map SEC(".maps");
+
+/* Using vmlinux.h causes the generated BTF to be so big that the object
+ * load fails at btf__load.
+ */
+struct sock {} __attribute__((preserve_access_index));
+struct sockaddr {} __attribute__((preserve_access_index));
+struct socket {
+ struct sock *sk;
+} __attribute__((preserve_access_index));
+
+struct inode {} __attribute__((preserve_access_index));
+struct dentry {
+ struct inode *d_inode;
+} __attribute__((preserve_access_index));
+struct file {
+ struct inode *f_inode;
+} __attribute__((preserve_access_index));
+
+
+SEC("lsm/inode_unlink")
+int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_inode_storage_get(&inode_storage_map, victim->d_inode, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ if (storage->value == DUMMY_STORAGE_VALUE)
+ inode_storage_result = true;
+
+ return 0;
+}
+
+SEC("lsm/socket_bind")
+int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address,
+ int addrlen)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ if (storage->value == DUMMY_STORAGE_VALUE)
+ sk_storage_result = true;
+
+ return 0;
+}
+
+SEC("lsm/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family, int type,
+ int protocol, int kern)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ storage->value = DUMMY_STORAGE_VALUE;
+
+ return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(test_int_hook, struct file *file)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ if (!file->f_inode)
+ return 0;
+
+ storage = bpf_inode_storage_get(&inode_storage_map, file->f_inode, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ storage->value = DUMMY_STORAGE_VALUE;
+ return 0;
+}
--
2.27.0.rc0.183.gde8f92d652-goog
On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
>
> +static struct bpf_local_storage_data *inode_storage_update(
> + struct inode *inode, struct bpf_map *map, void *value, u64 map_flags)
> +{
> + struct bpf_local_storage_data *old_sdata = NULL;
> + struct bpf_local_storage_elem *selem;
> + struct bpf_local_storage *local_storage;
> + struct bpf_local_storage_map *smap;
> + int err;
> +
> + err = check_update_flags(map, map_flags);
> + if (err)
> + return ERR_PTR(err);
> +
> + smap = (struct bpf_local_storage_map *)map;
> + local_storage = rcu_dereference(inode->inode_bpf_storage);
> +
> + if (!local_storage || hlist_empty(&local_storage->list)) {
> + /* Very first elem for this inode */
> + err = check_flags(NULL, map_flags);
> + if (err)
> + return ERR_PTR(err);
> +
> + selem = selem_alloc(smap, value);
> + if (!selem)
> + return ERR_PTR(-ENOMEM);
> +
> + err = inode_storage_alloc(inode, smap, selem);
inode_storage_update looks like big copy-paste except above one line.
pls consolidate.
> +BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
> + void *, value, u64, flags)
> +{
> + struct bpf_local_storage_data *sdata;
> +
> + if (flags > BPF_LOCAL_STORAGE_GET_F_CREATE)
> + return (unsigned long)NULL;
> +
> + sdata = inode_storage_lookup(inode, map, true);
> + if (sdata)
> + return (unsigned long)sdata->data;
> +
> + if (flags == BPF_LOCAL_STORAGE_GET_F_CREATE &&
> + atomic_inc_not_zero(&inode->i_count)) {
> + sdata = inode_storage_update(inode, map, value, BPF_NOEXIST);
> + iput(inode);
> + return IS_ERR(sdata) ?
> + (unsigned long)NULL : (unsigned long)sdata->data;
> + }
This is wrong. You cannot just copy paste the refcounting logic
from bpf_sk_storage_get(). sk->sk_refcnt is very different from inode->i_count.
To start, the inode->i_count cannot be incremented without lock.
If you really need to do it you need igrab().
Secondly, the iput() is not possible to call from bpf prog yet, since
progs are not sleepable and iput() may call iput_final() which may sleep.
But considering that only lsm progs from lsm hooks will call bpf_inode_storage_get()
the inode is not going to disappear while this function is running.
So why touch i_count ?
> +
> + return (unsigned long)NULL;
> +}
> +
> BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
> {
> if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> @@ -957,6 +1229,20 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
> return -ENOENT;
> }
>
> +BPF_CALL_2(bpf_inode_storage_delete,
> + struct bpf_map *, map, struct inode *, inode)
> +{
> + int err;
> +
> + if (atomic_inc_not_zero(&inode->i_count)) {
> + err = inode_storage_delete(inode, map);
> + iput(inode);
> + return err;
> + }
ditto.
> +
> + return inode_storage_delete(inode, map);
bad copy-paste from bpf_sk_storage_delete?
or what is this logic suppose to do?
Thanks for taking a look!
On 26-May 17:49, Alexei Starovoitov wrote:
> On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
> >
> > +static struct bpf_local_storage_data *inode_storage_update(
> > + struct inode *inode, struct bpf_map *map, void *value, u64 map_flags)
> > +{
> > + struct bpf_local_storage_data *old_sdata = NULL;
> > + struct bpf_local_storage_elem *selem;
> > + struct bpf_local_storage *local_storage;
> > + struct bpf_local_storage_map *smap;
> > + int err;
> > +
> > + err = check_update_flags(map, map_flags);
> > + if (err)
> > + return ERR_PTR(err);
> > +
> > + smap = (struct bpf_local_storage_map *)map;
> > + local_storage = rcu_dereference(inode->inode_bpf_storage);
> > +
> > + if (!local_storage || hlist_empty(&local_storage->list)) {
> > + /* Very first elem for this inode */
> > + err = check_flags(NULL, map_flags);
> > + if (err)
> > + return ERR_PTR(err);
> > +
> > + selem = selem_alloc(smap, value);
> > + if (!selem)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + err = inode_storage_alloc(inode, smap, selem);
>
> inode_storage_update looks like big copy-paste except above one line.
> pls consolidate.
Sure.
>
> > +BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
> > + void *, value, u64, flags)
> > +{
> > + struct bpf_local_storage_data *sdata;
> > +
> > + if (flags > BPF_LOCAL_STORAGE_GET_F_CREATE)
> > + return (unsigned long)NULL;
> > +
> > + sdata = inode_storage_lookup(inode, map, true);
> > + if (sdata)
> > + return (unsigned long)sdata->data;
> > +
> > + if (flags == BPF_LOCAL_STORAGE_GET_F_CREATE &&
> > + atomic_inc_not_zero(&inode->i_count)) {
> > + sdata = inode_storage_update(inode, map, value, BPF_NOEXIST);
> > + iput(inode);
> > + return IS_ERR(sdata) ?
> > + (unsigned long)NULL : (unsigned long)sdata->data;
> > + }
>
> This is wrong. You cannot just copy paste the refcounting logic
> from bpf_sk_storage_get(). sk->sk_refcnt is very different from inode->i_count.
> To start, the inode->i_count cannot be incremented without lock.
Good catch! Agreed, Jann pointed out that this can lead to bugs
similar to https://crbug.com/project-zero/2015.
> If you really need to do it you need igrab().
> Secondly, the iput() is not possible to call from bpf prog yet, since
> progs are not sleepable and iput() may call iput_final() which may sleep.
Agreed, I will send a separate patch to add a might_sleep call to
iput() which currently only has a "Consequently, iput() can sleep."
warning in the comments so that this can be caught by
CONFIG_DEBUG_ATOMIC_SLEEP.
> But considering that only lsm progs from lsm hooks will call bpf_inode_storage_get()
> the inode is not going to disappear while this function is running.
If the inode pointer is an argument to the LSM hook, it won't
disappear and yes this does hold generally true for the other
use-cases as well.
> So why touch i_count ?
>
> > +
> > + return (unsigned long)NULL;
> > +}
> > +
> > BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
> > {
> > if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> > @@ -957,6 +1229,20 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
> > return -ENOENT;
> > }
> >
> > +BPF_CALL_2(bpf_inode_storage_delete,
> > + struct bpf_map *, map, struct inode *, inode)
> > +{
> > + int err;
> > +
> > + if (atomic_inc_not_zero(&inode->i_count)) {
> > + err = inode_storage_delete(inode, map);
> > + iput(inode);
> > + return err;
> > + }
>
> ditto.
>
> > +
> > + return inode_storage_delete(inode, map);
>
> bad copy-paste from bpf_sk_storage_delete?
> or what is this logic suppose to do?
The former :) fixed...
- KP
On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
> From: KP Singh <[email protected]>
>
> Similar to bpf_local_storage for sockets, add local storage for inodes.
> The life-cycle of storage is managed with the life-cycle of the inode.
> i.e. the storage is destroyed along with the owning inode.
>
> Since, the intention is to use this in LSM programs, the destruction is
> done after security_inode_free in __destroy_inode.
NAK onbloating the inode structure. Please find an out of line way
to store your information.
On 26-May 22:08, Christoph Hellwig wrote:
> On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
> > From: KP Singh <[email protected]>
> >
> > Similar to bpf_local_storage for sockets, add local storage for inodes.
> > The life-cycle of storage is managed with the life-cycle of the inode.
> > i.e. the storage is destroyed along with the owning inode.
> >
> > Since, the intention is to use this in LSM programs, the destruction is
> > done after security_inode_free in __destroy_inode.
>
> NAK onbloating the inode structure. Please find an out of line way
> to store your information.
The other alternative is to use lbs_inode (security blobs) and we can
do this without adding fields to struct inode.
Here is a rough diff (only illustrative, won't apply cleanly) of the
changes needed to this patch:
https://gist.github.com/sinkap/1d213d17fb82a5e8ffdc3f320ec37d79
Once tracing has gets a whitelist based access to inode storage, I
guess it, too, can use bpf_local_storage for inodes if CONFIG_BPF_LSM
is enabled. Does this sound reasonable to the BPF folks?
- KP
On 5/27/2020 5:38 AM, KP Singh wrote:
> On 26-May 22:08, Christoph Hellwig wrote:
>> On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
>>> From: KP Singh <[email protected]>
>>>
>>> Similar to bpf_local_storage for sockets, add local storage for inodes.
>>> The life-cycle of storage is managed with the life-cycle of the inode.
>>> i.e. the storage is destroyed along with the owning inode.
>>>
>>> Since, the intention is to use this in LSM programs, the destruction is
>>> done after security_inode_free in __destroy_inode.
>> NAK onbloating the inode structure. Please find an out of line way
>> to store your information.
> The other alternative is to use lbs_inode (security blobs) and we can
> do this without adding fields to struct inode.
This is the correct approach, and always has been. This isn't the
first ( or second :( ) case where the correct behavior for an LSM
has been pretty darn obvious, but you've taken a different approach
for no apparent reason.
> Here is a rough diff (only illustrative, won't apply cleanly) of the
> changes needed to this patch:
>
> https://gist.github.com/sinkap/1d213d17fb82a5e8ffdc3f320ec37d79
To do just a little nit-picking, please use bpf_inode() instead of
bpf_inode_storage(). This is in keeping with the convention used by
the other security modules. Sticking with the existing convention
makes it easier for people (and tools) that work with multiple
security modules.
> Once tracing has gets a whitelist based access to inode storage, I
> guess it, too, can use bpf_local_storage for inodes
Only within the BPF module. Your sentence above is slightly garbled,
so I'm not really sure what you're saying, but if you're suggesting
that tracing code outside of the BPF security module can use the
BPF inode data, the answer is a resounding "no".
> if CONFIG_BPF_LSM
> is enabled. Does this sound reasonable to the BPF folks?
>
> - KP
>
>
On Wed, May 27, 2020 at 6:41 PM Casey Schaufler <[email protected]> wrote:
>
> On 5/27/2020 5:38 AM, KP Singh wrote:
> > On 26-May 22:08, Christoph Hellwig wrote:
> >> On Tue, May 26, 2020 at 06:33:34PM +0200, KP Singh wrote:
> >>> From: KP Singh <[email protected]>
> >>>
> >>> Similar to bpf_local_storage for sockets, add local storage for inodes.
> >>> The life-cycle of storage is managed with the life-cycle of the inode.
> >>> i.e. the storage is destroyed along with the owning inode.
> >>>
> >>> Since, the intention is to use this in LSM programs, the destruction is
> >>> done after security_inode_free in __destroy_inode.
> >> NAK onbloating the inode structure. Please find an out of line way
> >> to store your information.
> > The other alternative is to use lbs_inode (security blobs) and we can
> > do this without adding fields to struct inode.
>
> This is the correct approach, and always has been. This isn't the
> first ( or second :( ) case where the correct behavior for an LSM
> has been pretty darn obvious, but you've taken a different approach
> for no apparent reason.
>
> > Here is a rough diff (only illustrative, won't apply cleanly) of the
> > changes needed to this patch:
> >
> > https://gist.github.com/sinkap/1d213d17fb82a5e8ffdc3f320ec37d79
>
> To do just a little nit-picking, please use bpf_inode() instead of
> bpf_inode_storage(). This is in keeping with the convention used by
> the other security modules. Sticking with the existing convention
> makes it easier for people (and tools) that work with multiple
> security modules.
>
> > Once tracing has gets a whitelist based access to inode storage, I
> > guess it, too, can use bpf_local_storage for inodes
>
> Only within the BPF module. Your sentence above is slightly garbled,
> so I'm not really sure what you're saying, but if you're suggesting
> that tracing code outside of the BPF security module can use the
> BPF inode data, the answer is a resounding "no".
This is why I wanted to add a separate pointer in struct inode so that
we could share the implementation with tracing. bpf_local_storage
is managed (per-program+per-type of storage) with separate BPF maps.
So, it can be easily shared between two programs (and
program types) without them clobbering over each other.
I guess we can have separate pointers for tracing,
use the pointer in the security blob for the LSM and discuss this separately
if and when we use this for tracing and keep this series patches scoped to
BPF_PROG_TYPE_LSM.
- KP
>
> > if CONFIG_BPF_LSM
> > is enabled. Does this sound reasonable to the BPF folks?
> >
> > - KP
> >
> >
>
Hi KP,
I love your patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
[also build test ERROR on next-20200526]
[cannot apply to bpf/master linus/master linux/master v5.7-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/KP-Singh/Generalizing-bpf_local_storage/20200527-011230
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: parisc-randconfig-r025-20200527 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> make[3]: *** No rule to make target 'kernel/bpf/xskmap.o', needed by 'kernel/bpf/built-in.a'.
make[3]: Target '__build' not remade because of errors.
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
>
> From: KP Singh <[email protected]>
>
> inode_local_storage:
>
> * Hook to the file_open and inode_unlink LSM hooks.
> * Create and unlink a temporary file.
> * Store some information in the inode's bpf_local_storage during
> file_open.
> * Verify that this information exists when the file is unlinked.
>
> sk_local_storage:
>
> * Hook to the socket_post_create and socket_bind LSM hooks.
> * Open and bind a socket and set the sk_storage in the
> socket_post_create hook using the start_server helper.
> * Verify if the information is set in the socket_bind hook.
>
> Signed-off-by: KP Singh <[email protected]>
> ---
> .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
> .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
> 2 files changed, 199 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
> create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
>
[...]
> +struct dummy_storage {
> + __u32 value;
> +};
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC);
> + __type(key, int);
> + __type(value, struct dummy_storage);
> +} inode_storage_map SEC(".maps");
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
> + __type(key, int);
> + __type(value, struct dummy_storage);
> +} sk_storage_map SEC(".maps");
> +
> +/* Using vmlinux.h causes the generated BTF to be so big that the object
> + * load fails at btf__load.
> + */
That's first time I hear about such issue. Do you have an error log
from verifier?
Clang is smart enough to trim down used types to only those that are
actually necessary, so too big BTF shouldn't be a thing. But let's try
to dig into this and fix whatever issue it is, before giving up :)
> +struct sock {} __attribute__((preserve_access_index));
> +struct sockaddr {} __attribute__((preserve_access_index));
> +struct socket {
> + struct sock *sk;
> +} __attribute__((preserve_access_index));
> +
> +struct inode {} __attribute__((preserve_access_index));
> +struct dentry {
> + struct inode *d_inode;
> +} __attribute__((preserve_access_index));
> +struct file {
> + struct inode *f_inode;
> +} __attribute__((preserve_access_index));
> +
> +
[...]
Hi KP,
I love your patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
[cannot apply to bpf/master linus/master linux/master v5.7 next-20200602]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/KP-Singh/Generalizing-bpf_local_storage/20200527-011230
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: c6x-randconfig-r016-20200602 (attached as .config)
compiler: c6x-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=c6x
If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>
All errors (new ones prefixed by >>, old ones prefixed by <<):
In file included from net/core/sock.c:133:
>> include/linux/bpf_local_storage.h:41:20: error: two or more data types in declaration specifiers
41 | static inline void void bpf_inode_storage_free(struct inode *inode)
| ^~~~
--
In file included from net/core/filter.c:75:
>> include/linux/bpf_local_storage.h:41:20: error: two or more data types in declaration specifiers
41 | static inline void void bpf_inode_storage_free(struct inode *inode)
| ^~~~
In file included from include/asm-generic/atomic.h:12,
from ./arch/c6x/include/generated/asm/atomic.h:1,
from include/linux/atomic.h:7,
from include/asm-generic/bitops/lock.h:5,
from arch/c6x/include/asm/bitops.h:87,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/linux/list.h:9,
from include/linux/module.h:12,
from net/core/filter.c:20:
net/core/filter.c: In function 'bpf_clear_redirect_map':
arch/c6x/include/asm/cmpxchg.h:55:3: warning: value computed is not used [-Wunused-value]
55 | ((__typeof__(*(ptr)))__cmpxchg_local_generic((ptr), | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | (unsigned long)(o), | ~~~~~~~~~~~~~~~~~~~~~
57 | (unsigned long)(n), | ~~~~~~~~~~~~~~~~~~~~~
58 | sizeof(*(ptr))))
| ~~~~~~~~~~~~~~~~
include/asm-generic/cmpxchg.h:106:28: note: in expansion of macro 'cmpxchg_local'
106 | #define cmpxchg(ptr, o, n) cmpxchg_local((ptr), (o), (n))
| ^~~~~~~~~~~~~
net/core/filter.c:3529:4: note: in expansion of macro 'cmpxchg'
3529 | cmpxchg(&ri->map, map, NULL);
| ^~~~~~~
vim +41 include/linux/bpf_local_storage.h
19
20 #ifdef CONFIG_BPF_SYSCALL
21 void bpf_inode_storage_free(struct inode *inode);
22 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
23 struct bpf_sk_storage_diag *
24 bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs);
25 void bpf_sk_storage_diag_free(struct bpf_sk_storage_diag *diag);
26 int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
27 struct sock *sk, struct sk_buff *skb,
28 int stg_array_type,
29 unsigned int *res_diag_size);
30 #else
31 static inline int bpf_sk_storage_clone(const struct sock *sk,
32 struct sock *newsk)
33 {
34 return 0;
35 }
36 static inline struct bpf_sk_storage_diag *
37 bpf_sk_storage_diag_alloc(const struct nlattr *nla)
38 {
39 return NULL;
40 }
> 41 static inline void void bpf_inode_storage_free(struct inode *inode)
42 {
43 }
44 static inline void bpf_sk_storage_diag_free(struct bpf_sk_storage_diag *diag)
45 {
46 }
47 static inline int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
48 struct sock *sk, struct sk_buff *skb,
49 int stg_array_type,
50 unsigned int *res_diag_size)
51 {
52 return 0;
53 }
54 #endif
55
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On 01-Jun 13:29, Andrii Nakryiko wrote:
> On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
> >
> > From: KP Singh <[email protected]>
> >
> > inode_local_storage:
> >
> > * Hook to the file_open and inode_unlink LSM hooks.
> > * Create and unlink a temporary file.
> > * Store some information in the inode's bpf_local_storage during
> > file_open.
> > * Verify that this information exists when the file is unlinked.
> >
> > sk_local_storage:
> >
> > * Hook to the socket_post_create and socket_bind LSM hooks.
> > * Open and bind a socket and set the sk_storage in the
> > socket_post_create hook using the start_server helper.
> > * Verify if the information is set in the socket_bind hook.
> >
> > Signed-off-by: KP Singh <[email protected]>
> > ---
> > .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
> > .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
> > 2 files changed, 199 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
> > create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
> >
>
> [...]
>
> > +struct dummy_storage {
> > + __u32 value;
> > +};
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > + __type(key, int);
> > + __type(value, struct dummy_storage);
> > +} inode_storage_map SEC(".maps");
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
> > + __type(key, int);
> > + __type(value, struct dummy_storage);
> > +} sk_storage_map SEC(".maps");
> > +
> > +/* Using vmlinux.h causes the generated BTF to be so big that the object
> > + * load fails at btf__load.
> > + */
>
> That's first time I hear about such issue. Do you have an error log
> from verifier?
Here's what I get when I do the following change.
--- a/tools/testing/selftests/bpf/progs/local_storage.c
+++ b/tools/testing/selftests/bpf/progs/local_storage.c
@@ -4,8 +4,8 @@
* Copyright 2020 Google LLC.
*/
+#include "vmlinux.h"
#include <errno.h>
-#include <linux/bpf.h>
#include <stdbool.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
@@ -37,24 +37,6 @@ struct {
__type(value, struct dummy_storage);
} sk_storage_map SEC(".maps");
-/* Using vmlinux.h causes the generated BTF to be so big that the object
- * load fails at btf__load.
- */
-struct sock {} __attribute__((preserve_access_index));
-struct sockaddr {} __attribute__((preserve_access_index));
-struct socket {
- struct sock *sk;
-} __attribute__((preserve_access_index));
-
-struct inode {} __attribute__((preserve_access_index));
-struct dentry {
- struct inode *d_inode;
-} __attribute__((preserve_access_index));
-struct file {
- struct inode *f_inode;
-} __attribute__((preserve_access_index));
./test_progs -t test_local_storage
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 4488
str_off: 4488
str_len: 3012
btf_total_size: 7524
[1] STRUCT (anon) size=32 vlen=4
type type_id=2 bits_offset=0
map_flags type_id=6 bits_offset=64
key type_id=8 bits_offset=128
value type_id=9 bits_offset=192
[2] PTR (anon) type_id=4
[3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=28
[5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
[6] PTR (anon) type_id=7
[7] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
[8] PTR (anon) type_id=3
[9] PTR (anon) type_id=10
[10] STRUCT dummy_storage size=4 vlen=1
value type_id=11 bits_offset=0
[11] TYPEDEF __u32 type_id=12
[... More BTF Dump ...]
[91] TYPEDEF wait_queue_head_t type_id=175
[... More BTF Dump ...]
[173] FWD super_block struct
[174] FWD vfsmount struct
[175] FWD wait_queue_head struct
[106] STRUCT socket_wq size=128 vlen=4
wait type_id=91 bits_offset=0 Invalid member
libbpf: Error loading .BTF into kernel: -22.
libbpf: map 'inode_storage_map': failed to create: Invalid argument(-22)
libbpf: failed to load object 'local_storage'
libbpf: failed to load BPF skeleton 'local_storage': -22
test_test_local_storage:FAIL:skel_load lsm skeleton failed
#81 test_local_storage:FAIL
The failiure is in:
[106] STRUCT socket_wq size=128 vlen=4
wait type_id=91 bits_offset=0 Invalid member
>
> Clang is smart enough to trim down used types to only those that are
> actually necessary, so too big BTF shouldn't be a thing. But let's try
> to dig into this and fix whatever issue it is, before giving up :)
>
I was wrong about the size being an issue. The verifier thinks the BTF
is invalid and more specificially it thinks that the socket_wq's
member with type_id=91, i.e. typedef wait_queue_head_t is invalid. Am
I missing some toolchain patches?
- KP
> > +struct sock {} __attribute__((preserve_access_index));
> > +struct sockaddr {} __attribute__((preserve_access_index));
> > +struct socket {
> > + struct sock *sk;
> > +} __attribute__((preserve_access_index));
> > +
> > +struct inode {} __attribute__((preserve_access_index));
> > +struct dentry {
> > + struct inode *d_inode;
> > +} __attribute__((preserve_access_index));
> > +struct file {
> > + struct inode *f_inode;
> > +} __attribute__((preserve_access_index));
> > +
> > +
>
> [...]
On Tue, Jun 16, 2020 at 8:54 AM KP Singh <[email protected]> wrote:
>
> On 01-Jun 13:29, Andrii Nakryiko wrote:
> > On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
> > >
> > > From: KP Singh <[email protected]>
> > >
> > > inode_local_storage:
> > >
> > > * Hook to the file_open and inode_unlink LSM hooks.
> > > * Create and unlink a temporary file.
> > > * Store some information in the inode's bpf_local_storage during
> > > file_open.
> > > * Verify that this information exists when the file is unlinked.
> > >
> > > sk_local_storage:
> > >
> > > * Hook to the socket_post_create and socket_bind LSM hooks.
> > > * Open and bind a socket and set the sk_storage in the
> > > socket_post_create hook using the start_server helper.
> > > * Verify if the information is set in the socket_bind hook.
> > >
> > > Signed-off-by: KP Singh <[email protected]>
> > > ---
> > > .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
> > > .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
> > > 2 files changed, 199 insertions(+)
> > > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
> > > create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
> > >
> >
> > [...]
> >
> > > +struct dummy_storage {
> > > + __u32 value;
> > > +};
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > + __type(key, int);
> > > + __type(value, struct dummy_storage);
> > > +} inode_storage_map SEC(".maps");
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
> > > + __type(key, int);
> > > + __type(value, struct dummy_storage);
> > > +} sk_storage_map SEC(".maps");
> > > +
> > > +/* Using vmlinux.h causes the generated BTF to be so big that the object
> > > + * load fails at btf__load.
> > > + */
> >
> > That's first time I hear about such issue. Do you have an error log
> > from verifier?
>
> Here's what I get when I do the following change.
>
> --- a/tools/testing/selftests/bpf/progs/local_storage.c
> +++ b/tools/testing/selftests/bpf/progs/local_storage.c
> @@ -4,8 +4,8 @@
> * Copyright 2020 Google LLC.
> */
>
> +#include "vmlinux.h"
> #include <errno.h>
> -#include <linux/bpf.h>
> #include <stdbool.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> @@ -37,24 +37,6 @@ struct {
> __type(value, struct dummy_storage);
> } sk_storage_map SEC(".maps");
>
> -/* Using vmlinux.h causes the generated BTF to be so big that the object
> - * load fails at btf__load.
> - */
> -struct sock {} __attribute__((preserve_access_index));
> -struct sockaddr {} __attribute__((preserve_access_index));
> -struct socket {
> - struct sock *sk;
> -} __attribute__((preserve_access_index));
> -
> -struct inode {} __attribute__((preserve_access_index));
> -struct dentry {
> - struct inode *d_inode;
> -} __attribute__((preserve_access_index));
> -struct file {
> - struct inode *f_inode;
> -} __attribute__((preserve_access_index));
>
> ./test_progs -t test_local_storage
> libbpf: Error loading BTF: Invalid argument(22)
> libbpf: magic: 0xeb9f
> version: 1
> flags: 0x0
> hdr_len: 24
> type_off: 0
> type_len: 4488
> str_off: 4488
> str_len: 3012
> btf_total_size: 7524
>
> [1] STRUCT (anon) size=32 vlen=4
> type type_id=2 bits_offset=0
> map_flags type_id=6 bits_offset=64
> key type_id=8 bits_offset=128
> value type_id=9 bits_offset=192
> [2] PTR (anon) type_id=4
> [3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> [4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=28
> [5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
> [6] PTR (anon) type_id=7
> [7] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
> [8] PTR (anon) type_id=3
> [9] PTR (anon) type_id=10
> [10] STRUCT dummy_storage size=4 vlen=1
> value type_id=11 bits_offset=0
> [11] TYPEDEF __u32 type_id=12
>
> [... More BTF Dump ...]
>
> [91] TYPEDEF wait_queue_head_t type_id=175
>
> [... More BTF Dump ...]
>
> [173] FWD super_block struct
> [174] FWD vfsmount struct
> [175] FWD wait_queue_head struct
> [106] STRUCT socket_wq size=128 vlen=4
> wait type_id=91 bits_offset=0 Invalid member
>
> libbpf: Error loading .BTF into kernel: -22.
> libbpf: map 'inode_storage_map': failed to create: Invalid argument(-22)
> libbpf: failed to load object 'local_storage'
> libbpf: failed to load BPF skeleton 'local_storage': -22
> test_test_local_storage:FAIL:skel_load lsm skeleton failed
> #81 test_local_storage:FAIL
>
> The failiure is in:
>
> [106] STRUCT socket_wq size=128 vlen=4
> wait type_id=91 bits_offset=0 Invalid member
>
> >
> > Clang is smart enough to trim down used types to only those that are
> > actually necessary, so too big BTF shouldn't be a thing. But let's try
> > to dig into this and fix whatever issue it is, before giving up :)
> >
>
> I was wrong about the size being an issue. The verifier thinks the BTF
> is invalid and more specificially it thinks that the socket_wq's
> member with type_id=91, i.e. typedef wait_queue_head_t is invalid. Am
> I missing some toolchain patches?
>
It is invalid BTF in the sense that we have a struct, embedding a
struct, which is only defined as a forward declaration. There is not
enough information and such situation would have caused compilation
error, because it's impossible to determine the size of the outer
struct.
Yonghong, it seems like Clang is pruning types too aggressively here?
We should keep types that are embedded, even if they are not used
directly by user code. Could you please take a look?
> - KP
>
>
> > > +struct sock {} __attribute__((preserve_access_index));
> > > +struct sockaddr {} __attribute__((preserve_access_index));
> > > +struct socket {
> > > + struct sock *sk;
> > > +} __attribute__((preserve_access_index));
> > > +
> > > +struct inode {} __attribute__((preserve_access_index));
> > > +struct dentry {
> > > + struct inode *d_inode;
> > > +} __attribute__((preserve_access_index));
> > > +struct file {
> > > + struct inode *f_inode;
> > > +} __attribute__((preserve_access_index));
> > > +
> > > +
> >
> > [...]
On 6/16/20 12:25 PM, Andrii Nakryiko wrote:
> On Tue, Jun 16, 2020 at 8:54 AM KP Singh <[email protected]> wrote:
>> On 01-Jun 13:29, Andrii Nakryiko wrote:
>>> On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
>>>> From: KP Singh <[email protected]>
>>>>
>>>> inode_local_storage:
>>>>
>>>> * Hook to the file_open and inode_unlink LSM hooks.
>>>> * Create and unlink a temporary file.
>>>> * Store some information in the inode's bpf_local_storage during
>>>> file_open.
>>>> * Verify that this information exists when the file is unlinked.
>>>>
>>>> sk_local_storage:
>>>>
>>>> * Hook to the socket_post_create and socket_bind LSM hooks.
>>>> * Open and bind a socket and set the sk_storage in the
>>>> socket_post_create hook using the start_server helper.
>>>> * Verify if the information is set in the socket_bind hook.
>>>>
>>>> Signed-off-by: KP Singh <[email protected]>
>>>> ---
>>>> .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
>>>> .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
>>>> 2 files changed, 199 insertions(+)
>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
>>>> create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
>>>>
>>> [...]
>>>
>>>> +struct dummy_storage {
>>>> + __u32 value;
>>>> +};
>>>> +
>>>> +struct {
>>>> + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
>>>> + __uint(map_flags, BPF_F_NO_PREALLOC);
>>>> + __type(key, int);
>>>> + __type(value, struct dummy_storage);
>>>> +} inode_storage_map SEC(".maps");
>>>> +
>>>> +struct {
>>>> + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
>>>> + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
>>>> + __type(key, int);
>>>> + __type(value, struct dummy_storage);
>>>> +} sk_storage_map SEC(".maps");
>>>> +
>>>> +/* Using vmlinux.h causes the generated BTF to be so big that the object
>>>> + * load fails at btf__load.
>>>> + */
>>> That's first time I hear about such issue. Do you have an error log
>>> from verifier?
>> Here's what I get when I do the following change.
>>
>> --- a/tools/testing/selftests/bpf/progs/local_storage.c
>> +++ b/tools/testing/selftests/bpf/progs/local_storage.c
>> @@ -4,8 +4,8 @@
>> * Copyright 2020 Google LLC.
>> */
>>
>> +#include "vmlinux.h"
>> #include <errno.h>
>> -#include <linux/bpf.h>
>> #include <stdbool.h>
>> #include <bpf/bpf_helpers.h>
>> #include <bpf/bpf_tracing.h>
>> @@ -37,24 +37,6 @@ struct {
>> __type(value, struct dummy_storage);
>> } sk_storage_map SEC(".maps");
>>
>> -/* Using vmlinux.h causes the generated BTF to be so big that the object
>> - * load fails at btf__load.
>> - */
>> -struct sock {} __attribute__((preserve_access_index));
>> -struct sockaddr {} __attribute__((preserve_access_index));
>> -struct socket {
>> - struct sock *sk;
>> -} __attribute__((preserve_access_index));
>> -
>> -struct inode {} __attribute__((preserve_access_index));
>> -struct dentry {
>> - struct inode *d_inode;
>> -} __attribute__((preserve_access_index));
>> -struct file {
>> - struct inode *f_inode;
>> -} __attribute__((preserve_access_index));
>>
>> ./test_progs -t test_local_storage
>> libbpf: Error loading BTF: Invalid argument(22)
>> libbpf: magic: 0xeb9f
>> version: 1
>> flags: 0x0
>> hdr_len: 24
>> type_off: 0
>> type_len: 4488
>> str_off: 4488
>> str_len: 3012
>> btf_total_size: 7524
>>
>> [1] STRUCT (anon) size=32 vlen=4
>> type type_id=2 bits_offset=0
>> map_flags type_id=6 bits_offset=64
>> key type_id=8 bits_offset=128
>> value type_id=9 bits_offset=192
>> [2] PTR (anon) type_id=4
>> [3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>> [4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=28
>> [5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
>> [6] PTR (anon) type_id=7
>> [7] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
>> [8] PTR (anon) type_id=3
>> [9] PTR (anon) type_id=10
>> [10] STRUCT dummy_storage size=4 vlen=1
>> value type_id=11 bits_offset=0
>> [11] TYPEDEF __u32 type_id=12
>>
>> [... More BTF Dump ...]
>>
>> [91] TYPEDEF wait_queue_head_t type_id=175
>>
>> [... More BTF Dump ...]
>>
>> [173] FWD super_block struct
>> [174] FWD vfsmount struct
>> [175] FWD wait_queue_head struct
>> [106] STRUCT socket_wq size=128 vlen=4
>> wait type_id=91 bits_offset=0 Invalid member
>>
>> libbpf: Error loading .BTF into kernel: -22.
>> libbpf: map 'inode_storage_map': failed to create: Invalid argument(-22)
>> libbpf: failed to load object 'local_storage'
>> libbpf: failed to load BPF skeleton 'local_storage': -22
>> test_test_local_storage:FAIL:skel_load lsm skeleton failed
>> #81 test_local_storage:FAIL
>>
>> The failiure is in:
>>
>> [106] STRUCT socket_wq size=128 vlen=4
>> wait type_id=91 bits_offset=0 Invalid member
>>
>>> Clang is smart enough to trim down used types to only those that are
>>> actually necessary, so too big BTF shouldn't be a thing. But let's try
>>> to dig into this and fix whatever issue it is, before giving up :)
>>>
>> I was wrong about the size being an issue. The verifier thinks the BTF
>> is invalid and more specificially it thinks that the socket_wq's
>> member with type_id=91, i.e. typedef wait_queue_head_t is invalid. Am
>> I missing some toolchain patches?
>>
> It is invalid BTF in the sense that we have a struct, embedding a
> struct, which is only defined as a forward declaration. There is not
> enough information and such situation would have caused compilation
> error, because it's impossible to determine the size of the outer
> struct.
>
> Yonghong, it seems like Clang is pruning types too aggressively here?
> We should keep types that are embedded, even if they are not used
> directly by user code. Could you please take a look?
Sure. Will take a look shortly.
>
>
>
>> - KP
>>
>>
>>>> +struct sock {} __attribute__((preserve_access_index));
>>>> +struct sockaddr {} __attribute__((preserve_access_index));
>>>> +struct socket {
>>>> + struct sock *sk;
>>>> +} __attribute__((preserve_access_index));
>>>> +
>>>> +struct inode {} __attribute__((preserve_access_index));
>>>> +struct dentry {
>>>> + struct inode *d_inode;
>>>> +} __attribute__((preserve_access_index));
>>>> +struct file {
>>>> + struct inode *f_inode;
>>>> +} __attribute__((preserve_access_index));
>>>> +
>>>> +
>>> [...]
On 6/16/20 12:25 PM, Andrii Nakryiko wrote:
> On Tue, Jun 16, 2020 at 8:54 AM KP Singh <[email protected]> wrote:
>> On 01-Jun 13:29, Andrii Nakryiko wrote:
>>> On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
>>>> From: KP Singh <[email protected]>
>>>>
>>>> inode_local_storage:
>>>>
>>>> * Hook to the file_open and inode_unlink LSM hooks.
>>>> * Create and unlink a temporary file.
>>>> * Store some information in the inode's bpf_local_storage during
>>>> file_open.
>>>> * Verify that this information exists when the file is unlinked.
>>>>
>>>> sk_local_storage:
>>>>
>>>> * Hook to the socket_post_create and socket_bind LSM hooks.
>>>> * Open and bind a socket and set the sk_storage in the
>>>> socket_post_create hook using the start_server helper.
>>>> * Verify if the information is set in the socket_bind hook.
>>>>
>>>> Signed-off-by: KP Singh <[email protected]>
>>>> ---
>>>> .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
>>>> .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
>>>> 2 files changed, 199 insertions(+)
>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
>>>> create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
>>>>
>>> [...]
>>>
>>>> +struct dummy_storage {
>>>> + __u32 value;
>>>> +};
>>>> +
>>>> +struct {
>>>> + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
>>>> + __uint(map_flags, BPF_F_NO_PREALLOC);
>>>> + __type(key, int);
>>>> + __type(value, struct dummy_storage);
>>>> +} inode_storage_map SEC(".maps");
>>>> +
>>>> +struct {
>>>> + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
>>>> + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
>>>> + __type(key, int);
>>>> + __type(value, struct dummy_storage);
>>>> +} sk_storage_map SEC(".maps");
>>>> +
>>>> +/* Using vmlinux.h causes the generated BTF to be so big that the object
>>>> + * load fails at btf__load.
>>>> + */
>>> That's first time I hear about such issue. Do you have an error log
>>> from verifier?
>> Here's what I get when I do the following change.
>>
>> --- a/tools/testing/selftests/bpf/progs/local_storage.c
>> +++ b/tools/testing/selftests/bpf/progs/local_storage.c
>> @@ -4,8 +4,8 @@
>> * Copyright 2020 Google LLC.
>> */
>>
>> +#include "vmlinux.h"
>> #include <errno.h>
>> -#include <linux/bpf.h>
>> #include <stdbool.h>
>> #include <bpf/bpf_helpers.h>
>> #include <bpf/bpf_tracing.h>
>> @@ -37,24 +37,6 @@ struct {
>> __type(value, struct dummy_storage);
>> } sk_storage_map SEC(".maps");
>>
>> -/* Using vmlinux.h causes the generated BTF to be so big that the object
>> - * load fails at btf__load.
>> - */
>> -struct sock {} __attribute__((preserve_access_index));
>> -struct sockaddr {} __attribute__((preserve_access_index));
>> -struct socket {
>> - struct sock *sk;
>> -} __attribute__((preserve_access_index));
>> -
>> -struct inode {} __attribute__((preserve_access_index));
>> -struct dentry {
>> - struct inode *d_inode;
>> -} __attribute__((preserve_access_index));
>> -struct file {
>> - struct inode *f_inode;
>> -} __attribute__((preserve_access_index));
>>
>> ./test_progs -t test_local_storage
>> libbpf: Error loading BTF: Invalid argument(22)
>> libbpf: magic: 0xeb9f
>> version: 1
>> flags: 0x0
>> hdr_len: 24
>> type_off: 0
>> type_len: 4488
>> str_off: 4488
>> str_len: 3012
>> btf_total_size: 7524
>>
>> [1] STRUCT (anon) size=32 vlen=4
>> type type_id=2 bits_offset=0
>> map_flags type_id=6 bits_offset=64
>> key type_id=8 bits_offset=128
>> value type_id=9 bits_offset=192
>> [2] PTR (anon) type_id=4
>> [3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>> [4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=28
>> [5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
>> [6] PTR (anon) type_id=7
>> [7] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
>> [8] PTR (anon) type_id=3
>> [9] PTR (anon) type_id=10
>> [10] STRUCT dummy_storage size=4 vlen=1
>> value type_id=11 bits_offset=0
>> [11] TYPEDEF __u32 type_id=12
>>
>> [... More BTF Dump ...]
>>
>> [91] TYPEDEF wait_queue_head_t type_id=175
>>
>> [... More BTF Dump ...]
>>
>> [173] FWD super_block struct
>> [174] FWD vfsmount struct
>> [175] FWD wait_queue_head struct
>> [106] STRUCT socket_wq size=128 vlen=4
>> wait type_id=91 bits_offset=0 Invalid member
>>
>> libbpf: Error loading .BTF into kernel: -22.
>> libbpf: map 'inode_storage_map': failed to create: Invalid argument(-22)
>> libbpf: failed to load object 'local_storage'
>> libbpf: failed to load BPF skeleton 'local_storage': -22
>> test_test_local_storage:FAIL:skel_load lsm skeleton failed
>> #81 test_local_storage:FAIL
>>
>> The failiure is in:
>>
>> [106] STRUCT socket_wq size=128 vlen=4
>> wait type_id=91 bits_offset=0 Invalid member
>>
>>> Clang is smart enough to trim down used types to only those that are
>>> actually necessary, so too big BTF shouldn't be a thing. But let's try
>>> to dig into this and fix whatever issue it is, before giving up :)
>>>
>> I was wrong about the size being an issue. The verifier thinks the BTF
>> is invalid and more specificially it thinks that the socket_wq's
>> member with type_id=91, i.e. typedef wait_queue_head_t is invalid. Am
>> I missing some toolchain patches?
>>
> It is invalid BTF in the sense that we have a struct, embedding a
> struct, which is only defined as a forward declaration. There is not
> enough information and such situation would have caused compilation
> error, because it's impossible to determine the size of the outer
> struct.
>
> Yonghong, it seems like Clang is pruning types too aggressively here?
> We should keep types that are embedded, even if they are not used
> directly by user code. Could you please take a look?
Yes, this is a llvm bug. The proposed patch is here.
https://reviews.llvm.org/D82041
Will merge into llvm 11 trunk after the review. Not sure
whether we can get it into llvm 10.0.1 or not as its release
date is also very close.
>
>
>
>> - KP
>>
>>
>>>> +struct sock {} __attribute__((preserve_access_index));
>>>> +struct sockaddr {} __attribute__((preserve_access_index));
>>>> +struct socket {
>>>> + struct sock *sk;
>>>> +} __attribute__((preserve_access_index));
>>>> +
>>>> +struct inode {} __attribute__((preserve_access_index));
>>>> +struct dentry {
>>>> + struct inode *d_inode;
>>>> +} __attribute__((preserve_access_index));
>>>> +struct file {
>>>> + struct inode *f_inode;
>>>> +} __attribute__((preserve_access_index));
>>>> +
>>>> +
>>> [...]
Thanks for sending a fix. Should I keep the patch as it is with a TODO
to move to vmlinux.h when LLVM is updated?
On Wed, Jun 17, 2020 at 9:19 PM Yonghong Song <[email protected]> wrote:
>
>
> On 6/16/20 12:25 PM, Andrii Nakryiko wrote:
> > On Tue, Jun 16, 2020 at 8:54 AM KP Singh <[email protected]> wrote:
> >> On 01-Jun 13:29, Andrii Nakryiko wrote:
> >>> On Tue, May 26, 2020 at 9:34 AM KP Singh <[email protected]> wrote:
> >>>> From: KP Singh <[email protected]>
> >>>>
> >>>> inode_local_storage:
> >>>>
> >>>> * Hook to the file_open and inode_unlink LSM hooks.
> >>>> * Create and unlink a temporary file.
> >>>> * Store some information in the inode's bpf_local_storage during
> >>>> file_open.
> >>>> * Verify that this information exists when the file is unlinked.
> >>>>
> >>>> sk_local_storage:
> >>>>
> >>>> * Hook to the socket_post_create and socket_bind LSM hooks.
> >>>> * Open and bind a socket and set the sk_storage in the
> >>>> socket_post_create hook using the start_server helper.
> >>>> * Verify if the information is set in the socket_bind hook.
> >>>>
> >>>> Signed-off-by: KP Singh <[email protected]>
> >>>> ---
> >>>> .../bpf/prog_tests/test_local_storage.c | 60 ++++++++
> >>>> .../selftests/bpf/progs/local_storage.c | 139 ++++++++++++++++++
> >>>> 2 files changed, 199 insertions(+)
> >>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
> >>>> create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
> >>>>
> >>> [...]
> >>>
> >>>> +struct dummy_storage {
> >>>> + __u32 value;
> >>>> +};
> >>>> +
> >>>> +struct {
> >>>> + __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
> >>>> + __uint(map_flags, BPF_F_NO_PREALLOC);
> >>>> + __type(key, int);
> >>>> + __type(value, struct dummy_storage);
> >>>> +} inode_storage_map SEC(".maps");
> >>>> +
> >>>> +struct {
> >>>> + __uint(type, BPF_MAP_TYPE_SK_STORAGE);
> >>>> + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
> >>>> + __type(key, int);
> >>>> + __type(value, struct dummy_storage);
> >>>> +} sk_storage_map SEC(".maps");
> >>>> +
> >>>> +/* Using vmlinux.h causes the generated BTF to be so big that the object
> >>>> + * load fails at btf__load.
> >>>> + */
> >>> That's first time I hear about such issue. Do you have an error log
> >>> from verifier?
> >> Here's what I get when I do the following change.
> >>
> >> --- a/tools/testing/selftests/bpf/progs/local_storage.c
> >> +++ b/tools/testing/selftests/bpf/progs/local_storage.c
> >> @@ -4,8 +4,8 @@
> >> * Copyright 2020 Google LLC.
> >> */
> >>
> >> +#include "vmlinux.h"
> >> #include <errno.h>
> >> -#include <linux/bpf.h>
> >> #include <stdbool.h>
> >> #include <bpf/bpf_helpers.h>
> >> #include <bpf/bpf_tracing.h>
> >> @@ -37,24 +37,6 @@ struct {
> >> __type(value, struct dummy_storage);
> >> } sk_storage_map SEC(".maps");
> >>
> >> -/* Using vmlinux.h causes the generated BTF to be so big that the object
> >> - * load fails at btf__load.
> >> - */
> >> -struct sock {} __attribute__((preserve_access_index));
> >> -struct sockaddr {} __attribute__((preserve_access_index));
> >> -struct socket {
> >> - struct sock *sk;
> >> -} __attribute__((preserve_access_index));
> >> -
> >> -struct inode {} __attribute__((preserve_access_index));
> >> -struct dentry {
> >> - struct inode *d_inode;
> >> -} __attribute__((preserve_access_index));
> >> -struct file {
> >> - struct inode *f_inode;
> >> -} __attribute__((preserve_access_index));
> >>
> >> ./test_progs -t test_local_storage
> >> libbpf: Error loading BTF: Invalid argument(22)
> >> libbpf: magic: 0xeb9f
> >> version: 1
> >> flags: 0x0
> >> hdr_len: 24
> >> type_off: 0
> >> type_len: 4488
> >> str_off: 4488
> >> str_len: 3012
> >> btf_total_size: 7524
> >>
> >> [1] STRUCT (anon) size=32 vlen=4
> >> type type_id=2 bits_offset=0
> >> map_flags type_id=6 bits_offset=64
> >> key type_id=8 bits_offset=128
> >> value type_id=9 bits_offset=192
> >> [2] PTR (anon) type_id=4
> >> [3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> >> [4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=28
> >> [5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
> >> [6] PTR (anon) type_id=7
> >> [7] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
> >> [8] PTR (anon) type_id=3
> >> [9] PTR (anon) type_id=10
> >> [10] STRUCT dummy_storage size=4 vlen=1
> >> value type_id=11 bits_offset=0
> >> [11] TYPEDEF __u32 type_id=12
> >>
> >> [... More BTF Dump ...]
> >>
> >> [91] TYPEDEF wait_queue_head_t type_id=175
> >>
> >> [... More BTF Dump ...]
> >>
> >> [173] FWD super_block struct
> >> [174] FWD vfsmount struct
> >> [175] FWD wait_queue_head struct
> >> [106] STRUCT socket_wq size=128 vlen=4
> >> wait type_id=91 bits_offset=0 Invalid member
> >>
> >> libbpf: Error loading .BTF into kernel: -22.
> >> libbpf: map 'inode_storage_map': failed to create: Invalid argument(-22)
> >> libbpf: failed to load object 'local_storage'
> >> libbpf: failed to load BPF skeleton 'local_storage': -22
> >> test_test_local_storage:FAIL:skel_load lsm skeleton failed
> >> #81 test_local_storage:FAIL
> >>
> >> The failiure is in:
> >>
> >> [106] STRUCT socket_wq size=128 vlen=4
> >> wait type_id=91 bits_offset=0 Invalid member
> >>
> >>> Clang is smart enough to trim down used types to only those that are
> >>> actually necessary, so too big BTF shouldn't be a thing. But let's try
> >>> to dig into this and fix whatever issue it is, before giving up :)
> >>>
> >> I was wrong about the size being an issue. The verifier thinks the BTF
> >> is invalid and more specificially it thinks that the socket_wq's
> >> member with type_id=91, i.e. typedef wait_queue_head_t is invalid. Am
> >> I missing some toolchain patches?
> >>
> > It is invalid BTF in the sense that we have a struct, embedding a
> > struct, which is only defined as a forward declaration. There is not
> > enough information and such situation would have caused compilation
> > error, because it's impossible to determine the size of the outer
> > struct.
> >
> > Yonghong, it seems like Clang is pruning types too aggressively here?
> > We should keep types that are embedded, even if they are not used
> > directly by user code. Could you please take a look?
>
> Yes, this is a llvm bug. The proposed patch is here.
>
> https://reviews.llvm.org/D82041
>
> Will merge into llvm 11 trunk after the review. Not sure
>
> whether we can get it into llvm 10.0.1 or not as its release
>
> date is also very close.
>
>
> >
> >
> >
> >> - KP
> >>
> >>
> >>>> +struct sock {} __attribute__((preserve_access_index));
> >>>> +struct sockaddr {} __attribute__((preserve_access_index));
> >>>> +struct socket {
> >>>> + struct sock *sk;
> >>>> +} __attribute__((preserve_access_index));
> >>>> +
> >>>> +struct inode {} __attribute__((preserve_access_index));
> >>>> +struct dentry {
> >>>> + struct inode *d_inode;
> >>>> +} __attribute__((preserve_access_index));
> >>>> +struct file {
> >>>> + struct inode *f_inode;
> >>>> +} __attribute__((preserve_access_index));
> >>>> +
> >>>> +
> >>> [...]