Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp433118pxa; Wed, 19 Aug 2020 05:43:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJySJujys67f/++m48v85OhQh3mg67EZfG/+ucYIqeMMg1CfVt3PnCHb1jBUtNCk4PNhHBv9 X-Received: by 2002:a17:906:2e0c:: with SMTP id n12mr24352679eji.35.1597840992152; Wed, 19 Aug 2020 05:43:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597840992; cv=none; d=google.com; s=arc-20160816; b=wtePQ9ZZ7yRRngzJm7DzVZmNK5gi50hvIithPwpEVJKFIFl8wACBYPj7UblPym/YT1 egxFOzTPTfMDuQuzcCdxbbLQmhBNLubW9kKXzR+V2iXvmUIQf4Ue4V+NBwUno+KriA7i FmNvLHHHJl/UisElJplTgflvWxiHzyrRH1Bjh2rRokAnRi1ZaK+kqBCG76O22MNrhg7w ClrhmWBDzS72rCFM8Jk0j605kgSdVjAPc+WXQ5T8HomJpAd/t7gqEK1gdkyOpYbTTNo8 Dwk5w3eo8BC+B89A+V+R5HbTmUU19swtNFEICWa19gbkHRVdqRjuYdJb7ECrldU5XHG+ pqkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=/y3jRYjISN6SJ5xQ83KSW3ddpwc4GdyAtPJH5CRfRJ0=; b=08a/0Mg7nYeRixRGfq7nqEFIB7l5V6+RzS2hwBBv/tci7vfW/NjmmLXi2dWMkQF5RJ i33d4dZ7w+iCEg7JfhZg7dDGqX7Dx9E8mQAlBoV3DAvoLEiRvwDne7kMWdc/2+1sG7qL ucef/W0j16m19DtOg5X2i/bc0+OFFfPCzJgE6ChRmSCPETDn9hNMFBhw7u36/BwEDvRP pyANacMTx55nC4x0eYYp/uEJkTYv1RBDOVZBEonKbATG4OJ9uVU+SE287AemEMvYtl2J dNtZJNzRmGiKZ39l52iL4qYMF6HSHRg3Gl7pAESbmb9VxuCtjXZRFqwODxxah/RgBhDV 8SPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=MMwcAkNP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y24si15753662eje.87.2020.08.19.05.42.46; Wed, 19 Aug 2020 05:43:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=MMwcAkNP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726961AbgHSMmF (ORCPT + 99 others); Wed, 19 Aug 2020 08:42:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728189AbgHSMlz (ORCPT ); Wed, 19 Aug 2020 08:41:55 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59078C061342 for ; Wed, 19 Aug 2020 05:41:54 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id o18so26094871eje.7 for ; Wed, 19 Aug 2020 05:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=/y3jRYjISN6SJ5xQ83KSW3ddpwc4GdyAtPJH5CRfRJ0=; b=MMwcAkNPoxSXmPkz0pR7sTwo66Xg5xE8uIJAgcgY7tIE1K+xayzXU/yLxzn1EmRPBz cqK+MOxm4aZbhj43T3YSsPW3XM1y4yq7glyAHROQ4b1KQRxmH1gLlBRq+rhvyuZrYS+x bdxpA7wELpjNG4SRngDWN43obystfj96snvms= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/y3jRYjISN6SJ5xQ83KSW3ddpwc4GdyAtPJH5CRfRJ0=; b=kgo7fML64d9Qa1I178wQr32aLdEwOkt5fj/iwFr+vVOhaiyVQj1Av5rdi3tD03pdYK JOynY1ZYkWKr8Er5d9Y87V/Z72hRIVBGbQxAkrd87L+AnSxd1w9gHIoV/xGOM/qze9tN Tp2VOtCqnimrwtWWrwcAA8HkUNTnXBXqcHzBo94VBWROWvioFwm8ONpNg2LAFH9FTQSw ySZI9qLMRn6Fsa1mlHssC3pIgYcNLpQqLMp79cwYeIUz1+m5DHqudzSOhoJAyel8bVU5 HnOxtwNekQR3WHu4bcm3lfzTLxig78Iw11Q1jiRRwLelpKApueQyMUPT4AFs+7hG0iTX O8sQ== X-Gm-Message-State: AOAM5301dYtSUiH7tdMDzPw7eE9LqDWFIsuClHshr3eVuCH9RsAAuMIb EO6bv//qPOMw01eb0+4cM9EGrw== X-Received: by 2002:a17:906:86c9:: with SMTP id j9mr24478953ejy.5.1597840912462; Wed, 19 Aug 2020 05:41:52 -0700 (PDT) Received: from [192.168.2.66] ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id t3sm17820566edq.26.2020.08.19.05.41.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Aug 2020 05:41:51 -0700 (PDT) Subject: Re: [PATCH bpf-next v8 3/7] bpf: Generalize bpf_sk_storage To: Martin KaFai Lau Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest References: <20200803164655.1924498-1-kpsingh@chromium.org> <20200803164655.1924498-4-kpsingh@chromium.org> <20200818010545.iix72le4tkhuyqe5@kafai-mbp.dhcp.thefacebook.com> From: KP Singh Message-ID: <6cb51fa0-61a5-2cf6-b44d-84d58d08c775@chromium.org> Date: Wed, 19 Aug 2020 14:41:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200818010545.iix72le4tkhuyqe5@kafai-mbp.dhcp.thefacebook.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/18/20 3:05 AM, Martin KaFai Lau wrote: > On Mon, Aug 03, 2020 at 06:46:51PM +0200, KP Singh wrote: >> From: KP Singh >> >> Refactor the functionality in bpf_sk_storage.c so that concept of >> storage linked to kernel objects can be extended to other objects like >> inode, task_struct etc. >> >> Each new local storage will still be a separate map and provide its own >> set of helpers. This allows for future object specific extensions and >> still share a lot of the underlying implementation. >> >> This includes the changes suggested by Martin in: >> >> https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ >> >> which adds map_local_storage_charge, map_local_storage_uncharge, >> and map_owner_storage_ptr. > A description will still be useful in the commit message to talk > about the new map_ops, e.g. > they allow kernel object to optionally have different mem-charge strategy. > >> >> Co-developed-by: Martin KaFai Lau >> Signed-off-by: KP Singh >> --- >> include/linux/bpf.h | 9 ++ >> include/net/bpf_sk_storage.h | 51 +++++++ >> include/uapi/linux/bpf.h | 8 +- >> net/core/bpf_sk_storage.c | 246 +++++++++++++++++++++------------ >> tools/include/uapi/linux/bpf.h | 8 +- >> 5 files changed, 233 insertions(+), 89 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index cef4ef0d2b4e..8e1e23c60dc7 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -34,6 +34,9 @@ struct btf_type; >> struct exception_table_entry; >> struct seq_operations; >> struct bpf_iter_aux_info; >> +struct bpf_local_storage; >> +struct bpf_local_storage_map; >> +struct bpf_local_storage_elem; > "struct bpf_local_storage_elem" is not needed. True, I moved it to bpf_sk_storage.h because it's needed there. > >> >> extern struct idr btf_idr; >> extern spinlock_t btf_idr_lock; >> @@ -104,6 +107,12 @@ struct bpf_map_ops { >> __poll_t (*map_poll)(struct bpf_map *map, struct file *filp, >> struct poll_table_struct *pts); >> >> + /* Functions called by bpf_local_storage maps */ >> + int (*map_local_storage_charge)(struct bpf_local_storage_map *smap, >> + void *owner, u32 size); >> + void (*map_local_storage_uncharge)(struct bpf_local_storage_map *smap, >> + void *owner, u32 size); >> + struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); [...] >> + struct bpf_local_storage_map *smap, >> + struct bpf_local_storage_elem *first_selem); >> + >> +struct bpf_local_storage_data * >> +bpf_local_storage_update(void *owner, struct bpf_map *map, void *value, > Nit. It may be more consistent to take "struct bpf_local_storage_map *smap" > instead of "struct bpf_map *map" here. > > bpf_local_storage_map_check_btf() will be the only one taking > "struct bpf_map *map". That's because it is used in map operations as map_check_btf which expects a bpf_map *map pointer. We can wrap it in another function but is that worth doing? > >> + u64 map_flags); >> + >> #ifdef CONFIG_BPF_SYSCALL >> int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk); >> struct bpf_sk_storage_diag * >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index b134e679e9db..35629752cec8 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -3647,9 +3647,13 @@ enum { >> BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), >> }; >> >> -/* BPF_FUNC_sk_storage_get flags */ >> +/* BPF_FUNC__storage_get flags */ > BPF_FUNC__storage_get flags? > Done. >> enum { >> - BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), >> + BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0), >> + /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility >> + * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead. >> + */ >> + BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE, >> }; >> >> /* BPF_FUNC_read_branch_records flags. */ >> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c >> index 99dde7b74767..bb2375769ca1 100644 >> --- a/net/core/bpf_sk_storage.c >> +++ b/net/core/bpf_sk_storage.c >> @@ -84,7 +84,7 @@ struct bpf_local_storage_elem { >> struct bpf_local_storage { >> struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; >> struct hlist_head list; /* List of bpf_local_storage_elem */ >> - struct sock *owner; /* The object that owns the the above "list" of [...] >> } >> >> -/* sk_storage->lock must be held and selem->sk_storage == sk_storage. >> +/* local_storage->lock must be held and selem->sk_storage == sk_storage. > This name change belongs to patch 1. > > Also, > selem->"local_"storage == "local_"storage Done. > >> * The caller must ensure selem->smap is still valid to be >> * dereferenced for its smap->elem_size and smap->cache_idx. >> */ > > [ ... ] > >> @@ -367,7 +401,7 @@ static int sk_storage_alloc(struct sock *sk, >> /* Note that even first_selem was linked to smap's >> * bucket->list, first_selem can be freed immediately >> * (instead of kfree_rcu) because >> - * bpf_sk_storage_map_free() does a >> + * bpf_local_storage_map_free() does a [...] >> kfree(selem); >> - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); >> + mem_uncharge(smap, owner, smap->elem_size); >> return ERR_PTR(err); >> } >> >> @@ -430,8 +464,8 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, >> * such that it can avoid taking the local_storage->lock >> * and changing the lists. >> */ >> - old_sdata = >> - bpf_local_storage_lookup(local_storage, smap, false); >> + old_sdata = bpf_local_storage_lookup(local_storage, smap, >> + false); > Pure indentation change. The same line has been changed in patch 1. Please change > the identation in patch 1 if the above way is preferred. I removed this change. > >> err = check_flags(old_sdata, map_flags); >> if (err) >> return ERR_PTR(err); >> @@ -475,7 +509,7 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, >> * old_sdata will not be uncharged later during >> * bpf_selem_unlink_storage(). >> */ >> - selem = bpf_selem_alloc(smap, sk, value, !old_sdata); >> + selem = bpf_selem_alloc(smap, owner, value, !old_sdata); >> if (!selem) { >> err = -ENOMEM; >> goto unlock_err; >> @@ -567,7 +601,7 @@ void bpf_sk_storage_free(struct sock *sk) >> * Thus, no elem can be added-to or deleted-from the >> * sk_storage->list by the bpf_prog or by the bpf-map's syscall. >> * >> - * It is racing with bpf_sk_storage_map_free() alone >> + * It is racing with bpf_local_storage_map_free() alone > This name change belongs to patch 1 also. Done. > >> * when unlinking elem from the sk_storage->list and >> * the map's bucket->list. >> */ >> @@ -587,17 +621,12 @@ void bpf_sk_storage_free(struct sock *sk) >> kfree_rcu(sk_storage, rcu); >> } >> >> -static void bpf_local_storage_map_free(struct bpf_map *map) [..] >> >> /* bpf prog and the userspace can no longer access this map >> * now. No new selem (of this map) can be added >> - * to the sk->sk_bpf_storage or to the map bucket's list. >> + * to the bpf_local_storage or to the map bucket's list. > s/bpf_local_storage/owner->storage/ Done. > >> * >> * The elem of this map can be cleaned up here >> * or >> - * by bpf_sk_storage_free() during __sk_destruct(). >> + * by bpf_local_storage_free() during the destruction of the >> + * owner object. eg. __sk_destruct. > This belongs to patch 1 also. In patch, 1, changed it to: * The elem of this map can be cleaned up here * or when the storage is freed e.g. * by bpf_sk_storage_free() during __sk_destruct(). > >> */ >> for (i = 0; i < (1U << smap->bucket_log); i++) { >> b = &smap->buckets[i]; >> @@ -627,22 +657,31 @@ static void bpf_local_storage_map_free(struct bpf_map *map) >> rcu_read_unlock(); >> } >> >> - /* bpf_sk_storage_free() may still need to access the map. >> - * e.g. bpf_sk_storage_free() has unlinked selem from the map >> + /* bpf_local_storage_free() may still need to access the map. > It is confusing. There is no bpf_local_storage_free(). /* While freeing the storage we may still need to access the map. * * e.g. when bpf_sk_storage_free() has unlinked selem from the map * which then made the above while((selem = ...)) loop * exited immediately. > >> + * e.g. bpf_local_storage_free() has unlinked selem from the map >> * which then made the above while((selem = ...)) loop >> * exited immediately. >> * >> - * However, the bpf_sk_storage_free() still needs to access >> + * However, the bpf_local_storage_free() still needs to access > Same here. With the change above, this can stay bpf_sk_storage_free > >> * the smap->elem_size to do the uncharging in >> * bpf_selem_unlink_storage(). >> * >> * Hence, wait another rcu grace period for the >> - * bpf_sk_storage_free() to finish. >> + * bpf_local_storage_free() to finish. > and here. and this too can stay bpf_sk_storage_free > >> */ >> synchronize_rcu(); >> >> kvfree(smap->buckets); >> - kfree(map); >> + kfree(smap); >> +} >> + >> +static void sk_storage_map_free(struct bpf_map *map) >> +{ [...] _attr *attr) >> raw_spin_lock_init(&smap->buckets[i].lock); >> } >> >> - smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size; >> - smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); >> + smap->elem_size = >> + sizeof(struct bpf_local_storage_elem) + attr->value_size; > Same line has changed in patch 1. Change the indentation in patch 1 also > if the above way is desired. Done. > Others LGTM. >