Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp4831205rwb; Tue, 6 Sep 2022 13:29:42 -0700 (PDT) X-Google-Smtp-Source: AA6agR5i6nScZ7bF4BOq7oqiaq0jQhrxkPR/DIC+RBdVDC1YMi6OSTHw8b8DrD7K8bnv7ZH+nO7q X-Received: by 2002:a17:902:8d88:b0:175:368a:5e1e with SMTP id v8-20020a1709028d8800b00175368a5e1emr45318plo.5.1662496182124; Tue, 06 Sep 2022 13:29:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662496182; cv=none; d=google.com; s=arc-20160816; b=LmLmzAkwmuXr9xqthE6mKYu51nsqkjw2qL8qpPi0/SuLte3YsvM3rgh2It/pa6SPSh pT5VqeGjd298xDGD4piIvKEgrK7/OAESqDbkst4ec3G/7ZNWT6+0+4bQnKAVPpqlPRWa wLj2lg5SfR8snCVPEnlxbL+q++MFTMm9gqqgXMpKXTN1UEKE2Pg3oiiTQnrRLJ34HaA9 zdp0T/gODzApKvijXrJUTUfUHLr/15kz+Z8IGSEJLS74syFoiIL4b5DUWMX5TIMU5583 g59YzPyRqk/UVGeduNYSNToH9mH7sgvy4iqA6XFR7CENfjImXzo87Vdzu8uFQ5ZWabpL hBhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=Y3tZA2UfXIUdO651eZpQHa6bop0UrWXdjXGmdZD7XUw=; b=XgnIS9QCZrkYWnHjEJueQQLwH1Q0hvJYkXisBybyE+vCPYZGEn4BGNbrmlmKjzwo5H Jzr+I47J8LCFRT1X57k9h3moD/yfEPjQgXHZuJZ1xaW7z6hcXfs60UDLiqPB3qpe7Um5 GzEv3ynfNwrG/DZwLcMDH4+nsNRHXIS8Yz5Y5aoSG6Wjc2JMdSs4T1es0MVwdD7udJsl y3abZPaEgx6Puy9Cw2wO2UeI9rAZlRe+kGVpllCi2Tl2QdDxOTTAvIbpHSdMWNeGIBYt i2zTa/bwn0nA9JbqZKRZ6MaeyIkRkkNl+CbNbNlyDJSgrbq6zeNILnIKTAt++2nqbwNe 0rbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=I0ImiMk2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l64-20020a638843000000b00434a2cf8d6asi3307629pgd.97.2022.09.06.13.29.29; Tue, 06 Sep 2022 13:29:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=I0ImiMk2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230390AbiIFUKm (ORCPT + 99 others); Tue, 6 Sep 2022 16:10:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230087AbiIFUKP (ORCPT ); Tue, 6 Sep 2022 16:10:15 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 35E3DBC80F; Tue, 6 Sep 2022 13:05:14 -0700 (PDT) Received: from pwmachine.numericable.fr (85-170-34-72.rev.numericable.fr [85.170.34.72]) by linux.microsoft.com (Postfix) with ESMTPSA id C7ACA20B929C; Tue, 6 Sep 2022 12:58:21 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com C7ACA20B929C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1662494306; bh=Y3tZA2UfXIUdO651eZpQHa6bop0UrWXdjXGmdZD7XUw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=I0ImiMk2OSzmHAF9EcIea/Iv6FlA7LB/hWbPy9LpmWBxA/OWCrauZk3R6t9VWJC0j 3itEgHNH7uQHYpyP7WU01yZH1+GyGnywJA1qfj5ePpesxIZc6DIPD0paS1A2iERyfe wJx8sQQLjg4QvCTa3T51mNdXaCtjSca+V+FF1wrE= From: Francis Laniel To: bpf@vger.kernel.org Cc: Francis Laniel , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Jonathan Corbet , Mykola Lysenko , Shuah Khan , Joanne Koong , Dave Marchevsky , Lorenzo Bianconi , Maxim Mikityanskiy , Geliang Tang , "Naveen N. Rao" , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH v2 1/5] bpf: Make ring buffer overwritable. Date: Tue, 6 Sep 2022 21:56:42 +0200 Message-Id: <20220906195656.33021-2-flaniel@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220906195656.33021-1-flaniel@linux.microsoft.com> References: <20220906195656.33021-1-flaniel@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org By default, BPF ring buffer are size bounded, when producers already filled the buffer, they need to wait for the consumer to get those data before adding new ones. In terms of API, bpf_ringbuf_reserve() returns NULL if the buffer is full. This patch permits making BPF ring buffer overwritable. When producers already wrote as many data as the buffer size, they will begin to over write existing data, so the oldest will be replaced. As a result, bpf_ringbuf_reserve() never returns NULL. To avoid memory consumption, this patch writes data backward like overwritable perf ring buffer added in commit 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf event"). Signed-off-by: Francis Laniel --- include/uapi/linux/bpf.h | 3 +++ kernel/bpf/ringbuf.c | 43 ++++++++++++++++++++++++++++++---------- 2 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 59a217ca2dfd..c87a667649ab 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1227,6 +1227,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create an overwritable BPF_RINGBUF */ + BFP_F_RB_OVERWRITABLE = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index ded4faeca192..369c61cfe8aa 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -12,7 +12,7 @@ #include #include -#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE) +#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE | BFP_F_RB_OVERWRITABLE) /* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */ #define RINGBUF_PGOFF \ @@ -37,6 +37,8 @@ struct bpf_ringbuf { u64 mask; struct page **pages; int nr_pages; + __u8 overwritable: 1, + __reserved: 7; spinlock_t spinlock ____cacheline_aligned_in_smp; /* Consumer and producer counters are put into separate pages to allow * mapping consumer page as r/w, but restrict producer page to r/o. @@ -127,7 +129,12 @@ static void bpf_ringbuf_notify(struct irq_work *work) wake_up_all(&rb->waitq); } -static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) +static inline bool is_overwritable(struct bpf_ringbuf *rb) +{ + return !!rb->overwritable; +} + +static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node, __u32 flags) { struct bpf_ringbuf *rb; @@ -142,6 +149,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) rb->mask = data_sz - 1; rb->consumer_pos = 0; rb->producer_pos = 0; + rb->overwritable = !!(flags & BFP_F_RB_OVERWRITABLE); return rb; } @@ -170,7 +178,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr) bpf_map_init_from_attr(&rb_map->map, attr); - rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node); + rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node, attr->map_flags); if (!rb_map->rb) { kfree(rb_map); return ERR_PTR(-ENOMEM); @@ -248,6 +256,7 @@ static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) cons_pos = smp_load_acquire(&rb->consumer_pos); prod_pos = smp_load_acquire(&rb->producer_pos); + return prod_pos - cons_pos; } @@ -325,14 +334,24 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) } prod_pos = rb->producer_pos; - new_prod_pos = prod_pos + len; - /* check for out of ringbuf space by ensuring producer position - * doesn't advance more than (ringbuf_size - 1) ahead - */ - if (new_prod_pos - cons_pos > rb->mask) { - spin_unlock_irqrestore(&rb->spinlock, flags); - return NULL; + if (!is_overwritable(rb)) { + new_prod_pos = prod_pos + len; + + /* check for out of ringbuf space by ensuring producer position + * doesn't advance more than (ringbuf_size - 1) ahead + */ + if (new_prod_pos - cons_pos > rb->mask) { + spin_unlock_irqrestore(&rb->spinlock, flags); + return NULL; + } + } else { + /* + * With overwritable ring buffer we go from the end toward the + * beginning. + */ + prod_pos -= len; + new_prod_pos = prod_pos; } hdr = (void *)rb->data + (prod_pos & rb->mask); @@ -457,10 +476,14 @@ BPF_CALL_2(bpf_ringbuf_query, struct bpf_map *, map, u64, flags) switch (flags) { case BPF_RB_AVAIL_DATA: + if (is_overwritable(rb)) + return -EINVAL; return ringbuf_avail_data_sz(rb); case BPF_RB_RING_SIZE: return rb->mask + 1; case BPF_RB_CONS_POS: + if (is_overwritable(rb)) + return -EINVAL; return smp_load_acquire(&rb->consumer_pos); case BPF_RB_PROD_POS: return smp_load_acquire(&rb->producer_pos); -- 2.25.1