Received: by 2002:a89:288:0:b0:1f7:eeee:6653 with SMTP id j8csp104573lqh; Mon, 6 May 2024 12:41:59 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXIXOBAQeoNxgxWLlYFQjA1uE4dl7hpGV2dSLgHI4CoxEzRCM7dOsV1GHy3kbZD95pNxH33mZYNNchsicplbEctms15UJMQ1WqrkcPFYQ== X-Google-Smtp-Source: AGHT+IFaY7KHraW/31+cv1gPz0WEaeRYT9qQfREa8870nirxjZIxgi3DneedulHZTFbUTuwUcizF X-Received: by 2002:a05:6870:65a9:b0:23c:6020:6733 with SMTP id fp41-20020a05687065a900b0023c60206733mr13420146oab.27.1715024519477; Mon, 06 May 2024 12:41:59 -0700 (PDT) Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id q11-20020a63e20b000000b0061a13c6902bsi8883641pgh.446.2024.05.06.12.41.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 12:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-170351-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@redhat.com header.s=mimecast20190719 header.b=SunZxGZo; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-170351-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-170351-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id D4749283BFB for ; Mon, 6 May 2024 19:41:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9C9BD15ADB6; Mon, 6 May 2024 19:41:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SunZxGZo" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37AED5FDA5 for ; Mon, 6 May 2024 19:41:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715024506; cv=none; b=giKfXnQNHqT8+TwvZbftZz7qKaFXrKxGcCAeG8u6jm7GR/yLvQHjiKHy9qWT2bgmp8kWr20IG1AYHEEHsH07DrpX7iy9OVwpUOt9JVSlgdYvp27rYfe8xmxmGhVP1vqeqplyzBA4bt74mtQz7S0J1CKH3RQISqkUzyhJ2pLpsDE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715024506; c=relaxed/simple; bh=PrBhS5DyGnBu39z/bV+pMeXm6EsKEDKCW+dJYY8AEjI=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=meQ6TxiTUZZWTeq8xS7DbyR32QB21QzdRLcGnn9SSPRqQzOED4wKgf4AZvu2dP2+AZ0otVvga55+4ThtUSevGfCDX+bofVKY389pWt1Qu6oMs4VJhSHuhYFGeg+A5qpifxHoOLGELvyLed8zz1r4YsUrEGH2bPAC43RT4v1bh9A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SunZxGZo; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1715024504; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DY+sYZC8mturY+dnlOrnmUuxCVCJqkhdnmld3ulR81I=; b=SunZxGZotsJolMnWS30YjV394mjL70YkVHY4QY3hY9BDXU66A6z6Koj8kRa2zCrGlbRawF g+ijPI/9rKv77+S/DAG9n42vUbVmGv9Gm/m2yWaBmPjkNR6HyaFYF1u8XyBWFXMPA48igy +GxQKQIIsFkHY0vSE5x62QbqJDvAgoc= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-212-7SQpAIxoOySURcpRTbp5NQ-1; Mon, 06 May 2024 15:41:41 -0400 X-MC-Unique: 7SQpAIxoOySURcpRTbp5NQ-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-34b0b409775so836129f8f.2 for ; Mon, 06 May 2024 12:41:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715024500; x=1715629300; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DY+sYZC8mturY+dnlOrnmUuxCVCJqkhdnmld3ulR81I=; b=hOmhxQjPV64FW94DFKGejTl/7EREKalVcAk7lFEJPIYLN1t4EAwBdNDBIOjIMuZbJL oNhA6HJLEAPBSqMbG6/Hi9wG0m0rnXq7YX37ojXQOyydMfoxYBZFCgr3bXn8o/nD3EBr 2WbLp5HR6eN72z991uJffk1thykqwxZeDvCuL4w9cWOycI74YbBo5xTUXo38+RwKezXa /8kXjIndXeeogQIj+ON2qrsAsSDtza+V/X1TFnQrCdQbOb86kcFN2eLeeg75zrZQJg2g cm/6LyhwegCrqN6TaWFtoXiI7cPD7qHK7KPHhmE1I789WZKRK0cxCPA1TKzWCe8D36vX mHJg== X-Forwarded-Encrypted: i=1; AJvYcCWAYbmY0FCuv15OyLQDUgFfYLpyqD5rhdX7wNUQ9g5K/EGn5VbwGAzw7DFXXtIQnEm8yaSzmhRi7mpXFXbZyiatKdAA5FSrf7wIzwyv X-Gm-Message-State: AOJu0YwijvGLfU0h3gh2i0a0VM/58V79NvYIuyCeHc69kmShS+RPCC+I 4BWA74uiVS+qNgN9OwH5xKVcKN040C+HPGYkr4fC6l6m0OcXLvhz6uQ2IEQguRVjMawby/Ob4/4 5xpjw8+MPD4BLdrq+EbZdbUkPLHoVSlk7325hU0wd7tmUHlgSOhQPN2bb29Hn1A== X-Received: by 2002:a5d:4404:0:b0:34d:354:b9ba with SMTP id z4-20020a5d4404000000b0034d0354b9bamr7042491wrq.30.1715024500706; Mon, 06 May 2024 12:41:40 -0700 (PDT) X-Received: by 2002:a5d:4404:0:b0:34d:354:b9ba with SMTP id z4-20020a5d4404000000b0034d0354b9bamr7042472wrq.30.1715024500275; Mon, 06 May 2024 12:41:40 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id f6-20020a5d58e6000000b0034dd063e8dasm11301564wrd.86.2024.05.06.12.41.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 12:41:39 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id D92311275C73; Mon, 06 May 2024 21:41:38 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: Re: [PATCH net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. In-Reply-To: <20240503182957.1042122-15-bigeasy@linutronix.de> References: <20240503182957.1042122-1-bigeasy@linutronix.de> <20240503182957.1042122-15-bigeasy@linutronix.de> X-Clacks-Overhead: GNU Terry Pratchett Date: Mon, 06 May 2024 21:41:38 +0200 Message-ID: <87y18mohhp.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Sebastian Andrzej Siewior writes: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it. Use the __free() annotation to automatically reset the pointer once > function returns. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. > > On PREEMPT_RT the pointer to bpf_net_context is saved task's > task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU > variable (which is always NODE-local memory). Using always the > bpf_net_context approach has the advantage that there is almost zero > differences between PREEMPT_RT and non-PREEMPT_RT builds. Did you ever manage to get any performance data to see if this has an impact? [...] > +static inline struct bpf_net_context *bpf_net_ctx_get(void) > +{ > + struct bpf_net_context *bpf_net_ctx = this_cpu_read(bpf_net_context); > + > + WARN_ON_ONCE(!bpf_net_ctx); If we have this WARN... > +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) > +{ > + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); > + > + if (!bpf_net_ctx) > + return NULL; .. do we really need all the NULL checks? (not just here, but in the code below as well). I'm a little concerned that we are introducing a bunch of new branches in the XDP hot path. Which is also why I'm asking for benchmarks :) [...] > + /* ri->map is assigned in __bpf_xdp_redirect_map() from within a eBPF > + * program/ during NAPI callback. It is used during > + * xdp_do_generic_redirect_map()/ __xdp_do_redirect_frame() from the > + * redirect callback afterwards. ri->map is cleared after usage. > + * The path has no explicit RCU read section but the local_bh_disable() > + * is also a RCU read section which makes the complete softirq callback > + * RCU protected. This in turn makes ri->map RCU protocted and it is s/protocted/protected/ > + * sufficient to wait a grace period to ensure that no "ri->map == map" > + * exist. dev_map_free() removes the map from the list and then s/exist/exists/ -Toke