Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4932230imm; Tue, 31 Jul 2018 02:37:45 -0700 (PDT) X-Google-Smtp-Source: AAOMgpe0YJEyP74IveA7KgrRU5jbz8j4VH1uZR3Yd0a4Ta/i97A5lBNPJInyg/5n+qwnF9nbfEQP X-Received: by 2002:a63:4450:: with SMTP id t16-v6mr19710638pgk.102.1533029865102; Tue, 31 Jul 2018 02:37:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533029865; cv=none; d=google.com; s=arc-20160816; b=Kf+5G8PkcjS25AHTxWd8Uv1TN5g2vUjumCh+3vZtrw5VtJwKxyJ3QpNGVlLv0lQOTY MIXi5qcS2axCbNETVxGpKnMM6RPy0j+KXmErloSf5tw6oVOHJ5S/1ay3gT2RK/MvA+Rw RgxAljM8BQlgXkq8/bPgbh1SRF8hqFgd3HThhDH0YuAtOESdH08u8kF3hw+Y0xlhVuzL Zfh6Zw2ydiTwwbh6g0ygE8LHwe0E+3/UpY2QfhEqRxbkLOQ8sXtQX7bGtCZPsa0OAuat MYEQPCflCipUdOR1VuANzg/3WHj+EQtxh2P5MRkLTpwrfiFs0SM8bDcfaC4F8WeSH3NP TUbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=85TeqeXnlQijhRhmSXHYFUyoSms6CWO9ewd8YQdrlzk=; b=AEuULsuh32cjmf5wESfoDmIeftZhKmR1Fts64RetgAF1awPiWTKc41UR9R04lar55W TuJvy4OOvvF+O8eywrperj4WV/t4DdwVg4nxMO4TIiOeWsqImSnoxd384DNr9q/oi4hH jirUrB6oVIrHczITu7lJvotxCcSjn4lmRIusPepLWn82KFB09qPf0m9gXQ/Ok9vEcKui jO+H5gy1KQQMacWRU3Ifu2pQnRrnzIBrpi/N1KLjC6HxTUPKyUnIbxT4qdEwtiAsTPDt 4URuB2JkrIBo9+QKkqtXSgRE5fJ1eSa2JteGzodix3wdRGTkuf1NoYKXry7HaSfv/IK7 5vXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=S4q93Y6s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s65-v6si12633774pgb.486.2018.07.31.02.37.30; Tue, 31 Jul 2018 02:37:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=S4q93Y6s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731953AbeGaLQI (ORCPT + 99 others); Tue, 31 Jul 2018 07:16:08 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:34899 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730513AbeGaLQI (ORCPT ); Tue, 31 Jul 2018 07:16:08 -0400 Received: by mail-io0-f193.google.com with SMTP id w11-v6so12428353iob.2 for ; Tue, 31 Jul 2018 02:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=85TeqeXnlQijhRhmSXHYFUyoSms6CWO9ewd8YQdrlzk=; b=S4q93Y6s7UFrG8UJGPAoolFC8DWugF6wH1JyNoUi9eeit9cQmDLgtks8O9PAlMQg2p 6/f+qhQwWsPswsdrSs8o2aoGXNiqsvnq25cfLl3S8xqUbiLquzhosMomPbAuEekwexNi AKg83W2XGMPJxoe9p4SMEF+z/gkHMaMIVddSeVbuIsVTtuK6TmtToCqcHBI0fBNi3t+g Bl7nsUFks1d6argU0VTBhQITd5Y/kMP777YJjenCP/aCja+M9uwUkQm/7X2myLowvJ4Q xq6XzmuyoTYKT1kIPAQd4ixBpkJCvdf5UrGPJykJeFpDiPYZcQzkZgbdGMNu/DwTFtcN WZww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=85TeqeXnlQijhRhmSXHYFUyoSms6CWO9ewd8YQdrlzk=; b=KxQbCIBo4GJAE8F0IvN6+tkWKhNq57H3dVnYfDUpFy0X6zDT1o/u8b8v+/mv8I9nkx A/nmnpfCC12EcVOpJcOeAh3V3lX3cMqGJMRxo/e/Y/DMa0BiO5RGNtZZ4VKMGY2QPScO brLXM6f/njiRq9UstiSSIoMl3NmZrhLnutkezmwNDqwXK0YAb+4ejFxFt9xaZMd5AEat biffoeztRzSppqUJkrCKrFHNtrLHfaQSTKT51f3XcDZMmzXRni36sMVAic+ME46PpPgM W91/jzDRH21ZbrZBpcDA8T/FU5FNVSZXG9AiKV7GHhMBlqGqCtyR9KrtJM2e3QCeUAco wJ5A== X-Gm-Message-State: AOUpUlEyTMg/OJWQQiK0j42rSrcHoxTqJXHu78HJZrhwkqqZHOWQKMa6 vnPtCSfrSf5pWM4Llu7sUqlr2xiJmxaYWVxKN6JYqA== X-Received: by 2002:a6b:b010:: with SMTP id z16-v6mr16432926ioe.206.1533029799948; Tue, 31 Jul 2018 02:36:39 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4f:ba01:0:0:0:0:0 with HTTP; Tue, 31 Jul 2018 02:36:39 -0700 (PDT) In-Reply-To: <67423232-be56-fd47-06e6-394812c2b918@iogearbox.net> References: <20180729205835.34850-1-dancol@google.com> <20180730172641.7d516231@cakuba.netronome.com> <67423232-be56-fd47-06e6-394812c2b918@iogearbox.net> From: Daniel Colascione Date: Tue, 31 Jul 2018 02:36:39 -0700 Message-ID: Subject: Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command To: Daniel Borkmann Cc: Jakub Kicinski , Joel Fernandes , linux-kernel , Tim Murray , netdev , Alexei Starovoitov , Lorenzo Colitti , Chenbo Feng , Mathieu Desnoyers , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 31, 2018 at 1:34 AM, Daniel Borkmann wrote: > On 07/31/2018 02:33 AM, Daniel Colascione wrote: >> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski >> wrote: >>> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: >>>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann wrote: >>>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want >>>>> a similar mechanism in future for other maps, we would need a whole new bpf >>>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though >>>>> the underlying map may not even be a map-to-map. Additionally, we don't have >>>>> any map object at hand in the above, so we couldn't make any finer grained >>>>> decisions either. Something like below would be more suitable and leaves room >>>>> for extending this further in future. >>>> >>>> YAGNI. Your proposed mechanism doesn't add anything under the current >>>> implementation. >>> >>> FWIW in case of HW offload targeting a particular map may allow users >>> to avoid a potentially slow sync with all the devices on the system. >> >> Sure. But such a thing doesn't exist right now (right?), and we can >> add that more-efficient-in-that-one-case BPF interface when it lands. >> I'd rather keep things simple for now. > > I don't see a reason why that is even more complicated. Both the API and the implementation are much more complicated in the per-map ops version: just look at the patch size. The size argument isn't necessarily a dealbreaker, but I still don't see what the extra code size and complexity is buying. > An API command name > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and > exposes specific map details (here: map-in-map) into the UAPI whereas it > should reside within a specific implementation instead similar to other ops > we have for maps. But synchronize isn't conceptually a command that applies to a specific map. It waits on all references. Did you address my point about your proposed map-specific interface requiring redundant synchronize_rcu calls in the case where we swap multiple maps and want to wait for all the references to drain? Under my proposal, you'd just BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your proposal, we'd make it a per-map operation, so we'd issue one synchronize_rcu per map. > If in future other maps would be added that would have > similar mechanisms of inner objects they return to the BPF program, we'll > be adding yet another command just for this. And that's why my personal preference is to just calling this thing BPF_SYNCHRONIZE, which I'd define to wait for all such "inner objects". Alexei is the one who asked for the very specific naming, I believe. Anyway, we have a very simple patch that we could apply today. It addresses a real need, and it doesn't preclude adding something more specific later, when we know we need it. Besides, it's not as if adding a BPF command is particularly expensive. > Also, union bpf_attr is extensible, > e.g. additional members could be added in future whenever needed for this > subcommand instead of forcing it to NULL as done here. We fail with EINVAL when attr != NULL now, which means that we can safely accept a non-NULL attr-based subcommand later without breaking anyone. The interface is already extensible. > All I'm saying is to > keep it generic so it can be extended later. Sure, but no more extensible than it has to be. Prematurely-added extension points tend to cause trouble later.