Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2815933pxb; Mon, 19 Apr 2021 14:58:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6OlG/aSZaHY0xR2UW/G3UK0py506HNuZ3bPYupA1/7nL+iWprwmeNAdIID1UXqmXzFVqg X-Received: by 2002:a63:78cc:: with SMTP id t195mr13442162pgc.196.1618869494684; Mon, 19 Apr 2021 14:58:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618869494; cv=none; d=google.com; s=arc-20160816; b=lHzKQLxiXMcpzKWipaLyDO70xyTRjPJVeFl4q/+z9JDJqdG7Qp5LDL/fjgOth78Z96 2ZaXyl2WtZw8CDjilUQc3FbqeMNZzMAuS/azYAeUJ/Wx9DXkACE8HekiXhlzkB0HmYrg b9I1Ez9xyGIW1YxrqETAbr2syDThh6yt/UMel4Dd9LHeWyAfX3pgQ7xGTCOzzZzEDIeH ozcqjkEZjS01KjmWne5ok5YWQ6WYzIRNGtwAxzCnHkuklySC9Kj85NRBxP4SQTmXCGaP 3kGF2wFTrTYUZYoGzlI1nmvRiKc6eNhp/beNrA34TgeLeYOcmgVZmS8JALhIXJJuDHMh IJpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=i+DoXH2KISD9POpiabtw0qS13vF03/PCePH8rx3QvS4=; b=ZtnYSKKBMRiHkvPg3pwEA6yd+j7hEJGsVktDN3E+4CZJp+HanEu5nLecoVHOoF7Igm yJPgS7wf8YWi454HFiuElKyyKgzOnn2MUJo9qwiFN2MGI7r8xee3fCaiXIXqmHpSI5uK sXPp5HnsxGohIfSYetiGtIZfQNZWQo+k7v2zn00emqvS01udZzrLI4kvjL90yvwvzsk9 MIMef3a6gqN0PFcDrfArOUDNpojlNjo+4oGAHE6v0mAWQZBKLfFQ9gy27jH2muSwPC8t AQNPrHNqBuTVX50WfPE8UhrT/vrszvgR4b88Hwy44/ZorFvOR4QrralgWOEBS+r4CU7q 2nTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l9si834297pjg.35.2021.04.19.14.58.02; Mon, 19 Apr 2021 14:58:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232239AbhDSV55 (ORCPT + 99 others); Mon, 19 Apr 2021 17:57:57 -0400 Received: from www62.your-server.de ([213.133.104.62]:51578 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbhDSV54 (ORCPT ); Mon, 19 Apr 2021 17:57:56 -0400 Received: from sslproxy02.your-server.de ([78.47.166.47]) by www62.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1lYbtU-000DGG-BC; Mon, 19 Apr 2021 23:57:24 +0200 Received: from [85.7.101.30] (helo=linux.home) by sslproxy02.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lYbtU-000LSA-1C; Mon, 19 Apr 2021 23:57:24 +0200 Subject: Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API To: =?UTF-8?Q?Toke_H=c3=b8iland-J=c3=b8rgensen?= , Kumar Kartikeya Dwivedi , bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , Jesper Dangaard Brouer , linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: <20210419121811.117400-1-memxor@gmail.com> <20210419121811.117400-4-memxor@gmail.com> <6e8b744c-e012-c76b-b55f-7ddc8b7483db@iogearbox.net> <874kg2gdcf.fsf@toke.dk> From: Daniel Borkmann Message-ID: Date: Mon, 19 Apr 2021 23:57:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <874kg2gdcf.fsf@toke.dk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.4/26145/Mon Apr 19 13:07:08 2021) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/19/21 11:43 PM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann writes: >> On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote: >>> This adds functions that wrap the netlink API used for adding, >>> manipulating, and removing traffic control filters. These functions >>> operate directly on the loaded prog's fd, and return a handle to the >>> filter using an out parameter named id. >>> >>> The basic featureset is covered to allow for attaching, manipulation of >>> properties, and removal of filters. Some additional features like >>> TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can >>> added on top later by extending the bpf_tc_cls_opts struct. >>> >>> Support for binding actions directly to a classifier by passing them in >>> during filter creation has also been omitted for now. These actions have >>> an auto clean up property because their lifetime is bound to the filter >>> they are attached to. This can be added later, but was omitted for now >>> as direct action mode is a better alternative to it, which is enabled by >>> default. >>> >>> An API summary: >>> >>> bpf_tc_act_{attach, change, replace} may be used to attach, change, and >> >> typo on bpf_tc_act_{...} ? >> ^^^ >>> replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in >>> which case it is subsitituted as ETH_P_ALL by default. >> >> Do you have an actual user that needs anything other than ETH_P_ALL? Why is it >> even needed? Why not stick to just ETH_P_ALL? >> >>> The behavior of the three functions is as follows: >>> >>> attach = create filter if it does not exist, fail otherwise >>> change = change properties of the classifier of existing filter >>> replace = create filter, and replace any existing filter >> >> This touches on tc oddities quite a bit. Why do we need to expose them? Can't we >> simplify/abstract this e.g. i) create or update instance, ii) delete instance, >> iii) get instance ? What concrete use case do you have that you need those three >> above? >> >>> bpf_tc_cls_detach may be used to detach existing SCHED_CLS >>> filter. The bpf_tc_cls_attach_id object filled in during attach, >>> change, or replace must be passed in to the detach functions for them to >>> remove the filter and its attached classififer correctly. >>> >>> bpf_tc_cls_get_info is a helper that can be used to obtain attributes >>> for the filter and classififer. The opts structure may be used to >>> choose the granularity of search, such that info for a specific filter >>> corresponding to the same loaded bpf program can be obtained. By >>> default, the first match is returned to the user. >>> >>> Examples: >>> >>> struct bpf_tc_cls_attach_id id = {}; >>> struct bpf_object *obj; >>> struct bpf_program *p; >>> int fd, r; >>> >>> obj = bpf_object_open("foo.o"); >>> if (IS_ERR_OR_NULL(obj)) >>> return PTR_ERR(obj); >>> >>> p = bpf_object__find_program_by_title(obj, "classifier"); >>> if (IS_ERR_OR_NULL(p)) >>> return PTR_ERR(p); >>> >>> if (bpf_object__load(obj) < 0) >>> return -1; >>> >>> fd = bpf_program__fd(p); >>> >>> r = bpf_tc_cls_attach(fd, if_nametoindex("lo"), >>> BPF_TC_CLSACT_INGRESS, >>> NULL, &id); >>> if (r < 0) >>> return r; >>> >>> ... which is roughly equivalent to (after clsact qdisc setup): >>> # tc filter add dev lo ingress bpf obj foo.o sec classifier da >>> >>> ... as direct action mode is always enabled. >>> >>> If a user wishes to modify existing options on an attached classifier, >>> bpf_tc_cls_change API may be used. >>> >>> Only parameters class_id can be modified, the rest are filled in to >>> identify the correct filter. protocol can be left out if it was not >>> chosen explicitly (defaulting to ETH_P_ALL). >>> >>> Example: >>> >>> /* Optional parameters necessary to select the right filter */ >>> DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, >>> .handle = id.handle, >>> .priority = id.priority, >>> .chain_index = id.chain_index) >> >> Why do we need chain_index as part of the basic API? >> >>> opts.class_id = TC_H_MAKE(1UL << 16, 12); >>> r = bpf_tc_cls_change(fd, if_nametoindex("lo"), >>> BPF_TC_CLSACT_INGRESS, >>> &opts, &id); >> >> Also, I'm not sure whether the prefix should even be named bpf_tc_cls_*() tbh, >> yes, despite being "low level" api. I think in the context of bpf we should stop >> regarding this as 'classifier' and 'action' objects since it's really just a >> single entity and not separate ones. It's weird enough to explain this concept >> to new users and if a libbpf based api could cleanly abstract it, I would be all >> for it. I don't think we need to map 1:1 the old tc legacy even in the low level >> api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay, >> but I would remove the others. > > Hmm, I'm OK with dropping the TC oddities (including the cls_ in the > name), but I think we should be documenting it so that users that do > come from TC will not be completely lost :) Yeah, that sounds good to me. :) All I'm trying to say is that /we/ are used to the terminology and quirks that come with it, but I'm hoping that such API will be used by *new* folks who have zero context on the underlying details, and they also really shouldn't have to care. Even if on the lower level we expose handle/priority at least, the API should be designed with a mindset of no prior tc experience required. Simple and easy to use. I think the handle/priority concept can be explained/documented fairly straight forward. There is a chance to hide all the nasty historic/legacy details in something clean, easy to use and scaleable (implied by the da mode), we should use this opportunity. ;) Thanks, Daniel