Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934488AbbGVPMg (ORCPT ); Wed, 22 Jul 2015 11:12:36 -0400 Received: from www62.your-server.de ([213.133.104.62]:56226 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932693AbbGVPMf (ORCPT ); Wed, 22 Jul 2015 11:12:35 -0400 Message-ID: <55AFB2E0.5060307@iogearbox.net> Date: Wed, 22 Jul 2015 17:12:32 +0200 From: Daniel Borkmann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: "Michael Kerrisk (man-pages)" CC: Alexei Starovoitov , Silvan Jegen , linux-man@vger.kernel.org, linux-kernel@vger.kernel.org, Walter Harms Subject: Re: Edited draft of bpf(2) man page for review/enhancement References: <556583B4.4000607@gmail.com> <55658DB4.6000106@iogearbox.net> <55AFAD7A.6090009@gmail.com> In-Reply-To: <55AFAD7A.6090009@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7075 Lines: 180 On 07/22/2015 04:49 PM, Michael Kerrisk (man-pages) wrote: > Hi Daniel, > > Sorry for the long delay in following up.... No worries, eBPF is quite some material. ;) > On 05/27/2015 11:26 AM, Daniel Borkmann wrote: >> On 05/27/2015 10:43 AM, Michael Kerrisk (man-pages) wrote: >>> Hello Alexei, >>> >>> I took the draft 3 of the bpf(2) man page that you sent back in March >>> and did some substantial editing to clarify the language and add a >>> few technical details. Could you please check the revised version >>> below, to ensure I did not inject any errors. >>> >>> I also added a number of FIXMEs for pieces of the page that need >>> further work. Could you take a look at these and let me know your >>> thoughts, please. >> >> That's great, thanks! Minor comments: >> >> ... >>> .TH BPF 2 2015-03-10 "Linux" "Linux Programmer's Manual" >>> .SH NAME >>> bpf - perform a command on an extended BPF map or program >>> .SH SYNOPSIS >>> .nf >>> .B #include >>> .sp >>> .BI "int bpf(int cmd, union bpf_attr *attr, unsigned int size); >>> >>> .SH DESCRIPTION >>> The >>> .BR bpf () >>> system call performs a range of operations related to extended >>> Berkeley Packet Filters. >>> Extended BPF (or eBPF) is similar to >>> the original BPF (or classic BPF) used to filter network packets. >>> For both BPF and eBPF programs, >>> the kernel statically analyzes the programs before loading them, >>> in order to ensure that they cannot harm the running system. >>> .P >>> eBPF extends classic BPF in multiple ways including the ability to call >>> in-kernel helper functions (via the >>> .B BPF_CALL >>> opcode extension provided by eBPF) >>> and access shared data structures such as BPF maps. >> >> I would perhaps emphasize that maps can be shared among in-kernel >> eBPF programs, but also between kernel and user space. > > This is covered later in the page, under the "BPF maps" subheading. > Maybe you missed that? (Or did you think it doesn't suffice?) Okay, I presume you mean: Maps are a generic data structure for storage of different types and sharing data between the kernel and user-space programs. Maybe, to emphasize both options a bit (not sure if it's better in my words, though): Maps are a generic data structure for storage of different types and allow for sharing data among eBPF kernel programs, but also between kernel and user-space applications. >>> The programs can be written in a restricted C that is compiled into >>> .\" FIXME In the next line, what is "a restricted C"? Where does >>> .\" one get further information about it? >> >> So far only from the kernel samples directory and for tc classifier >> and action, from the tc man page and/or examples/bpf/ in the tc git >> tree. > > So, given that we are several weeks down the track, and things may have > changed, I'll re-ask the questions ;-) : > > * Is this restricted C documented anywhere? Not (yet) that I'm aware of. We were thinking that short-mid term to polish the stuff that resides in the kernel documentation, that is, Documentation/networking/filter.txt, to get it in a better shape, which I presume, would also include a documentation on the restricted C. So far, examples are provided in the tc-bpf man page (see link below). The set of available helper functions callable from eBPF resides under (enum bpf_func_id): https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/bpf.h > * Is the procedure for compiling this restricted C documented anywhere? > (Yes, it's LLVM, but are the suitable pipelines/options documented > somewhere?) > >>> eBPF bytecode and executed on the in-kernel virtual machine or >>> just-in-time compiled into native code. >>> .SS Extended BPF Design/Architecture >>> .P >>> .\" FIXME In the following line, what does "different data types" mean? >>> .\" Are the values in a map not just blobs? >> >> Sort of, currently, these blobs can have different sizes of keys >> and values (you can even have structs as keys). For the map itself >> they are treated as blob internally. However, recently, bpf tail call >> got added where you can lookup another program from an array map and >> call into it. Here, that particular type of map can only have entries >> of type of eBPF program fd. I think, if needed, adding a paragraph to >> the tail call could be done as follow-up after we have an initial man >> page in the tree included. > > Okay -- I've added a FIXME placeholder for this, so we can revisit. Okay. >>> BPF maps are a generic data structure for storage of different data types. >>> A user process can create multiple maps (with key/value-pairs being >>> opaque bytes of data) and access them via file descriptors. >>> BPF programs can access maps from inside the kernel in parallel. >>> It's up to the user process and BPF program to decide what they store >>> inside maps. >>> .P >>> BPF programs are similar to kernel modules. >>> They are loaded by the user >>> process and automatically unloaded when the process exits. >> >> Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for >> eBPF classifier and actions, and here it's slightly different: in tc, >> we load the programs, maps etc, and push down the eBPF program fd in >> order to let the kernel hold reference on the program itself. >> >> Thus, there, the program fd that the application owns is gone when the >> application terminates, but the eBPF program itself still lives on >> inside the kernel. But perhaps it's already too much detail to mention >> here ... > > Well, it should be documented somewhere.... Yep, fwiw some time ago I've hacked together a man page for tc: https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=cbdd1e6921d21815e35d2a96526cfbad5ac98e09 >>> Each BPF program is a set of instructions that is safe to run until >>> its completion. >>> The in-kernel BPF verifier statically determines that the program >>> terminates and is safe to execute. >>> .\" FIXME In the following sentence, what does "takes hold" mean? >> >> Takes a reference. Meaning, that maps cannot disappear under us while >> the eBPF program that is using them in the kernel is still alive. > > So, I changed this to: > > [[ > During verification, the kernel increments reference counts for each of > the maps that the eBPF program uses, > so that the selected maps cannot be removed until the program is unloaded. > ]] > > Okay? Okay. [...] > I'll send out a new draft soon, but in the meantime hopefully you > or Alexei might have a chance to answer some open questions (see my > other mail to Alexei, which will be sent soon), so I can further edit > the page before sending it out. Later on, we should also add a paragraph on eBPF tail calls, but one step at a time. Thanks again, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/