Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4785680ybi; Mon, 3 Jun 2019 17:45:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzxumG7EEMVcuEFtCGXlVJikMynY9fuHKYMEJA9HbqkFYeYPU5/XBTeFqeekEeX1XBSeyXI X-Received: by 2002:a65:60c2:: with SMTP id r2mr23751304pgv.156.1559609123029; Mon, 03 Jun 2019 17:45:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559609123; cv=none; d=google.com; s=arc-20160816; b=qUr8+B8KEYuvubjf1OQSalrKl6KC8PQ+2pa3XNGHSBzAb54SLOC3gfYef1INYlAzmf Jh2DhQZHUm0wi4R+JJRgM2fCsni6HcZXT9c+nUckdd5gdTTVUofOYpzzHV8N3gGNrCZR mlbtTJZRjJ5erNY/EP+bkbC99W/5Y488Aa+m3LqhuRz3NsK2afRZoXgWhmncfSa4xFTQ sS8tUJxNhc5wG6zQGkwPQ+BOaRd61NZ9QS02l4m3XrY2QaUd+okgWPkS0LBharz1D/6f S7wi7gg3mgWhvXiv+mvfzX56cfjN0H/G6bh+d3wYbzrfjJy5z2plnYZRYiI+Gv4cX/sI nf2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Ejtl8+6C3J8xwpSCupJg2K0A22RxB4H0puVmtWIFWGo=; b=nykAxkKykA1b2rxOX/i3FRJNVWUybvsvMlMQ4oWIffTQRMROZ+Ot96xkbH7CuTaEIi 09PPBGZfxd2zQWWdJ1DujDC9yXnEPwyq/tgZq0MTpfJh1ZKOyHv2grqGZK4oKPDNIGlx BpIms5x428ngQAAMfkwcNQW1Vg2s8eieGszNnViD2eee4wmfL+oWTPYZ/7r2LnySlSOE Y4EsRjHnc2lDYqdSYfT4E6raSO9FlSO+HqfNPvvycbom29Q0jDRhV7LUxskv1sFiRHE/ g8lG/jTS4eHfYij8B688D+entj5n8Qav4TNrz+9p8fSJ0UaWTU8sazY1ebKDWN3Av/2z /S+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ce1si22909862plb.338.2019.06.03.17.45.06; Mon, 03 Jun 2019 17:45:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726465AbfFDAoC (ORCPT + 99 others); Mon, 3 Jun 2019 20:44:02 -0400 Received: from www62.your-server.de ([213.133.104.62]:56690 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726102AbfFDAoC (ORCPT ); Mon, 3 Jun 2019 20:44:02 -0400 Received: from [78.46.172.3] (helo=sslproxy06.your-server.de) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1hXxYV-0005OB-Ic; Tue, 04 Jun 2019 02:43:59 +0200 Received: from [178.197.249.21] (helo=linux.home) by sslproxy06.your-server.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1hXxYV-000IJG-9e; Tue, 04 Jun 2019 02:43:59 +0200 Subject: Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd To: Alexei Starovoitov Cc: Matt Mullins , "netdev@vger.kernel.org" , Andrew Hall , "bpf@vger.kernel.org" , "ast@kernel.org" , "linux-kernel@vger.kernel.org" , Martin Lau , Yonghong Song , "rostedt@goodmis.org" , "mingo@redhat.com" , Song Liu References: <20190531223735.4998-1-mmullins@fb.com> <6c6a4d47-796a-20e2-eb12-503a00d1fa0b@iogearbox.net> <68841715-4d5b-6ad1-5241-4e7199dd63da@iogearbox.net> <05626702394f7b95273ab19fef30461677779333.camel@fb.com> <70b9a1b2-c960-b810-96f9-1fb5f4a4061b@iogearbox.net> From: Daniel Borkmann Message-ID: <71c96268-7779-6e34-3078-5532d9f8fa55@iogearbox.net> Date: Tue, 4 Jun 2019 02:43:58 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.100.3/25469/Mon Jun 3 09:59:22 2019) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/04/2019 01:54 AM, Alexei Starovoitov wrote: > On Mon, Jun 3, 2019 at 4:48 PM Daniel Borkmann wrote: >> On 06/04/2019 01:27 AM, Alexei Starovoitov wrote: >>> On Mon, Jun 3, 2019 at 3:59 PM Matt Mullins wrote: >>>> >>>> If these are invariably non-nested, I can easily keep bpf_misc_sd when >>>> I resubmit. There was no technical reason other than keeping the two >>>> codepaths as similar as possible. >>>> >>>> What resource gives you worry about doing this for the networking >>>> codepath? >>> >>> my preference would be to keep tracing and networking the same. >>> there is already minimal nesting in networking and probably we see >>> more when reuseport progs will start running from xdp and clsbpf >>> >>>>> Aside from that it's also really bad to miss events like this as exporting >>>>> through rb is critical. Why can't you have a per-CPU counter that selects a >>>>> sample data context based on nesting level in tracing? (I don't see a discussion >>>>> of this in your commit message.) >>>> >>>> This change would only drop messages if the same perf_event is >>>> attempted to be used recursively (i.e. the same CPU on the same >>>> PERF_EVENT_ARRAY map, as I haven't observed anything use index != >>>> BPF_F_CURRENT_CPU in testing). >>>> >>>> I'll try to accomplish the same with a percpu nesting level and >>>> allocating 2 or 3 perf_sample_data per cpu. I think that'll solve the >>>> same problem -- a local patch keeping track of the nesting level is how >>>> I got the above stack trace, too. >>> >>> I don't think counter approach works. The amount of nesting is unknown. >>> imo the approach taken in this patch is good. >>> I don't see any issue when event_outputs will be dropped for valid progs. >>> Only when user called the helper incorrectly without BPF_F_CURRENT_CPU. >>> But that's an error anyway. >> >> My main worry with this xchg() trick is that we'll miss to export crucial >> data with the EBUSY bailing out especially given nesting could increase in >> future as you state, so users might have a hard time debugging this kind of >> issue if they share the same perf event map among these programs, and no >> option to get to this data otherwise. Supporting nesting up to a certain >> level would still be better than a lost event which is also not reported >> through the usual way aka perf rb. > > I simply don't see this 'miss to export data' in all but contrived conditions. > Say two progs share the same perf event array. > One prog calls event_output and while rb logic is working > another prog needs to start executing and use the same event array Correct. > slot. Today it's only possible for tracing prog combined with networking, > but having two progs use the same event output array is pretty much > a user bug. Just like not passing BPF_F_CURRENT_CPU. I don't see the user bug part, why should that be a user bug? It's the same as if we would say that sharing a BPF hash map between networking programs attached to different hooks or networking and tracing would be a user bug which it is not. One concrete example would be cilium monitor where we currently expose skb trace and drop events a well as debug data through the same rb. This should be usable from any type that has perf_event_output helper enabled (e.g. XDP and tc/BPF) w/o requiring to walk yet another per cpu mmap rb from user space.