Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1732805pxy; Thu, 6 May 2021 14:40:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxkhQgcSYbUfQ61bnwpzeM5J5w0kG4pO34EhrGs/AkRP3wrPYnTyhc+bHySKEl4bAHWILNT X-Received: by 2002:a17:906:17cc:: with SMTP id u12mr6558533eje.170.1620337247191; Thu, 06 May 2021 14:40:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620337247; cv=none; d=google.com; s=arc-20160816; b=tTLs4SCnUeWyHmlEPE72dWuH8JVyStwAQB5TYo8PeiWRSeS6LMEDDUYKCMEIITYEn0 MlQduJPgjNbRmxm0BANmkjqmcoTnPneE3jLIaFSRnVB5De0d//dcNaBmvmUKX2txUuwW 1K+MzlulnPxQK45mvy9FXl5U5zdWBa2uDKMaBxiHfVUVqcx6lDBGo1ytGm6Zm8TrtK8E j6uRjQZiJZSpvBg4FhBajMg3CM/57bbXQYZVBEnrSU2rdJpgP15TZ4dTmyHj62c6VPj0 up86RQ37Ja1vJ820d1upQqbh3Kbnp6ikQuSdtByHv31qYUjIsPN0rwfb+/TvtkCw6FXX /TxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=2EsUUfAz80EOczaQwB5tXjqv9Utw5BUHCVr2GB3M3vE=; b=GFKMOxQoeB6qT3DanVJgetvu4N/inQ+BI26XZ48C1xtyK7vqLMG0xjRcbW1HYwZX3m PIIpX4MjLPysODYpPMMST00ySou92glHErUuX0syYrE3JKrXZfmA+Fkw6a6CNbMk/wmN r+jDO6q7AnRwiSlYJfOAC7psuXap1w9cFjDp6Xc3nnF6kbsdcPs79KVptfFup3cUHy+l KPABE6GZUXRoTpYUcLrEhJrYgIwoeuBSzm9qaEuJrImgeIHiiecI1z3bnCH8qihtFf8d H+MORrqvZmQsfgsk5PEOOdKpe4JpiRhbqJTCDfi6V8aHFjUEbjjKaLCKwSZUxo+2geuG cH8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o18si3246240edc.137.2021.05.06.14.40.05; Thu, 06 May 2021 14:40:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbhEFVjk (ORCPT + 99 others); Thu, 6 May 2021 17:39:40 -0400 Received: from www62.your-server.de ([213.133.104.62]:39980 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbhEFVjj (ORCPT ); Thu, 6 May 2021 17:39:39 -0400 Received: from sslproxy05.your-server.de ([78.46.172.2]) by www62.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1lelhf-0009JR-Fh; Thu, 06 May 2021 23:38:39 +0200 Received: from [85.7.101.30] (helo=linux.home) by sslproxy05.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lelhf-000GZv-7s; Thu, 06 May 2021 23:38:39 +0200 Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare To: Florent Revest , Andrii Nakryiko Cc: bpf , Alexei Starovoitov , Andrii Nakryiko , KP Singh , Brendan Jackman , Stanislav Fomichev , open list , syzbot References: <20210505162307.2545061-1-revest@chromium.org> From: Daniel Borkmann Message-ID: <875174b0-c0f1-8a41-ef00-3f0fe0396288@iogearbox.net> Date: Thu, 6 May 2021 23:38:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.103.2/26162/Thu May 6 13:11:07 2021) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/6/21 10:17 PM, Florent Revest wrote: > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko > wrote: >> On Wed, May 5, 2021 at 3:29 PM Florent Revest wrote: >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko >>> wrote: >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko >>>> wrote: >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann wrote: >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote: >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest wrote: >>>>>>>> >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it >>>>>>>> by the end of their scope with bpf_bprintf_cleanup. >>>>>>>> >>>>>>>> If one of these helpers gets called within the scope of one of these >>>>>>>> helpers, for example: a first bpf program gets called, uses >>>>>>> >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512 >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the >>>>>>> only situation where this can occur, right? If someone is doing >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the >>>>>>> second BPF program, etc. We can't eliminate the probability, but >>>>>>> having a small stack of buffers would make the probability so >>>>>>> miniscule as to not worry about it at all. >>>>>>> >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so >>>>>>> the changes are minimal. Nestedness property is preserved for >>>>>>> non-sleepable BPF programs, right? If we want this to work for >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of >>>>> >>>>> oh wait, we already disable migration for sleepable BPF progs, so it >>>>> should be good to do nestedness level only >>>> >>>> actually, migrate_disable() might not be enough. Unless it is >>>> impossible for some reason I miss, worst case it could be that two >>>> sleepable programs (A and B) can be intermixed on the same CPU: A >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues >>>> and nestedness doesn't work anymore. So something like "reserving a >>>> slot" would work better. >>> >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? >>> >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be >>>>>>> acceptable performance-wise, as it's not the fastest code anyway >>>>>>> (printf'ing in general). >>>>>>> >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is >>>>>>> suboptimal, so seems worth fixing this. >>>>>>> >>>>>>> Thoughts? >>>>>> >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same >>>>>> thought on why not allowing nesting here given users very likely expect >>>>>> these helpers to just work for all the contexts. >>>>>> >>>>>> Thanks, >>>>>> Daniel >>> >>> What would you think of just letting the helpers own these 512 bytes >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf >>> would only need to write there, there would be no acquire semantic >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on >>> the helper return so there would be no bpf_printf_cleanup either. We >>> would also not pre-reserve static memory for all CPUs and it becomes >>> trivial to handle re-entrant helper calls. >>> >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf >>> code but I've not been convinced of its necessity. >> >> I got the impression that extra 512 bytes on the kernel stack is quite >> a lot and that's why we have per-cpu buffers. Especially that >> bpf_trace_printk() can be called from any context, including NMI. > > Ok, I understand. > > What about having one buffer per helper, synchronized with a spinlock? > Actually, bpf_trace_printk already has that, not for the bprintf > arguments but for the bprintf output so this wouldn't change much to > the performance of the helpers anyway: > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385 > > These helpers are not performance sensitive so a per-cpu stack of > buffers feels over-engineered to me (and is also complexity I feel a > bit uncomfortable with). But wouldn't this have same potential of causing a deadlock? Simple example would be if you have a tracing prog attached to bstr_printf(), and one of the other helpers using the same lock called from a non-tracing prog. If it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii mentioned earlier. We've had few prior examples with similar issues [0]. [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99