Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp274895pxj; Fri, 7 May 2021 08:21:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwujIFn6WnxiZkjHlzfKJNcfqchPR4iCMGIq88tCZvM/yeOLN/ysiu+P4E26dXc+LpkgEEL X-Received: by 2002:a17:902:eb14:b029:ed:6fc3:a42c with SMTP id l20-20020a170902eb14b02900ed6fc3a42cmr10254153plb.26.1620400881717; Fri, 07 May 2021 08:21:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620400881; cv=none; d=google.com; s=arc-20160816; b=XlPugNtLq3FSbMjQAG5gSwwftf/d+b5Ssb30LCqLjXlGnTe4HvDrLtSmB0AQbei4tE +Z2ndVQjbZZU0KR2p2EyuB6ZKjVmZQNta05s5zcNA0sIGUPWJ402PCW9Gkkm7sndFQwK 8HtVvoMeWCzBgWnyTCJ9ljOiBdfgnzND1AY5jzTxxqDWtGMQiAXa5zbBcWkfsXUqWHGU 7FrULtQL4SxhfMuFPuRs/BDApYFQS1tqkXPG24fAijE7/fz8STpwPd4gI+LtndSVuL+V s5lfieq5qMRiWRUkhzKAviogHn2NQJiZHwjmo4Bj/Unom7QYZ9QAINno4Ej5vobfILp8 pEvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+820bBEEqqhQYCq8XGwExxSwhxuLHxIq3wsvTuPIL5o=; b=h5oh25FiSauSfu7z+jtiHC6W6HY4xurDwPzsyM+chJdUtpPQII04wccRuzld5dvCcw 7zUtxep91zO8sSItZbBkHA45ayjIEo1X/vK29yLnGnJ2tx9B4r3A/hbc9gPmsYr1W6KR Gmj6Ol11d5K2Shx1HiTlNzqTwsRBRCdOLpyBpuZkTirqFw/8W5tzp35haTcVNoATbvES 3kbHEhWeVW8Y3GTDyK1IQGuu6PlrWhNrdRkd2M2eYS7K3e+2BYhckyIdApt5UUGuvSPl wUBXSS869DDfbJml3FRq5r46m4suSt7OvcUGIayPbAXXNaCl7huS3cYH9JOZu1APFSz4 YdsQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="ZPB8NlE/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 2si1935557pjw.98.2021.05.07.08.21.07; Fri, 07 May 2021 08:21:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="ZPB8NlE/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229558AbhEGKku (ORCPT + 99 others); Fri, 7 May 2021 06:40:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232268AbhEGKks (ORCPT ); Fri, 7 May 2021 06:40:48 -0400 Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C08B3C061763 for ; Fri, 7 May 2021 03:39:24 -0700 (PDT) Received: by mail-io1-xd2c.google.com with SMTP id k25so7551341iob.6 for ; Fri, 07 May 2021 03:39:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+820bBEEqqhQYCq8XGwExxSwhxuLHxIq3wsvTuPIL5o=; b=ZPB8NlE/FMef4Xaw1s3C6uH4LFcb50rHrXtJpOqJRc/8pFPgmHj9MiQ7/VeuQZCIu6 gkF47/dS8JKUlJxqIfk6nAPXcV486GBACGeB+qVE98Vo3Zysz7a4sOMcv7tKS+tO8au+ G1t/XuRGEAVqdj/IwuAUWyDHUn0nFWrRaXcYA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+820bBEEqqhQYCq8XGwExxSwhxuLHxIq3wsvTuPIL5o=; b=bY9FOUqCzOsJS+NrawFlFhI8ER4ceMdGtNoPCCVtKzSpT7cSwHx2lRpGTCKy79UAvC 2YN7nMTpMKY3mMQoIgS8Ug6I1rKXwhhLvCYl/GisCwrxjf2MVu/3QfTSCyPD8PRolxj0 GNUGtrMC80A+dJEJ5nwphG/T6phqy3ZFwgim3F8COLRauy2NqfsdQLy/pMqqiFZSsoFR 0VEdTBYAhf+yKjZ83dxUdql+k1xQ6gb+oXtIJ403JvyW2x/oNUs45rCoDTckNVzepHvY ZpSF5QsHCU2cOqQUePuspTiw4pRL0XTauR1FlrzcVU7QqQ3iEodO39fPf31Cd9N/aoGx 0BAQ== X-Gm-Message-State: AOAM533u7j55wzSHYVvou5Po/2Jry2m0vdyFElJH0o1oMSDwz8v0J5rN ezM9DmpMIQWxncETwRkUbBrWPNgTz/hxCFMQN5uPyw== X-Received: by 2002:a05:6638:f0e:: with SMTP id h14mr8347035jas.32.1620383963958; Fri, 07 May 2021 03:39:23 -0700 (PDT) MIME-Version: 1.0 References: <20210505162307.2545061-1-revest@chromium.org> <875174b0-c0f1-8a41-ef00-3f0fe0396288@iogearbox.net> In-Reply-To: <875174b0-c0f1-8a41-ef00-3f0fe0396288@iogearbox.net> From: Florent Revest Date: Fri, 7 May 2021 12:39:13 +0200 Message-ID: Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare To: Daniel Borkmann Cc: Andrii Nakryiko , bpf , Alexei Starovoitov , Andrii Nakryiko , KP Singh , Brendan Jackman , Stanislav Fomichev , open list , syzbot Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 6, 2021 at 11:38 PM Daniel Borkmann wrote: > > On 5/6/21 10:17 PM, Florent Revest wrote: > > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko > > wrote: > >> On Wed, May 5, 2021 at 3:29 PM Florent Revest wrote: > >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko > >>> wrote: > >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko > >>>> wrote: > >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann wrote: > >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest wrote: > >>>>>>>> > >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to > >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > >>>>>>>> by the end of their scope with bpf_bprintf_cleanup. > >>>>>>>> > >>>>>>>> If one of these helpers gets called within the scope of one of these > >>>>>>>> helpers, for example: a first bpf program gets called, uses > >>>>>>> > >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512 > >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > >>>>>>> only situation where this can occur, right? If someone is doing > >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it > >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > >>>>>>> second BPF program, etc. We can't eliminate the probability, but > >>>>>>> having a small stack of buffers would make the probability so > >>>>>>> miniscule as to not worry about it at all. > >>>>>>> > >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > >>>>>>> the changes are minimal. Nestedness property is preserved for > >>>>>>> non-sleepable BPF programs, right? If we want this to work for > >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of > >>>>> > >>>>> oh wait, we already disable migration for sleepable BPF progs, so it > >>>>> should be good to do nestedness level only > >>>> > >>>> actually, migrate_disable() might not be enough. Unless it is > >>>> impossible for some reason I miss, worst case it could be that two > >>>> sleepable programs (A and B) can be intermixed on the same CPU: A > >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues > >>>> and nestedness doesn't work anymore. So something like "reserving a > >>>> slot" would work better. > >>> > >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? > >>> > >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be > >>>>>>> acceptable performance-wise, as it's not the fastest code anyway > >>>>>>> (printf'ing in general). > >>>>>>> > >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work > >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > >>>>>>> suboptimal, so seems worth fixing this. > >>>>>>> > >>>>>>> Thoughts? > >>>>>> > >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same > >>>>>> thought on why not allowing nesting here given users very likely expect > >>>>>> these helpers to just work for all the contexts. > >>>>>> > >>>>>> Thanks, > >>>>>> Daniel > >>> > >>> What would you think of just letting the helpers own these 512 bytes > >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf > >>> would only need to write there, there would be no acquire semantic > >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on > >>> the helper return so there would be no bpf_printf_cleanup either. We > >>> would also not pre-reserve static memory for all CPUs and it becomes > >>> trivial to handle re-entrant helper calls. > >>> > >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf > >>> code but I've not been convinced of its necessity. > >> > >> I got the impression that extra 512 bytes on the kernel stack is quite > >> a lot and that's why we have per-cpu buffers. Especially that > >> bpf_trace_printk() can be called from any context, including NMI. > > > > Ok, I understand. > > > > What about having one buffer per helper, synchronized with a spinlock? > > Actually, bpf_trace_printk already has that, not for the bprintf > > arguments but for the bprintf output so this wouldn't change much to > > the performance of the helpers anyway: > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385 > > > > These helpers are not performance sensitive so a per-cpu stack of > > buffers feels over-engineered to me (and is also complexity I feel a > > bit uncomfortable with). > > But wouldn't this have same potential of causing a deadlock? Simple example > would be if you have a tracing prog attached to bstr_printf(), and one of > the other helpers using the same lock called from a non-tracing prog. If Ah, right, I see :/ > it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii > mentioned earlier. We've had few prior examples with similar issues [0]. > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99 Ok it's not as bad as I imagined, thank you Daniel :) I'll look into it beginning of next week.