Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7366165rwr; Tue, 25 Apr 2023 11:51:26 -0700 (PDT) X-Google-Smtp-Source: AKy350bt3b1s8LWfkzhZfdChhBIOQh4G18oAdPrQAuOjQGvmA0kYJQWjtB2ex5nycONpHt/gfzmG X-Received: by 2002:a05:6a21:33a7:b0:f2:1577:2ea0 with SMTP id yy39-20020a056a2133a700b000f215772ea0mr21126166pzb.44.1682448685773; Tue, 25 Apr 2023 11:51:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682448685; cv=none; d=google.com; s=arc-20160816; b=FHKv/GKzKfvS6jXW5K4QJazEB4ugl2GyJN/XzPbiOZLrKgf2iDnW1bM13mVnIkZjjo N8ZhnZ6AnfyAho+6usVh6AQuLiyabpc3+oS0MZMpOjxhyLdnS4M5/TdD7xAcNdNBV0xk 4+1whT2S1fmLZX70jYeH3sOrSJDNC9nGCumucSF1swqgSyEzcvzCNk4U+rY+yiASllXX tcleMfcIdnmRNVoRUuVr4boKRneMmmvWC07Rf3diauo2neioOHjxyZmpuNaSW2AR2jBn rs2BCc6y3XLuoVUcA23UFYJ5R+jQXqV+17CbhR0mJEVu9RhA1Xs5pbAVcx0/NFUr6h2R vm/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=9r/LuIGjx8rrrWo+CZqGbK9n/EXyber61jogDduPh4o=; b=OxE6wd3kKY2qaTlfi31cGru6LV4PJB05ugqVyEQ8Msfcc4+R6JDod3KgCixgjVJxfX +IQXN/fJWjd19vcWbsK0R94KPHMuF0IRNVDDbkVhdecUL0HY4G5lm7drV195WCcsnyD8 GYN4wkU+v13Pe51j6/uE4ZKj3LiC/lE3cYEvKVBuX7Wi+JXd3ysJg7vE+rRpoC9Cebmz jfaoSvxewGMRgClRrT2cMutsmZWiM9OQEUb3iwvp2gglqyUWlmI6rO7Qom/FWUn9bErO yy5YV+wfsEoGAn383SH6+54ApCncRfDcTfYlQEGy1fTbpfxFuUKgdi2NZc1Zhz1zj2dj UnoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=vtV3wBJH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j13-20020a63cf0d000000b0051b77e5ed1esi14342290pgg.548.2023.04.25.11.51.12; Tue, 25 Apr 2023 11:51:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=vtV3wBJH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234524AbjDYSnN (ORCPT + 99 others); Tue, 25 Apr 2023 14:43:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234204AbjDYSnL (ORCPT ); Tue, 25 Apr 2023 14:43:11 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 979ED146CC for ; Tue, 25 Apr 2023 11:43:10 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id 98e67ed59e1d1-24756a12ba0so4275898a91.1 for ; Tue, 25 Apr 2023 11:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682448190; x=1685040190; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9r/LuIGjx8rrrWo+CZqGbK9n/EXyber61jogDduPh4o=; b=vtV3wBJHtnKPGsYJrJ8ojlk7oCjHxAwyeJoQso+XJ1HTkxl9SS8nngqu1iHr1vfdP2 R3hRjAdY2OQfQiKy374/JzF7Y0XSJ37LKrieQ9Y6Q7SBCTdIakL2XCmgkZm7qmx1OcGG fY4LGcT/oF+mD8dF+Hi1CLdE5RunYO4WaESN791EbrObQovyaFkdTFYCQ5rDJLX3Vpxb 3Xmlhbc+hAyhEkRu9tTUv+tvqMQjYUJQRqVMTCvJ6qcSzozx46V5+mTJpG3EeNgGp74x mj6FjWzyDNfbb+39wnYn/4Tj+FQIwqm8cPDSqVGWDqkmnfJU6mxjZn4aBhOity0+Y0BE CZ0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682448190; x=1685040190; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9r/LuIGjx8rrrWo+CZqGbK9n/EXyber61jogDduPh4o=; b=X93/b+vxmAZrP0eweVg1SA2Qq8ExMqAee2p3mHdaTo9zViGr8uqtME29x4zyVkvo3V bVJnyctMhIMbKnZOiewaTue7NniTPRFqKAYy7TfJCEQezQf+zbGvO9/yEBD5lMqhCzgX xuVgOZjUcL38qd8vZl4wb20hBQbS62khuKT/GArbNVitUmDASnfaliNNA9guBGGh0NWZ InFcCD9h4L4ax0S7dnOe6RYyg/M6RejmtkpWGfereA8HcD5ojwy2vVrGk8WmQ2Vq2jiX DPRcIF7Uq4Bx0NDN8mmbDUddjjoHt1Kd8b0T+Js7K4qF4f2HzOjy3XsU+SLJlS0sVQgp oHfQ== X-Gm-Message-State: AAQBX9ekAAoz95SSQT1TNoQ3iEHpM47ZHIGg3137WkJMXFLVtWl1ixDy wtOmwxrHBXbsAN2Zt3hQkSTiMPdshk/MNtvDe+aJ7A== X-Received: by 2002:a17:90b:1642:b0:247:6c78:6c3f with SMTP id il2-20020a17090b164200b002476c786c3fmr19434427pjb.29.1682448189925; Tue, 25 Apr 2023 11:43:09 -0700 (PDT) MIME-Version: 1.0 References: <20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com> <20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com> In-Reply-To: From: Stanislav Fomichev Date: Tue, 25 Apr 2023 11:42:58 -0700 Message-ID: Subject: Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) To: Kui-Feng Lee Cc: Martin KaFai Lau , Eric Dumazet , davem@davemloft.net, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, daniel@iogearbox.net, Jakub Kicinski , Paolo Abeni , Leon Romanovsky , David Ahern , Arnd Bergmann , Kees Cook , Christian Brauner , Kuniyuki Iwashima , Lennart Poettering , linux-arch@vger.kernel.org, Aleksandr Mikhalitsyn , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 25, 2023 at 10:59=E2=80=AFAM Kui-Feng Lee = wrote: > > > > On 4/18/23 09:47, Stanislav Fomichev wrote: > > On 04/17, Martin KaFai Lau wrote: > >> On 4/14/23 6:55 PM, Stanislav Fomichev wrote: > >>> On 04/13, Stanislav Fomichev wrote: > >>>> On Thu, Apr 13, 2023 at 7:38=E2=80=AFAM Aleksandr Mikhalitsyn > >>>> wrote: > >>>>> > >>>>> On Thu, Apr 13, 2023 at 4:22=E2=80=AFPM Eric Dumazet wrote: > >>>>>> > >>>>>> On Thu, Apr 13, 2023 at 3:35=E2=80=AFPM Alexander Mikhalitsyn > >>>>>> wrote: > >>>>>>> > >>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christi= an), > >>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts w= hich > >>>>>>> install FDs into the process fdtable. > >>>>>>> > >>>>>>> After some offlist discussion it was proposed to add a blacklist = of > >>>>>> > >>>>>> We try to replace this word by either denylist or blocklist, even = in changelogs. > >>>>> > >>>>> Hi Eric, > >>>>> > >>>>> Oh, I'm sorry about that. :( Sure. > >>>>> > >>>>>> > >>>>>>> socket options those can cause troubles when BPF cgroup hook is e= nabled. > >>>>>>> > >>>>>> > >>>>>> Can we find the appropriate Fixes: tag to help stable teams ? > >>>>> > >>>>> Sure, I will add next time. > >>>>> > >>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hook= s") > >>>>> > >>>>> I think it's better to add Stanislav Fomichev to CC. > >>>> > >>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already > >>>> use it for tcp zerocopy, I'm assuming it should work in this case as > >>>> well? > >>> > >>> Jakub reminded me of the other things I wanted to ask here bug forgot= : > >>> > >>> - setsockopt is probably not needed, right? setsockopt hook triggers > >>> before the kernel and shouldn't leak anything > >>> - for getsockopt, instead of bypassing bpf completely, should we inst= ead > >>> ignore the error from the bpf program? that would still preserve > >>> the observability aspect > >> > >> stealing this thread to discuss the optlen issue which may make sense = to > >> bypass also. > >> > >> There has been issue with optlen. Other than this older post related t= o > >> optlen > PAGE_SIZE: > >> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux= .dev/, > >> the recent one related to optlen that we have seen is > >> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen =3D=3D 0 and = the kernel > >> put the expected optlen (> 0) and 'return 0;' to userspace. The usersp= ace > >> intention is to learn the expected optlen. This makes 'ctx.optlen > > >> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning > >> -EFAULT to the userspace even the bpf prog has not changed anything. > > > > (ignoring -EFAULT issue) this seems like it needs to be > > > > if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { > > /* error */ > > } > > > > ? > > > >> Does it make sense to also bypass the bpf prog when 'ctx.optlen > > >> max_optlen' for now (and this can use a separate patch which as usual > >> requires a bpf selftests)? > > > > Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or somethin= g > > seems like the way to go. It caused too much trouble already :-( > > > > Should I prepare a patch or do you want to take a stab at it? > > > >> In the future, does it make sense to have a specific cgroup-bpf-prog (= a > >> specific attach type?) that only uses bpf_dynptr kfunc to access the o= ptval > >> such that it can enforce read-only for some optname and potentially al= so > >> track if bpf-prog has written a new optval? The bpf-prog can only retu= rn 1 > >> (OK) and only allows using bpf_set_retval() instead. Likely there is s= till > >> holes but could be a seed of thought to continue polishing the idea. > > > > Ack, let's think about it. > > > > Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea > > as well? If we can have a sleepable hook that can copy_from_user/copy_t= o_user, > > and we have a mostly working bpf_getsockopt (after your refactoring), > > I don't see why we need to continue the current scheme of triggering > > after the kernel? > > Since a sleepable hook would cause some restrictions, perhaps, we could > introduce something like the promise pattern. In our case here, BPF > program call an async version of copy_from_user()/copy_to_user() to > return a promise. Having a promise might work. This is essentially what we already do with sockets/etc with acquire/release pattern. What are the sleepable restrictions you're hinting about? I feel like with the sleepable bpf, we can also remove all the temporary buffer management / extra copies which sounds like a win to me. (we have this ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can allocate temporary buffers if needed.. > >>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call t= hat > >>> gets called whenever bpf returns an error to make sure protocols = have > >>> a chance to handle that condition (and free the fd) > >>> > >> > >>