Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7526111rwr; Tue, 25 Apr 2023 14:38:59 -0700 (PDT) X-Google-Smtp-Source: AKy350bc+BDosH22Z8HyGPvMp9si6H70N7TzTVv47yLn8siOSWdyef6G9vD3yEoAQf5pJ5J3q1LM X-Received: by 2002:a17:902:bc47:b0:1a8:153a:99a4 with SMTP id t7-20020a170902bc4700b001a8153a99a4mr18216463plz.64.1682458739336; Tue, 25 Apr 2023 14:38:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682458739; cv=none; d=google.com; s=arc-20160816; b=M23i5mjdSK2w6ZAibtj8Ft9BFXzdbB1dLwedhq6d3g4uwUX9as6WAy4UG63L/yVp7V O+Db6lS/O9WmQHx1MxZkmYY6y1fzU2BPcwF1eAo9GQznAEK06DeTtaOxTVHidkJ3GRr4 PDYeX3mcV4rNtDd3mKpLPwHnKrOvyHJPVDdHpH2f7RNyM3GShHy2IxfThJeXPGBvj3iX Z7gJnBfiTGdvfcjJxyNtDRSFa2IpqWcUVnyzeaASuDUPj7k88xNQkyb2BZrCg9DklLB/ eJHGQSUgbqC2Yx3RvIYrJJIpwAK6JUOmLYkHkRT/h/ENXmq/VPv15ENJxoi2RE3iGH8U 9X/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=KN4OwG7Q8S0Ej2HKI/K8Bzq4Vn4sob6zcbuKi/+SBdk=; b=we0juil3hpF23zkrsAdM4FTK/AvPMyVy/P64syVCvrWuTSZmjeCZLeFjfw7K8nhOQL cMw6h+V1jDuK/NStj8uzOvmMLSFbw7OhyAhO6MW+Hj3UPH1Be1m8rOsxyfNXJ4dTwX1L ce920p6Ek6XSz/BxU+DXmY4VvMWSVV3wBWnCs+GEBn5pdcxdQ7DCacX04sQOQpjIqMjL Dn+6ZG8zI765vK0ArC0OdZG/JEm7ex26FS0vgMAqUZ3EImE5DmTarM9wob7DHiYUlrBG JaOwhkdWIw4AiV92hXdLCjuyO2sMb4CSk4q6iE2Ti60s6O3XmV1/HhWIkCMZkDsJfff/ 631Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=Oz4UbWpn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kf4-20020a17090305c400b001a1ce0509c7si13579321plb.16.2023.04.25.14.38.45; Tue, 25 Apr 2023 14:38:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=Oz4UbWpn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236396AbjDYV2g (ORCPT + 99 others); Tue, 25 Apr 2023 17:28:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236287AbjDYV2f (ORCPT ); Tue, 25 Apr 2023 17:28:35 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5ADFD729A; Tue, 25 Apr 2023 14:28:27 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id 3f1490d57ef6-b99f374179bso1896036276.3; Tue, 25 Apr 2023 14:28:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682458106; x=1685050106; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=KN4OwG7Q8S0Ej2HKI/K8Bzq4Vn4sob6zcbuKi/+SBdk=; b=Oz4UbWpnj6Eh8d8V1VZ88IAXj/GUdzOFDQ7w7bab738unz4j94g2KIoScKa+JBOCee 6AnhR9ro+G/dg6bIlF2wDf9EwKsttEkhVZE4qfSu1PAFYXABo2gReoNJ9t29CBk03uNK WIzhAM11FzK7JQ6v7Dg6EJm6hIrI0Ya4Wmn/Rji2zbxmmJXSTWwOKuszPUH6UmJy61do I16qnR2ojZu0deWpR74s3SB0w9bl8DfCbf/z0dnpeQclzEMH4Srg6RnMUQK9E8ud6W4b TEylmunEnL3QI+CN6gagBCfpGDHnuxJpIC9iHTZv69VdTqdBgfItR3wsUuyqYHmjR+9Z axBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682458106; x=1685050106; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KN4OwG7Q8S0Ej2HKI/K8Bzq4Vn4sob6zcbuKi/+SBdk=; b=RX+z1QSUhFzsvtf9CiKILQAyLWJRsJgJmKHlO7qlDhqCLoJT7Ny1H3Gcim0WcPODxE kvFgXTWPFK2ovF46NlCwTvHWUzppRfZymaE/eo4dOsnObxEayAuQ2XrRBws4rMnFQwLs k9sCCeM5WRVMTwCdfcsBir6HFd1ofjsvgMmEjuiUd28fF9F+1oGkl0UlcPgEholimjlt E12M0NV2qsRIxx5x63w8dnrkJ/h/LTC/wxGtEXPmtbEEcTajcq9g/g7EoOwcY9kmPoF7 GvPCX1IzE0Z4tLJgvJFzYir3+/L7r/N89JzsDPVvbmbqklxVSMzgKUUSci3l9c7U72le wZvw== X-Gm-Message-State: AAQBX9evgEDI+8da7VbesvOM8wgPn/jDzQGSdQ/N9FmW6zsHHSsAa5eL PImpV5utL6mnpqIQsvadKmk= X-Received: by 2002:a25:4243:0:b0:b75:9a44:5342 with SMTP id p64-20020a254243000000b00b759a445342mr13550844yba.4.1682458106526; Tue, 25 Apr 2023 14:28:26 -0700 (PDT) Received: from ?IPV6:2600:1700:6cf8:1240:ba8:150e:68:30f0? ([2600:1700:6cf8:1240:ba8:150e:68:30f0]) by smtp.gmail.com with ESMTPSA id d134-20020a25e68c000000b00b9949799ce3sm2639289ybh.32.2023.04.25.14.28.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 Apr 2023 14:28:26 -0700 (PDT) Message-ID: <927ddd10-ae5b-886c-6725-3daf04456e52@gmail.com> Date: Tue, 25 Apr 2023 14:28:19 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Content-Language: en-US To: Stanislav Fomichev Cc: Martin KaFai Lau , Eric Dumazet , davem@davemloft.net, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, daniel@iogearbox.net, Jakub Kicinski , Paolo Abeni , Leon Romanovsky , David Ahern , Arnd Bergmann , Kees Cook , Christian Brauner , Kuniyuki Iwashima , Lennart Poettering , linux-arch@vger.kernel.org, Aleksandr Mikhalitsyn , bpf References: <20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com> <20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com> From: Kui-Feng Lee In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/25/23 11:42, Stanislav Fomichev wrote: > On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee wrote: >> >> >> >> On 4/18/23 09:47, Stanislav Fomichev wrote: >>> On 04/17, Martin KaFai Lau wrote: >>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote: >>>>> On 04/13, Stanislav Fomichev wrote: >>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn >>>>>> wrote: >>>>>>> >>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet wrote: >>>>>>>> >>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian), >>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which >>>>>>>>> install FDs into the process fdtable. >>>>>>>>> >>>>>>>>> After some offlist discussion it was proposed to add a blacklist of >>>>>>>> >>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs. >>>>>>> >>>>>>> Hi Eric, >>>>>>> >>>>>>> Oh, I'm sorry about that. :( Sure. >>>>>>> >>>>>>>> >>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled. >>>>>>>>> >>>>>>>> >>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ? >>>>>>> >>>>>>> Sure, I will add next time. >>>>>>> >>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") >>>>>>> >>>>>>> I think it's better to add Stanislav Fomichev to CC. >>>>>> >>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already >>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as >>>>>> well? >>>>> >>>>> Jakub reminded me of the other things I wanted to ask here bug forgot: >>>>> >>>>> - setsockopt is probably not needed, right? setsockopt hook triggers >>>>> before the kernel and shouldn't leak anything >>>>> - for getsockopt, instead of bypassing bpf completely, should we instead >>>>> ignore the error from the bpf program? that would still preserve >>>>> the observability aspect >>>> >>>> stealing this thread to discuss the optlen issue which may make sense to >>>> bypass also. >>>> >>>> There has been issue with optlen. Other than this older post related to >>>> optlen > PAGE_SIZE: >>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/, >>>> the recent one related to optlen that we have seen is >>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel >>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace >>>> intention is to learn the expected optlen. This makes 'ctx.optlen > >>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning >>>> -EFAULT to the userspace even the bpf prog has not changed anything. >>> >>> (ignoring -EFAULT issue) this seems like it needs to be >>> >>> if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { >>> /* error */ >>> } >>> >>> ? >>> >>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen > >>>> max_optlen' for now (and this can use a separate patch which as usual >>>> requires a bpf selftests)? >>> >>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something >>> seems like the way to go. It caused too much trouble already :-( >>> >>> Should I prepare a patch or do you want to take a stab at it? >>> >>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a >>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval >>>> such that it can enforce read-only for some optname and potentially also >>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1 >>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still >>>> holes but could be a seed of thought to continue polishing the idea. >>> >>> Ack, let's think about it. >>> >>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea >>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user, >>> and we have a mostly working bpf_getsockopt (after your refactoring), >>> I don't see why we need to continue the current scheme of triggering >>> after the kernel? >> >> Since a sleepable hook would cause some restrictions, perhaps, we could >> introduce something like the promise pattern. In our case here, BPF >> program call an async version of copy_from_user()/copy_to_user() to >> return a promise. > > Having a promise might work. This is essentially what we already do > with sockets/etc with acquire/release pattern. Would you mind to give me some context of the socket things? > > What are the sleepable restrictions you're hinting about? I feel like > with the sleepable bpf, we can also remove all the temporary buffer > management / extra copies which sounds like a win to me. (we have this > ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can > allocate temporary buffers if needed.. > >>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that >>>>> gets called whenever bpf returns an error to make sure protocols have >>>>> a chance to handle that condition (and free the fd) >>>>> >>>> >>>>