Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1684366pxb; Wed, 30 Mar 2022 08:17:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzG3A+yBmf/s3Ze21JLh/+c3i/cc5/MssV+TEjz2fFL2lyN+1cXG5cFVGYDZGtngSaEuwYR X-Received: by 2002:a17:906:2695:b0:6ce:f9c:b476 with SMTP id t21-20020a170906269500b006ce0f9cb476mr40269642ejc.235.1648653474109; Wed, 30 Mar 2022 08:17:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648653474; cv=none; d=google.com; s=arc-20160816; b=vpE8OCc6LaBbgtv1B+1VTcpwOFpHTNjejSxF7OU5rfqlNvqfn6xnl4RdvaQ74PdGQs tlHieYSg6SKa036gcCIren2AtsJI7yusO9z3nA3ndok0G28Olpm7Y5q10nLuFlzAHZ3t 40KOgruIqIUsSXUg0Vt8zvatXleZTvxudOHoRyWkzBdCezoYDHj7cJMuIE4ABVG19zvd k7mrRvfksyuZ8WgbExS99P8RtOCR3k2DtzPzqIJ0GFoElxs2T/8sr6Ue8Pu+zBz5qg2L vWCDARrBl85gXzOVJAch+ND7gv2Y2qwqLsNnCdyrDygmhbTN6MuPRZsr7kFXQjAoEn7x OaTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=KvFCFp39fcZMgF3qrB2Wxye/PpzuYjmzOhgExsjshNg=; b=ZcKM/SN8U4bP4EdpkEnj1z3EoCcHuMceDm45uV0+fdiR6IZ55BOfSHan8OktHShoDB xOY6eARdXjggW2Hi8sFg+pDNUeTo/ogoizS+TbbtA2STKC09H7+KQzhqdVPK67xQcgLC UmLU3ITRKGnKicNJEOTlbbACsabVY/VqgYoSp804yNFE7XynCjHpe3Ez5sKclpMoEoQ4 gcsm4jZ53FwFmNM4YBhGbcQLcXVZctSFJjOfBEBcOACWW76uUwZJmCGhsXO5zu+5Psk3 oDdP3RR057ZtsgMBD469JsUf2L0l2tNczlVasCHloy1VrKEty7WMJFgxbVgrBEZjr0f9 Yddg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=SsW+w1yn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lc9-20020a170906f90900b006e005895e01si19495085ejb.986.2022.03.30.08.17.25; Wed, 30 Mar 2022 08:17:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=SsW+w1yn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240499AbiC2Xbp (ORCPT + 99 others); Tue, 29 Mar 2022 19:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229512AbiC2Xbo (ORCPT ); Tue, 29 Mar 2022 19:31:44 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A94701EC48; Tue, 29 Mar 2022 16:30:00 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id t13so14845232pgn.8; Tue, 29 Mar 2022 16:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KvFCFp39fcZMgF3qrB2Wxye/PpzuYjmzOhgExsjshNg=; b=SsW+w1ynBPr+p6WJt1nv1wq0HHEkQA/T36bSMU6ljqQaVj1UT5s3a8d6Of+A4VsFRg GZVxQ43NSdtuYkMzTkLVoQJ+lHmv/nk6BpSScWzGqnRtZ7PDbst0E40HCam1KiCnRTDD 8zw4B9ezNv49fYGbKLJRP78K6TYMIyHoqC7KaXeySubwQNibcPzInmb1Rv8Z1KbqGUUu TwZyW9EveDrMwWglC5F2WFmVsXczcrr0z+tvs+M9UWkfCE3RJkHZ0l62TqvVyxV2p1bp LwxajxAumXM4aplBmebCiqhGFynrRHdF55NDgNDjPz8ek9gZS3b2osmXXt6W1SqdDgUO GCYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KvFCFp39fcZMgF3qrB2Wxye/PpzuYjmzOhgExsjshNg=; b=nydX/dfN6pGJ1KzMAqAgIdfKrRCu3gvzob8zoGwvQUmOKoFBHAWTDpJkFXiBAQcZ/S Ieet3ml5Z0Dzbjc5cc3Zj1pqRENNxMXAuGolQ0asBP7ZzQJ96WZo6UyDfLFKAjeYXf+X 2v6Yi/gre9UJJaQkCQUS0QVmRqD9pWj6jwGws3DwzsrFGNamPK3G99d2ZjxwMQpNn95m B0nOND2/Zxeav9J67ErbtW1LZnWYLJIiYRPVbyB2j30r0u33RqovMk5yY4x70jwuhbvW 1oTdCrTqOhQnwjMA/fZVLR+/YfWHI+yIC3HfInDKhKUv91TZGt2o8gR2WGxV3xmP24ce EFDA== X-Gm-Message-State: AOAM5317oPTNmAVQyucobrMesHMLMDBfCijjl1jqAVSR+pdKCKIb2GBy Y032XKO0jK3vE1H08Hx4dgM= X-Received: by 2002:a05:6a00:1824:b0:4f6:dc69:227e with SMTP id y36-20020a056a00182400b004f6dc69227emr30302603pfa.58.1648596600114; Tue, 29 Mar 2022 16:30:00 -0700 (PDT) Received: from ast-mbp ([2620:10d:c090:400::5:f900]) by smtp.gmail.com with ESMTPSA id b10-20020a056a00114a00b004f784ba5e6asm22022029pfm.17.2022.03.29.16.29.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Mar 2022 16:29:59 -0700 (PDT) Date: Tue, 29 Mar 2022 16:29:56 -0700 From: Alexei Starovoitov To: Hao Luo Cc: Kumar Kartikeya Dwivedi , Yonghong Song , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , KP Singh , Martin KaFai Lau , Song Liu , bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC bpf-next 0/2] Mmapable task local storage. Message-ID: <20220329232956.gbsr65jdbe4lw2m6@ast-mbp> References: <20220324234123.1608337-1-haoluo@google.com> <9cdf860d-8370-95b5-1688-af03265cc874@fb.com> <20220329093753.26wc3noelqrwlrcj@apollo.legion> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 29, 2022 at 10:43:42AM -0700, Hao Luo wrote: > On Tue, Mar 29, 2022 at 2:37 AM Kumar Kartikeya Dwivedi > wrote: > > > > On Mon, Mar 28, 2022 at 11:16:15PM IST, Hao Luo wrote: > > > On Mon, Mar 28, 2022 at 10:39 AM Hao Luo wrote: > > > > > > > > Hi Yonghong, > > > > > > > > On Fri, Mar 25, 2022 at 12:16 PM Yonghong Song wrote: > > > > > > > > > > On 3/24/22 4:41 PM, Hao Luo wrote: > > > > > > Some map types support mmap operation, which allows userspace to > > > > > > communicate with BPF programs directly. Currently only arraymap > > > > > > and ringbuf have mmap implemented. > > > > > > > > > > > > However, in some use cases, when multiple program instances can > > > > > > run concurrently, global mmapable memory can cause race. In that > > > > > > case, userspace needs to provide necessary synchronizations to > > > > > > coordinate the usage of mapped global data. This can be a source > > > > > > of bottleneck. > > > > > > > > > > I can see your use case here. Each calling process can get the > > > > > corresponding bpf program task local storage data through > > > > > mmap interface. As you mentioned, there is a tradeoff > > > > > between more memory vs. non-global synchronization. > > > > > > > > > > I am thinking that another bpf_iter approach can retrieve > > > > > the similar result. We could implement a bpf_iter > > > > > for task local storage map, optionally it can provide > > > > > a tid to retrieve the data for that particular tid. > > > > > This way, user space needs an explicit syscall, but > > > > > does not need to allocate more memory than necessary. > > > > > > > > > > WDYT? > > > > > > > > > > > > > Thanks for the suggestion. I have two thoughts about bpf_iter + tid and mmap: > > > > > > > > - mmap prevents the calling task from reading other task's value. > > > > Using bpf_iter, one can pass other task's tid to get their values. I > > > > assume there are two potential ways of passing tid to bpf_iter: one is > > > > to use global data in bpf prog, the other is adding tid parameterized > > > > iter_link. For the first, it's not easy for unpriv tasks to use. For > > > > the second, we need to create one iter_link object for each interested > > > > tid. It may not be easy to use either. > > > > > > > > - Regarding adding an explicit syscall. I thought about adding > > > > write/read syscalls for task local storage maps, just like reading > > > > values from iter_link. Writing or reading task local storage map > > > > updates/reads the current task's value. I think this could achieve the > > > > same effect as mmap. > > > > > > > > > > Actually, my use case of using mmap on task local storage is to allow > > > userspace to pass FDs into bpf prog. Some of the helpers I want to add > > > need to take an FD as parameter and the bpf progs can run > > > concurrently, thus using global data is racy. Mmapable task local > > > storage is the best solution I can find for this purpose. > > > > > > Song also mentioned to me offline, that mmapable task local storage > > > may be useful for his use case. > > > > > > I am actually open to other proposals. > > > > > > > You could also use a syscall prog, and use bpf_prog_test_run to update local > > storage for current. Data can be passed for that specific prog invocation using > > ctx. You might have to enable bpf_task_storage helpers in it though, since they > > are not allowed to be called right now. > > > > The loading process needs CAP_BPF to load bpf_prog_test_run. I'm > thinking of allowing any thread including unpriv ones to be able to > pass data to the prog and update their own storage. If I understand the use case correctly all of this mmap-ing is only to allow unpriv userspace to access a priv map via unpriv mmap() syscall. But the map can be accessed as unpriv already. Pin it with the world read creds and do map_lookup sys_bpf cmd on it.