Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp126832pxb; Tue, 12 Apr 2022 18:57:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyH9MZC+XhXUQ75IT11TjlqJW1Pm1EeQChlFuTlvwjjsoqDGg9mYMCRouIhKfAouQzN4b2G X-Received: by 2002:a05:6402:190d:b0:41b:a70d:1367 with SMTP id e13-20020a056402190d00b0041ba70d1367mr40988105edz.155.1649815047943; Tue, 12 Apr 2022 18:57:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649815047; cv=none; d=google.com; s=arc-20160816; b=KpXQ2k9uX9rsY2eS7T+FA9zJ1PMeVX/Dcrn9/oKVBWLD9lnihHRHvECI5F1WM46MvC HRtmIr2x34n3j+D/GF6wsBa/g2018kNM8VS0sPQEtY1O+4XxheeWn0zKIAtLDlNJYt1U 8+iJAKCiHRZFdNq4Brb3/9SabPVrFEo3mhGPWP3my5m7vaxkdEtuTCZSaj7lAD4pkZHv wxNk5HVQNLNQ+OiLvJzgtWCfhtGQOx9XSK3oHzwMzettCytsEqiyUV01Gnx1oaSdgNsf fl1DOJzai/xBdsdgkimYL1BuXlCI7JD2lC8NyATaMA9btkKOrSn4kzc1QBtNq3HxWYmy ETCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xfdEJ1D4GUxiolNC2ZZ77EBkTaTM2GoIjPyRx3XQD1Q=; b=vo9Zc0v5v189vbbKr2piFip/47E7BjB09L5nnfcYWjNcx90JHnpp2/OQVTNpoa0Ykv jawFKeYEnqw2aEk7F4NzJ7o6CCCNSPKMRzn0efRX5DpnYWONY3DrBDiS0a7VfbmsMUyU 3BjqdsZqPfOxfUckxJ2PJ6trLAU3pCxuwSp4Mx4VEtxxw7G3s1Blxt8DR36pEGt/Pu4V X0sJW35DPCc+5o6AiWwq5IsKpRJwc0NHq0RwGRiHY8Y/qiSyxq9pPrQS1u85iQxPXxjz SQLIP87NGI8hm+NK4fwvtqa0CAzvGbSHd6TUZwNehT/8jTugsO6AE5zHPA+vbrId6Fgv PEPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="MCex7/V9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ss4-20020a170907c00400b006e0339b4835si1026100ejc.1006.2022.04.12.18.57.02; Tue, 12 Apr 2022 18:57:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="MCex7/V9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230240AbiDLX3a (ORCPT + 99 others); Tue, 12 Apr 2022 19:29:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229812AbiDLX0y (ORCPT ); Tue, 12 Apr 2022 19:26:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31801985A0; Tue, 12 Apr 2022 15:37:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AFA1861C9C; Tue, 12 Apr 2022 21:00:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2333CC385AF; Tue, 12 Apr 2022 21:00:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649797250; bh=ZpL2a5lPfBdUIedwirE+bwCeUGPgdpinbYEuBaVo93A=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=MCex7/V9j1yaLO/NwFvMfN5Tp5passJZr3HqGxacPnkN/uqVJgDV+rDlkf1JPJwrw jJB0IM3r7a2aAMtm494uCivGxhePa7QfZ+jkvq7y8qLMP/BY8+y8zWkpDjmPQa5I7L S7USJwm6H2O06hVGDOBS6jujqxpB4Ehdbv+IenT2YnlBLa+zjLLRgjhYnqd++AdrUS G1HhigEtmQyZDEFtNzHtsuQ1N1RKuyeK9ygp0C5hWpeJXX3IggUIBvCT1o6nkW7Xty KaPr7X08KWFfkIvdctNrlhRko2OZv2SYWUSZ86O+K8x+Vrw4K3NlwSUJA54zTdvMa/ 9JFzcxQ9NBJ0Q== Received: by mail-yb1-f177.google.com with SMTP id x200so185500ybe.13; Tue, 12 Apr 2022 14:00:50 -0700 (PDT) X-Gm-Message-State: AOAM531vS9nc+/mkPDmeNVQcGDxlyvYuGImW7eCpuDnxn5ZsV2d0RjPp LmR1lb/d8Ducsnbsnup/zZZWvuo32EdNcRWTez8= X-Received: by 2002:a25:4055:0:b0:641:636e:18dd with SMTP id n82-20020a254055000000b00641636e18ddmr8364049yba.389.1649797249049; Tue, 12 Apr 2022 14:00:49 -0700 (PDT) MIME-Version: 1.0 References: <20220411233549.740157-1-song@kernel.org> <20220411233549.740157-5-song@kernel.org> <0e8047d07def02db8ef33836ee37de616660045c.camel@intel.com> In-Reply-To: <0e8047d07def02db8ef33836ee37de616660045c.camel@intel.com> From: Song Liu Date: Tue, 12 Apr 2022 14:00:35 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack To: "Edgecombe, Rick P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "hch@infradead.org" , "imbrenda@linux.ibm.com" , "akpm@linux-foundation.org" , "ast@kernel.org" , "mcgrof@kernel.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 12, 2022 at 10:21 AM Edgecombe, Rick P wrote: > > On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote: > > @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void) > > bitmap_zero(pack->bitmap, bpf_prog_pack_size / > > BPF_PROG_CHUNK_SIZE); > > list_add_tail(&pack->list, &pack_list); > > > > - set_vm_flush_reset_perms(pack->ptr); > > set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / > > PAGE_SIZE); > > set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / > > PAGE_SIZE); > > return pack; > > Dropping set_vm_flush_reset_perms() is not mentioned in the commit log. > It is kind of a fix for a different issue. > > Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does > not work with them, we should have some comments or warnings to that > effect somewhere. Someone may try to pass the flags in together. Good catch! I will add it in the next version. > > > @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct > > bpf_binary_header *hdr) > > if (bitmap_find_next_zero_area(pack->bitmap, > > bpf_prog_chunk_count(), 0, > > bpf_prog_chunk_count(), 0) == > > 0) { > > list_del(&pack->list); > > - module_memfree(pack->ptr); > > > > + set_memory_nx((unsigned long)pack->ptr, > > bpf_prog_pack_size / PAGE_SIZE); > > + set_memory_rw((unsigned long)pack->ptr, > > bpf_prog_pack_size / PAGE_SIZE); > > + vfree(pack->ptr); > > kfree(pack); > > Now that it calls module_alloc_huge() instead of vmalloc_node_range(), > should it call module_memfree() instead of vfree()? Right. Let me sort that out. (Also, whether we introduce module_alloc_huge() or not). > > > > Since there are bugs, simple, immediate fixes seem like the right thing > to do, but I had a couple long term focused comments on this new > feature: > > It would be nice if bpf and the other module_alloc() callers could > share the same large pages. Meaning, ultimately that this whole thing > should probably live outside of bpf. BPF tracing usages might benefit > for example, and kprobes and ftrace are not too different than bpf > progs from a text allocation perspective. Agreed. > > I agree that the module's part is non-trivial. A while back I had tried > to do something like bpf_prog_pack() that worked for all the > module_alloc() callers. It had some modules changes to allow different > permissions to go to different allocations so they could be made to > share large pages: > > https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/ > > I thought the existing kernel special permission allocation methods > were just too brittle and intertwined to improve without a new > interface. The hope was the new interface could wrap all the arch > intricacies instead of leaving them exposed in the cross-arch callers. > > I wonder what you think of that general direction or if you have any > follow up plans for this? Since I am still learning the vmalloc/module_alloc code, I think I am not really capable of commenting on the direction. From our use cases, we do see performance hit due to large number of BPF program fragmenting the page table. Kernel module, OTOH, is not too big an issue for us, as we usually build hot modules into the kernel. That being said, we are interested in making the huge page interface general for BPF program and kernel module. We can commit resources to this effort. Thanks, Song