Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp312045rwd; Fri, 26 May 2023 20:37:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4EtquBuMvZRp5juTPA5382Yz1D9C3nmx12StVFCSK65Em1EohbKfZ3We2Xdi0K+uGmaWhz X-Received: by 2002:a17:902:d2cb:b0:1aa:ff24:f8f0 with SMTP id n11-20020a170902d2cb00b001aaff24f8f0mr5330455plc.4.1685158667355; Fri, 26 May 2023 20:37:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685158667; cv=none; d=google.com; s=arc-20160816; b=Rtsl+mWeCnGgbQHSIdaeDemAvTtx/FhOOVhx8JpR9I5zTorhlbruqaFdU15TEQINO0 nEY6ObVogRBqCT5EjCBY+Km7MZGin44zhpAJyuatUECi4pYtBwC2aV+jiPtaI2zdRE0J BV+MDQ4/raIGK4GU44FUcaBT3WwINzJhNFoIFIsrRltfgL9unP5GGzlxTfsBEEExwkI/ X4EXjbJIy9k73/2dM+y9ojo3eRv0LIE9RT/Sv8zm4SaabooEQE4nyCgooUe0P6XNntdg q65P0jNruPFwQp1hvwWfNMz0asyqkY4qOj7UKFdFbhIeXZd4mvwfLsw1NvPzrnj2BWTk Ue0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:dkim-signature:date; bh=zC6ZNRAMXSTd1zjJb1ou+NimilKWTlR9ciflQq7y0Bw=; b=DNtEqwT158zLYzCE6giB9X1oaUhQftqMP0KSZJ0nKVwlc9acPVepuLgHbcg7TNmw2f An8hP+LXBluvgkIYD//aeix8qpaeinBJ3bzS1it9KZd1fj5z79pZ2CehoU1GHVTjxCPC 9fKys0pI1opiszfyf2eI+5kFdiqUuExwkU3kNuTIfHIivkwv8OtCciFz5bGLRfrbOs+r +3uCQrjKweUuJL2BXlWWj07UUEC1NOrLwwbVybulmskC8n6crJPo0OFSLeUrHGpDn0v6 wbu2nPcADDiSfk2YDeGpr/IfS5VFdyxwhIW3QdbOq4vWT0kS0gFhPUDVZ27hkWmF62vq nMfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=oWx2NpJv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b20-20020a170902d89400b001a8096ef2bfsi5312078plz.34.2023.05.26.20.37.30; Fri, 26 May 2023 20:37:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=oWx2NpJv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237380AbjE0DUA (ORCPT + 99 others); Fri, 26 May 2023 23:20:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjE0DT7 (ORCPT ); Fri, 26 May 2023 23:19:59 -0400 Received: from out-29.mta0.migadu.com (out-29.mta0.migadu.com [IPv6:2001:41d0:1004:224b::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2CA299 for ; Fri, 26 May 2023 20:19:56 -0700 (PDT) Date: Fri, 26 May 2023 23:19:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1685157595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zC6ZNRAMXSTd1zjJb1ou+NimilKWTlR9ciflQq7y0Bw=; b=oWx2NpJvKxrzV3oXwp2njxlTCvxR3WH88zCPvD2fyst1ulzM9qe96PAlVTOQETHCG81FGd rMVp8WLbEw6k0SPHd08q8Y611MvjKz8NUXREJBkL3JBxYqI+6GLH6qHyDUE0mMfEXCca0D 3yNNaKHa1oS6rPQunUyGJMxDV6Hcgig= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Song Liu Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, mcgrof@kernel.org, peterz@infradead.org, tglx@linutronix.de, x86@kernel.org, rppt@kernel.org Subject: Re: [PATCH 1/3] module: Introduce module_alloc_type Message-ID: References: <20230526051529.3387103-1-song@kernel.org> <20230526051529.3387103-2-song@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 26, 2023 at 05:03:18PM -0700, Song Liu wrote: > On Fri, May 26, 2023 at 4:39 PM Kent Overstreet > wrote: > > > [...] > > > > > But it should be an internal implementation detail, I don't think we > > want the external interface tied to vmalloc - > > > > > These two APIs allow the core code work with all archs. They won't break > > > sub-page allocations. (not all archs will have sub-page allocations) > > > > So yes, text_poke() doesn't work on all architectures, and we need a > > fallback. > > > > But if the fallback needs to go the unprotect/protect route, I think we > > need to put that in the helper, and not expose it. Part of what people > > are wanting is to limit or eliminate pages that are RWX, so we > > definitely shouldn't be creating new interfaces to flipping page > > permissions: that should be pushed down to as low a level as possible. > > > > E.g., with my jitalloc_update(), a more complete version would look like > > > > void jitalloc_update(void *dst, void *src, size_t len) > > { > > if (text_poke_available) { > > text_poke(...); > > } else { > > unprotect(); > > memcpy(); > > protect(); > > } > > } > > I think this doesn't work for sub page allocation? Perhaps I elided too much - it does if you add a single lock. You can't do that if it's not a common helper. > At the end of all this, we will have modules running from huge pages, data > and text. It will give significant performance benefit when some key driver > cannot be compiled into the kernel. Yeah, I've seen the numbers for the perf impact of running as a module due to the extra TLB overhead - but Mike's recent data was showing that this doesn't matter nearly as much as data as it does for text. > > Also - module_memory_fill_type(), module_memory_invalidate_type() - I > > know these are for BPF, but could you explain why we need them in the > > external interface here? Could they perhaps be small helpers in the bpf > > code that use something like jitalloc_update()? > > These are used by all users, not just BPF. 1/3 uses them in > kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I can > use them instead of text_poke_* (and that is probably better code style). > As I am writing the email, I think I should use it in ftrace.c (2/3) to avoid > the __weak function. Ok. Could we make it clearer why they're needed and what they're for? I know bpf fills in invalid instructions initially; I didn't read through enough code to find the "why", and let's document that to save other people the same effort. And do they really need to be callbacks in mod_alloc_params...? Do we need the other things in mod_alloc_params/vmalloc_params? - your granularity field says it's for specifying PAGE/PMD size: we definitely do not want that. We've had way too much code along the lines of "implement hugepages for x", we need to stop doing that. It's an internal mm/ detail. - vmalloc_params: we need gfp_t and pgprot_t, but those should be normal arguments. start/end/vm_flags? They seem like historical module baggage, can we get rid of them? > > > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text). > > > We can probably use it (with some updates) instead of the vmap_area > > > based allocator. But I guess we need to align the direction first. > > > > Let's see if we can get tglx to chime in again, since he was pretty > > vocal in the last discussion. > > > > It's also good practice to try to summarize all the relevant "whys" from > > the discussion that went into a feature and put that in at least the > > commit message - or even better, header file comments. > > > > Also, organization: the module code is a huge mess, let's _please_ do > > split this out and think about organization a bit more, not add to the > > mess :) > > I don't really think module code is very messy at the moment. If > necessary, we can put module_alloc_type related code in a > separate file. Hey, it's been organized since I last looked at it :) But I tihink this belongs in mm/, not module code.