Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp201632rwd; Fri, 26 May 2023 17:55:05 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7es7wu7TRYvOTNmK1MWDykka3vZY8dgff0Q8Tzz6u0/PAKPtJWhwVk/yPGARBfzyMfX4DV X-Received: by 2002:a05:6a20:7da6:b0:10b:b2fb:68fc with SMTP id v38-20020a056a207da600b0010bb2fb68fcmr1039158pzj.53.1685148905051; Fri, 26 May 2023 17:55:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685148905; cv=none; d=google.com; s=arc-20160816; b=gs9Ozf8l9jn5pTe21wUMAD7Wie0dwYrBY/FwY3UPlueNMqs5F24d7HzsKrMBC8UOXd pUdzhWjmTWC2RmFqpMLkpUee92Jd4WHbQH/0fk4k1xrAT0tvE3nlAIAPMs81nYCjCdpf gP90X7W1nZByOkhtU/KmWQlIvT3nAUokHTV3lgwt+ZM0rLq8Zy+T5Y7F7gQA/kavlFhv boKMl/H8Mbb7WAv9sjqUdJR56ViM+LOa8wbkkpLOysGPNVEydyhbIVI+QLNRPM5Pmiur 0cLvEaWBdMKkDZZdCDXhyjzZvRAgy/RlTPV/hJ/rjMrX4rN2/BEGSG1NfIIVfmx5fL3h fyzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Ut37JFjcgcN8bNZY69hJAUVmbKPY/9i7D/Udo19OnJg=; b=Ha4ogL66Nb6Wujl81LuRvaQvqUa1nEnrKnwyOew3Z1Thp1oc515/+urKXyVJ+KKkfp yFr8HSR0IT2EJmpi5PRoZgfn4DNEEjIr+R9hqwKaUl+ZB0U/XaFNviGB7NxO+d8QZzi1 j72BfO4dO0ixxpJopl6MZZjEG+uxKq7iKZj+CffGRGFCCIccuoqKRWa19VsOU2RfwwII rH7Kdlrea0LD0wSPPwKhPWl2desxnQ6LUsbtAp0VuU830SSQRAFNOGNINlVK+XnxLg3x qeXgeK2vn1ygKf9E9Ss9ogRsY9kq8hKz6ImLlKKBxznA0dUvsKQtPaaERnthMyxM5LEq gerg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kVAGhggn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b5-20020a63eb45000000b00519ffeb6d43si4996636pgk.188.2023.05.26.17.54.53; Fri, 26 May 2023 17:55:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kVAGhggn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231137AbjE0ADf (ORCPT + 99 others); Fri, 26 May 2023 20:03:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbjE0ADe (ORCPT ); Fri, 26 May 2023 20:03:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BD45A4 for ; Fri, 26 May 2023 17:03:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C981A65488 for ; Sat, 27 May 2023 00:03:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2AF89C433D2; Sat, 27 May 2023 00:03:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685145812; bh=iu9+NFhbI0BAPXcc+Uw6qHUPhwcZ6dpVVB3MfirmE+E=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=kVAGhggnXBaor8aBGW62jkO9bEkKP1Im7yJffXqEHovO01b/EDeTltwQQSB+dy9gG Oz9P0nYqRQS6ALZTEifhu6TahCzmY+DkxfZQ2HBcAaiKAiUBl8eJHhFkVb43YIRQ/B 9ZvWu99tezMUlBcbyAOxzq2r4GNnZtwOomisb3S4qUKRsoFsmGBuY5JSA/JZ41FeeL 0+3+75aZBYNBgO1VDo0TiA7FiU2I422vT1TG+JuXVJ7CgqPr4iKmUhd0uHzzuVw7SD O22pC4OuDjHwVQu3XbfF+yu69bCi+z011p5NM0tAq4ltWUO+9Jepq4vKE++kQscXJt xzZa7tH57QlnQ== Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-4f3b9755961so1418986e87.0; Fri, 26 May 2023 17:03:32 -0700 (PDT) X-Gm-Message-State: AC+VfDyfLwVWdAlQWXI2mZ20gFmwt36NJdI7J1cqaXoHNm21Xi36uH+W 96Zzoj8SSJoufQI1EuYLIWs5nxl1fyw8fBScnxg= X-Received: by 2002:ac2:5e87:0:b0:4ef:d3f3:c421 with SMTP id b7-20020ac25e87000000b004efd3f3c421mr1067658lfq.4.1685145810185; Fri, 26 May 2023 17:03:30 -0700 (PDT) MIME-Version: 1.0 References: <20230526051529.3387103-1-song@kernel.org> <20230526051529.3387103-2-song@kernel.org> In-Reply-To: From: Song Liu Date: Fri, 26 May 2023 17:03:18 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/3] module: Introduce module_alloc_type To: Kent Overstreet Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, mcgrof@kernel.org, peterz@infradead.org, tglx@linutronix.de, x86@kernel.org, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 26, 2023 at 4:39=E2=80=AFPM Kent Overstreet wrote: > [...] > > But it should be an internal implementation detail, I don't think we > want the external interface tied to vmalloc - > > > These two APIs allow the core code work with all archs. They won't brea= k > > sub-page allocations. (not all archs will have sub-page allocations) > > So yes, text_poke() doesn't work on all architectures, and we need a > fallback. > > But if the fallback needs to go the unprotect/protect route, I think we > need to put that in the helper, and not expose it. Part of what people > are wanting is to limit or eliminate pages that are RWX, so we > definitely shouldn't be creating new interfaces to flipping page > permissions: that should be pushed down to as low a level as possible. > > E.g., with my jitalloc_update(), a more complete version would look like > > void jitalloc_update(void *dst, void *src, size_t len) > { > if (text_poke_available) { > text_poke(...); > } else { > unprotect(); > memcpy(); > protect(); > } > } I think this doesn't work for sub page allocation? > > See? We provide a simpler, safer interface, and this also lets us handle > multiple threads racing to update/flip permissions on the same page in a > single place (e.g. with sticking a lock somewhere in the jitalloc > structures). [...] > > This must have been part of the previous discussion since you started > with execmem_alloc(), but I did scan through that and I'm still not > seeing the justification for why this needs to handle non-text > allocations. > > If I had to hazard a guess it would be because of tglx wanting to solve > tlb fragmentation for modules, but to me that doesn't sound like a good > enough reason and it looks like we're ending up with a messy "try to be > all things to all people" interface as a result :/ > > Do we _really_ need that? > > Mike was just saying at LSF that direct map fragmentation turned out to > be a non issue for performance for non-text, so maybe we don't. At the end of all this, we will have modules running from huge pages, data and text. It will give significant performance benefit when some key driver cannot be compiled into the kernel. > > Also - module_memory_fill_type(), module_memory_invalidate_type() - I > know these are for BPF, but could you explain why we need them in the > external interface here? Could they perhaps be small helpers in the bpf > code that use something like jitalloc_update()? These are used by all users, not just BPF. 1/3 uses them in kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I ca= n use them instead of text_poke_* (and that is probably better code style). As I am writing the email, I think I should use it in ftrace.c (2/3) to avo= id the __weak function. > > > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text). > > We can probably use it (with some updates) instead of the vmap_area > > based allocator. But I guess we need to align the direction first. > > Let's see if we can get tglx to chime in again, since he was pretty > vocal in the last discussion. > > It's also good practice to try to summarize all the relevant "whys" from > the discussion that went into a feature and put that in at least the > commit message - or even better, header file comments. > > Also, organization: the module code is a huge mess, let's _please_ do > split this out and think about organization a bit more, not add to the > mess :) I don't really think module code is very messy at the moment. If necessary, we can put module_alloc_type related code in a separate file. Thanks, Song