Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp397247pxb; Fri, 22 Apr 2022 03:28:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxe+AbwwSHANnlVUfZTRTJzjFTjAQc1uo2SMMigiVQpHbSJ3XFLUE/EgAz5Q1/wlARdr5Rf X-Received: by 2002:a63:5759:0:b0:399:5816:6a80 with SMTP id h25-20020a635759000000b0039958166a80mr3386553pgm.253.1650623304933; Fri, 22 Apr 2022 03:28:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650623304; cv=none; d=google.com; s=arc-20160816; b=qsDU50qn1jfeWtTxdiKiJSrXH7R7jdlNvwRnWF8Ee7uIFWA1hMajiAZfcbKPGve9Az xEmFyVf0p9x6WKfHCL/6/e0GyY6NyjjAbVDlLEyira1IXImT/I20MRz0iJm/sjemx4Nr 7AIFN3UuRoJGFvByqOa9NLvmEdPYtDrBWLgmgf11af+xuWr11Xvz53y1sLjsSYFKUS91 bDM44Fke6ZYXv/u/3UpUlRTj/hco9LboO7JJsesWIllkF6YDFsEM3sNSpGxAky6fiKgy ALpMUY7sMcKeGqcan4oGmt3uPTpcSln1+3ZtquJxws/79hUVmLNqxsZKVfhBRn4XMaLC 57UQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=BiFwTkcFOMTtKqRq3uhC/TlKTqlAY7nnCuq174HKk0M=; b=uCqJ5aV382b5pxHF4Wy7aHKET0duHrL5rYEo887tuLNnYk0vMAi1vvZe7eefTipEvq nzyrnetahg+xx8qByXCYJyuQ0BIWrDb5fQsPKqwFfgLpGsKmYT6cDPerwCL/yL09mrxe 5nhdU4GYEypM7jNegtBz5e0+riIP5/aq463tovEfUBernaOwrOOCZsFmGWmLzSbcFyDk S/44otyyZap+u9jPwPSviDsUgtrxhZzmXN+cn2aTuSNHqRa0uhCP2QJNACcj5u3pSK3w nVXO/UI/edEXtN1uSqG07BB6hLm0LpA84mm150jir2JprKCz5xKVz4we5ljf2QYJhQkx nZuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=AyKZDrqt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a170902f68800b001561bf8defdsi8592579plg.592.2022.04.22.03.28.10; Fri, 22 Apr 2022 03:28:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=AyKZDrqt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1444863AbiDVHej (ORCPT + 99 others); Fri, 22 Apr 2022 03:34:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386165AbiDVHeh (ORCPT ); Fri, 22 Apr 2022 03:34:37 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C739B515B4; Fri, 22 Apr 2022 00:31:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=BiFwTkcFOMTtKqRq3uhC/TlKTqlAY7nnCuq174HKk0M=; b=AyKZDrqtPLqsXggbYiNB4s0jKC YuBihnegzx6MUh0QMeUCIUyxS0f8f2fHUB0dpThVn0kxXgH0ji+gVYvcBDXPVbqWZBj1H0bHQy6HK rNHe9jnf9i/eJ6t+KzyR7oxKiLEvuZVv4p7TySSggte7OR1iygeQw2aUvicwuJ+FbLEXH5T9RW31q IoRQPXYf6lKNUEFw8989N3pP2QZ3Om5Fj0KPUjQngthaabNwb3SsL9Te6XdrDGTAaKoV+cLlcsXhd elh8AV6Q6wtQkZ48zABYj5dlG8a7qkVNvEDyGFpyaK6gdfaEXz6tVXyNi6cht7qnie+3kKrkNBvcB 7NUjcvXg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nhnlA-007dGP-HF; Fri, 22 Apr 2022 07:31:20 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id 3733D9861C1; Fri, 22 Apr 2022 09:31:18 +0200 (CEST) Date: Fri, 22 Apr 2022 09:31:18 +0200 From: Peter Zijlstra To: Song Liu Cc: Linus Torvalds , Song Liu , Alexei Starovoitov , bpf , Linux-MM , Linux Kernel Mailing List , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Andrew Morton , "Edgecombe, Rick P" , Christoph Hellwig , Andrii Nakryiko Subject: Re: [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack Message-ID: <20220422073118.GR2731@worktop.programming.kicks-ass.net> References: <20220421072212.608884-1-song@kernel.org> <1A4FF473-0988-48BE-9993-0F5E9F0AAC95@fb.com> <8F788446-899C-4BA3-8236-612A94D98582@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8F788446-899C-4BA3-8236-612A94D98582@fb.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > On Apr 21, 2022, at 3:30 PM, Linus Torvalds wrote: > > I actually think bpf_arch_text_copy() is another horribly badly done thing. > > > > It seems only implemented on x86 (I'm not sure how anything else is > > supposed to work, I didn't go look), and there it is horribly badly > > done, using __text_poke() that does all these magical things just to > > make it atomic wrt concurrent code execution. > > > > None of which is *AT*ALL* relevant for this case, since concurrent > > code execution simply isn't a thing (and if it were, you would already > > have lost). > > > > And if that wasn't pointless enough, it does all that magic "map the > > page writably at a different virtual address using poking_addr in > > poking_mm" and a different address space entirely. > > > > All of that is required for REAL KERNEL CODE. > > > > But the thing is, for bpf_prog_pack, all of that is just completely > > pointless and stupid complexity. I think the point is that this hole will likely share a page with active code, and as such there should not be a writable mapping mapping to it, necessitating the whole __text_poke() mess. That said; it does seem somewhat silly have a whole page worth of int3 around just for this. Perhaps we can do something like the completely untested below? --- arch/x86/kernel/alternative.c | 48 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index d374cb3cf024..60afa9105307 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -994,7 +994,20 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state) __ro_after_init struct mm_struct *poking_mm; __ro_after_init unsigned long poking_addr; -static void *__text_poke(void *addr, const void *opcode, size_t len) +static void text_poke_memcpy(void *dst, const void *src, size_t len) +{ + memcpy(dst, src, len); +} + +static void text_poke_memset(void *dst, const void *src, size_t len) +{ + int c = *(int *)src; + memset(dst, c, len); +} + +typedef void text_poke_f(void *dst, const void *src, size_t len); + +static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len) { bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; struct page *pages[2] = {NULL}; @@ -1059,7 +1072,7 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) prev = use_temporary_mm(poking_mm); kasan_disable_current(); - memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len); + func((void *)poking_addr + offset_in_page(addr), src, len); kasan_enable_current(); /* @@ -1091,7 +1104,8 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) * If the text does not match what we just wrote then something is * fundamentally screwy; there's nothing we can really do about that. */ - BUG_ON(memcmp(addr, opcode, len)); + if (func == text_poke_memcpy) + BUG_ON(memcmp(addr, src, len)); local_irq_restore(flags); pte_unmap_unlock(ptep, ptl); @@ -1118,7 +1132,7 @@ void *text_poke(void *addr, const void *opcode, size_t len) { lockdep_assert_held(&text_mutex); - return __text_poke(addr, opcode, len); + return __text_poke(text_poke_memcpy, addr, opcode, len); } /** @@ -1137,7 +1151,7 @@ void *text_poke(void *addr, const void *opcode, size_t len) */ void *text_poke_kgdb(void *addr, const void *opcode, size_t len) { - return __text_poke(addr, opcode, len); + return __text_poke(text_poke_memcpy, addr, opcode, len); } /** @@ -1167,7 +1181,29 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len) s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); - __text_poke((void *)ptr, opcode + patched, s); + __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s); + patched += s; + } + mutex_unlock(&text_mutex); + return addr; +} + +void *text_poke_set(void *addr, int c, size_t len) +{ + unsigned long start = (unsigned long)addr; + size_t patched = 0; + + if (WARN_ON_ONCE(core_kernel_text(start))) + return NULL; + + mutex_lock(&text_mutex); + while (patched < len) { + unsigned long ptr = start + patched; + size_t s; + + s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); + + __text_poke(text_poke_memset, (void *)ptr, (void *)&c, s); patched += s; } mutex_unlock(&text_mutex);