Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp2509570rdb; Wed, 21 Feb 2024 09:41:01 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVG8fAnxSCjZJmXZj+vQFKdwdP8ANnLcayHATsJhMcmHpXd4jVkHUQTWxtfAlNgMCLF4ak5r3mNdB8v5sx5vRKNHGZZmcF9l9UXBLSeqA== X-Google-Smtp-Source: AGHT+IHBh6NoSjYxMfyIDcLC+sOmv1ASnnC4MPl0dnOyf+LEHVkGCu67XHqXQlQasZhEpnCQuXDU X-Received: by 2002:aa7:c3cb:0:b0:565:252d:a1a with SMTP id l11-20020aa7c3cb000000b00565252d0a1amr180858edr.5.1708537261134; Wed, 21 Feb 2024 09:41:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708537261; cv=pass; d=google.com; s=arc-20160816; b=lW9tiAksuvmTJZxHP1izzbLtF8ghWbY8B6XaOf8B6KxGrBEnFNOQMaFrQmZT+QJeLF WCWebzczPlqLBnI4H+2iyjtc3yn96gYasKjhFRmg4wMYPoD8C9o4J4MfXiHdLVJ74ZdT xls3rVl0sOaJ+e43fikMa38C7+c+BkbezTq+Kt6xNwCXcU9+NM9g7Cb9yIh14dAfYky2 Uzb2cA2syoPsljQIPhGgHaOD02MDc11wkIuZevYGeDXlrC9ZEEL6B/86CEHoQTKctGVF Jv1ddLInAEnb7va4vj2w7ytI4gj9+7Tyo0GOP8PwoA/85k9cIfva5ZYC3oet1Sw0EpMi IW4A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=rIzuZ6FsR3sWeVB3z/k8YzwbvGS8Ck67EO49fqrTZqs=; fh=VaZYt3Yw8r9yRTMAsu4gvQRXbNKbOgsB6oek7dicjtA=; b=xN16z/Jt1tK5K+wn19bw7p8hh8Xp5idTuBd8Rl5EchxwEsi4D3HCk1EGWCtBSelPYN 5fTlxPjzSoI2RLiXtD0Xth4UHDu87wOP2bxaIgq+EMDS5ailotFWQsa7Mki5GhNEXHob S+6PAnSRmTPNahxRyonBrbc647w1i05vPSWfStWKGtXsLY8kiuZ3byEnKnjQGLspvcMD UnFXCnK6UwvbflkZMBbYE2Qkt1ALaJs7cC/pVvWM1vLmI5NTIybqiAoZcx6e7winGigr Bgs5qPaAJhyOcGCJ1e6nZV18y8d/O0/3PH7cgZMsHGy9FA0lNWYgw9kv5578brxhgNPG I7XQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=cTIffFjH; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-75209-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75209-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id s19-20020a50d493000000b005642ea0a7besi4156298edi.164.2024.02.21.09.41.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 09:41:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75209-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=cTIffFjH; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-75209-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75209-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id AE3951F220D5 for ; Wed, 21 Feb 2024 17:41:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B565083CDD; Wed, 21 Feb 2024 17:40:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="cTIffFjH" Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2E0B69313; Wed, 21 Feb 2024 17:40:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708537252; cv=none; b=IG4wcI/hP/kU40goemQgb2lAgquBSPL9RE/J6OSWN3KV7JfF4eK/ew/OieqTDLIOL59Axjgtn9hp3dhYmEl4Je4KokpNw0KRsGW4Ns7n8pKQfblcUnZrHTSF5/lXOTuvKHu7NGC0XZo380Wo6p/qcrpUw5MxuT6KqEpo6HlxW/g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708537252; c=relaxed/simple; bh=yLnMOorwj6QR1Q8oj/Iy54vokcHJ0eqwtfTaOHATb1k=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GDGxDZ6kWWJGpVVdD4qBtzt1UFXav8FXlaTr8PcKS2K+7siB2ppGMfaI8RfjUd4t365nFLzEsFjKOXQ9HrMsL9S4eHGWj5Da3HW8mjW2BOh7T7BEeyRBGGiviF5uLuwx62R5NCF5e8yuPCjdYZrOe34PqZATJPlm36l0f17tcwc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=cTIffFjH; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=rIzuZ6FsR3sWeVB3z/k8YzwbvGS8Ck67EO49fqrTZqs=; b=cTIffFjHI+K2z/E9s7UbxErFsG 7PzQ9ZKYkXud32O3aA0Frwag4BWV+ym6GvRCPPadMXG/qv7zUeSrtclIK6a6wVudMKztfoDbdPfEZ lwMi4hwd2y7F8LEJyv1fM8nvMKYzyv+CoH62NKQhFjubZGvPi156d7j04MjNPVOR5F34cDv4ejZ27 V60Nkh8uZcMq73hqsqyAgfZjZOu4aIXWerzro65GdLuWhv1iNNCCnKdZ972UItt+5yrV+ooJPtvcJ OBLXaopJpgxjluOEtpEGkDqkD7Zf3Ui0RYCEARrt5opvnSzZEBEZEAzWLUm8KneKurfbxZAltmIcv 1t6bYHAg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rcqaO-00000001xe0-2o8H; Wed, 21 Feb 2024 17:40:48 +0000 Date: Wed, 21 Feb 2024 09:40:48 -0800 From: Luis Chamberlain To: Changbin Du , live-patching@vger.kernel.org Cc: Andrew Morton , linux-modules@vger.kernel.org, linux-kernel@vger.kernel.org, Xiaoyi Su , Eric Chanudet Subject: Re: [PATCH v3] modules: wait do_free_init correctly Message-ID: References: <20240217081810.4155871-1-changbin.du@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240217081810.4155871-1-changbin.du@huawei.com> Sender: Luis Chamberlain + live-patching folks, Finally, things are starting to be much clearer. Thanks for the time for working on this, some more comments below and a question which I think deserves some attention. On Sat, Feb 17, 2024 at 04:18:10PM +0800, Changbin Du wrote: > The synchronization here is just to ensure the module init's been freed > before doing W+X checking. Some nits, this should read instead: Fix the ordering of freeing of a module init so that it happens before W+X checking. > But the commit 1a7b7d922081 ("modules: Use > vmalloc special flag") moves do_free_init() into a global workqueue > instead of call_rcu(). So now rcu_barrier() can not ensure that do_free_init > has completed. We should wait it via flush_work(). Remove "But" and adjust as: Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved calling do_free_init() into a global workqueue instead of relying on it being called through call_rcu(..., do_free_init), which used to allowed us call do_free_init() asynchronously after the end of a subsequent grace period. The move to a global workqueue broke the gaurantees for code which needed to be sure the do_free_init() would complete with rcu_barrier(). To fix this callers which used to rely on rcu_barrier() must now instead use flush_work(&init_free_wq). > Without this fix, we still could encounter false positive reports in > W+X checking, This is good thanks for the clarification. I think it would be useful for the commit log then to describe also that it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. > and the rcu synchronization is unnecessary which can > introduce significant delay. While this can be true, I am not sure if we can remove it. See below. > Eric Chanudet reports that the rcu_barrier introduces ~0.1s delay on a > PREEMPT_RT kernel. That's a separate issue. > [ 0.291444] Freeing unused kernel memory: 5568K > [ 0.402442] Run /sbin/init as init process > > With this fix, the above delay can be eliminated. > > Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag") > Signed-off-by: Changbin Du > Cc: Xiaoyi Su > Cc: Eric Chanudet > > --- > v3: > - amend comment in do_init_module() and update commit msg. > v2: > - fix compilation issue for no CONFIG_MODULES found by 0-DAY. > --- > include/linux/moduleloader.h | 8 ++++++++ > init/main.c | 5 +++-- > kernel/module/main.c | 9 +++++++-- > 3 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h > index 001b2ce83832..89b1e0ed9811 100644 > --- a/include/linux/moduleloader.h > +++ b/include/linux/moduleloader.h > @@ -115,6 +115,14 @@ int module_finalize(const Elf_Ehdr *hdr, > const Elf_Shdr *sechdrs, > struct module *mod); > > +#ifdef CONFIG_MODULES > +void flush_module_init_free_work(void); > +#else > +static inline void flush_module_init_free_work(void) > +{ > +} > +#endif > + > /* Any cleanup needed when module leaves. */ > void module_arch_cleanup(struct module *mod); > > diff --git a/init/main.c b/init/main.c > index e24b0780fdff..f0b7e21ac67f 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -99,6 +99,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -1402,11 +1403,11 @@ static void mark_readonly(void) > if (rodata_enabled) { > /* > * load_module() results in W+X mappings, which are cleaned > - * up with call_rcu(). Let's make sure that queued work is > + * up with init_free_wq. Let's make sure that queued work is > * flushed so that we don't hit false positives looking for > * insecure pages which are W+X. > */ > - rcu_barrier(); Was this the only source of waiters that used rcu_barrier() to sync ? What about kallsyms, live-patching ? This original source to the addition of this rcu_barrier() (in a slight older modified form with with rcu_barrier_sched()) was commit ae646f0b9ca13 ("init: fix false positives in W+X checking") since v4.17 in 2018, 6 years ago. So I'm hoping we don't have any other side-by new users which have grown dependent on this rcu_barrier() for other call_rcu()'s they may have used, but it is hard to tell. So while I agree that flush work is the right solution, removing the rcu_barrier() is technically another change which could potentially regress for other reasons now. It is perhaps safe, but I'm used to surprises for minor changes like these. So I think it makes sense to lift it now, and test it in the wild to see what could possibly break, I'd much prefer to split this as two separate commits. One which does the fix, and another that lifts the rcu_barrier() with the stated rationale and savings on time of ~0.1s on PREEMPT_RT kernels. Luis