Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754759AbcLTDGF (ORCPT ); Mon, 19 Dec 2016 22:06:05 -0500 Received: from ozlabs.org ([103.22.144.67]:47373 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752189AbcLTDGC (ORCPT ); Mon, 19 Dec 2016 22:06:02 -0500 From: Rusty Russell To: "Luis R. Rodriguez" Cc: Filipe Manana , "Paul E. McKenney" , linux-doc@vger.kernel.org, rgoldwyn@suse.com, hare , Jonathan Corbet , Linus Torvalds , linux-kselftest@vger.kernel.org, Andrew Morton , Dan Williams , Aaron Tomlin , rwright@hpe.com, Heinrich Schuchardt , Michal Marek , martin.wilck@suse.com, Jeff Mahoney , Ingo Molnar , Petr Mladek , Dmitry Torokhov , Guenter Roeck , "Eric W. Biederman" , shuah@kernel.org, DSterba@suse.com, Kees Cook , Josh Poimboeuf , Arnaldo Carvalho de Melo , Miroslav Benes , NeilBrown , linux-kernel@vger.kernel.org, David Miller , Jessica Yu , Subash Abhinov Kasiviswanathan , Julia Lawall Subject: Re: [RFC 10/10] kmod: add a sanity check on module loading In-Reply-To: References: <20161208184801.1689-1-mcgrof@kernel.org> <20161208194930.2816-1-mcgrof@kernel.org> <87k2b2kpdt.fsf@rustcorp.com.au> <20161216083123.GF13946@wotan.suse.de> <87zijvjjm1.fsf@rustcorp.com.au> User-Agent: Notmuch/0.22.1 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Tue, 20 Dec 2016 11:23:08 +1030 Message-ID: <87fuljjua3.fsf@rustcorp.com.au> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5430 Lines: 147 "Luis R. Rodriguez" writes: > On Dec 16, 2016 9:54 PM, "Rusty Russell" wrote: > > AFAICT the mistake here is that kmod is returning "done, OK" when the > > module it is trying to load is already loading (but not finished > > loading). That's the root problem; it's an attempt at optimization by > > kmod which goes awry. > > This is true! To be precise though the truth of the matter is that kmod'd > respective usermode helper: modprobe can be buggy and may lie to us. It may > allow request_module() to return 0 but since we don't validate it, any > assumption we make can be deadly. In the case of get_fs_type() its a null > dereference. Wait, what?? I can't see that in get_fs_type, which hasn't changed since 2013. If a caller is assuming get_fs_type() doesn't return NULL, they're broken and need fixing of course: struct file_system_type *get_fs_type(const char *name) { struct file_system_type *fs; const char *dot = strchr(name, '.'); int len = dot ? dot - name : strlen(name); fs = __get_fs_type(name, len); if (!fs && (request_module("fs-%.*s", len, name) == 0)) fs = __get_fs_type(name, len); if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) { put_filesystem(fs); fs = NULL; } return fs; } Where does this NULL-deref is the module isn't correctly loaded? > *Iff* we want a sanity check to verify kmod's umh is not lying to us we > need to verify after 0 was returned that it was not lying to us. Since kmod > accepts aliases but find_modules_all() only works on the real module name a > validation check cannot happen when all you have are aliases. request_module() should block until resolution, but that's fundamentally a userspace problem. Let's not paper over it in kernelspace. > *Iff* we are sure we don't want a validation (or another earlier > optimization to avoid calling out to modrobe if the alias requested is > already present, which does the time shaving I mentioned on the tests) then > naturally no request_module() calls returning 0 can assert information > about the requested module. I think we might need to change more code if we > accept we cannot trust request_module() calls, or we accept userspace > telling the kernel something may mean we sometimes crash. This later > predicament seems rather odd so hence the patch. > > Perhaps in some cases validation of work from a umh is not critical in > kernel but for request_module() I can tell you that today get_fs_type code > currently asserts the module found can never be NULL. OK, what am I missing in the code above? > > Looking at the code in the kernel, we *already* get this right: block if > > a module is still loading anyway. Once it succeeds we return -EBUSY if > > > > it fails we'll proceed to try to load it again. > > > > I don't understand what you're trying to fix with adding aliases > > in-kernel? > > Two fold now: > > a) validation on request_module() work when an alias is used But why? > b) since kmod accepts aliaes, if we get aliases support, it means we could > *also* preemptively avoid calling out to userspace for modules already > present. No, because once we have a module we don't request it: requesting is the fallback case. > >> FWIW a few things did occur to me: > >> > >> a) list_add_rcu() is used so new modules get added first > > > > Only after we're sure that there are no duplicates. > > > > > OK! This is a very critical assertion. I should be able to add a debug > WARN_ON() should two modules be on the modules list for the same module > then ? Yes, names must be unique. >> b) find_module_all() returns the last module which was added as it > traverses >> the module list > >> BTW should find_module_all() use rcu to traverse? > > Yes; the kallsyms code does this on Oops. Not really a big issue in > practice, but a nice fix. > > Ok, will bundle into my queue. Please submit to Jessica for her module queue, as it's orthogonal AFAICT. > I will note though that I still think there's a bug in this code -- > upon a failure other "spinning" requests can fail, I believe this may > be due to not having another state or informing pending modules too > early of a failure but I haven't been able to prove this conjecture > yet. That's possible, but I can't see it from quickly re-checking the code. The module should be fully usable at this point; the module's init has been called successfully, so in the case of __get_fs_type() it should now succeed. The module cleans up its init section, but that should be independent. If there is a race, it's likely to be when some other caller wakes the queue. Moving the wakeup as soon as possible should make it easier to trigger: diff --git a/kernel/module.c b/kernel/module.c index f57dd63186e6..78bd89d41a22 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3397,6 +3397,7 @@ static noinline int do_init_module(struct module *mod) /* Now it's a first class citizen! */ mod->state = MODULE_STATE_LIVE; + wake_up_all(&module_wq); blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_LIVE, mod); @@ -3445,7 +3446,6 @@ static noinline int do_init_module(struct module *mod) */ call_rcu_sched(&freeinit->rcu, do_free_init); mutex_unlock(&module_mutex); - wake_up_all(&module_wq); return 0; Thanks, Rusty.