Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4602722pxv; Tue, 27 Jul 2021 11:19:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDdNY/WqDYkAVL1Y34T/fpFgv8i5ImYfKuPi/48+rxpAaIE/62UeyzWniy227SHNOyVX93 X-Received: by 2002:a05:6602:2424:: with SMTP id g4mr20347896iob.189.1627409979352; Tue, 27 Jul 2021 11:19:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627409979; cv=none; d=google.com; s=arc-20160816; b=gWMD68AqD5Ba1CMcmQzVsDGoLpt/34bmNJ6N8RDUYcw6MeiF9wolwFwKgI68d35e8P b7D/iqj1aN9+NQobAMN1LORbwuKzgcpR6iHgCytE2LGUjHcB+ckz847ixbgY4n0ASfSq jxYwgFQwrJfnwhQBQcysetLYbDniB26DN2z+iPpl29T66ckYqFm452IsmOvyen+VRDTZ c8/x9XSXd/dqidq2YKkDpqETzDSwdCzRpZIBGBMVX26xQtLQORqTTs+mk9ibxgWIhIyg HDUNWhPANTaPN3MdahmMsigfCSwM5fZFGJtWcfoa7NyZalkMsVzNnlNn+lrvNPDCCWke UXiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=/wH+bt150laCfivNWd1GtrgcQyf1VwVgzN2kPwmqJJo=; b=jStdsOmUDfPLP+T08BMWcd8HZTbPN2bVhpmy5EKYlBHMf3y6ZW4WfrQwovdr+ruvq0 69GPB4lcpDvw6SRb/XD99Qt5CziEt9DIpSnULtnnAvnKv2T+SxKYI2o7E6hN0x4sKm/Z Zw0LIyWnH2qq66bDDysnEraZFKE9SZPN1jiwSnAKJjX35IL0x9OFKI2jBmLpJjQYO81/ w7qB8Fpj0iQ0DGM4UQaG7SGBIYZCLD3thnHjHreWB4b9NJ0GrlkS8uMe7C/wsl+HvfmW 4DHwCQ5NwOBKVqzaIQ2Jn4JxodIPfugYXMUyhkwq8QRmAK9cTq8Qa/bbzX17lKFJzHHQ ZAcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=ygtKXlWc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q15si4025351jaj.14.2021.07.27.11.19.27; Tue, 27 Jul 2021 11:19:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=ygtKXlWc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229903AbhG0SSY (ORCPT + 99 others); Tue, 27 Jul 2021 14:18:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229537AbhG0SSX (ORCPT ); Tue, 27 Jul 2021 14:18:23 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79CFCC061760; Tue, 27 Jul 2021 11:18:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=/wH+bt150laCfivNWd1GtrgcQyf1VwVgzN2kPwmqJJo=; b=ygtKXlWcjHlUArlv1um/OmOXYP ZBHc9FJo2URjB5RY2d5YCaJVQcVt/r5ahSiyAYk1yfF60HXKutJrRaQ5kETGpuJks00upJ2c+1GxY Ft0aVq6Ky73sFv8zCSMaqYVrjioU1dQ50B2qjpDbth2cXoZi+D28IrxI7lBbzvMmI9b2i+F/TVVnM ca22l0qbtwpUXOrAR1tVRryUoEkdLdJkOn/+OIZTs+PyGc2DzIIlG/8b7wPp+in4mVN6M5JSlQ0Rf WJbzRobfmiXqZBnC5n4y9DCeRqOls44TUVvgahtJB0/ryljBg2QrpWQjV/kmgd0aqfyb0OBS5/fZS fZOkE4mg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m8ReV-00FndS-RP; Tue, 27 Jul 2021 18:18:03 +0000 Date: Tue, 27 Jul 2021 11:18:03 -0700 From: Luis Chamberlain To: "gregkh@linuxfoundation.org" Cc: David Laight , "tj@kernel.org" , "shuah@kernel.org" , "akpm@linux-foundation.org" , "rafael@kernel.org" , "davem@davemloft.net" , "kuba@kernel.org" , "ast@kernel.org" , "andriin@fb.com" , "daniel@iogearbox.net" , "atenart@kernel.org" , "alobakin@pm.me" , "weiwan@google.com" , "ap420073@gmail.com" , "jeyu@kernel.org" , "ngupta@vflare.org" , "sergey.senozhatsky.work@gmail.com" , "minchan@kernel.org" , "axboe@kernel.dk" , "mbenes@suse.com" , "jpoimboe@redhat.com" , "tglx@linutronix.de" , "keescook@chromium.org" , "jikos@kernel.org" , "rostedt@goodmis.org" , "peterz@infradead.org" , "linux-block@vger.kernel.org" , "netdev@vger.kernel.org" , Douglas Gilbert , Hannes Reinecke , "linux-kselftest@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] kernel/module: add documentation for try_module_get() Message-ID: References: <20210722221905.1718213-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 27, 2021 at 07:46:34PM +0200, gregkh@linuxfoundation.org wrote: > On Tue, Jul 27, 2021 at 10:30:36AM -0700, Luis Chamberlain wrote: > > On Sat, Jul 24, 2021 at 12:15:10PM +0000, David Laight wrote: > > > From: Luis Chamberlain > > > > Sent: 22 July 2021 23:19 > > > > > > > > There is quite a bit of tribal knowledge around proper use of > > > > try_module_get() and that it must be used only in a context which > > > > can ensure the module won't be gone during the operation. Document > > > > this little bit of tribal knowledge. > > > > > > > ... > > > > > > Some typos. > > > > > > > +/** > > > > + * try_module_get - yields to module removal and bumps reference count otherwise > > > > + * @module: the module we should check for > > > > + * > > > > + * This can be used to check if userspace has requested to remove a module, > > > a module be removed > > > > + * and if so let the caller give up. Otherwise it takes a reference count to > > > > + * ensure a request from userspace to remove the module cannot happen. > > > > + * > > > > + * Care must be taken to ensure the module cannot be removed during > > > > + * try_module_get(). This can be done by having another entity other than the > > > > + * module itself increment the module reference count, or through some other > > > > + * means which gaurantees the module could not be removed during an operation. > > > guarantees > > > > + * An example of this later case is using this call in a sysfs file which the > > > > + * module created. The sysfs store / read file operation is ensured to exist > > > ^^^^^^^^^^^^^^^^^^^ > > > Not sure what that is supposed to mean. > > > > I'll clarify further. How about: > > > > The sysfs store / read file operations are gauranteed to exist using > > kernfs's active reference (see kernfs_active()). > > But that has nothing to do with module reference counts. kernfs knows > nothing about modules. Yes but we are talking about sysfs files which the module creates. So but inference again, an active reference protects a module. > > > So there is a potentially horrid race: > > > The module unload is going to do: > > > driver_data->module_ref = 0; > > > and elsewhere there'll be: > > > ref = driver_data->module_ref; > > > if (!ref || !try_module_get(ref)) > > > return -error; > > > > > > You have to have try_module_get() to allow the module unload > > > function to sleep. > > > But the above code still needs a driver lock to ensure the > > > unload code doesn't race with the try_module_get() and the > > > 'ref' be invalidated before try_module_get() looks at it. > > > (eg if an interrupt defers processing.) > > > > > > So there can be no 'yielding'. > > > > Oh but there is. Consider access to a random sysfs file 'add_new_device' > > which takes as input a name, for driver foo, and so foo's > > add_new_foobar_device(name="bar") is called. Unless sysfs file > > "yields" by using try_module_get() before trying to add a new > > foo device called "bar", it will essentially be racing with the > > exit routine of module foo, and depending on how locking is implemented > > (most drivers get it wrong), this easily leads to crashes. > > > > In fact, this documentation patch was motivated by my own solution to a > > possible deadlock when sysfs is used. Using the same example above, if > > the same sysfs file uses *any* lock, which is *also* used on the exit > > routine, you can easily trigger a deadlock. This can happen for example > > by the lock being obtained by the removal routine, then the sysfs file > > gets called, waits for the lock to complete, then the module's exit > > routine starts cleaning up and removing sysfs files, but we won't be > > able to remove the sysfs file (due to kernefs active reference) until > > the sysfs file complets, but it cannot complete because the lock is > > already held. > > > > Yes, this is a generic problem. Yes I have proof [0]. Yes, a generic > > solution has been proposed [1], and because Greg is not convinced and I > > need to move on with life, I am suggesting a temporary driver specific > > solution (to which Greg is still NACK'ing, without even proposing any > > alternatives) [2]. > > > > [0] https://lkml.kernel.org/r/20210703004632.621662-5-mcgrof@kernel.org > > [1] https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com > > [2] https://lkml.kernel.org/r/20210723174919.ka3tzyre432uilf7@garbanzo > > My problem with your proposed solution is that it is still racy, you can > not increment your own module reference count from 0 -> 1 and expect it > to work properly. You need external code to do that somewhere. You are not providing *any* proof for this. And even so, I believe I have clarified as best as possible how a kernfs active reference implicitly protects the module when we are talking about sysfs files. > Now trying to tie sysfs files to the modules that own them would be > nice, but as we have seen, that way lies way too many kernel changes, > right? It's not a one-liner fix. Yes. > Hm, maybe. Did we think about this from the kobj_attribute level? If > we use the "wrapper" logic there and the use of the macros we already > have for attributes, we might be able to get the module pointer directly > "for free". > > Did we try that? That was my hope. I tried that first. Last year in November I determined kernfs is kobject stupid. But more importantly *neither* are struct device specific, so neither of them have semantics for modules or even devices. > this thread has been going on for so long I can't > remember anymore... Please... Luis