Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2384812pxb; Mon, 20 Sep 2021 20:58:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzo6TlzSIdi9oUdsc94pEITB51sHnrtXM7KiszoT7erqVelZnDwNmBn50ou6oT7XnraD6Lk X-Received: by 2002:a17:906:30d0:: with SMTP id b16mr32858782ejb.495.1632196697393; Mon, 20 Sep 2021 20:58:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632196697; cv=none; d=google.com; s=arc-20160816; b=iAlsQWc5b5PjFFRiPcr2v3mLaUyL0waOQEawpWZSb7DoJij4cLJ9Ub5sayLv3vFjEJ FNUpMATGDjiIaY7p0AIFrwHruZPwQ8Nx2tGnjGAz8dBl85xD724bXA00cZ76uUsd9G/7 VHeu5dhaOitU9M8VyOAYnOmALSZtFxcneqou/MtdFuh2aCAdnSCyHw+Cf0v8Walum1Mf vqSQvUpTVaRXbJPnAIejlDsqZMrvn06GRExNg11tRcitKvsAq7ot6YavWmKCY4VxgoS1 e3hZtvipKWmzrwEeV8TrTJ3u4z3vqbA+mP7IlqMS8Cu81q9sfgWdWZxtW0NHDyrOt9wF dXcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=CqRJNowvS30DA449VJOT4nm+AVZU/Om+Rqqdzurt050=; b=JuYnOgiqa+vuWjzPHdXt2KnWh4l5slTTPSNZzxLE+0y0jUdkv/6jqF4uvXVy33iZ9B vqefhFypUN4jtFFYnk76sfzFN/XuPEPgPWo96fnGu46JIsGi1gR4ezpjh3LjfuBNvHAm 6yEZFYEl8T2owzH0ApaRBv6sCxnT2s0uPnreuPHA4RuzWND8Qr/MtxGr2uwXSqnta/OB F8SqMY8qQBe+xN4WIPpLQdTYtLN0RiKqMir0cHipDjpvAx8bH3e+uyraxyDtcWON3ltD 7LrdclMG3IxVZS6gY/jAWoppgz6OAPbXOxm6PVz9gFuQs4lWB0agXlo725oS/y9ERwPj qGAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=iNAgLEI2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g4si18983123edu.444.2021.09.20.20.57.54; Mon, 20 Sep 2021 20:58:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=iNAgLEI2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234615AbhIUBpK (ORCPT + 99 others); Mon, 20 Sep 2021 21:45:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232356AbhIUBlJ (ORCPT ); Mon, 20 Sep 2021 21:41:09 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51FC6C06EAB6 for ; Mon, 20 Sep 2021 13:52:33 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id w11so2357498plz.13 for ; Mon, 20 Sep 2021 13:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CqRJNowvS30DA449VJOT4nm+AVZU/Om+Rqqdzurt050=; b=iNAgLEI2/KhgFVKx72MAqTdpftvJ0aSwNn3Ejh2yEcwAEjl6yoUfTnx7YEE5xxLLuP 1vGxRNVkPSZajZSlr8Z++4xK6MAUHC1hmL5Id1qonAbXlE3/B+ITv3TcdqeZXgF72TRn 4mGdKsRJPPWvKJHSRyo74FHrclds73dk7pRuX3CP2b8e/4WSGZ1Bq4cr1ZIziuehan85 Ibn/f5om/hzTPFJ53Rr+YajWSeJ7r/pZP6MkvHev6u5k9zJZtMW+MjXYotZIebVMDL9X pWqDSK819z2UFpUUjUM2uNFjNrF6D0bjnx9OQJsvR13XORAUQ5lu6dSGmtpqaJqXNUyn bwYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CqRJNowvS30DA449VJOT4nm+AVZU/Om+Rqqdzurt050=; b=K3ywBAyK+T7nCkVJxuiNj7AsfYuv9JVNSsFOH1H4tb9z9+F5+FbyLs7CfA4Q6KPh/U ZD8g7Gg82Vpwklj+PiQQMQn8T3lMzbIlCga9B0WAs0W+AWqBTZ5UfOUR4bRe4Akas2jC aqnEuFHCGbIq9IEEX228w4cYr4G6iUQECL7QpgWH8S1pAFy9+06eVArFLss1ek8Io8HO eT282WR8/z6O6kLMJGKr9PNW9P4tazwuSwPa811ofXJNNcZek+UlkmMZzcCmA4W6W7T2 J4LgL0llnROiJKTPOVPi43Rne/O/fS4e4Kn7RZDLD5hWgLRr+tvdNG3m6Co1sWCaAl9q ptiw== X-Gm-Message-State: AOAM53183MWbAVfjUMIWwEJuwpQi8ACbrIUBRDpq4uYVWqoTICE9Eh15 985bq+hMR4GVv7RrCgpDYqJ+yPwMwJ+FcrvN9nKtbA== X-Received: by 2002:a17:902:bd8d:b0:13a:8c8:a2b2 with SMTP id q13-20020a170902bd8d00b0013a08c8a2b2mr24412734pls.89.1632171152630; Mon, 20 Sep 2021 13:52:32 -0700 (PDT) MIME-Version: 1.0 References: <20210918050430.3671227-1-mcgrof@kernel.org> <20210918050430.3671227-10-mcgrof@kernel.org> In-Reply-To: <20210918050430.3671227-10-mcgrof@kernel.org> From: Dan Williams Date: Mon, 20 Sep 2021 13:52:21 -0700 Message-ID: Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal To: Luis Chamberlain Cc: Tejun Heo , Greg KH , Andrew Morton , Minchan Kim , jeyu@kernel.org, shuah , Randy Dunlap , "Rafael J. Wysocki" , Masahiro Yamada , Nick Desaulniers , yzaikin@google.com, Nathan Chancellor , ojeda@kernel.org, Tetsuo Handa , vitor@massaru.org, elver@google.com, Jarkko Sakkinen , Alexander Potapenko , rf@opensource.cirrus.com, Stephen Hemminger , David Laight , bvanassche@acm.org, jolsa@kernel.org, Andy Shevchenko , trishalfonso@google.com, andreyknvl@gmail.com, Jiri Kosina , mbenes@suse.com, Nitin Gupta , Sergey Senozhatsky , Reinette Chatre , Fenghua Yu , Borislav Petkov , X86 ML , "H. Peter Anvin" , lizefan.x@bytedance.com, Johannes Weiner , Daniel Vetter , Bjorn Helgaas , =?UTF-8?Q?Krzysztof_Wilczy=C5=84ski?= , senozhatsky@chromium.org, Christoph Hellwig , Joe Perches , hkallweit1@gmail.com, Jens Axboe , Josh Poimboeuf , Thomas Gleixner , Kees Cook , Steven Rostedt , Peter Zijlstra , linux-spdx@vger.kernel.org, Linux Doc Mailing List , linux-block@vger.kernel.org, linux-fsdevel , linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org, Linux Kernel Mailing List , copyleft-next@lists.fedorahosted.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain wrote: > > When sysfs attributes use a lock also used on module removal we can > race to deadlock. This happens when for instance a sysfs file on > a driver is used, then at the same time we have module removal call > trigger. The module removal call code holds a lock, and then the sysfs > file entry waits for the same lock. While holding the lock the module > removal tries to remove the sysfs entries, but these cannot be removed > yet as one is waiting for a lock. This won't complete as the lock is > already held. Likewise module removal cannot complete, and so we deadlock. > > This can now be easily reproducible with our sysfs selftest as follows: > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0027 > > To fix this we extend the struct kernfs_node with a module reference and > use the try_module_get() after kernfs_get_active() is called which > protects integrity and the existence of the kernfs node during the > operation. > > So long as the kernfs node is protected with kernfs_get_active() we know > we can rely on its contents. And, as now just documented in the previous > patch, we also now know that once kernfs_get_active() is called the module > is also guarded to exist and cannot be removed. > > If try_module_get() fails we fail the operation on the kernfs node. > > We use a try method as a full lock means we'd then make our sysfs > attributes busy us out from possible module removal, and so userspace > could force denying module removal, a silly form of "DOS" against module > removal. A try lock on the module removal ensures we give priority to > module removal and interacting with sysfs attributes only comes second. > Using a full lock could mean for instance that if you don't stop poking > at sysfs files you cannot remove a module. > > Races between removal of sysfs files and the module are not possible > given sysfs files are created by the same module, and when a sysfs file > is being used kernfs prevents removal of the sysfs file. So if module > removal is actually happening the removal would have to wait until > the sysfs file operation is complete. > > This deadlock was first reported with the zram driver, however the live > patching folks have acknowledged they have observed this as well with > live patching, when a live patch is removed. I was then able to > reproduce easily by creating a dedicated selftests. > > A sketch of how this can happen follows: > > CPU A CPU B > whatever_store() > module_unload > mutex_lock(foo) > mutex_lock(foo) > del_gendisk(zram->disk); > device_del() > device_remove_groups() This flow seems possible to trigger with: echo $dev > /sys/bus/$bus/drivers/$driver/unbind I am missing why module pinning is part of the solution when it's the device_del() path that is racing? Module removal is just a more coarse grained way to trigger unbind => device_del(). Isn't the above a bug in the driver, not missing synchronization in kernfs? Forgive me if the unbind question was asked and answered elsewhere, this is my first time taking a look at this series.