Received: by 2002:ab2:7a55:0:b0:1f4:4a7d:290d with SMTP id u21csp716823lqp; Fri, 5 Apr 2024 06:57:44 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXh+B/+fOie3LFet6uqyylemG0OWgiiUr6uUEWVY3O2NJPz/2diagPecuEPCA3J3zoxRirbb1YoLclsrOKtPS5WvILezN3uJ2Yj/5nw2Q== X-Google-Smtp-Source: AGHT+IGaQ898n+WVeb8PawB1KMN/bVc75pRya1uOE/OpdF2JrNtZwkxtSLwy+GPxok0QFowJ4H2A X-Received: by 2002:a17:906:4e93:b0:a4e:757:989a with SMTP id v19-20020a1709064e9300b00a4e0757989amr1042238eju.8.1712325464142; Fri, 05 Apr 2024 06:57:44 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712325464; cv=pass; d=google.com; s=arc-20160816; b=abj/GLCYueJL7SjoQ72zA4OsuvFaEv9pehhdeIVhhMKgcrgXbw8Taij3qQhdx1lLm1 RlMVv+t6HEFLg0qhbealUPOoMfnx+irwfR93tqoTugSsK99/WDKYggchRwLJW6niM2xa SZ3uC8mWkj1s9mUW12qtWlqFsmUEtBg3iK22V4n429GolGVO19NRhdIL2feWSLHaEN8s NGSTm9BYtsfW6oVkqICu5pWz+3LiLAcKOHBC859ljnWodKJOLhRHzdy4cOFl7WB87ZRp 8dedgsoNrIL4ZgvzvZdPm0qwy7S+l5l6yH95M9G0qNhiMcmZl9SnqQEcRR1rpQtmDsG7 kssw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=738WEsCeTm3WRmpCcWnRNGKuSjZdu785jO21z6/92CA=; fh=3JVSUIkNUWjMPZzXGExUpivex7I5NxtLKTvrMUmaJj4=; b=Xggc8uFRSPnZThUubEtKBDraitPldxixsj1t2sYBo4cz45PjBG9WWvX6T03NxMOVnQ MGP9pkPxTcR8MIhboV+BK2pIWHG7lutIZopQbIMga6e6QsfR+rR/xrV9qw0tjAkdHvHX 8csHEk2rJuOwac0Pp6prfx0jJe2s6up45xm+x2qitn6Yxwvr1hJrGjJ+oqCtDZn6EFIi Q5HaDFsoJ8Xs2qIxVWEzeatDFa/bFTl01sP2osa2j475HeBOcvPZLURQSO0/+7z3QCjj Z30MRVppstf/H96QvzI3sQAGAy/XrXQqTUDfSXVlMVs4hiSX/yIqPD2ZVoGOdZ5tg6us GM6g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tTPdHpmG; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-133093-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-133093-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id dt7-20020a170907728700b00a474885785csi761778ejc.307.2024.04.05.06.57.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Apr 2024 06:57:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-133093-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=tTPdHpmG; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-133093-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-133093-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id C53D51F24458 for ; Fri, 5 Apr 2024 13:48:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A4FF616D9BB; Fri, 5 Apr 2024 13:48:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tTPdHpmG" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B7916131A; Fri, 5 Apr 2024 13:48:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712324907; cv=none; b=faQqLyQPt9eKve0ImXbzQlgD0HMaTkLTkBhyTBWZEaDgIHwUsB9Xpc5jPy9L5DctrSAgn73/avCoJdHkhDD/+xoAMLNhOmwoorWmhZ1u8OA//udTsDyDavSentJBzDhP0KU2nqzUZ+6QWElHvzoEW68ADY1/wnndJgjFMw63t5I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712324907; c=relaxed/simple; bh=joolmc4LtKTv8gZerkw3rQ+TPn8Z4H7XsFQKjvP3UV0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kMvHBYNAu0QEIz2z7Aa7bYvhHjaSNuHlmOGQsgtlShMPDK7hWLw8is96Odxm06cR07Tb5HJ7XhfLta4RQbaxSg2J6UE44GpgnK9SHRp8r9inlO2OUQNdw54uvEBfbgfyNQWX84Kua9RlqbbxZEwVPcjBSfVbwSsVgC+BqtGx2Uc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tTPdHpmG; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6AAEFC433F1; Fri, 5 Apr 2024 13:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712324907; bh=joolmc4LtKTv8gZerkw3rQ+TPn8Z4H7XsFQKjvP3UV0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tTPdHpmGUxHtdEjhVn5pMRejA0anBNCardSo/ZWe8tPR2tv6L1mdV05eI/4faBQ8m b0/BIVbYCUHuAZ14XFnfBorEI2IZZWj9J/zYqoQmXDrKL8vVgANzfe6n/ptuewqDXb JnDVfp6fGfPwceh5ALURhg3pS3P5RqcEgVE8Q36AiUU7Ep7xArDKb0CBIDy5Aa//fl rAXDL8z+JvzrhydoheDU4p829mc/GkZ85mdelO3hFnmy6weXgxe+C2vimt9Lz0bsqd PbhAwCqCnbUDunX5wPhd3TCVVAwuMqr0YflibS5mRgtrZlJDta16LPPnJvDcx6LE7t BpB6drS+Mk+CA== Date: Fri, 5 Apr 2024 15:48:20 +0200 From: Christian Brauner To: Christoph Hellwig Cc: Amir Goldstein , Al Viro , syzbot , gregkh@linuxfoundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com, tj@kernel.org, valesini@yandex-team.ru, Jan Kara , Miklos Szeredi Subject: Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek Message-ID: <20240405-liebschaft-effekt-ca71fb6e7699@brauner> References: <00000000000098f75506153551a1@google.com> <0000000000002f2066061539e54b@google.com> <20240404081122.GQ538574@ZenIV> <20240404082110.GR538574@ZenIV> <20240405065135.GA3959@lst.de> <20240405-ozonwerte-hungrig-326d97c62e65@brauner> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20240405-ozonwerte-hungrig-326d97c62e65@brauner> On Fri, Apr 05, 2024 at 01:19:35PM +0200, Christian Brauner wrote: > On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > > I don't follow what you are saying. > > > Which code is in non-starter violation? > > > kernfs for calling lookup_bdev() with internal of->mutex held? > > > > That is a huge problem, and has been causing endless annoying lockdep > > chains in the block layer for us. If we have some way to kill this > > the whole block layer would benefit. > > Why not just try and add a better resume api that forces resume to not > use a path argument neither for resume_file nor for writes to > /sys/power/resume. IOW, extend the protocol what can get written to > /sys/power/resume and then phase out the path argument. It'll take a > while but it's a possibly clean solution. In fact, just looking at this code with a naive set of eyes for a second: * There's early_lookup_bdev() which deals with PARTUUID, PARTLABEL, raw device number, and lookup based on /dev. No actual path lookup involved in that. * So the only interesting case is lookup_bdev() for /sys/power/suspend. That one takes arbitrary paths. But being realistic for a moment... How many people will specify a device path that's _not_ some variant of /dev/...? IOW, how many people will specify a device path that's not on devtmpfs or a symlink on devtmpfs? Probably almost no one. Containers come to mind ofc. But they can't mount devtmpfs so usually what they do is that they create a tmpfs mount at /dev and then bind-mount device nodes they need into there. But unprivileged containers cannot use suspend because that requires init_user_ns capabilities. And privileged containers that are allowed to hibernate and use custom paths seem extremly unlikely as well. So really, _naively_ it seems to me that one could factor out the /dev/* part of the device number parsing logic in early_lookup_bdev() and port resume_store() to use that first and only if that fails fall back to full lookup_bdev(). (Possibly combined with some sort of logging that the user should use /dev/... paths or at least a way to recognize that this arbitrary path stuff is actually used.) And citing from a chat with the hibernation maintainer in systemd: So /sys/power/resume does systemd ever write anything other than a /dev/* path in to there? Hmm? You never do that? It only accepts devno. So that takes away one of the main users of this api. So I really suspect that arbitrary device path is unused in practice. Maybe I'm all wrong though.