Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2490733pxb; Thu, 11 Feb 2021 13:51:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzxpw2ACKIOr8OjeOZABfBn/iqMYsHSbA2AIdMHzA6mPtK4w6S+zb8gtbnKoGz6sfcq7ehr X-Received: by 2002:a17:906:1c1b:: with SMTP id k27mr4942972ejg.402.1613080291486; Thu, 11 Feb 2021 13:51:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613080291; cv=none; d=google.com; s=arc-20160816; b=SFFJiFMfneT7afa9hU+tc2r2YTgUo3Qe+sYl5JjtpMUti0v00815uRx883uUgig0sr OVJPQxO3CtiUwKInUkdsmp2Gx4TGI4kg8si9crfJyQVubvQg96+mRlj6f/dyg2pNHZvD 7wDQDRXstPIz4FuiCHYQhXbbTfCmOReqMwDK+GemfzPfkux9a5NzjyePqyDYZJr/s5qz k+ScXGa0o90cYM/8o2bNtqK9ZmGYhdlfBxzXCCZ15Ver3jtgdU2IBzfcfzbjTvTTIOEO u4c/CcQQwnkIT4RynXrQOLdYzQOicUeEbqcrz3di94hL/RLsqBl4kzwtIzSdQROahIFu TCLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=NBmfmvyxjik8sI5bRxhvSgp9+8zXxOEno92+WLSXF0c=; b=RTmopvCwzvWzfltdv1e2d2PVLS+ycOCr0HD+CG8wvz/vmiZnfwN3CJ2klKOTusARLZ f6Ot1Gc4+kbzP/0LVNljgiXFVXve64WSczbyCoW+I/zL/6zksOX29O91F7HV1fYwHYSp 4+jzAAne/9WBtqPfr5JyMxPms7d2MYwD44/jFid46QBqAXxy1h1zMV9PiE0Urd8sDI26 NlwB8RK1H8ut6xwGhBxtnpLwG5tpLzeOfWNbXARQOghSqIiJpo4NnywXuRWPwd+oufnS hb5ovsZn4nMAL2jq0qGO+AFxHmegK//YyN8CB78zUaI0e79P1tuXPlKmbj+0o9iD3AUQ vImw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z8si4660595edx.377.2021.02.11.13.51.05; Thu, 11 Feb 2021 13:51:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229777AbhBKVrw (ORCPT + 99 others); Thu, 11 Feb 2021 16:47:52 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:45178 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229478AbhBKVrs (ORCPT ); Thu, 11 Feb 2021 16:47:48 -0500 Received: from cwcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 11BLksBX030966 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 11 Feb 2021 16:46:55 -0500 Received: by cwcc.thunk.org (Postfix, from userid 15806) id 9C89B15C3601; Thu, 11 Feb 2021 16:46:54 -0500 (EST) Date: Thu, 11 Feb 2021 16:46:54 -0500 From: "Theodore Ts'o" To: Dmitry Vyukov Cc: Jan Kara , syzbot , Jan Kara , LKML , syzkaller-bugs Subject: Re: possible deadlock in dquot_commit Message-ID: References: <000000000000a05b3b05baf9a856@google.com> <20210211113718.GM19070@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 11, 2021 at 12:47:18PM +0100, Dmitry Vyukov wrote: > > This actually looks problematic: We acquired &ei->i_data_sem/2 (i.e., > > I_DATA_SEM_QUOTA subclass) in ext4_map_blocks() called from > > ext4_block_write_begin(). This suggests that the write has been happening > > directly to the quota file (or that lockdep annotation of the inode went > > wrong somewhere). Now we normally protect quota files with IMMUTABLE flag > > so writing it should not be possible. We also don't allow clearing this > > flag on used quota file. Finally I'd checked lockdep annotation and > > everything looks correct. So at this point the best theory I have is that a > > filesystem has been suitably corrupted and quota file supposed to be > > inaccessible from userspace got exposed but I'd expect other problems to > > hit first in that case. Anyway without a reproducer I have no more ideas... > > There is a reproducer for 4.19 available on the dashboard. Maybe it will help. > I don't why it did not pop up on upstream yet, there lots of potential > reasons for this. The 4.19 version of the syzbot report has a very different stack trace. Instead of it being related to an apparent write to the quota file, it is apparently caused by a call to rmdir: dump_stack+0x22c/0x33e lib/dump_stack.c:118 print_circular_bug.constprop.0.cold+0x2d7/0x41e kernel/locking/lockdep.c:1221 ... __mutex_lock+0xd7/0x13f0 kernel/locking/mutex.c:1072 dquot_commit+0x4d/0x400 fs/quota/dquot.c:469 ext4_write_dquot+0x1f2/0x2a0 fs/ext4/super.c:5644 ... ext4_evict_inode+0x933/0x1830 fs/ext4/inode.c:298 evict+0x2ed/0x780 fs/inode.c:559 iput_final fs/inode.c:1555 [inline] ... vfs_rmdir fs/namei.c:3865 [inline] do_rmdir+0x3af/0x420 fs/namei.c:3943 __do_sys_unlinkat fs/namei.c:4105 [inline] __se_sys_unlinkat fs/namei.c:4099 [inline] __x64_sys_unlinkat+0xdf/0x120 fs/namei.c:4099 do_syscall_64+0xf9/0x670 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe Which leads me to another apparent contradiction. Looking at the C reproducer source code, and running the C reproducer under "strace -ff", there is never any attempt to run rmdir() on the corrupted file system that is mounted. Neither as observed by my running the C reproducer, or by looking at the C reproducer source code. Looking at the code, I did see a number of things which seemed to be bugs; procid never gets incremented, so all of the threads only operate on /dev/loop0, and each call to the execute() function tries to setup two file systems on /dev/loop0. So the each thread to run creates a temp file, binds it to /dev/loop0, and then creates another temp file, tries to bind it to /dev/loop0 (which will fail), tries to mount /dev/loop0 (again) on the samee mount point (which will succeed). I'm not sure if this is just some insanity that was consed up by the fuzzer... or I'm wondering if this was an unfaithful translation of the syzbot repro to C. Am I correct in understanding that when syzbot is running, it uses the syzbot repro, and not the C repro? - Ted