Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp241007yba; Sat, 30 Mar 2019 21:03:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqwAtgpgP5c/KH4g4K/wQbTMO1x9AUMzm30AXWjCckQCWoqg2j5Ryf9UICd7uLNnN/WOAs3L X-Received: by 2002:a62:fb10:: with SMTP id x16mr56011079pfm.5.1554004999833; Sat, 30 Mar 2019 21:03:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554004999; cv=none; d=google.com; s=arc-20160816; b=N49LdJXftZTkIor7PjaidPQaidXtlVf+I5B2jxphVd3+E1QkUNAvP9FkMUNMEUYzW5 /7O5kXwM+//m6G3o18psjO6qV/hZSNBKn0cn8BY7H4qaHc8udTzvtUZ2zYTCbd8rpYD3 os7SqKkEKmfCwDkRhU3dMWQW0MHIvISoIg6Mk0KU/f9KTKhbQij/wxaKitC9QZAvi5gH OUG8mT7vqHi4unnXZ1EYKjc5OdJ1zM3JJJH2PtQWIzAEV7EUf5kN/w6jtyjmVoJ5Enz/ 8TignGmCpsnNItFMvPtn8maXZmeZmPjkEY7f8DgEYy7JvhqaCzeWM8e3f782HwQVRiKv v7IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date; bh=jRJcg4Of0Ugr7vL01tq/4bVwTkSqC3EMFM2li6XhQqs=; b=QFFvRkxKA8UHSYfEETVSc/Z8sKdZMeYf2s3nYHwscEBqVgIV9gpTJf5KcF+IhQ/dj/ istqJSZiSuqJfbJ5BiGASz9Be//Mvg8OR+GOtJYJ57Mr1Wr5wP/5WEcVXilgSF/RR0ls EtBdN8ZMDJKqTEWN1ORNKHjGKPX/mzlOnueYTFZ4/0LkNuDu5re30F73JDO3uCadr5Be 7C7ky/M/vr3HJr1TLQYuTshzr5b+bnUyGbqgiJKXIno4IF0tq5kc0QB2jjS4uIBqssb4 EY01MP2NrMQa3zehCkfKGN8zR/6W7I5ZovXjp3D2nHn7mON7G+IOoYrgO550YZ3u9MgK 6i8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmu.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bc4si5758148plb.119.2019.03.30.21.02.35; Sat, 30 Mar 2019 21:03:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmu.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725856AbfCaEAg (ORCPT + 99 others); Sun, 31 Mar 2019 00:00:36 -0400 Received: from hurricane.elijah.cs.cmu.edu ([128.2.209.191]:35948 "EHLO hurricane.elijah.cs.cmu.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725773AbfCaEAg (ORCPT ); Sun, 31 Mar 2019 00:00:36 -0400 Received: from [127.0.0.1] (helo=cs.cmu.edu) by hurricane.elijah.cs.cmu.edu with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1hARe1-00027a-8N; Sun, 31 Mar 2019 00:00:29 -0400 Date: Sun, 31 Mar 2019 00:00:24 -0400 From: Jan Harkes To: Waiman Long Cc: Ingo Molnar , Peter Zijlstra , Alexander Viro , Pedro Cuadra Chamorro , linux-kernel@vger.kernel.org Subject: Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni Message-ID: <20190331040023.qbx52lwzufkxg3kw@cs.cmu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 29, 2019 at 05:53:22PM +0000, Waiman Long wrote: > On 03/29/2019 12:10 PM, Jan Harkes wrote: > > I knew I definitely had never seen this problem with the stable kernel > > on Ubuntu xenial (4.4) so I bisected between v4.4 and v5.1-rc2 and ended > > up at > > > > # first bad commit: [925b9cd1b89a94b7124d128c80dfc48f78a63098] > > # locking/rwsem: Make owner store task pointer of last owning reader > > > > When I revert this particular commit on 5.1-rc2, I am not able to > > reproduce the problem anymore. > > Without CONFIG_DEBUG_RWSEMS, the only behavioral change of this patch is > to do an unconditional write of task_structure pointer into sem->owner > after acquiring the read lock in down_read(). Before this patch, it does I tried with just that change, but that is not at fault. It is also hard to believe we have a use-after-free issue, because we are using a spinlock on the inode that is held in place by the file we are releasing. After trying various variations the minimal change that fixes the soft lockup is as follows. Without this patch I get a reliable lockup, with the patch everything works as expected. diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h index bad2bca..0cc437d 100644 --- a/kernel/locking/rwsem.h +++ b/kernel/locking/rwsem.h @@ -61,8 +61,7 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem) static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem, struct task_struct *owner) { - unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED - | RWSEM_ANONYMOUSLY_OWNED; + unsigned long val = RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED; WRITE_ONCE(sem->owner, (struct task_struct *)val); } I'll continue digging if I can find a reason why. So far I've only found one place where rwsem->owner is modified while not holding a lock, but changing that doesn't make a difference for my particular case. diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 03cb4b6f842e..fe696a8b57f3 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -114,11 +114,11 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *); static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, bool read, unsigned long ip) { - lock_release(&sem->rw_sem.dep_map, 1, ip); #ifdef CONFIG_RWSEM_SPIN_ON_OWNER if (!read) sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN; #endif + lock_release(&sem->rw_sem.dep_map, 1, ip); } static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, Jan