Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3872071imu; Fri, 30 Nov 2018 07:19:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/W4bUzZ070oTag+5WVkaxnODiSBYoJ3qlnN/CIseYk22FHwsNyd97aIrLYNUosILgHHe7GB X-Received: by 2002:a62:19d5:: with SMTP id 204mr5997745pfz.33.1543591147181; Fri, 30 Nov 2018 07:19:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543591147; cv=none; d=google.com; s=arc-20160816; b=IZHuKuPZ1aIm+1KBUoh9F2uTruMvK6GmyFAN4291JXBOshWaZ/fQpWEgMVJoH7/Vi8 +CsTWb08gidlGBSWCSeoAXgkMFdzr1xvPiQKRBMoDVjTgmJ9mJMLNC5v316SqUGE7JB+ SNH1olI0V+83iuiez18nA2Ok9rXV3/IfC6A1bs3UEYF1rW0NoVo23MaKurHVRyV01I1z 7EBHn8o0FNm4RxSzSa4P/TGJO3WAnFSoe44phyl4pj/xzrM3MlTVWj2WhLzgJoGNll9C rPtZYVaOlVKQYt9qmV0+ed5HN88JFP9gbLdfXGb3suMZwzjuXYT7SFjyxJaf1b+o8yhy R8Rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from; bh=s0LFgX0Hcxd8HvR/zd2okmQR78sI/ta2CXgvoI9RYys=; b=LnrnO/nc2Pahi/PDBgS0aw+sRwpCdJB8Q9WaxV0QL/yh8KqRwj3gNGYF/oMh2NQVv3 9kzxpYGgfPvGFaMIiOFpEXOhne+NawQ30Pj5iJojYYcAb+QdUuVHETn5jCn0bt60xwfe U4480bORg29fE1xybDay5y7QaQHXbGdkrNy+nGB5HgZmZl8zV/4b3x11sgNJgSD0YqqP 2u6NFDOfiZ51vgBb3gRsCJD/oHylAPdKZ3wCD73p0J1y9a6OEiGHWfdxdNI7Jho7+R6v Tat8Ko4rmtTmGqQFX9g4c1g8hWYPhjfihAiCs0LLIB1fsLWIvlzZ9Y2651FuLbvyGQN1 OrMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g184si5515638pfb.288.2018.11.30.07.18.19; Fri, 30 Nov 2018 07:19:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727386AbeLAC0h (ORCPT + 99 others); Fri, 30 Nov 2018 21:26:37 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:33355 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726616AbeLAC0g (ORCPT ); Fri, 30 Nov 2018 21:26:36 -0500 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gSkXG-0002vq-QN; Fri, 30 Nov 2018 08:16:55 -0700 Received: from ip68-227-174-240.om.om.cox.net ([68.227.174.240] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gSkXF-0007dP-Hk; Fri, 30 Nov 2018 08:16:54 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: "gregkh\@linuxfoundation.org" Cc: Jan Glauber , Will Deacon , Alexander Viro , "linux-fsdevel\@vger.kernel.org" , "linux-kernel\@vger.kernel.org" , "jslaby\@suse.com" References: <20181109143744.GA12128@hc> <20181109155856.GC2091@brain-police> <20181110111656.GA16667@hc> <20181120182854.GC28838@arm.com> <20181120190317.GA29161@arm.com> <20181121131900.GA18931@hc> <20181123180525.GA21017@arm.com> <20181128200806.GC32668@arm.com> <20181129184950.GA7290@hc> <20181130104154.GA11991@kroah.com> Date: Fri, 30 Nov 2018 09:16:49 -0600 In-Reply-To: <20181130104154.GA11991@kroah.com> (gregkh@linuxfoundation.org's message of "Fri, 30 Nov 2018 11:41:54 +0100") Message-ID: <875zwe389q.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1gSkXF-0007dP-Hk;;;mid=<875zwe389q.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=68.227.174.240;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19WqtPmMJpl665zmz/+ZBVC+i+kw2vNmIg= X-SA-Exim-Connect-IP: 68.227.174.240 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on sa05.xmission.com X-Spam-Level: X-Spam-Status: No, score=0.1 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,TooManyTo_001 autolearn=disabled version=3.4.2 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4472] * 0.3 TooManyTo_001 Multiple "To" Header Recipients 2x (uncommon) * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;"gregkh\@linuxfoundation.org" X-Spam-Relay-Country: X-Spam-Timing: total 661 ms - load_scoreonly_sql: 0.08 (0.0%), signal_user_changed: 3.7 (0.6%), b_tie_ro: 2.5 (0.4%), parse: 1.68 (0.3%), extract_message_metadata: 21 (3.1%), get_uri_detail_list: 4.2 (0.6%), tests_pri_-1000: 19 (2.9%), tests_pri_-950: 1.82 (0.3%), tests_pri_-900: 1.45 (0.2%), tests_pri_-90: 36 (5.4%), check_bayes: 34 (5.1%), b_tokenize: 13 (1.9%), b_tok_get_all: 10 (1.5%), b_comp_prob: 4.1 (0.6%), b_tok_touch_all: 4.5 (0.7%), b_finish: 0.76 (0.1%), tests_pri_0: 564 (85.4%), check_dkim_signature: 1.11 (0.2%), check_dkim_adsp: 5 (0.8%), poll_dns_idle: 0.39 (0.1%), tests_pri_10: 2.3 (0.3%), tests_pri_500: 4.9 (0.7%), rewrite_mail: 0.00 (0.0%) Subject: Re: dcache_readdir NULL inode oops X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "gregkh@linuxfoundation.org" writes: > Adding Eric as he touched this code last :) > > On Thu, Nov 29, 2018 at 07:25:48PM +0000, Jan Glauber wrote: >> On Wed, Nov 28, 2018 at 08:08:06PM +0000, Will Deacon wrote: >> > I spent some more time looking at this today... >> > >> > On Fri, Nov 23, 2018 at 06:05:25PM +0000, Will Deacon wrote: >> > > Doing some more debugging, it looks like the usual failure case is where >> > > one CPU clears the inode field in the dentry via: >> > > >> > > devpts_pty_kill() >> > > -> d_delete() // dentry->d_lockref.count == 1 >> > > -> dentry_unlink_inode() >> > > >> > > whilst another CPU gets a pointer to the dentry via: >> > > >> > > sys_getdents64() >> > > -> iterate_dir() >> > > -> dcache_readdir() >> > > -> next_positive() >> > > >> > > and explodes on the subsequent inode dereference when trying to pass the >> > > inode number to dir_emit(): >> > > >> > > if (!dir_emit(..., d_inode(next)->i_ino, ...)) >> > > >> > > Indeed, the hack below triggers a warning, indicating that the inode >> > > is being cleared concurrently. >> > > >> > > I can't work out whether the getdents64() path should hold a refcount >> > > to stop d_delete() in its tracks, or whether devpts_pty_kill() shouldn't >> > > be calling d_delete() like this at all. >> > >> > So the issue is that opening /dev/pts/ptmx creates a new pty in /dev/pts, >> > which disappears when you close /dev/pts/ptmx. Consequently, when we tear >> > down the dentry for the magic new file, we have to take the i_node rwsem of >> > the *parent* so that concurrent path walkers don't trip over it whilst its >> > being freed. I wrote a simple concurrent program to getdents(/dev/pts/) in >> > one thread, whilst another opens and closes /dev/pts/ptmx: it crashes the >> > kernel in seconds. >> >> I also made a testcase and verified that your fix is fine. I also tried >> replacing open-close on /dev/ptmx with mkdir-rmdir but that does not >> trigger the error. >> >> > Patch below, but I'd still like somebody else to look at this, please. >> >> I wonder why no inode_lock on parent is needed for devpts_pty_new(), but >> I'm obviously not a VFS expert... So your patch looks good to me and >> clearly solves the issue. >> >> thanks, >> Jan >> >> > Will >> > >> > --->8 >> > >> > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c >> > index c53814539070..50ddb95ff84c 100644 >> > --- a/fs/devpts/inode.c >> > +++ b/fs/devpts/inode.c >> > @@ -619,11 +619,17 @@ void *devpts_get_priv(struct dentry *dentry) >> > */ >> > void devpts_pty_kill(struct dentry *dentry) >> > { >> > - WARN_ON_ONCE(dentry->d_sb->s_magic != DEVPTS_SUPER_MAGIC); >> > + struct super_block *sb = dentry->d_sb; >> > + struct dentry *parent = sb->s_root; >> > >> > + WARN_ON_ONCE(sb->s_magic != DEVPTS_SUPER_MAGIC); > > Side note, I wonder if this is even needed anymore... > >> > + >> > + inode_lock(parent->d_inode); >> > dentry->d_fsdata = NULL; >> > drop_nlink(dentry->d_inode); >> > d_delete(dentry); >> > + inode_unlock(parent->d_inode); >> > + >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ >> > } > > This feels right but getting some feedback from others would be good. This is going to be special at least because we are not coming through the normal unlink path and we are manipulating the dcache. This looks plausible. If this is whats going on then we have had this bug for a very long time. I will see if I can make some time. It looks like in the general case everything is serialized by the devpts_mutex. I wonder if just changing the order of operations here would be enough. AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not implicated so that won't help here. I will look more shortly. I am in the middle in the move so I don't have time to complete the analysis today. Eric