Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751454AbdFHKVa (ORCPT ); Thu, 8 Jun 2017 06:21:30 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:57427 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750786AbdFHKV3 (ORCPT ); Thu, 8 Jun 2017 06:21:29 -0400 Subject: Re: 4.9.30 NULL pointer dereference in __remove_shared_vm_struct To: Tommi Rantala , Andrew Morton , Andrea Arcangeli , "Kirill A. Shutemov" , Linux-MM , LKML References: <7244cb6d-ed7a-451a-1af9-885090173311@nokia.com> From: Tetsuo Handa Message-ID: Date: Thu, 8 Jun 2017 19:21:02 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <7244cb6d-ed7a-451a-1af9-885090173311@nokia.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1787 Lines: 45 Tommi Rantala wrote: > I have hit this kernel bug twice with 4.9.30 while running trinity, any > ideas? It's not easily reproducible. No idea. But if you can reproduce this problem, I think you can retry with the OOM reaper disabled (like shown below), for the latter report is 10 seconds after the OOM reaper reclaimed memory. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index ec9f11d..7e17242 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -560,8 +560,8 @@ static void oom_reap_task(struct task_struct *tsk) struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) - schedule_timeout_idle(HZ/10); + //while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + // schedule_timeout_idle(HZ/10); if (attempts <= MAX_OOM_REAP_RETRIES) goto done; Since line 137 is atomic_inc(), file->f_inode was for some reason NULL, wasn't it? if (vma->vm_flags & VM_DENYWRITE) atomic_inc(&file_inode(file)->i_writecount); And mmput() from exit_mm() from do_exit() is called before exit_files() is called from do_exit(). Thus, something by error made file->f_inode == NULL, despite quite few locations set f_inode to NULL. # grep -nFr -- '->f_inode ' * fs/file_table.c:168: file->f_inode = path->dentry->d_inode; fs/file_table.c:224: file->f_inode = NULL; fs/open.c:711: f->f_inode = inode; fs/open.c:782: f->f_inode = NULL; fs/overlayfs/copy_up.c:36: if (f->f_inode == d_inode(dentry)) Maybe the OOM reaper by error reclaimed and somebody zeroed the reclaimed page containing file->f_inode. JFYI, 4.9.30 does not have commit 235190738aba7c5c ("oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA") backported.