Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp1224282imb; Sat, 2 Mar 2019 07:13:57 -0800 (PST) X-Google-Smtp-Source: AHgI3IafpT/TFEQC+vqOPrgZzAJfszc+hiLv9PvkRP9bFtFQuAdVsVXPgkve+ANhop5yP0IRNblk X-Received: by 2002:aa7:91d7:: with SMTP id z23mr11289381pfa.137.1551539637198; Sat, 02 Mar 2019 07:13:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551539637; cv=none; d=google.com; s=arc-20160816; b=WWlosAWOpqI2YZfIGV7NPljjydXQccVLhRL9TP77SG75jvNKHL2WPOP0+XuPQAU3YV H2orez7F7cYZ8HN3pyi1zsHW3YFRwb76VGFKe54Eu6DgXg/3QVfPT4Qb/p+wcJPTS5eR 0uz7MgnLTku8hQU0hL0AoEsy3/iPpMpq8FS7/aq1l/LIH0adgLObeaw5ka36hEtUb4XP Iti2Xl7lXdJIyMJ3t7z8wFVHbQ+YDAtbJTwvcZKbI9dXClUIWOMr5nDeC+ksXdC4cSVT MBUDwAqgrw/BitzRuE4OXqHPPZtpG9Rne70wzrH/zm6Jek1lC8s67cX3ekLeHRrWPxGD FRbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=sjSqMIoEjtI7SM2/h9YXuH0OJaBmS4bgBeYZjNAxlxg=; b=ff0vekm2JsyH5ah9DaSHLjC349nnz0XvZ8cjc66zuFmaxXMkKMWG0joD/OJi5e7pwI W0P7FIFovRkKQ7Tnk4glE7n7OhdRg2oMqZgIs2LAZk9rBbTRfd4j719zSdnkDzLeUEUP DXGWzUhBjzQzkM0vZCUsNNuSesJQR0ED5CtbXXiT3/hm3DCY3NPqXfzmHh2Rsmf1fbNd YtWXXJv4pXfQlNPXwXLICjF9SKAg6RXIivP0a1KHrF2i2zvbyjHRv/V660TSrGKOnFwT FtfVkRc451WuduMGXnDft5PdhpqNeCj+0cJiE8uW5FShT6MhjiBrNVTb1R/kZfbmkRvj 31Og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d16si831561pgb.443.2019.03.02.07.13.29; Sat, 02 Mar 2019 07:13:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726295AbfCBPMU (ORCPT + 99 others); Sat, 2 Mar 2019 10:12:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34254 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726044AbfCBPMT (ORCPT ); Sat, 2 Mar 2019 10:12:19 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 05BD711BB79; Sat, 2 Mar 2019 15:11:47 +0000 (UTC) Received: from dustball.brq.redhat.com (unknown [10.43.17.163]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3CC08600CD; Sat, 2 Mar 2019 15:11:40 +0000 (UTC) From: Jan Stancek To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, peterz@infradead.org, riel@surriel.com, mhocko@suse.com, ying.huang@intel.com, jrdr.linux@gmail.com, jglisse@redhat.com, aneesh.kumar@linux.ibm.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, rientjes@google.com, kirill@shutemov.name, mgorman@techsingularity.net, jstancek@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct Date: Sat, 2 Mar 2019 16:11:26 +0100 Message-Id: <0b7a4604529e16ace8d65a42dac7c78582e7fb28.1551538524.git.jstancek@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Sat, 02 Mar 2019 15:11:47 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. This is a stress test, where one thread mmaps/writes/munmaps memory area and other thread is trying to read from it: CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) Call Trace: ([<0000000000000000>] (null)) [<00000000001adae4>] lock_acquire+0xec/0x258 [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 [<000000000012a780>] page_table_free+0x48/0x1a8 [<00000000002f6e54>] do_fault+0xdc/0x670 [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 page_table_free() is called with NULL mm parameter, but because "0" is a valid address on s390 (see S390_lowcore), it keeps going until it eventually crashes in lockdep's lock_acquire. This crash is reproducible at least since 4.14. Problem is that "vmf->vma" used in do_fault() can become stale. Because mmap_sem may be released, other threads can come in, call munmap() and cause "vma" be returned to kmem cache, and get zeroed/re-initialized and re-used: handle_mm_fault | __handle_mm_fault | do_fault | vma = vmf->vma | do_read_fault | __do_fault | vma->vm_ops->fault(vmf); | mmap_sem is released | | | do_munmap() | remove_vma_list() | remove_vma() | vm_area_free() | # vma is released | ... | # same vma is allocated | # from kmem cache | do_mmap() | vm_area_alloc() | memset(vma, 0, ...) | pte_free(vma->vm_mm, ...); | page_table_free | spin_lock_bh(&mm->context.lock);| | This patch pins mm_struct and stores its value, to avoid using potentially stale "vma" when calling pte_free(). [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c Signed-off-by: Jan Stancek --- mm/memory.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index e11ca9dd823f..1287ee9acbdc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3517,12 +3517,17 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) * but allow concurrent faults). * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). + * If mmap_sem is released, vma may become invalid (for example + * by other thread calling munmap()). */ static vm_fault_t do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); vm_fault_t ret; + mmgrab(vm_mm); + /* * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ @@ -3561,9 +3566,12 @@ static vm_fault_t do_fault(struct vm_fault *vmf) /* preallocated pagetable is unused: free it */ if (vmf->prealloc_pte) { - pte_free(vma->vm_mm, vmf->prealloc_pte); + pte_free(vm_mm, vmf->prealloc_pte); vmf->prealloc_pte = NULL; } + + mmdrop(vm_mm); + return ret; } -- 1.8.3.1