Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp69834ybl; Tue, 7 Jan 2020 02:20:54 -0800 (PST) X-Google-Smtp-Source: APXvYqzxSwwqYdt80t5sW9HIurEOt3ZfEoqiAAjc4wReHbRifg9NkEmAem0pnk7WxtwlS/OLmNku X-Received: by 2002:a05:6830:4b9:: with SMTP id l25mr123362559otd.266.1578392454225; Tue, 07 Jan 2020 02:20:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578392454; cv=none; d=google.com; s=arc-20160816; b=J7kAb05+lVQ3KoeyQqfbZstjcZs5ktTqLLRhRbF4eP0bTF9lXTkLdjV1WDwFDE1Xo5 zrIEEAN6S0gApaXHcBzFA8oTNabGXZ7JrcrJRXBmv2+CgYoWth/jG9kgH4OUsFtTHef8 oSwwTkrN9XiawG18pVnaJQ2cCYsp5Ot0+7GSupNilbmOBWzJoK2QoG3fU+NHR7dgf8VG rsTsqzt3QVcW6ky4vJKsvd4P2XU09JQK8hikZ12eiFUlyh9+7ZDehaVuEWZsGd/upXA/ I8cyzZXXRuWbau5l2f90mu2ywwX2Nq5rPWtWpTUe5PXU7MwVfdWlW+L2D6aM35lejKUx 5u1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:message-id:date:cc:to:from:subject:dkim-signature; bh=SfZstHrbtjR/LipvtxYFtLO8RuvHkj9qkEOgd+AGIyc=; b=oFjD4eYh1V/6jT0sr0ygLeuEbkXenqpP+550h94RMT5wnSQK/F3h/DqTe3kd6OI4rP KrcApBEWBdv8SkMPswHLuoDCGHviJQ0jHFrphVj+vss+chDpQ6+YIPIBYgaYiX/eDwml gWt4VMQvndxRCxlU1HUfP0iOTAGTdBCo0nxgajB/m2bKGao+pbXlILxJvE3CQ8Fxnx+j P8LMU/O0OZESH0QuhQ1LqrRo3BSOIXr1T9L8G2QCfpSsfdrZjKkjRn2lW1AgvaLvvYlX khj+oLvdr8tOS5UOLO6J9YtLvhSTc1MdjaehXbnsdNxDInwByW5ORPL9L65D4IDC23uj UgDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=V3XQ1DaY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e22si36485010oiy.124.2020.01.07.02.20.41; Tue, 07 Jan 2020 02:20:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=V3XQ1DaY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727811AbgAGKUC (ORCPT + 99 others); Tue, 7 Jan 2020 05:20:02 -0500 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:36852 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726558AbgAGKUB (ORCPT ); Tue, 7 Jan 2020 05:20:01 -0500 Received: from mxbackcorp1g.mail.yandex.net (mxbackcorp1g.mail.yandex.net [IPv6:2a02:6b8:0:1402::301]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id 4D48A2E0AF1; Tue, 7 Jan 2020 13:19:57 +0300 (MSK) Received: from myt5-70c90f7d6d7d.qloud-c.yandex.net (myt5-70c90f7d6d7d.qloud-c.yandex.net [2a02:6b8:c12:3e2c:0:640:70c9:f7d]) by mxbackcorp1g.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id 5RNHgShfHB-JudOLcWq; Tue, 07 Jan 2020 13:19:57 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1578392397; bh=SfZstHrbtjR/LipvtxYFtLO8RuvHkj9qkEOgd+AGIyc=; h=Message-ID:Date:To:From:Subject:Cc; b=V3XQ1DaYjyEpCGsFkd+8AOuKeCRE59MbfTuQp+mIBHn2V/C6KrIDZzC6ghtsnhEnF SVxgcTQfR2EJ/Y6oDOj4Y+515TcHac5uoUvBG2eoBe32HLujA506bsQ97VXIPXX2/1 SIWKrhfhJi/wAZvytOi/0Kd0VDX7hlXNv38Q0g4Y= Authentication-Results: mxbackcorp1g.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from unknown (unknown [2a02:6b8:b080:6407::1:5]) by myt5-70c90f7d6d7d.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id CadqU3Oesa-JuVSf9Zg; Tue, 07 Jan 2020 13:19:56 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) Subject: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork From: Konstantin Khlebnikov To: linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org, Wei Yang Cc: Rik van Riel , Li Xinhai , "Kirill A. Shutemov" Date: Tue, 07 Jan 2020 13:19:56 +0300 Message-ID: <157839239609.694.10268055713935919822.stgit@buzz> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected way but fortunately still produces valid anon-vma tree, so nothing crashes. If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork before all patches in child process related VMAs: DST1 DST2 .. DSTn will fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0). Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal). With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it. Root problem caused by initialization order in dup_mmap(): vma->vm_prev is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to previous VMA in parent mm. Second problem is hidden behind first one: assumption "Parent has vm_prev, which implies we have vm_prev" is wrong if first VMA in parent mm has set flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer because in current code 'prev' actually is same as 'pprev'. Third hidden problem is linking between VMA and anon-vmas whose pages it could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas, shared anon-vma isn't attached. But every mapped page stays reachable in rmap because we erroneously share anon-vma from parent's previous VMA. This patch moves sharing logic out of anon_vma_clone() into more specific anon_vma_fork() because this supposed to work only at fork() and simply reuses anon_vma from previous VMA if it is forked from the same anon-vma. Signed-off-by: Konstantin Khlebnikov Reported-by: Li Xinhai Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork") Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t --- include/linux/rmap.h | 3 ++- kernel/fork.c | 2 +- mm/rmap.c | 23 +++++++++-------------- 3 files changed, 12 insertions(+), 16 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 988d176472df..560e4480dcd0 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -143,7 +143,8 @@ void anon_vma_init(void); /* create anon_vma_cachep */ int __anon_vma_prepare(struct vm_area_struct *); void unlink_anon_vmas(struct vm_area_struct *); int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *); -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *); +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, + struct vm_area_struct *prev); static inline int anon_vma_prepare(struct vm_area_struct *vma) { diff --git a/kernel/fork.c b/kernel/fork.c index 2508a4f238a3..c33626993831 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, tmp->anon_vma = NULL; if (anon_vma_prepare(tmp)) goto fail_nomem_anon_vma_fork; - } else if (anon_vma_fork(tmp, mpnt)) + } else if (anon_vma_fork(tmp, mpnt, prev)) goto fail_nomem_anon_vma_fork; tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT); tmp->vm_next = tmp->vm_prev = NULL; diff --git a/mm/rmap.c b/mm/rmap.c index b3e381919835..3c1e04389291 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; struct anon_vma *root = NULL; - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; - - /* - * If parent share anon_vma with its vm_prev, keep this sharing in in - * child. - * - * 1. Parent has vm_prev, which implies we have vm_prev. - * 2. Parent and its vm_prev have the same anon_vma. - */ - if (!dst->anon_vma && src->anon_vma && - pprev && pprev->anon_vma == src->anon_vma) - dst->anon_vma = prev->anon_vma; - list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { struct anon_vma *anon_vma; @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) * the corresponding VMA in the parent process is attached to. * Returns 0 on success, non-zero on failure. */ -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, + struct vm_area_struct *prev) { struct anon_vma_chain *avc; struct anon_vma *anon_vma; @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) if (!pvma->anon_vma) return 0; + /* Share anon_vma with previous VMA if it has the same parent. */ + if (prev && prev->anon_vma && + prev->anon_vma->parent == pvma->anon_vma) { + vma->anon_vma = prev->anon_vma; + return anon_vma_clone(vma, prev); + } + /* Drop inherited anon_vma, we'll reuse existing or allocate new. */ vma->anon_vma = NULL;