Received: by 10.213.65.68 with SMTP id h4csp351524imn; Tue, 20 Mar 2018 05:15:53 -0700 (PDT) X-Google-Smtp-Source: AG47ELsitr09UCzem+JWyvkev6WBfhJATUR9DcBW+q9VbeY75FOoxmB6IffXRC0vB5Znp8EQ2mOJ X-Received: by 2002:a17:902:650e:: with SMTP id b14-v6mr14314012plk.147.1521548153662; Tue, 20 Mar 2018 05:15:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521548153; cv=none; d=google.com; s=arc-20160816; b=SpUvlYhkK266U/5LZf71cuI6yW18P55U4642XihxpQHJgIv93149etOsCR3lRZV7/1 Ynl8uBN+PzjDXnvKSzMN7RLL3bt4y2CXfWSsQZvG5ClYOlt9B4oJblIZmo4CUgtNegG5 HRr/SewxG3DqeCoab2o80ONlXypAHgpq1a6Fr64sVNkf/XqnFtLHWBfAYHN5mNu9j9uu 18d97X2/fDCnJjnrWGQmUYNVgP0ac1fXPdimgZIK2e9fuvN7cBgO2ch7nyDrg6eJAxzB Ou0/Nq1bFMk2e6+pLrvHytDAVS5RQraUjMBubuS/lMXAsW3IW5nXnzS0SqRSQpf0DTTu YuqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=/8tPQ47v0SEmJSVUlgkAE6yJodVCPzlXH2oe4QSfKYU=; b=XXUYJV6v3Zakubjy84gjktqWw54LtNqunuT9R6azpSnZg66D86WIeJMDvrvExKDWIT NwgxbVX8cDdwfkEmC5oaj93wiLYJuD8cFkrOGXeSBi6FjUWbGLDLrGo7sBKhYtqCQ9qu Cquc0mIHGMXVIXRs3ir5MIbPsUEsTT2WAfwGb9uoAX/K6j+aVaB7TlycbsA3oYWB9XLW 5mkNg1Aj8gHgCtOTInIUkyzI54w9F6MI9kmxpXCUsCxeXYBNyciMCH7vYxC2RlYAqVF9 +LEZqae597sY4mJRy7BMZ/kkU6JngWO6wQqZEZOghDYUiqVMFb/wze8jH1Qe9vlzZMHs E85A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3-v6si1506806plb.717.2018.03.20.05.15.38; Tue, 20 Mar 2018 05:15:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753161AbeCTMOh (ORCPT + 99 others); Tue, 20 Mar 2018 08:14:37 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39334 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752980AbeCTMOf (ORCPT ); Tue, 20 Mar 2018 08:14:35 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2KCE1rw045348 for ; Tue, 20 Mar 2018 08:14:34 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0b-001b2d01.pphosted.com with ESMTP id 2gtysex07r-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Tue, 20 Mar 2018 08:14:34 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 20 Mar 2018 12:14:32 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 20 Mar 2018 12:14:30 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2KCEUrl65208440; Tue, 20 Mar 2018 12:14:30 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 945254C040; Tue, 20 Mar 2018 12:07:38 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4F3964C04A; Tue, 20 Mar 2018 12:07:38 +0000 (GMT) Received: from p-imbrenda.boeblingen.de.ibm.com (unknown [9.152.224.149]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Tue, 20 Mar 2018 12:07:38 +0000 (GMT) From: Claudio Imbrenda To: linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, aarcange@redhat.com, minchan@kernel.org, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, hughd@google.com, borntraeger@de.ibm.com, gerald.schaefer@de.ibm.com Subject: [PATCH v3 1/1] mm/ksm: fix interaction with THP Date: Tue, 20 Mar 2018 13:14:29 +0100 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 18032012-0020-0000-0000-0000040747B1 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032012-0021-0000-0000-0000429B611D Message-Id: <1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-20_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803200143 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch fixes a corner case for KSM. When two pages belong or belonged to the same transparent hugepage, and they should be merged, KSM fails to split the page, and therefore no merging happens. This bug can be reproduced by: * making sure ksm is running (in case disabling ksmtuned) * enabling transparent hugepages * allocating a THP-aligned 1-THP-sized buffer e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21) * filling it with the same values e.g. memset(p, 42, 1<<21) * performing madvise to make it mergeable e.g. madvise(p, 1<<21, MADV_MERGEABLE) * waiting for KSM to perform a few scans The expected outcome is that the all the pages get merged (1 shared and the rest sharing); the actual outcome is that no pages get merged (1 unshared and the rest volatile) The reason of this behaviour is that we increase the reference count once for both pages we want to merge, but if they belong to the same hugepage (or compound page), the reference counter used in both cases is the one of the head of the compound page. This means that split_huge_page will find a value of the reference counter too high and will fail. This patch solves this problem by testing if the two pages to merge belong to the same hugepage when attempting to merge them. If so, the hugepage is split safely. This means that the hugepage is not split if not necessary. Co-authored-by: Gerald Schaefer Signed-off-by: Claudio Imbrenda --- mm/ksm.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/mm/ksm.c b/mm/ksm.c index 293721f..da777a9 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2082,8 +2082,22 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item) tree_rmap_item = unstable_tree_search_insert(rmap_item, page, &tree_page); if (tree_rmap_item) { + bool split; + kpage = try_to_merge_two_pages(rmap_item, page, tree_rmap_item, tree_page); + /* + * If both pages we tried to merge belong to the same compound + * page, then we actually ended up increasing the reference + * count of the same compound page twice, and split_huge_page + * failed. + * Here we set a flag if that happened, and we use it later to + * try split_huge_page again. Since we call put_page right + * afterwards, the reference count will be correct and + * split_huge_page should succeed. + */ + split = PageTransCompound(page) + && compound_head(page) == compound_head(tree_page); put_page(tree_page); if (kpage) { /* @@ -2110,6 +2124,20 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item) break_cow(tree_rmap_item); break_cow(rmap_item); } + } else if (split) { + /* + * We are here if we tried to merge two pages and + * failed because they both belonged to the same + * compound page. We will split the page now, but no + * merging will take place. + * We do not want to add the cost of a full lock; if + * the page is locked, it is better to skip it and + * perhaps try again later. + */ + if (!trylock_page(page)) + return; + split_huge_page(page); + unlock_page(page); } } } -- 2.7.4