Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4027773imm; Mon, 14 May 2018 01:15:20 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoI7MphjSruR8Tx0PK7pIaEtcu3RvqHT5oe6nJIIqnDGZFuQIE8wXIdi+2sd0H0mwZW4Bta X-Received: by 2002:a17:902:bb92:: with SMTP id m18-v6mr718605pls.46.1526285720309; Mon, 14 May 2018 01:15:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526285720; cv=none; d=google.com; s=arc-20160816; b=E2lOAZU/ZTf7aWNpIxcJj7KxzbdQzjk0HLZemaouNTZi+5bZHF5RnAWeSUJFNNkjQv cHTe3h0t9QptnqjT7IC8vZSGcHtF6QDzdXuuI5QWHZv/UMofPApwv+6AVNPpB6hBl9A6 dbgZJV1e9/xgx00dZwaZAq+wywRtVDeX65qJ09TLqxZyRdNu0JC82oq5PV+gbr6O/7AX fAxiD4ZBtQkH2pI14k1hjFOyUERrZ+keiOGh1Vl7BFqkIHcigEIE60YFpwCGRIKu3CU9 hHIrVG4yVLRzrN1anBwTXMUG5zOizjdBfqX+RLMS6W6kJBFORlu64sVdUTd8W6Rcz3W4 tGBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:references:in-reply-to:date :subject:cc:to:from:arc-authentication-results; bh=SznpAbRwnJSJI/F7uNZtjmDSpyqVN/dDbAi1c3VBxxw=; b=UoTcWUtpjr1/OT+nSyLwJ1zw9kUH2B7GJmLu3NYUZeIEZ+HmxAENozhwH11zyrE680 sd3P7fQiQUkA4F5vEsVaBrEHPrMn4wgNudA54weGUZ2UMFkCYsTVjUiw+twnkPoWht61 fY2ppgiEY475NHC/jsu+LO4uHTZRjcHLy3vJbInzpLwoovGrduNhkOSq66t11RQ9e4w2 gKUTxIFGOKAQ8og9L61nxLvlk3cM/iqPC/vm6QO4TfeGcjyyy/K0W1Fsn+PrRMp3iokG eAQNtfUsdUjCPNnqNqJHOJAgmPqv0jZRT0xoIcRb7KyzHEwDsJR+tb0G3w/86NSMcADC Plag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e90-v6si8794218pfb.355.2018.05.14.01.15.05; Mon, 14 May 2018 01:15:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752693AbeENIOF (ORCPT + 99 others); Mon, 14 May 2018 04:14:05 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:53752 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752196AbeENIN7 (ORCPT ); Mon, 14 May 2018 04:13:59 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4E84Io7043335 for ; Mon, 14 May 2018 04:13:59 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0b-001b2d01.pphosted.com with ESMTP id 2hy4sc4yta-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 14 May 2018 04:13:58 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 May 2018 09:13:52 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 14 May 2018 09:13:50 +0100 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4E8DoLk7668010; Mon, 14 May 2018 08:13:50 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4538F42041; Mon, 14 May 2018 09:04:45 +0100 (BST) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DA8D942042; Mon, 14 May 2018 09:04:43 +0100 (BST) Received: from rapoport-lnx (unknown [9.148.8.81]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 14 May 2018 09:04:43 +0100 (BST) Received: by rapoport-lnx (sSMTP sendmail emulation); Mon, 14 May 2018 11:13:48 +0300 From: Mike Rapoport To: Jonathan Corbet Cc: linux-doc , linux-mm , lkml , Mike Rapoport Subject: [PATCH 2/3] docs/vm: transhuge: minor updates Date: Mon, 14 May 2018 11:13:39 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1526285620-453-1-git-send-email-rppt@linux.vnet.ibm.com> References: <1526285620-453-1-git-send-email-rppt@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18051408-0008-0000-0000-000004F639B9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051408-0009-0000-0000-00001E8A96F8 Message-Id: <1526285620-453-3-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-14_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805140085 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some formatting changes and addition of a sentence introducing khugepaged Signed-off-by: Mike Rapoport --- Documentation/vm/transhuge.rst | 47 ++++++++++++++++++++++++++++++++---------- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/Documentation/vm/transhuge.rst b/Documentation/vm/transhuge.rst index 56d04cbb..47c7e47 100644 --- a/Documentation/vm/transhuge.rst +++ b/Documentation/vm/transhuge.rst @@ -9,14 +9,19 @@ Objective Performance critical computing applications dealing with large memory working sets are already running on top of libhugetlbfs and in turn -hugetlbfs. Transparent Hugepage Support is an alternative means of +hugetlbfs. Transparent HugePage Support (THP) is an alternative mean of using huge pages for the backing of virtual memory with huge pages that supports the automatic promotion and demotion of page sizes and without the shortcomings of hugetlbfs. -Currently it only works for anonymous memory mappings and tmpfs/shmem. +Currently THP only works for anonymous memory mappings and tmpfs/shmem. But in the future it can expand to other filesystems. +.. note:: + in the examples below we presume that the basic page size is 4K and + the huge page size is 2M, although the actual numbers may vary + depending on the CPU architecture. + The reason applications are running faster is because of two factors. The first factor is almost completely irrelevant and it's not of significant interest because it'll also have the downside of @@ -28,15 +33,27 @@ only matters the first time the memory is accessed for the lifetime of a memory mapping. The second long lasting and much more important factor will affect all subsequent accesses to the memory for the whole runtime of the application. The second factor consist of two -components: 1) the TLB miss will run faster (especially with -virtualization using nested pagetables but almost always also on bare -metal without virtualization) and 2) a single TLB entry will be -mapping a much larger amount of virtual memory in turn reducing the -number of TLB misses. With virtualization and nested pagetables the -TLB can be mapped of larger size only if both KVM and the Linux guest -are using hugepages but a significant speedup already happens if only -one of the two is using hugepages just because of the fact the TLB -miss is going to run faster. +components: + +1) the TLB miss will run faster (especially with virtualization using + nested pagetables but almost always also on bare metal without + virtualization) + +2) a single TLB entry will be mapping a much larger amount of virtual + memory in turn reducing the number of TLB misses. With + virtualization and nested pagetables the TLB can be mapped of + larger size only if both KVM and the Linux guest are using + hugepages but a significant speedup already happens if only one of + the two is using hugepages just because of the fact the TLB miss is + going to run faster. + +THP can be enabled system wide or restricted to certain tasks or even +memory ranges inside task's address space. Unless THP is completely +disabled, there is ``khugepaged`` daemon that scans memory and +collapses sequences of basic pages into huge pages. + +The THP behaviour is controlled via :ref:`sysfs ` +interface and using madivse(2) and prctl(2) system calls. Transparent Hugepage Support maximizes the usefulness of free memory if compared to the reservation approach of hugetlbfs by allowing all @@ -69,9 +86,14 @@ Applications that gets a lot of benefit from hugepages and that don't risk to lose memory by using hugepages, should use madvise(MADV_HUGEPAGE) on their critical mmapped regions. +.. _thp_sysfs: + sysfs ===== +Global THP controls +------------------- + Transparent Hugepage Support for anonymous memory can be entirely disabled (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to avoid the risk of consuming more memory resources) or enabled @@ -142,6 +164,9 @@ khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". +Khugepaged controls +------------------- + khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's -- 2.7.4