Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp933504imu; Tue, 20 Nov 2018 09:05:39 -0800 (PST) X-Google-Smtp-Source: AFSGD/W4MRX/Ut1FlW083NxcCesQ3VAzdi1o/svW7ZFpjky0vI8HAAVLxcJgc6Hj5yAZnF4ls0cd X-Received: by 2002:a17:902:820d:: with SMTP id x13mr3077278pln.229.1542733539726; Tue, 20 Nov 2018 09:05:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542733539; cv=none; d=google.com; s=arc-20160816; b=l177/uQ7wlhNT2uXnwwPu59b/V8v+ikr5UbIYeWs/t5zdGM/WRNKeX+91mY+0GL3xq ACCdSTvTGlhrMHGd0or0kkwayjcuWFTJH1q/HT5M4rD1fwMzWIsPwlmlx4shutCY8cdZ ccH9mP/J1y3jFE1evbrE4I8bydM+8yvBu2MYWWJK5ygJ1xAWZlMuHQMLa2gT9fIn83WI jlWgKem+GWTFJ7kl8p/GiqM4REEUSs93nVgHWnwZianuQEXcfMz1pJiVFu3sjXGQt857 y1q1R4ahwZmLsicYHdB4yq9l+G4R2kET1bogA/n6lh492Oc7OTiPb957Y+iCrqNslDqG 0Vdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=nCFRttT71A7qAXCBI+H4b6nOLehGfLSy/4fSXkqEN14=; b=K0oKMoq/kRqtoMspPfFn3ZmBTTlOXmr+wUbfxK1Jx2pJVQKu1FlJqsFEVHhhhuv8L6 bsbzb1i4RKOvO38amhUWBN4L39NUQoPFPGgspjMJDLYWBvHA0iOJUtz2Gj+r0NxGcHbK C8cneyg83xPfOevqTL6hkHgzDS9TlCMtii0W6q91GSDHrQQAs8WNzEX/xHsOqgXB5GlA 1c1oF0w4/heP4Ccm2GBayb0y9MYJBv6NbJB6H4XtLL/WGI9qtTkWKVdSauHSfsOiBP+7 Z4EUD63/nuvC8I20iBOOYZ6jZR4A7MR14tKtCT1LW+LaLKT/NIbNc7l7TJDfceJ+bPcD paYA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w12si13938292pgl.122.2018.11.20.09.04.58; Tue, 20 Nov 2018 09:05:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729383AbeKUDeS (ORCPT + 99 others); Tue, 20 Nov 2018 22:34:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37268 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726281AbeKUDeR (ORCPT ); Tue, 20 Nov 2018 22:34:17 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 92258308213E; Tue, 20 Nov 2018 17:04:08 +0000 (UTC) Received: from sky.random (ovpn-120-160.rdu2.redhat.com [10.10.120.160]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 47EBF1001F57; Tue, 20 Nov 2018 17:04:08 +0000 (UTC) Date: Tue, 20 Nov 2018 12:04:07 -0500 From: Andrea Arcangeli To: "Kirill A. Shutemov" Cc: Mel Gorman , Anthony Yznaga , linux-mm@kvack.org, linux-kernel@vger.kernel.org, aneesh.kumar@linux.ibm.com, akpm@linux-foundation.org, jglisse@redhat.com, khandual@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com, mhocko@kernel.org, minchan@kernel.org, peterz@infradead.org, rientjes@google.com, vbabka@suse.cz, willy@infradead.org, ying.huang@intel.com, nitingupta910@gmail.com Subject: Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory Message-ID: <20181120170407.GM29258@redhat.com> References: <1541746138-6706-1-git-send-email-anthony.yznaga@oracle.com> <20181109121318.3f3ou56ceegrqhcp@kshutemo-mobl1> <20181109195150.GA24747@redhat.com> <20181110132249.GH23260@techsingularity.net> <20181110164412.GB22642@redhat.com> <20181120091122.3dxlgff3vivwilrg@kshutemo-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181120091122.3dxlgff3vivwilrg@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Tue, 20 Nov 2018 17:04:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2018 at 12:11:22PM +0300, Kirill A. Shutemov wrote: > On Sat, Nov 10, 2018 at 11:44:12AM -0500, Andrea Arcangeli wrote: > > I would prefer to add intelligence to detect when COWs after fork > > should be done at 2m or 4k granularity (in the latter case by > > splitting the pmd before the actual COW while leaving the transhuge > > pmd intact in the other mm), because that would save CPU (and it'd > > automatically optimize redis). The snapshot process especially would > > run faster as it will read with THP performance. > > I would argue we should switch to 4k COW everywhere. But it requires some We could do that if MADV_HUGEPAGE is not set for example. So there would still be a way to force the 2M cows if something benefits from them. For example with binaries executed in tmpfs one could want 2M cows on MAP_PRIVATE to keep all the executable in 2MB tlbs despite the memory loss (but then there are those libs that apparently aren't released to load the binaries into THP anon too for the same reason and with even higher memory waste risk as unlike tmpfs nothing can be shared if you run multiple copies of a go large binary or something). Certainly it would help whenever fork() is used for snapshotting purposes, but then fork() used for snapshotting purposes doesn't look the best mechanism possible for atomic snapshots. It would be interesting to know which other common workloads will benefit, for workloads that unlike fork()-for-snapshot, are already as optimal as it can get. > work on khugepaged side to be able to recover THP back after multiple 4k > COW in the range. Currently khugepaged is not able to collapse PTE entires > backed by compound page back to PMD. Yes this is also answering Anthony question about what shall happen after to the 4k cows on the doublemap. The thing is, by the time khugepaged comes around, the child will hopefully already have quit, so it would be ideal if it can understand the anon page isn't even shared anymore, it's fully private to the process after holding the mmap_sem for writing, so if it's not-shared anymore and mapcount is 1, khugepaged doesn't need to do the 2M cow of the doublemap THP at all, it just needs to flush the 4k fragment back to the THP and drop the doublemap and convert the readonly pte entries to a writable pmd_trans_huge (if VM_WRITE is still set). > I have this on my todo list for long time, but... We're also slowly making progress on the uffd-wp to offer a hopefully way more efficient way to do the snapshot than using fork(), then the whole fork thing won't be an issue because there will be no fork.