Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp3062756imj; Mon, 18 Feb 2019 18:52:52 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib8gU6SVd2HVyqufMHXNc7hqABIjCUS9xBCakIfl2eYrFUP2bdn66hw0p1Fb5IWBODa8XG4 X-Received: by 2002:a63:ce16:: with SMTP id y22mr21821790pgf.296.1550544772282; Mon, 18 Feb 2019 18:52:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550544772; cv=none; d=google.com; s=arc-20160816; b=t+j+rYvcZhvesye6Rw5hasZOVJ1LjHwIeJS00NpSomSR+NJdFHxZo7qcLqTCzhZ4vi vEM/kZTBFLKxhbsytVYXgQ6aXFuTkzpt+6izOX14JQSM/FIEUSZ5D4g4hICrUVL+9tsL rmx/vKTNqQQ4A4HIW0QdKiVFwVoKbR0vs+jGTTVbdB78dgh7X/5K2I/kox3IzdmbG0yB ThN0QcTxWcB38/AyvEBcGx+WNahla0lrzNUv7OnCcKI1JebRAmgLoGIoFaM7S6OzZFDg T16t3mxUK0QFM45h/DcdISSyQPJ5gmTQb0HEVFn2ttSjLqzQKzfIAqAjXWWsxItUPDrY Lacg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GJ/izb3W7l/a5E2u3FrAOd++EDwy7WL1k6QfoiqpMAU=; b=G6pFWf7ZF1GqKbZDgQuaa9Qse3zJLp7g2Vj0bgp72WiEPyGkgwjzy6q2a6Fjcgebgo cC7N8TJma7uB/264OIU2B6QAXTU1fMqMdnosfhYSPM/aJ/Wb2TZ7JOxcrwQ5FGP1Z3Hg g6wSuM9EdMgoquYnjw9ZkJ4U5e+MlWmQ9GoXrSczzxAXTtWEarcSPBPcEw7390/2hyzz /MJeR1L3H8bT/XYi12QYtDMPool9gsWTsYVINWWBQ0l2lVq+4mgrThOf3Kew8jqSauqS /vKPBtNO+4Hr/ircVlfqpmnr7HP8Z8/h8OTv3labY/LILh3R880lntsBcUqk8yatW0EK 2EsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v11si12850841pgb.276.2019.02.18.18.52.37; Mon, 18 Feb 2019 18:52:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727394AbfBSCqU (ORCPT + 99 others); Mon, 18 Feb 2019 21:46:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48238 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726317AbfBSCqT (ORCPT ); Mon, 18 Feb 2019 21:46:19 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DB8392D7E9; Tue, 19 Feb 2019 02:46:18 +0000 (UTC) Received: from sky.random (ovpn-120-13.rdu2.redhat.com [10.10.120.13]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A5E0E60C66; Tue, 19 Feb 2019 02:46:12 +0000 (UTC) Date: Mon, 18 Feb 2019 21:46:11 -0500 From: Andrea Arcangeli To: Alexander Duyck Cc: David Hildenbrand , "Michael S. Tsirkin" , Nitesh Narayan Lal , kvm list , LKML , Paolo Bonzini , lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, Yang Zhang , Rik van Riel , dodgen@google.com, Konrad Rzeszutek Wilk , dhildenb@redhat.com Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting Message-ID: <20190219024611.GC24502@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> <20190218114601-mutt-send-email-mst@kernel.org> <44740a29-bb14-e6e6-2992-98d0ae58e994@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 19 Feb 2019 02:46:19 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote: > essentially fragmented them. I guess hugepaged went through and > started trying to reassemble the huge pages and as a result there have > been apps that ended up consuming more memory than they would have > otherwise since they were using fragments of THP pages after doing an > MADV_DONTNEED on sections of the page. With relatively recent kernels MADV_DONTNEED doesn't necessarily free anything when it's applied to a THP subpage, it only splits the pagetables and queues the THP for deferred splitting. If there's memory pressure a shrinker will be invoked and the queue is scanned and the THPs are physically splitted, but to be reassembled/collapsed after a physical split it requires at least one young pte. If this is particularly problematic for page hinting, this behavior where the MADV_DONTNEED can be undoed by khugepaged (if some subpage is being frequently accessed), can be turned off by setting /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none to 0. Then the THP will only be collapsed if all 512 subpages are mapped (i.e. they've all be re-allocated by the guest). Regardless of the max_ptes_none default, keeping the smaller guest buddy orders as the last target for page hinting should be good for performance. > Yeah, no problem. The only thing I don't like about MADV_FREE is that > you have to have memory pressure before the pages really start getting > scrubbed with is both a benefit and a drawback. Basically it defers > the freeing until you are under actual memory pressure so when you hit > that case things start feeling much slower, that and it limits your > allocations since the kernel doesn't recognize the pages as free until > it would have to start trying to push memory to swap. The guest allocation behavior should not be influenced by MADV_FREE vs MADV_DONTNEED, the guest can't see the difference anyway, so why should it limit the allocations? The benefit of MADV_FREE should be that when the same guest frees and reallocates an huge amount of RAM (i.e. guest app allocating and freeing lots of RAM in a loop, not so uncommon), there will be no KVM page fault during guest re-allocations. So in absence of memory pressure in the host it should be a major win. Overall it sounds like a good tradeoff compared to MADV_DONTNEED that forcefully invokes MMU notifiers and forces host allocations and KVM page faults in order to reallocate the same RAM in the same guest. When there's memory pressure it's up to the host Linux VM to notice there's plenty of MADV_FREE material to free at zero I/O cost before starting swapping. Thanks, Andrea