Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3916554pxb; Mon, 1 Feb 2021 07:55:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJxhR/R7y4L1UGhor4eX2zrMSKtbUTu6zNZ+u0rcvLcV3sd5kiCFfxTghrxZ/WM0dpR8r/WN X-Received: by 2002:a17:906:e107:: with SMTP id gj7mr17617966ejb.298.1612194954045; Mon, 01 Feb 2021 07:55:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612194954; cv=none; d=google.com; s=arc-20160816; b=hFHRzrTwrmUATNIlndjyA24oxRF042JKXdZeCNiz45pTkGV52hkIqiRYF88j49ISAU iOSPR5lPGOHrRYvT34X5Ohyj7p6QFIyK9TpF71aO5WY/n0T6Oi4fdUngHNuKGQtGBsyB 7Rg9zVndyvWdlSMqMylN6645li0OTmhWLUHC7hiWn69oqii0Cea4NfjKR5rcetWLDn0H ibCyFC0vn09HSbaILIp1b8SqFB/KlRd/HFlw3zWk+yEetAjFLQqGNTj6j4CN2zjkKYz/ Y40YaxA5Gys0kSiQtIzazEZyy6/t9GuBCs3YGSSEXGA9+HulHhZF5tFAgR5S9kacYofv vk8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=k7JcQqia2+dIQ5jR2h/xA1wGbQRmVVlja2/HYWBBHWM=; b=c1/23tkoqCB5RLP6m0buU76cp8wBco32vxSfMuw9rIqCIMBdQRt77RV5t7R9AJtvnp blGU7cu2xxZm0oaX0L9gjKoR7C50KmgpCnmhC5krAKtujnt3xUFCVFakcbM0nzbeG/wT vWO2S/ZMu+IJSgTGLAr7t/7dx0ZSWQWlXbq65P0CFuTX6dKyiwSRun3x/2b6qaSW8bQ+ CnrionESaOnRryVpA8BuFaVsg8fYEg/z7+lny33+qlUPZJaXEzEBSKENY/Nex/99SrHY +oeP2YzjxQsrcjYEKGKHFP9EidDrWUaoRnmEJmaF6qzdhhH2LoojyfciPhpf6568lci2 e4rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bDvejn22; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w24si11051077eje.339.2021.02.01.07.55.29; Mon, 01 Feb 2021 07:55:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bDvejn22; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231876AbhBAPyF (ORCPT + 99 others); Mon, 1 Feb 2021 10:54:05 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:52502 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232094AbhBAPwu (ORCPT ); Mon, 1 Feb 2021 10:52:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612194684; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k7JcQqia2+dIQ5jR2h/xA1wGbQRmVVlja2/HYWBBHWM=; b=bDvejn226PM8OFlUvzXpUYG2O6lUEeWDNOCnc4TxQqEsi1qZWGO8KNtCSaZWGg4zeDlpcf dbZuipXycvEHHTJ0eV/fjmdEWJkJr623gb+Ox7HGkDOrQOkfNsNq9Y45KQl0xzs5fgfiSx coeBv0F9kUQVujhWCKV91Jw8NmrUVk0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-87-uXYghaC-NlG2uuwXgi1lcA-1; Mon, 01 Feb 2021 10:51:20 -0500 X-MC-Unique: uXYghaC-NlG2uuwXgi1lcA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0B4781054FA3; Mon, 1 Feb 2021 15:51:04 +0000 (UTC) Received: from [10.36.115.24] (ovpn-115-24.ams2.redhat.com [10.36.115.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0F7F61346F; Mon, 1 Feb 2021 15:50:55 +0000 (UTC) To: Oscar Salvador Cc: Muchun Song , corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, naoya.horiguchi@nec.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: <20210117151053.24600-1-songmuchun@bytedance.com> <20210117151053.24600-6-songmuchun@bytedance.com> <20210126092942.GA10602@linux> <6fe52a7e-ebd8-f5ce-1fcd-5ed6896d3797@redhat.com> <20210126145819.GB16870@linux> <259b9669-0515-01a2-d714-617011f87194@redhat.com> <20210126153448.GA17455@linux> <9475b139-1b33-76c7-ef5c-d43d2ea1dba5@redhat.com> <20210128222906.GA3826@localhost.localdomain> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page Message-ID: <0f34d46b-cb42-0bbf-1d7e-0b4731bdb5e9@redhat.com> Date: Mon, 1 Feb 2021 16:50:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210128222906.GA3826@localhost.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28.01.21 23:29, Oscar Salvador wrote: > On Wed, Jan 27, 2021 at 11:36:15AM +0100, David Hildenbrand wrote: >> Extending on that, I just discovered that only x86-64, ppc64, and arm64 >> really support hugepage migration. >> >> Maybe one approach with the "magic switch" really would be to disable >> hugepage migration completely in hugepage_migration_supported(), and >> consequently making hugepage_movable_supported() always return false. > > Ok, so migration would not fork for these pages, and since them would > lay in !ZONE_MOVABLE there is no guarantee we can unplug the memory. > Well, we really cannot unplug it unless the hugepage is not used > (it can be dissolved at least). > > Now to the allocation-when-freeing. > Current implementation uses GFP_ATOMIC(or wants to use) + forever loop. > One of the problems I see with GFP_ATOMIC is that gives you access > to memory reserves, but there are more users using those reserves. > Then, worst-scenario case we need to allocate 16MB order-0 pages > to free up 1GB hugepage, so the question would be whether reserves > really scale to 16MB + more users accessing reserves. > > As I said, if anything I would go for an optimistic allocation-try > , if we fail just refuse to shrink the pool. > User can always try to shrink it later again via /sys interface. > > Since hugepages would not be longer in ZONE_MOVABLE/CMA and are not > expected to be migratable, is that ok? > > Using the hugepage for the vmemmap array was brought up several times, > but that would imply fragmenting memory over time. > > All in all seems to be overly complicated (I might be wrong). > > >> Huge pages would never get placed onto ZONE_MOVABLE/CMA and cannot be >> migrated. The problem I describe would apply (careful with using >> ZONE_MOVABLE), but well, it can at least be documented. > > I am not a page allocator expert but cannot the allocation fallback > to ZONE_MOVABLE under memory shortage on other zones? No, for now it's not done. Only movable allocations target ZONE_MOVABLE. Doing so would be controversial: when would be the right point in time to start spilling unmovable allocations into CMA/ZONE_MOVABLE? You certainly want to try other things first (swapping, reclaim, compaction), before breaking any guarantees regarding hotunplug+migration/compaction you have with CMA/ZONE_MOVABLE. And even if you would allow it, your workload would already suffer extremely. So it smells more like a setup issue. But then, who knows when allocating huge pages (esp. at runtime) that there are such side effects before actually running into them? We can make sure that all relevant archs support migration of ordinary (!gigantic) huge pages (for now, only x86-64, ppc64/spapr, arm64), so we can place them onto ZONE_MOVABLE. It gets harder with more special cases. Gigantic pages (without CMA) are more of a general issue, but at least it's simple to document ("Careful when pairing ZONE_MOVABLE with gigantic pages on !CMA"). An unexpected high amount of unmovable memory is just extremely difficult to handle with ZONE_MOVABLE; it's hard for the user/admin to figure out that such restrictions actually apply. -- Thanks, David / dhildenb