Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3921161pxb; Tue, 26 Jan 2021 08:00:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4lk4ZgcDL6u3G6rGkJ5NsNQpv9FJ7Ex3e/q9nHbzDQhpHkbWEz0tLDqGPs1kr909kAEZn X-Received: by 2002:a17:906:408:: with SMTP id d8mr3909759eja.280.1611676857719; Tue, 26 Jan 2021 08:00:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611676857; cv=none; d=google.com; s=arc-20160816; b=U/Nh2dWt8d7juwR5bLmTA7mbtrI4IvNN7sMUBVjpKewGI06YlUc4aeGtI4jaczA3WE 0Vu1JQo/aI2yUYQrrqx3PeMAW7U0sxCBzYTrQcJiJGxbTGFiQ+Vwu2OKDh0byPR4e6UQ qMu7KeYDZxKE+WK0RIJfJGEK0VM0cTyGJV6KZQsigoYOxsUQcpez+B+XdedH8l/2viqA Cg44KTruYM91qjvZp+xceVRD+udbryDIw11dNlxWZ19Q3HTYeUSM1F0BK+GCeYN1HBgw /jXrlpxLltmZZEuGLyNlI2eNvhilJ6gPEnwq+ge4hId+H7ekFGVaSKhjaXXf+z6aLhhX eptg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=qvg6GUvEnmAW96ByuvMGsJ8tE+9PZ78AQOUlu7dcYeM=; b=jhn0cKUsOWwtdXBW5fSFfRPrSJopFw7EYtm0qBAH6Syc1QXCCGiX1ewp5+B5VCdLYP NuWDGBSr/gIYZfCzUbh3TqFcfZ33/HRnj3bwLZ7KHQxtavw3IdPYpwcC2uxqwFX5ZlDD A7BHfw69dU2dW9DVgnNOQCuC0dr6V5qTkxJ2w8oW4Q81FFM/0FLIWL5DAkKLeUyccTiU goKqp7b9VUbEt/3/x2D2eAZ5Ur0ZcujI8ABHmPS4YRSOmJzuPWmv2SnEj82unBb9DPLj dP/TUe4UciTbYNx3hyBvlHu1Dctjc4Q2U8Ra0TjuagAKrr7MQx/kEwvA+o6BzVj3X+Ec 2Xwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HHv21YTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l7si8633811eds.458.2021.01.26.08.00.32; Tue, 26 Jan 2021 08:00:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HHv21YTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404247AbhAZP6s (ORCPT + 99 others); Tue, 26 Jan 2021 10:58:48 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:59846 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404269AbhAZP5r (ORCPT ); Tue, 26 Jan 2021 10:57:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611676580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qvg6GUvEnmAW96ByuvMGsJ8tE+9PZ78AQOUlu7dcYeM=; b=HHv21YTl2h+Mlc4rScAlV3diadayic/+dioYehr+T79fqh3kCQjf2xGEqX0NW7yDYTA1Pg scjbFQNkpzN7s1r8EASq8uNpXicUSfh9GjuQTakp0TojAt5DLLkynVZm59LVYDi6OTHPJV N+PcPsmZWHrAZmfdpciSy8AThyf0VaQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-307-D7G1GSxJNte8KCAbmeW5BQ-1; Tue, 26 Jan 2021 10:56:16 -0500 X-MC-Unique: D7G1GSxJNte8KCAbmeW5BQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A669B107ACE8; Tue, 26 Jan 2021 15:56:12 +0000 (UTC) Received: from [10.36.114.192] (ovpn-114-192.ams2.redhat.com [10.36.114.192]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66D3D6B8DD; Tue, 26 Jan 2021 15:56:06 +0000 (UTC) To: Oscar Salvador Cc: Muchun Song , corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, naoya.horiguchi@nec.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: <20210117151053.24600-1-songmuchun@bytedance.com> <20210117151053.24600-6-songmuchun@bytedance.com> <20210126092942.GA10602@linux> <6fe52a7e-ebd8-f5ce-1fcd-5ed6896d3797@redhat.com> <20210126145819.GB16870@linux> <259b9669-0515-01a2-d714-617011f87194@redhat.com> <20210126153448.GA17455@linux> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page Message-ID: <9475b139-1b33-76c7-ef5c-d43d2ea1dba5@redhat.com> Date: Tue, 26 Jan 2021 16:56:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210126153448.GA17455@linux> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26.01.21 16:34, Oscar Salvador wrote: > On Tue, Jan 26, 2021 at 04:10:53PM +0100, David Hildenbrand wrote: >> The real issue seems to be discarding the vmemmap on any memory that has >> movability constraints - CMA and ZONE_MOVABLE; otherwise, as discussed, we >> can reuse parts of the thingy we're freeing for the vmemmap. Not that it >> would be ideal: that once-a-huge-page thing will never ever be a huge page >> again - but if it helps with OOM in corner cases, sure. > > Yes, that is one way, but I am not sure how hard would it be to implement. > Plus the fact that as you pointed out, once that memory is used for vmemmap > array, we cannot use it again. > Actually, we would fragment the memory eventually? > >> Possible simplification: don't perform the optimization for now with free >> huge pages residing on ZONE_MOVABLE or CMA. Certainly not perfect: what >> happens when migrating a huge page from ZONE_NORMAL to (ZONE_MOVABLE|CMA)? > > But if we do not allow theose pages to be in ZONE_MOVABLE or CMA, there is no > point in migrate them, right? Well, memory unplug "could" still work and migrate them and alloc_contig_range() "could in the future" still want to migrate them (virtio-mem, gigantic pages, powernv memtrace). Especially, the latter two don't work with ZONE_MOVABLE/CMA. But, I mean, it would be fair enough to say "there are no guarantees for alloc_contig_range()/offline_pages() with ZONE_NORMAL, so we can break these use cases when a magic switch is flipped and make these pages non-migratable anymore". I assume compaction doesn't care about huge pages either way, not sure about numa balancing etc. However, note that there is a fundamental issue with any approach that allocates a significant amount of unmovable memory for user-space purposes (excluding CMA allocations for unmovable stuff, CMA is special): pairing it with ZONE_MOVABLE becomes very tricky as your user space might just end up eating all kernel memory, although the system still looks like there is plenty of free memory residing in ZONE_MOVABLE. I mentioned that in the context of secretmem in a reduced form as well. We theoretically have that issue with dynamic allocation of gigantic pages, but it's something a user explicitly/rarely triggers and it can be documented to cause problems well enough. We'll have the same issue with GUP+ZONE_MOVABLE that Pavel is fixing right now - but GUP is already known to be broken in various ways and that it has to be treated in a special way. I'd like to limit the nasty corner cases. Of course, we could have smart rules like "don't online memory to ZONE_MOVABLE automatically when the magic switch is active". That's just ugly, but could work. -- Thanks, David / dhildenb