Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp6669555pxb; Wed, 17 Feb 2021 10:10:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwqhEDhJYNSbzJrB8+aJHnY4lc+tvip+7nh7wGWuN2P6DCvr7FNbIJVS3x19qTna5CVljPo X-Received: by 2002:a17:906:398c:: with SMTP id h12mr164880eje.469.1613585452934; Wed, 17 Feb 2021 10:10:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613585452; cv=none; d=google.com; s=arc-20160816; b=hhznXRo5XjW0QPk4g/TOt1sO49kLANE+dHHwMIJkFjKw2gwm5XiwxZtMkCjnTBQMrO GiaSMRTmChwqiZGKZrOMQPtuVSpM5DD4loQB05ednZdVPpKgUTGlGEDfaWLFv0AFqosp FTciS2Q7IPHKFadAQ927NH7MLy9oCGlFe1AtR+IGDKCjr/VBRZqoL6HBJ5Hn5xxz0z5a f9QFXtE5yxV3hgTypb4lYFCGRUoVDB4W+NXEarU4y7HAsH8nQOeK8K059X7RYjgUDzLV gXUFLTbzLWN5BRMspElFxyO7LccTWeJXcnwTigquOIi5/83wg/dV3RgxxzDcPGdPNBfK jqBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=NdLbkril6CIYfiUNdKXicv+g1kCH6GD7XsaChB0EuFA=; b=iLoEm4Cxt24MKxrMPKa5kQ6c0bLwVUJfTuNCCR9B6um0mK4iE8DL6IzUXfQ9NdHllj vDRVAVvWwYPeA0cUJY+lNA3VkYaUf1ypo5Q87t/Xqs+7HL1XTiRRAtEQ/StTjCRVBkUN mxD0AwtpGOWTqS1nvS017oeEsXIFdGGHus0MhrUNW5DgLJ6Lfo4ul3X0x4nbzQJeqlQQ z0bBmLt0Vq9OXikObWAaCGmPQUaLbGL2ZGDCqgvMpYeVyjOH3dkAFJjht488ZRBSrpxr f3df+9xl/XC/VMIJorZ+/moqzqr1d+hCYkoOstOuF67xSxJTF921d519h63usdGMh4Xg En7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=foa5CReH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w5si2166473edv.61.2021.02.17.10.10.27; Wed, 17 Feb 2021 10:10:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=foa5CReH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232862AbhBQNzO (ORCPT + 99 others); Wed, 17 Feb 2021 08:55:14 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:45635 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231470AbhBQNzM (ORCPT ); Wed, 17 Feb 2021 08:55:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613570025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NdLbkril6CIYfiUNdKXicv+g1kCH6GD7XsaChB0EuFA=; b=foa5CReHqAuADz/X0a6bfDHW7/+jikytQuo7E+cPQXUJZkj53ddV3VI052eq1PSoROk3Xd hnLk4Y6e2rx6gJwOYWMIQCURIwpdF8TlQH8dHqjbVlxqSYVQZVRBNvJoY3MIN3ac8ZMLXB 5qSe7Wdp6Sgnimye/ZdDYUxFgGOQHkA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-75-vh45aAXBMeOgoLTDHE_SsQ-1; Wed, 17 Feb 2021 08:53:41 -0500 X-MC-Unique: vh45aAXBMeOgoLTDHE_SsQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 37AAF100A67C; Wed, 17 Feb 2021 13:53:40 +0000 (UTC) Received: from [10.36.114.178] (ovpn-114-178.ams2.redhat.com [10.36.114.178]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E5DA5D719; Wed, 17 Feb 2021 13:53:38 +0000 (UTC) Subject: Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages To: Michal Hocko Cc: Oscar Salvador , Andrew Morton , Mike Kravetz , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210217100816.28860-1-osalvador@suse.de> <20210217100816.28860-2-osalvador@suse.de> <182f6a4a-6f95-9911-7730-8718ab72ece2@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <5f50c810-3f49-a162-6d1d-cf621c515f45@redhat.com> Date: Wed, 17 Feb 2021 14:53:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17.02.21 14:50, Michal Hocko wrote: > On Wed 17-02-21 14:36:47, David Hildenbrand wrote: >> On 17.02.21 14:30, Michal Hocko wrote: >>> On Wed 17-02-21 11:08:15, Oscar Salvador wrote: >>>> Free hugetlb pages are tricky to handle so as to no userspace application >>>> notices disruption, we need to replace the current free hugepage with >>>> a new one. >>>> >>>> In order to do that, a new function called alloc_and_dissolve_huge_page >>>> is introduced. >>>> This function will first try to get a new fresh hugetlb page, and if it >>>> succeeds, it will dissolve the old one. >>>> >>>> With regard to the allocation, since we do not know whether the old page >>>> was allocated on a specific node on request, the node the old page belongs >>>> to will be tried first, and then we will fallback to all nodes containing >>>> memory (N_MEMORY). >>> >>> I do not think fallback to a different zone is ok. If yes then this >>> really requires a very good reasoning. alloc_contig_range is an >>> optimistic allocation interface at best and it shouldn't break carefully >>> node aware preallocation done by administrator. >> >> What does memory offlining do when migrating in-use hugetlbfs pages? Does it >> always keep the node? > > No it will break the node pool. The reasoning behind that is that > offlining is an explicit request from the userspace and it is expected userspace? in 99,9996% it's the hardware that triggers the unplug of a DIMM. > >> I think keeping the node is the easiest/simplest approach for now. >> >>> >>>> Note that gigantic hugetlb pages are fenced off since there is a cyclic >>>> dependency between them and alloc_contig_range. >>> >>> Why do we need/want to do all this in the first place? >> >> cma and virtio-mem (especially on ZONE_MOVABLE) really want to handle >> hugetlbfs pages. > > Do we have any real life examples? Or does this fall more into, let's > optimize an existing implementation category. > It's a big TODO item I have on my list and I am happy that Oscar is looking into it. So yes, I noticed it while working on virtio-mem. It's real. -- Thanks, David / dhildenb