Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp31696pxu; Tue, 5 Jan 2021 04:30:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzA85JxrDBzHJ4gWlxRAuGNOK1mwDslWqWbbjaC8H3lxSFLRmGkNUlBkm6eWnW5aGiwxdCE X-Received: by 2002:a05:6402:2292:: with SMTP id cw18mr21577262edb.336.1609849816889; Tue, 05 Jan 2021 04:30:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609849816; cv=none; d=google.com; s=arc-20160816; b=q9+ccRbZ9p9MvODgVWevuXe2ZG/mHORaTFmav1eVTIxGRIICtJA/hx39Y2BkfCBap7 md5cyMHIz8NaGW7XryfMiayAQ4XVyGXyMNlUo6hpWZZqPy/jtLirEeksXBOUCXmZZL38 CTsMWHJ1q3kG32o3F7158Yadzb551mTnA2MdzEB7QVHAkZajCnyIlu+9K4qRWAyFUfCc GxFr5yR69CTxlVkIyNhlmFTnKIwt3rZSODfXfdBeexpy24dwm19ygU5YRsZnsDlvWzme m+YXa550PkyMgllljov9u7Qb0Rz28msGzPGGNkax8gS6s8zGZCzIM4T67i4QaoM4hqEk l5NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=GnN5IEGJ/23g6PIWT40J6HpSWq+j0GIc1IYqRwgM3sQ=; b=WL7jxgQ4atRmRcV5mQ69leU6Y5TCD+qf3YUJWym5qnDV6yGKlD7TcdrzvLTYbNdYmb VmhIQ3XADu9e3EtkuGUvOHP6Htar373jLjp2+dfwwAiXUCipB59GO7Fn6wW+8KUk7O7z NawX94W0ureNALakQRloPVFKGD1M4nROaUZVKRFt8VySlasLngcOnTnb499417kViOxg kS/ljlw4glcnKJdXgl08GTgT6KIpw8k0QOSUrKZka29H1r3GA5pWqFbS0++PHMx6hsAT IrhwdQpM4QDP4CE4UbC456IB9oUMxRGSAVi0adSMjc0nCCVyFIpiqWXckA6S6NdMEED+ jS+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LuOQ7TAg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cb9si32533102edb.580.2021.01.05.04.29.53; Tue, 05 Jan 2021 04:30:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LuOQ7TAg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729115AbhAEK2w (ORCPT + 99 others); Tue, 5 Jan 2021 05:28:52 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:21110 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728951AbhAEK2w (ORCPT ); Tue, 5 Jan 2021 05:28:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609842445; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GnN5IEGJ/23g6PIWT40J6HpSWq+j0GIc1IYqRwgM3sQ=; b=LuOQ7TAg15O5qOnteL2ucP72YPJTou6IBOs8Ta7uCirk+nqEL4qlhXgV7oWQaFmKHbKdVE nYRbHrO1l1eXS3uUFpAFuu1pK5pmS6b7avQMsS7ct8StLKpmLNCrskBswfP9L/8WM8Kx57 ev/5DbqXAMa0NbApeOD08ghtqLy2gSc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-365-OgiryqcPOmOX81V2EvMReg-1; Tue, 05 Jan 2021 05:27:24 -0500 X-MC-Unique: OgiryqcPOmOX81V2EvMReg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 162F71572B; Tue, 5 Jan 2021 10:27:22 +0000 (UTC) Received: from [10.36.114.117] (ovpn-114-117.ams2.redhat.com [10.36.114.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id B72D770959; Tue, 5 Jan 2021 10:27:11 +0000 (UTC) Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: Liang Li Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm , LKML , virtualization@lists.linux-foundation.org References: <96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <85f16139-b499-dd02-f2bc-c3c42d57ccd8@redhat.com> Date: Tue, 5 Jan 2021 11:27:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05.01.21 11:22, Liang Li wrote: >>>> That‘s mostly already existing scheduling logic, no? (How many vms can I put onto a specific machine eventually) >>> >>> It depends on how the scheduling component is designed. Yes, you can put >>> 10 VMs with 4C8G(4CPU, 8G RAM) on a host and 20 VMs with 2C4G on >>> another one. But if one type of them, e.g. 4C8G are sold out, customers >>> can't by more 4C8G VM while there are some free 2C4G VMs, the resource >>> reserved for them can be provided as 4C8G VMs >>> >> >> 1. You can, just the startup time will be a little slower? E.g., grow >> pre-allocated 4G file to 8G. >> >> 2. Or let's be creative: teach QEMU to construct a single >> RAMBlock/MemoryRegion out of multiple tmpfs files. Works as long as you >> don't go crazy on different VM sizes / size differences. >> >> 3. In your example above, you can dynamically rebalance as VMs are >> getting sold, to make sure you always have "big ones" lying around you >> can shrink on demand. >> > Yes, we can always come up with some ways to make things work. > it will make the developer of the upper layer component crazy :) I'd say that's life in upper layers to optimize special (!) use cases. :) >>> >>> You must know there are a lot of functions in the kernel which can >>> be done in userspace. e.g. Some of the device emulations like APIC, >>> vhost-net backend which has userspace implementation. :) >>> Bad or not depends on the benefits the solution brings. >>> From the viewpoint of a user space application, the kernel should >>> provide high performance memory management service. That's why >>> I think it should be done in the kernel. >> >> As I expressed a couple of times already, I don't see why using >> hugetlbfs and implementing some sort of pre-zeroing there isn't sufficient. > > Did I miss something before? I thought you doubt the need for > hugetlbfs free page pre zero out. Hugetlbfs is a good choice and is > sufficient. I remember even suggesting to focus on hugetlbfs during your KVM talk when chatting. Maybe I was not clear before. > >> We really don't *want* complicated things deep down in the mm core if >> there are reasonable alternatives. >> > I understand your concern, we should have sufficient reason to add a new > feature to the kernel. And for this one, it's most value is to make the > application's life is easier. And implementing it in hugetlbfs can avoid > adding more complexity to core MM. Exactly, that's my point. Some people might still disagree with the hugetlbfs approach, but there it's easier to add tunables without affecting the overall system. -- Thanks, David / dhildenb