Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1060155pxb; Thu, 28 Jan 2021 07:11:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJwrQs6g1x91LR+F7GBsooSMHFkvrehXmwRXVr3IwnO8hAUxRGRkUNO77ZlNTIJRWPloJ1pl X-Received: by 2002:a17:906:f1d6:: with SMTP id gx22mr11638538ejb.348.1611846691546; Thu, 28 Jan 2021 07:11:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611846691; cv=none; d=google.com; s=arc-20160816; b=l6hm0sscWFU/eL9al/ij0fNafsQzrCa4GriipOh1Se4JcMw4ershHTyBYuWBB9ZA9o 1qnwtViHAPtx/YnfcNWxZ5VEGEd9OafUkI5j367kdML2JlawNhJBanqgC1fB7jmRJFZq Pitba+mE4Z+wnuGIZjHb3DVY7JirREIAFocrw/qmG70umNyBcinY1a+5/BcuCYL82Vop xiIzLtMp81Sn8gMeBrI9oGU8DkSt4IInkfFr3olPFTXpneRIy8R+ux1dOSMN5OlEeTh9 9CRdtExtFlMGUWis4oyOxIEEX3mIOyS7F1s6fnf4jQ6REXygKha1n+nYnoT4KVRTE045 yfaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=ChbOkEgbATFM37SwTaCF7y0/gcttE6lmGqqCimF4GA8=; b=o0F61i5G/lrbZIiO7pRfQ2NH+OnmL1EGALblC7QIO0yhbp250YZl+RlD0YhzlX8Af0 X6GdEwd+9ONcL9AjA4v/SExN2Zu6Iuuv4q2pqM97Ne0Q7mIgol1LeehPuuGziKt1H71C uKZxkEggcrzyd0CTg4YTixoJJLbzgz1ckklEEQ31XyE/1hzIeOxAORsf6S56cVNbFL8s W40dk8T9BOG/Xvu9bNChwkzOwkcSLP8Txpj0ShT1pABr7IMg9U9DzRqpYlb5XUjIFnGH M9T/Bcr57RmaFzMnyF3hsJ2AxR8Rsq4lWkgsfjMMw9Wfh9aSauv8wPdsQ40F1WBIAAnE hDPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NakydoJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b13si3208618edk.29.2021.01.28.07.11.05; Thu, 28 Jan 2021 07:11:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NakydoJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232037AbhA1PHH (ORCPT + 99 others); Thu, 28 Jan 2021 10:07:07 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:34114 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231793AbhA1PEr (ORCPT ); Thu, 28 Jan 2021 10:04:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611846201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ChbOkEgbATFM37SwTaCF7y0/gcttE6lmGqqCimF4GA8=; b=NakydoJ9h1ceN5CE7BYMNXJyGgNZ2zTJUZU2q07QU+GQYwZH6geO6/Dl5MOOlUdcjXbzQ/ wAkz3zxVpyu5VfbsFrw8xBRGr/IdxqadoSIcgxNJemO4wTMY4Q2kTKhn6HGKx7NvYYUnLW PHaIwAerF7vdWJFev7rQtzelOf7i52c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-290-h47oyd7PPFO5_XSp3oVzYw-1; Thu, 28 Jan 2021 10:03:16 -0500 X-MC-Unique: h47oyd7PPFO5_XSp3oVzYw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 21BC210052FE; Thu, 28 Jan 2021 15:03:13 +0000 (UTC) Received: from [10.36.113.207] (ovpn-113-207.ams2.redhat.com [10.36.113.207]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2A1845D720; Thu, 28 Jan 2021 15:03:08 +0000 (UTC) To: Pavel Tatashin Cc: linux-mm , LKML , Sasha Levin , Tyler Hicks , Andrew Morton , Dan Williams , Michal Hocko , Oscar Salvador , Vlastimil Babka , Joonsoo Kim , Jason Gunthorpe , Marc Zyngier , Linux ARM , Will Deacon , James Morse , James Morris References: <8c2b75fe-a3e5-8eff-7f37-5d23c7ad9742@redhat.com> <94797c92-cd90-8a65-b879-0bb5f12b9fc5@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: dax alignment problem on arm64 (and other achitectures) Message-ID: Date: Thu, 28 Jan 2021 16:03:07 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> One issue usually is that often firmware can allocate from available >> system RAM and/or modify/initialize it. I assume you're running some >> custom firmware :) > > We have a special firmware that does not touch the last 2G of physical > memory for its allocations :) > Fancy :) [...] >> Personally, I think the future is 4k, especially for smaller machines. >> (also, imagine right now how many 512MB THP you can actually use in your >> 8GB VM ..., simply not suitable for small machines). > > Um, this is not really about 512THP. Yes, this is smaller machine, but > performance is very important to us. Boot budget for the kernel is > under half a second. With 64K we save 0.2s 0.35s vs 0.55s. This is > because fewer struct pages need to be initialized. Also, fewer TLB > misses, and 3-level page tables add up as performance benefits. > > For larger servers 64K pages make total sense: Less memory is wasted as metdata. Yes, indeed, for very large servers it might make sense in that regard. However, once we can eventually free vmemmap of hugetlbfs things could change; assuming user space will be consuming huge pages (which large machines better be doing ... databases, hypervisors ... ). Also, some hypervisors try allocating the memmap completely ... but I consider that rather a special case. Personally, I consider being able to use THP/huge pages more important than having 64k base pages and saving some TLB space there. Also, with 64k you have other drawbacks: for example, each stack, each TLS for threads in applications suddenly consumes 16 times more memory as "minimum". Optimizing boot time/memmap initialization further is certainly an interesting topic. Anyhow, you know your use case best, just sharing my thoughts :) [...] >>> >>> Right, but I do not think it is possible to do for dax devices (as of >>> right now). I assume, it contains information about what kind of >>> device it is: devdax, fsdax, sector, uuid etc. >>> See [1] namespaces tabel. It contains summary of pmem devices types, >>> and which of them have label (all except for raw). >> >> Interesting, I wonder if the label is really required to get this >> special use case running. I mean, all you want is to have dax/kmem >> expose the whole thing as system RAM. You don't want to lose even 2MB if >> it's just for the sake of unnecessary metadata - this is not a real >> device, it's "fake" already. > > Hm, would not it essentially mean allowing memory hot-plug for raw > pmem devices? Something like create mmap, and hot-add raw pmem? Theoretically yes, but I have no idea if that would make sense for real "raw pmem" as well. Hope some of the pmem/nvdimm experts can clarify what's possible and what's not :) -- Thanks, David / dhildenb