Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp604147pxb; Wed, 27 Jan 2021 16:28:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJxUCvPKctr6qqlRLdTKOYy7iyuTZENzxWcN7ojBDln+feZz79t3DfgHv479MWQgML4p9vR/ X-Received: by 2002:a17:906:4143:: with SMTP id l3mr8557252ejk.306.1611793737595; Wed, 27 Jan 2021 16:28:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611793737; cv=none; d=google.com; s=arc-20160816; b=DttvxabfKS2W3xL1JoPllcJnTp1P+IZ7z5gK3/xnL4czIBTvUyh/3KSORGuyJfhKJm eS0ZFn/8d0UXhTTL3+OX/1XTBEwmZGaipCdC8xXKiq+Y9k22GRrapbXOI9N4Q7YKwM2Z yvFAiNVYdnd2XcQmoDjEcNIxzoynU+NzZnQgno/EtoLY/aoIMPB/U0LJZvdF/s5xZmVR rZKlGFX+1aRToXtpQhIgn0JqIh5pAJUplf5CJp5aBrUvjYnK+p3GWCljDc/+KTUEqDJ/ sGjKTPWgRUza2up+kfFfrPE41qYCrA4nb8aQG1QccCusv8gdYP+GrTQXPDJVnDPTo26L V+vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:subject:message-id:date:from:mime-version :dkim-signature; bh=8YqfyvXUUcZWe9IQfEwv1uN3EftJilGP7wEEowSADhg=; b=bYfZ7QA6ZR6pS4X6mhHkxCJ1d983jQwPCjCLka81ejUJAu5sy7MEN9d1RcFZjXwlIY gTgqX06qwxwIJPXOJYv654uQan/f1HhRp2dLB3Lu8ecz/XQ27pfr8blcTs/t4Fs5o0Xi mThJpoQ9lEgR7wUEzA47zMTHRSXfjiA6OY9Uhdwcmdjyr7hdF7hYbAkKy15kFgnFYBvm WvMRctm8j4GaWnJ8ANZ0VieVIAYP/cHM9IyMZk3dqQemHuSHIZjsww90YlQFuDhXTs1w u3VR5Ro0tqVtdJyzzm0HTP6EKScXxHt8gqy5Hh+ZM47OjtgTuYcvb4ecPROxTVG10Bq7 LCXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=dYQh6vgG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o34si1879317eda.345.2021.01.27.16.28.34; Wed, 27 Jan 2021 16:28:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=dYQh6vgG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233349AbhA0Uoi (ORCPT + 99 others); Wed, 27 Jan 2021 15:44:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233256AbhA0Uog (ORCPT ); Wed, 27 Jan 2021 15:44:36 -0500 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A279C061573 for ; Wed, 27 Jan 2021 12:43:55 -0800 (PST) Received: by mail-ed1-x534.google.com with SMTP id b21so4092053edy.6 for ; Wed, 27 Jan 2021 12:43:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=8YqfyvXUUcZWe9IQfEwv1uN3EftJilGP7wEEowSADhg=; b=dYQh6vgGP4ZJ/OG6qVH+8jnnFmvM+HetBo7y4oFBFJuBZ/tK/QE0cZ8Evi0PgVCq9Y HQgk+0jYm4I6a8jngkSyzZOHPGw5WqsQ0T5UNaoCe0v1iSW3kK3PMWUUqMozL8lT2ZUV SvVyCyvx3MvBUUntT3HMaPa4tBbct+NKlD6vJ0qjGRldHPphtF2jpyloIPvHuyjAlRZs wHI9AiC4Q7Yvag5yQNjceqjWre1NZYS0ZUprZDgu/UCpgcJPvONx/kisXAnFpZpIbmpu 3lK0/BzInBaL98xzZJsrYh/ni6qTof4ss9VC45Xuk8pcPh9xKMhURqNVwkNq7BJRAEoh UDPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=8YqfyvXUUcZWe9IQfEwv1uN3EftJilGP7wEEowSADhg=; b=d19G3ldJv4VfcU1ek+RjvdL/31l9byUahNUiO5cMpwuL9iObYCNBuTWuznGw6fcugN uGuKrwiqTGIRMCkZhOydbXEkOessA6V/o197khHi1UhQ1WlNfORLuzCIb/58lLngJ511 t3f9tOVcfijcXMNuPOFn9GQfbfc+x3apqqBaPYgds3wu5kb8Hm7+EZp8bqx7aMkK1iaS 1Qt9wtv9u83uqg3Rpd0L016OBSU0oQS6Kc5qtW6tGW572QgUUdW4PVP1IcnrvTfIUSdl C3IjFk5iDuzf8/ikaGVx+lNaQ5Gi1zxl90MY9MNf+LQHqvZEyUn0h6siUHN5s+5azA+y 5zSQ== X-Gm-Message-State: AOAM533n+ni64G+bKWVqTSh+CyEFRWsED4qRW7jjSAIitRrlhgRs6QZl qAe8Hi7J7N8PY1pO6kIsd59SBuuHBx4caFoRQkzaNQ== X-Received: by 2002:aa7:d987:: with SMTP id u7mr5937436eds.62.1611780234243; Wed, 27 Jan 2021 12:43:54 -0800 (PST) MIME-Version: 1.0 From: Pavel Tatashin Date: Wed, 27 Jan 2021 15:43:18 -0500 Message-ID: Subject: dax alignment problem on arm64 (and other achitectures) To: linux-mm , LKML , Sasha Levin , Tyler Hicks , Andrew Morton , Dan Williams , David Hildenbrand , Michal Hocko , Oscar Salvador , Vlastimil Babka , Joonsoo Kim , Jason Gunthorpe , Marc Zyngier , Linux ARM , Will Deacon , James Morse , James Morris Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is something that Dan Williams and I discussed off the mailing list sometime ago, but I want to have a broader discussion about this problem so I could send out a fix that would be acceptable. We have a 2G pmem device that is carved out of regular memory that we use to pass data across reboots. After the machine is rebooted we hotplug that memory back, so we do not lose 2G of system memory (machine is small, only 8G of RAM total). In order to hotplug pmem memory it first must be converted to devdax. Devdax has a label 2M in size that is placed at the beginning of the pmem device memory which brings the problem. The section size is a hotplugging unit on Linux. Whatever gets hot-plugged or hot-removed must be section size aligned. On x86 section size is 128M on arm64 it is 1G (because arm64 supports 64K pages, and 128M does not work with 64K pages). Because the first 2M are subtracted from the pmem device to create devdax, that actual hot-pluggable memory is not 1G/128M aligned, and instead we lose 126M on x86 or 1022M on arm64 of memory that is getting hot-plugged, the whole first section is skipped when memory gets hot plugged because of 2M label. As a workaround, so we do not lose 1022M out of 8G of memory on arm64 we have section size reduced to 128M. We are using this patch [1]. This way we are losing 126M (which I still hate!) I would like to get rid of this workaround. First, because I would like us to switch to 64K pages to gain performance, and second so we do not depend on an unofficial patch which already has given us some headache with kdump support. Here are some solutions that I think we can do: 1. Instead of carving the memory at 1G aligned address, do it at 1G - 2M address, this way when devdax is created it is perfectly 1G aligned. On ARM64 it causes a panic because there is a 2M hole in memory. Even if panic is fixed, I do not think this is a proper fix. This is simply a workaround to the underlying problem. 2. Dan Williams introduced subsections [2]. They, however do not work with devdax, and hot-plugging in general. Those patches take care of __add_pages() side of things, and not add_memory(). Also, it is unclear what kind of user interface changes need to be made in order to enable subsection features to online/offline pages. 3. Allow to hot plug daxdev together with the label, but teach the kernel not to touch label (i.e. allocate its memory). IMO, kind of ugly solution, because when devdax is hot-plugged it is not even aware of label size. But, perhaps that can be changed. 4. Other ideas? (move dax label to the end? a special case without a label? label outside of data?) Thank you, Pasha [1] https://lore.kernel.org/lkml/20190423203843.2898-1-pasha.tatashin@soleen.com [2] https://lore.kernel.org/lkml/156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com