Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1243899imm; Wed, 10 Oct 2018 11:19:54 -0700 (PDT) X-Google-Smtp-Source: ACcGV62XsveRcLwC5R4XzBhOhi0s6+/ATwcHEzXVnONdHWIjTIjiPbUZQvGrepMzvR7oz7rS3Hb2 X-Received: by 2002:a62:f909:: with SMTP id o9-v6mr35988520pfh.160.1539195594439; Wed, 10 Oct 2018 11:19:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539195594; cv=none; d=google.com; s=arc-20160816; b=PD9kbuh1hB3qkA33nvYAbvnEeTnqBfi86NmwADyGYQeeongNKAwUuLdCtW2CNdVT3f XoPV6m4WcsghzPwC0cVCcH1zypZy1xmFI1bBKzshKdAhwKEQ/uhaiQuS3VnKf4NpvURl T/bYYgeo9o+RfwLNBBIpHH6b/M6qYf3qR5dDE8D8PxLQPxJn2d2kkRg+ObFOpDsSIz3w /dvKuP1HLfoDl0fPYPoWOZTu9mYj0p3xMl8kXxC1GC5P0QRvDB8nCqwTQ15DREYyUGJW ylUL46at9d3AQaYdilGWqQdkJVN+hw/btlOia9ljS2jMY8VIPfNLZWM9VTxkdSAYDGdq 5IUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=36MHwRT1ipsNtYtZWpwxQqOkJnCbCF2tSfZNOZc+owE=; b=RfKHDgxpLf9USmXqkyK+bZFmsGwdBdvRQCuT1fjbw6+0a+i5+P+WYwieHDTV/iilkt nYCWnBb8AAfpoZPi55SLU58hRWVtJqw1NhiWGb3Zi9mBUcmRFL68H9n2KHq2YMSoT5D5 RrOGnL1HY4vterlSRXi8qJomipQAYmwG+yRQ81wFnGqWEFmOsxGivJrLpqyrPgb0clJS khFxJQul3u4SNqiOldsp/nczEDRwj+dxbmnWE7KIOEeWkXs0qL2E76MJz8/J2IUOyu5b rJ3U7Mpgv7vyTXsqw/HRt0sKKlrGdIKsth3arN/vYrv6CdLrAlbW62f/ibbjVoXFFZCe ga0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vwDASnrF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m16-v6si21218005pgl.462.2018.10.10.11.19.39; Wed, 10 Oct 2018 11:19:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vwDASnrF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726791AbeJKBmV (ORCPT + 99 others); Wed, 10 Oct 2018 21:42:21 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:41523 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726525AbeJKBmV (ORCPT ); Wed, 10 Oct 2018 21:42:21 -0400 Received: by mail-ot1-f68.google.com with SMTP id c32so6310280otb.8 for ; Wed, 10 Oct 2018 11:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=36MHwRT1ipsNtYtZWpwxQqOkJnCbCF2tSfZNOZc+owE=; b=vwDASnrFTSkXN5YuR/gR8N76Fb9GfjuqmeLleaWaArjysIkqheNW+5yGHDObpLwdf8 fOZVv6xppPjwHHy0XIuIEbYTLSNFBkA/4QNXSq3QAdeEmxTQxZVL41kuAo3GYMPZNCeC lMfZYOD9fr7NOjiT0HdZbH22ECRR6MSjDqtLSEm3ooTQlDU8Fny/B1G0FMoyF4F15/g1 nroTOl5FgKBd0/Qp3PwnQBgSOHQA+OyExXwtgBY6PKp8O9nCZqY2jutVMbElZ//mzSpS cf3DlRyYWyV4U3ib/LrLhu8bvQSu1dohPuQCqlYlM21b5e24cv/O8Bc9fQ/nyzFZ/+zu zLwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=36MHwRT1ipsNtYtZWpwxQqOkJnCbCF2tSfZNOZc+owE=; b=H0i/9OCNNvetnsVP+8I2vxk2A8oDitSa2XZkLbQY6daXPmTorJkpXk1Lc9Dr2ctRuc B8QTkDhw7XZx641Idr8BQVPzuEcd2nXueV2HR8umiL/3J/TENfaf4Si5Z97Af4mDeb+Z VGgqxnqpIbUsXUzS1x1yM47Iui0lu9ks9etjUIW6NNnQFCId6u+g6jo+M9gb2JfiE+4+ chhAJw/CbZEnU+cvPWbx7YXY6Nzli4FzhNOc07f0oDqLYgaOUXnlcRcMQ43rAYXbsJij FG/Dm4Uy/R+XntmCBSp1A6CjFcc9jP1vEdEaUsPtzdsnUBjItt+7CjlflEg9DRwhpTzg ykOw== X-Gm-Message-State: ABuFfogZgO3U03YZP+BRvbXRmGgnN6cU/4cy3rDsILIB39EPmgS927AQ LMzJWBwLEZ1ARhwIeizIm418TXPbUo9Xi6y8wFOq6lvOqIs= X-Received: by 2002:a9d:24c7:: with SMTP id z65mr20165781ota.229.1539195540660; Wed, 10 Oct 2018 11:19:00 -0700 (PDT) MIME-Version: 1.0 References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <20181009170051.GA40606@tiger-server> <25092df0-b7b4-d456-8409-9c004cb6e422@linux.intel.com> <20181010095838.GG5873@dhcp22.suse.cz> <20181010172451.GK5873@dhcp22.suse.cz> In-Reply-To: <20181010172451.GK5873@dhcp22.suse.cz> From: Dan Williams Date: Wed, 10 Oct 2018 11:18:49 -0700 Message-ID: Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap To: Michal Hocko Cc: alexander.h.duyck@linux.intel.com, Linux MM , Andrew Morton , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , rppt@linux.vnet.ibm.com, Ingo Molnar , "Kirill A. Shutemov" , Zhang Yi Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 10, 2018 at 10:30 AM Michal Hocko wrote: > > On Wed 10-10-18 09:39:08, Alexander Duyck wrote: > > On 10/10/2018 2:58 AM, Michal Hocko wrote: > > > On Tue 09-10-18 13:26:41, Alexander Duyck wrote: > > > [...] > > > > I would think with that being the case we still probably need the call to > > > > __SetPageReserved to set the bit with the expectation that it will not be > > > > cleared for device-pages since the pages are not onlined. Removing the call > > > > to __SetPageReserved would probably introduce a number of regressions as > > > > there are multiple spots that use the reserved bit to determine if a page > > > > can be swapped out to disk, mapped as system memory, or migrated. > > > > > > PageReserved is meant to tell any potential pfn walkers that might get > > > to this struct page to back off and not touch it. Even though > > > ZONE_DEVICE doesn't online pages in traditional sense it makes those > > > pages available for further use so the page reserved bit should be > > > cleared. > > > > So from what I can tell that isn't necessarily the case. Specifically if the > > pagemap type is MEMORY_DEVICE_PRIVATE or MEMORY_DEVICE_PUBLIC both are > > special cases where the memory may not be accessible to the CPU or cannot be > > pinned in order to allow for eviction. > > Could you give me an example please? > > > The specific case that Dan and Yi are referring to is for the type > > MEMORY_DEVICE_FS_DAX. For that type I could probably look at not setting the > > reserved bit. Part of me wants to say that we should wait and clear the bit > > later, but that would end up just adding time back to initialization. At > > this point I would consider the change more of a follow-up optimization > > rather than a fix though since this is tailoring things specifically for DAX > > versus the other ZONE_DEVICE types. > > I thought I have already made it clear that these zone device hacks are > not acceptable to the generic hotplug code. If the current reserve bit > handling is not correct then give us a specific reason for that and we > can start thinking about the proper fix. > Right, so we're in a situation where a hack is needed for KVM's current interpretation of the Reserved flag relative to dax mapped pages. I'm arguing to push that knowledge / handling as deep as possible into the core rather than hack the leaf implementations like KVM, i.e. disable the Reserved flag for all non-MEMORY_DEVICE_* ZONE_DEVICE types. Here is the KVM thread about why they need a change: https://lkml.org/lkml/2018/9/7/552 ...and where I pushed back on a KVM-local hack: https://lkml.org/lkml/2018/9/19/154