Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp174456pxb; Thu, 2 Sep 2021 01:19:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyPE/Xi1Rqk9lZlm1uWgOtYjsLEnwC5PzfYGhOfvc0mMvGet0YnpJbl027N0gc8NtsVdZkq X-Received: by 2002:a92:c091:: with SMTP id h17mr1449010ile.286.1630570794881; Thu, 02 Sep 2021 01:19:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630570794; cv=none; d=google.com; s=arc-20160816; b=gRZ50VSbk65m08wSspu4vC/IXvMP6Aa9uYEOnVb4R44tJ9VfhiDVuECkGPflsVVirX Y0hVrhgY/VlNjTSzDnzNm/DAubuTIn3roKJdCelaBPmTZHRFFXdFjEdLjMpsVDWVUe7B iI4Fl866GJ36TuIvadGmf4acq0cmmgkhxwp1z3bP8CG98iTyiRLklGbSfDjfDpuezjWK uqO8FyIXGUU09M+f0UksiL2OIMa63RM0sYhy0ITvpvCPasN6McJG8hN9FNW6Qg2tQ4L7 +cyrOS+5qmD/sD3oozHqHjuLZ+vW1UndZGbgLUls2jefwUgiAcMgP7OHl2pQPE6mNSdF WW1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=rRFKjMS5Xkv8vlXK/al4g4WrildpioKaFKxzWitMb1o=; b=t3yIAHoZqhOHU7e+Ox+qh3dRiFJovoMI6Z1c87prqlOAVGoijadS7MCViSJ+zkt+de G9nTGaqsSEJ0yPGynRDBbhFUYIxz6aDhzYJlaVaLcv5i0GkvJ22qFHbt9Y9H/fbJXAuy 8cJflk+gJg2ifL1DScsen7fjsM6cas9d7buYCTJvlgC/ZfzX0X+eN0LapAUHe7zXTzEg V/Gz4FL2SPTKWRmDkPw+sEz+vF0W0xYZNw18XwSSlxqK9oNsNwySgRhbB8DLZEGsWH+/ cqAN6rRm78q4EqveEXUE/ahJzCMKRTQ0+L0g7OYW/Z7naw7f2nd6YasaFrIFsgsxtCWY Q4lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x10si1362744ilg.50.2021.09.02.01.19.37; Thu, 02 Sep 2021 01:19:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244195AbhIBIT3 (ORCPT + 99 others); Thu, 2 Sep 2021 04:19:29 -0400 Received: from verein.lst.de ([213.95.11.211]:50460 "EHLO verein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233504AbhIBIT2 (ORCPT ); Thu, 2 Sep 2021 04:19:28 -0400 Received: by verein.lst.de (Postfix, from userid 2407) id 1190E6736F; Thu, 2 Sep 2021 10:18:26 +0200 (CEST) Date: Thu, 2 Sep 2021 10:18:26 +0200 From: Christoph Hellwig To: Felix Kuehling Cc: Christoph Hellwig , "Sierra Guiza, Alejandro (Alex)" , akpm@linux-foundation.org, linux-mm@kvack.org, rcampbell@nvidia.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, jgg@nvidia.com, jglisse@redhat.com, Dan Williams Subject: Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration Message-ID: <20210902081826.GA16283@lst.de> References: <20210825034828.12927-1-alex.sierra@amd.com> <20210825034828.12927-4-alex.sierra@amd.com> <20210825074602.GA29620@lst.de> <20210830082800.GA6836@lst.de> <20210901082925.GA21961@lst.de> <11d64457-9d61-f82d-6c98-d68762dce85d@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <11d64457-9d61-f82d-6c98-d68762dce85d@amd.com> User-Agent: Mutt/1.5.17 (2007-11-01) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote: > >>> It looks like I'm totally misunderstanding what you are adding here > >>> then. Why do we need any special treatment at all for memory that > >>> has normal struct pages and is part of the direct kernel map? > >> The pages are like normal memory for purposes of mapping them in CPU > >> page tables and for coherent access from the CPU. > > That's the user page tables. What about the kernel direct map? > > If there is a normal kernel struct page backing there really should > > be no need for the pgmap. > > I'm not sure. The physical address ranges are in the UEFI system address > map as special-purpose memory. Does Linux create the struct pages and > kernel direct map for that without a pgmap call? I didn't see that last > time I went digging through that code. So doing some googling finds a patch from Dan that claims to hand EFI special purpose memory to the device dax driver. But when I try to follow the version that got merged it looks it is treated simply as an MMIO region to be claimed by drivers, which would not get a struct page. Dan, did I misunderstand how E820_TYPE_SOFT_RESERVED works? > >> From an application > >> perspective, we want file-backed and anonymous mappings to be able to > >> use DEVICE_PUBLIC pages with coherent CPU access. The goal is to > >> optimize performance for GPU heavy workloads while minimizing the need > >> to migrate data back-and-forth between system memory and device memory. > > I don't really understand that part. file backed pages are always > > allocated by the file system using the pagecache helpers, that is > > using the page allocator. Anonymouns memory also always comes from > > the page allocator. > > I'm coming at this from my experience with DEVICE_PRIVATE. Both > anonymous and file-backed pages should be migrateable to DEVICE_PRIVATE > memory by the migrate_vma_* helpers for more efficient access by our > GPU. (*) It's part of the basic premise of HMM as I understand it. I > would expect the same thing to work for DEVICE_PUBLIC memory. Ok, so you want to migrate to and from them. Not use DEVICE_PUBLIC for the actual page cache pages. That maks a lot more sense. > I see DEVICE_PUBLIC as an improved version of DEVICE_PRIVATE that allows > the CPU to map the device memory coherently to minimize the need for > migrations when CPU and GPU access the same memory concurrently or > alternatingly. But we're not going as far as putting that memory > entirely under the management of the Linux memory manager and VM > subsystem. Our (and HPE's) system architects decided that this memory is > not suitable to be used like regular NUMA system memory by the Linux > memory manager. So yes. It is a Memory Mapped I/O region, which unlike the PCIe BARs that people typically deal with is fully cache coherent. I think this does make more sense as a description. But to go back to what start this discussion: If these are memory mapped I/O pfn_valid should generally not return true for them. And as you already pointed out in reply to Alex we need to tighten the selection criteria one way or another.