Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3204330imu; Sat, 24 Nov 2018 00:03:14 -0800 (PST) X-Google-Smtp-Source: AJdET5cMW12AOIRVcKntyYUETZOgy7EJRQNAZBqbqvXPd/4uY5dIV3ro2bsNzncqmjxnjADgcRQw X-Received: by 2002:a62:3707:: with SMTP id e7-v6mr19265371pfa.70.1543046594383; Sat, 24 Nov 2018 00:03:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543046594; cv=none; d=google.com; s=arc-20160816; b=cZhqiLW/RXAd1mOmHSGXTLAQuy75eZVfnE88q/tnT9UOKqZmgX+nqRLTwxMf6yoX4Z pk7JQ7KbLQAviSfMtgqLDFRH2PIS98938RgAbDqnMzPvFq4CddEcDuK0qG4+cUsJozkV SArT1nXiWMGneKX94V0HFmkXwwiE/hhZaMdnVpCrFIh2580cSY7gvyFH2pxXbrthnmx/ dYmS5Teto6zL3qlCHe/swZqNsk54ddRUzk6Q6kZMGLaEFKcNWV/R6NiLaufox9LVJa9O iF+TR+qk2MssOXDS/LVCKUDs3eI6UNFAfj0kYjK7XnVMevz0dXMQuhTsou23+u/VtXL9 Mgcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=tPc4nEelMrzQeX+IBN2h7vXw86pN2WPABxSkEYwWg9g=; b=FwcWC62MJRxcF1KjkoBG2dfVIqmbG0151vOm/OscTbpfIiaZxay3vSp2mRhzGZdPvM oYo9ihv/oN6Q/DrmeQsGCrt8EgbywjWgZ7akpI0Kd7KtDcQpCGmIxPJx5y23cI6pq5V/ Ybzi3T2bwD7dEE0d5aybvybaHxDwnmb9k8nbWxh7P3RIeV/vJJ+G1Suw1h7clN4AAZwU 1e8boglNhZGrO1XQSRcFpBFfbhUQgB7OCF18yvIBY1RcOk4V0qCzL9e5WwrGYEKFoo0y ZHvaPHsC9hbVVxziggwMUuYe5Ji0jdY+ZDbia6EznfYFbZcd9gv6KSnW1nSVLOOt0ZwZ YDfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w32si55719304pga.337.2018.11.24.00.02.59; Sat, 24 Nov 2018 00:03:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408260AbeKWRxz (ORCPT + 99 others); Fri, 23 Nov 2018 12:53:55 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:36784 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731655AbeKWRxz (ORCPT ); Fri, 23 Nov 2018 12:53:55 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4CE5035CD; Thu, 22 Nov 2018 23:10:55 -0800 (PST) Received: from [10.1.29.128] (p8cg001049571a15.cambridge.arm.com [10.1.29.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D9C753F5CF; Thu, 22 Nov 2018 23:10:53 -0800 (PST) Subject: Re: [PATCH 0/7] ACPI HMAT memory sysfs representation To: Dan Williams Cc: Dave Hansen , Keith Busch , Linux Kernel Mailing List , Linux ACPI , Linux MM , Greg KH , "Rafael J. Wysocki" References: <20181114224902.12082-1-keith.busch@intel.com> <1ed406b2-b85f-8e02-1df0-7c39aa21eca9@arm.com> <4ea6e80f-80ba-6992-8aa0-5c2d88996af7@intel.com> From: Anshuman Khandual Message-ID: <0194f47c-d1d8-108e-a57f-0316adb9112b@arm.com> Date: Fri, 23 Nov 2018 12:40:53 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/22/2018 11:38 PM, Dan Williams wrote: > On Thu, Nov 22, 2018 at 3:52 AM Anshuman Khandual > wrote: >> >> >> >> On 11/19/2018 11:07 PM, Dave Hansen wrote: >>> On 11/18/18 9:44 PM, Anshuman Khandual wrote: >>>> IIUC NUMA re-work in principle involves these functional changes >>>> >>>> 1. Enumerating compute and memory nodes in heterogeneous environment (short/medium term) >>> >>> This patch set _does_ that, though. >>> >>>> 2. Enumerating memory node attributes as seen from the compute nodes (short/medium term) >>> >>> It does that as well (a subset at least). >>> >>> It sounds like the subset that's being exposed is insufficient for yo >>> We did that because we think doing anything but a subset in sysfs will >>> just blow up sysfs: MAX_NUMNODES is as high as 1024, so if we have 4 >>> attributes, that's at _least_ 1024*1024*4 files if we expose *all* >>> combinations. >> Each permutation need not be a separate file inside all possible NODE X >> (/sys/devices/system/node/nodeX) directories. It can be a top level file >> enumerating various attribute values for a given (X, Y) node pair based >> on an offset something like /proc/pid/pagemap. >> >>> >>> Do we agree that sysfs is unsuitable for exposing attributes in this manner? >>> >> >> Yes, for individual files. But this can be worked around with an offset >> based access from a top level global attributes file as mentioned above. >> Is there any particular advantage of using individual files for each >> given attribute ? I was wondering that a single unsigned long (u64) will >> be able to pack 8 different attributes where each individual attribute >> values can be abstracted out in 8 bits. > > sysfs has a 4K limit, and in general I don't think there is much > incremental value to go describe the entirety of the system from sysfs > or anywhere else in the kernel for that matter. It's simply too much> information to reasonably consume. Instead the kernel can describe the I agree that it may be some amount of information to parse but is crucial for any task on a heterogeneous system to evaluate (probably re-evaluate if the task moves around) its memory and CPU binding at runtime to make sure it has got the right one. > coarse boundaries and some semblance of "best" access initiator for a > given target. That should cover the "80%" case of what applications The current proposal just assumes that the best one is the nearest one. This may be true for bandwidth and latency but may not be true for some other properties. This assumptions should not be there while defining new ABI. > want to discover, for the other "20%" we likely need some userspace > library that can go parse these platform specific information sources > and supplement the kernel view. I also think a simpler kernel starting > point gives us room to go pull in more commonly used attributes if it > turns out they are useful, and avoid going down the path of exporting > attributes that have questionable value in practice. > Applications can just query platform information right now and just use them for mbind() without requiring this new interface. We are not even changing any core MM yet. So if it's just about identifying the node's memory properties it can be scanned from platform itself. But I agree we would like the kernel to start adding interfaces for multi attribute memory but all I am saying is that it has to be comprehensive. Some of the attributes have more usefulness now and some have less but the new ABI interface has to accommodate exporting all of these.