Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3202660imu; Sat, 24 Nov 2018 00:01:28 -0800 (PST) X-Google-Smtp-Source: AFSGD/W/QI3H5uMLz4ZZqcqbv2ioOhrnkwGVg7iQwVFU8PeapmpENhtIa45aR5vPxC65nqAiqXXe X-Received: by 2002:a17:902:8d93:: with SMTP id v19-v6mr18488685plo.133.1543046488450; Sat, 24 Nov 2018 00:01:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543046488; cv=none; d=google.com; s=arc-20160816; b=qCGG3YDS/UBd0ds/6xr1cHutwMsroS3sSv3de73LTm2QnLtNeyRGDo7ydmmaQQKx+X HIsk2S0xb5ezwIc9k7qi8lX/PY9vSlcGLtL2ae7eSZsdBZWDjj3VvZus0q9OlowLc5w0 lFcYh0Yad21ZHzmYUUpkkpVRoh8D3hc6RDTiHPY8hDtfr6U8f/JMTvXWw8Y+Mkyaew/l zz1uW8E3dqBdMBEaN8PNr7On2MgxlCGTzjmf2hB2ENRPHWXiWOv9PjBDC0j6zdXenFLm 07e+MFdx7ABGBNz7duCQEL3SCLzkMJY9rvkqee7Cd76ZwY+CyWZsVUJxlopX60jE/55x 9HvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=XE6RvPSfDEFhcr7rTi4v7OgmL2UFNFjTbeupva+Efiw=; b=qG5tHudIxzbTjdVyD75vEKxNu8Cit+8PGG0Nvol2QBHxvO5sgBn72otX27on/wluV8 SmlIvTCRXZkrr4HiVlrV7C9S4yXWRrQDnYvdrcPQuw15xvNI3oWRjREB/8BLtOe6h+pJ VcVNDKDwoANnNfxmMp73ufxO5GHAxF2JQp62pslPb8K1NT4Qe2OMoEtWA9Qc5InHFAgz umZXnxmUIX1qMCxzi52plqaUxD/ViNVFTUpJE1gMMhMSlKSgiic+1tLtlmGL9gyXnYVx GdUP4IrukzX+Xm1e5xyKgpm7t/NLovJnkTyEqeOkT+hKyUe482VmPU23qEiTnVC2C31P +yBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si42464729pfd.113.2018.11.24.00.01.14; Sat, 24 Nov 2018 00:01:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2502108AbeKWRZu (ORCPT + 99 others); Fri, 23 Nov 2018 12:25:50 -0500 Received: from foss.arm.com ([217.140.101.70]:36478 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408204AbeKWRZu (ORCPT ); Fri, 23 Nov 2018 12:25:50 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6BEE35CD; Thu, 22 Nov 2018 22:42:58 -0800 (PST) Received: from [10.1.29.128] (p8cg001049571a15.cambridge.arm.com [10.1.29.128]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BE1953F5A0; Thu, 22 Nov 2018 22:42:56 -0800 (PST) Subject: Re: [PATCH 0/7] ACPI HMAT memory sysfs representation To: Dave Hansen , Keith Busch , linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dan Williams References: <20181114224902.12082-1-keith.busch@intel.com> <1ed406b2-b85f-8e02-1df0-7c39aa21eca9@arm.com> <4ea6e80f-80ba-6992-8aa0-5c2d88996af7@intel.com> From: Anshuman Khandual Message-ID: Date: Fri, 23 Nov 2018 12:12:56 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/22/2018 11:31 PM, Dave Hansen wrote: > On 11/22/18 3:52 AM, Anshuman Khandual wrote: >>> >>> It sounds like the subset that's being exposed is insufficient for yo >>> We did that because we think doing anything but a subset in sysfs will >>> just blow up sysfs: MAX_NUMNODES is as high as 1024, so if we have 4 >>> attributes, that's at _least_ 1024*1024*4 files if we expose *all* >>> combinations. >> Each permutation need not be a separate file inside all possible NODE X >> (/sys/devices/system/node/nodeX) directories. It can be a top level file >> enumerating various attribute values for a given (X, Y) node pair based >> on an offset something like /proc/pid/pagemap. > > My assumption has been that this kind of thing is too fancy for sysfs: Applications need to know the matrix of multi attribute properties as seen from various memory accessors/initiators to be able to bind them to desired CPUs and memory. That gives applications true view of an heterogeneous system. While I understand your concern here about the sysfs (which can be worked around with probably multiple global files may be if the size is a problem etc) but an insufficient interface is definitely problematic in longer term. This is going to be an ABI which is locked in for good. Hence even it might appear over engineering at the moment but IMHO is the right thing to do. > > Documentation/filesystems/sysfs.txt: >> Attributes should be ASCII text files, preferably with only one value >> per file. It is noted that it may not be efficient to contain only one >> value per file, so it is socially acceptable to express an array of >> values of the same type. >> >> Mixing types, expressing multiple lines of data, and doing fancy >> formatting of data is heavily frowned upon. Doing these things may get >> you publicly humiliated and your code rewritten without notice. > > /proc/pid/pagemap is binary, not one-value-per-file and relatively > complicated to parse. I agree but it does provide user space really valuable information about the faulted pages for it's VA space. Was there any better way of getting it ? May be but at this point in time it is essential. > > Do you really think following something like pagemap is the right model > for sysfs.> > BTW, I'm not saying we don't need *some* interface like you propose. We > almost certainly will at some point. I just don't think it will be in > sysfs. I am not saying doing this in sysfs is very elegant. I would rather have a syscall read back (MAX_NODES * MAX_NODES * u64) attribute matrix from the kernel. Probably a subset of that information can appear on sysfs to speed of queries for various optimizations as Keith mentioned before. But we will have to first evaluate and come to an agreement what constitutes a comprehensive set for multi attribute properties. Are we willing to go in the direction for inclusion of a new system call, subset of it appears on sysfs etc ? My primary concern is not how the attribute information appears on the sysfs but lack of it's completeness.