Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751547AbdHaObk (ORCPT ); Thu, 31 Aug 2017 10:31:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:50143 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751086AbdHaObj (ORCPT ); Thu, 31 Aug 2017 10:31:39 -0400 Subject: Re: [PATCH 1/5] tracing, mm: Record pfn instead of pointer to struct page To: Steven Rostedt Cc: Arnaldo Carvalho de Melo , Ingo Molnar , linux-kernel@vger.kernel.org, Namhyung Kim , David Ahern , Jiri Olsa , Minchan Kim , Peter Zijlstra , linux-mm@kvack.org References: <1428963302-31538-1-git-send-email-acme@kernel.org> <1428963302-31538-2-git-send-email-acme@kernel.org> <897eb045-d63c-b9e3-c6e7-0f6b94536c0f@suse.cz> <20170831094306.0fb655a5@gandalf.local.home> From: Vlastimil Babka Message-ID: Date: Thu, 31 Aug 2017 16:31:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170831094306.0fb655a5@gandalf.local.home> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3077 Lines: 84 On 08/31/2017 03:43 PM, Steven Rostedt wrote: > On Mon, 31 Jul 2017 09:43:41 +0200 Vlastimil Babka wrote: > >> On 04/14/2015 12:14 AM, Arnaldo Carvalho de Melo wrote: >>> From: Namhyung Kim >>> >>> The struct page is opaque for userspace tools, so it'd be better to save >>> pfn in order to identify page frames. >>> >>> The textual output of $debugfs/tracing/trace file remains unchanged and >>> only raw (binary) data format is changed - but thanks to libtraceevent, >>> userspace tools which deal with the raw data (like perf and trace-cmd) >>> can parse the format easily. >> >> Hmm it seems trace-cmd doesn't work that well, at least on current >> x86_64 kernel where I noticed it: >> >> trace-cmd-22020 [003] 105219.542610: mm_page_alloc: [FAILED TO PARSE] pfn=0x165cb4 order=0 gfp_flags=29491274 migratetype=1 > > Which version of trace-cmd failed? It parses for me. Hmm, the > vmemmap_base isn't in the event format file. It's the actually address. > That's probably what failed to parse. Mine says 2.6. With 4.13-rc6 I get FAILED TO PARSE. > >> >> I'm quite sure it's due to the "page=%p" part, which uses pfn_to_page(). >> The events/kmem/mm_page_alloc/format file contains this for page: >> >> REC->pfn != -1UL ? (((struct page *)vmemmap_base) + (REC->pfn)) : ((void *)0) > > But yeah, I think the output is wrong. I just ran this: > > page=0xffffea00000a62f4 pfn=680692 order=0 migratetype=0 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK > > But running it with trace-cmd report -R (raw format): > > mm_page_alloc: pfn=0xa62f4 order=0 gfp_flags=24150208 migratetype=0 > > The parser currently ignores types, so it doesn't do pointer > arithmetic correctly, and would be hard to here as it doesn't know the > size of the struct page. What could work is if we changed the printf > fmt to be: > > (unsigned long)(0xffffea0000000000UL) + (REC->pfn * sizeof(struct page)) > > >> >> I think userspace can't know vmmemap_base nor the implied sizeof(struct >> page) for pointer arithmetic? >> >> On older 4.4-based kernel: >> >> REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0) > > This is what I have on 4.13-rc7 > >> >> This also fails to parse, so it must be the struct page part? > > Again, what version of trace-cmd do you have? On the older distro it was 2.0.4 > >> >> I think the problem is, even if ve solve this with some more >> preprocessor trickery to make the format file contain only constant >> numbers, pfn_to_page() on e.g. sparse memory model without vmmemap is >> more complicated than simple arithmetic, and can't be exported in the >> format file. >> >> I'm afraid that to support userspace parsing of the trace data, we will >> have to store both struct page and pfn... or perhaps give up on reporting >> the struct page pointer completely. Thoughts? > > Had some thoughts up above. Yeah, it could be made to work for some configurations, but see the part about "sparse memory model without vmemmap" above. > -- Steve >