Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp526240pxb; Wed, 11 Nov 2020 09:26:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJwzvlCnUx/LFRXE3QbMey/hapvgvRNcwbPO6uk1VVV+rQwpkw9S7C3PfbcNNOUcE9h+/YQo X-Received: by 2002:a50:ef03:: with SMTP id m3mr548111eds.212.1605115604448; Wed, 11 Nov 2020 09:26:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605115604; cv=none; d=google.com; s=arc-20160816; b=lwdZ7uf38y2WnNqIEZ4k3qK+uCruPvb+C/zD5mAQSzOP+e0kmG94rVKLHakhszFVu3 UHTE9gRUUZFPoWxmFEpahpnfCmybjExXlKYj/NXMpmUoziNFlGGPhzDN+sNsbAOIHcGa EHlfCaY9MYCq2V+N7yLDWINDO2aKZlZT0OV5qZIiicXZhW8PCwHjUvJ4OqvmX6245Fsg ZlGnd+JlfVF1bkXIPlgdXhskGT6XwwzQX/vJyQv7sVHEIzjU2/4I8eTB4M+6OQfkLX/G bUuV2qRKAoVBX8cRLhtKQ91hSwUUJRMXLT3+g8oWLeap0GM22s771pdZXW+qLgQ6D27G 32zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=3F71YuHSzycH8kru8FWlRGlBynr9f/xVQKlAzH9f6kc=; b=WR4kXglvMLNf3Qrr6ry53zY2qDuL5+FhT5vY+gEy2qZ5RFY+XnVzzcA6/t1KshYQdI Seq8vfm4OVfRwcFD/KlimxFc8+pKDgotc4aF5abKcXc66gqBxlGadVz+pOgKAbQCqkgH oh813KJV4oLTswlXPrH0PULdJUHVuCvQ172qu6ud5HzqGezHoD1/k0+pgzjtZoG6voVk kFbJ5j4n3dAL/A7H/HI1rwZSHK2bVrmNPF1wKafQw9+iu7mvQUP2EJ8XQLIz+CPRaI8V qVDZOu7Ow6VYH9ljTlrpdRM6nNerrVuRobJpaMtKCfXg1cabvZ1ofD51cjW4er4JwtmX EG7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=LQ3pFkuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id um17si1787223ejb.701.2020.11.11.09.26.17; Wed, 11 Nov 2020 09:26:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=LQ3pFkuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727192AbgKKRXY (ORCPT + 99 others); Wed, 11 Nov 2020 12:23:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726216AbgKKRXX (ORCPT ); Wed, 11 Nov 2020 12:23:23 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A492C0613D1 for ; Wed, 11 Nov 2020 09:23:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=3F71YuHSzycH8kru8FWlRGlBynr9f/xVQKlAzH9f6kc=; b=LQ3pFkuz9JlRO+luUEY5BhpzBh sKNxwR/+OY35sexxmAn6DxjUEUVQtoN1S4phD+sggCsz3UbHbkNDs8S5BTVJOLeAxyNIc9Xu71U4w GtIIV0G8IJEt4Qt03f//anIiZbQSxtXI5tYpZVx9uaa9uj3so7dK1QdukZeEG1DC0hbc33oggwsnn lvzZq3MG+5a9PV3AAtlUP7Ms+NoL6sgoVELV99p50IJHa5p/VLXBU2nCfSLxT1oaSsFLoxwV1t8RP ZzqJv10uYbz9MRS6wPoGmcQKPlChYQ2UNVcaeX94/RWjfEAZ9KOYP9G6QQOFD/F7ChjMMKejnFIbB fhT4CliQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kctph-0004iS-IY; Wed, 11 Nov 2020 17:22:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id BC095301E02; Wed, 11 Nov 2020 18:22:53 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 8AF28203E65DE; Wed, 11 Nov 2020 18:22:53 +0100 (CET) Date: Wed, 11 Nov 2020 18:22:53 +0100 From: Peter Zijlstra To: Matthew Wilcox Cc: "Liang, Kan" , Will Deacon , Michael Ellerman , mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@redhat.com, eranian@google.com, ak@linux.intel.com, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, benh@kernel.crashing.org, paulus@samba.org, David Miller , vbabka@suse.cz Subject: Re: [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Message-ID: <20201111172253.GG2628@hirez.programming.kicks-ass.net> References: <20201013154615.GE2594@hirez.programming.kicks-ass.net> <20201013163449.GR2651@hirez.programming.kicks-ass.net> <8e88ba79-7c40-ea32-a7ed-bdc4fc04b2af@linux.intel.com> <20201111095750.GS2594@hirez.programming.kicks-ass.net> <20201111112246.GR2651@hirez.programming.kicks-ass.net> <20201111124357.GS2651@hirez.programming.kicks-ass.net> <20201111153022.GT17076@casper.infradead.org> <20201111155724.GE2628@hirez.programming.kicks-ass.net> <20201111163848.GU17076@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201111163848.GU17076@casper.infradead.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 11, 2020 at 04:38:48PM +0000, Matthew Wilcox wrote: > On Wed, Nov 11, 2020 at 04:57:24PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 11, 2020 at 03:30:22PM +0000, Matthew Wilcox wrote: > > > This confuses me. Why only special-case hugetlbfs pages here? Should > > > they really be treated differently from THP? If you want to consider > > > that we might be mapping a page that's twice as big as a PUD entry and > > > this is only half of it, then the simple way is: > > > > > > if (pud_leaf(pud)) { > > > #ifdef pud_page > > > page = compound_head(pud_page(*pud)); > > > return page_size(page); > > > > Also; this is 'wrong'. The purpose of this function is to return the > > hardware TLB size of a given address. The above will return the compound > > size, for any random compound page, which would be myrads of reasons. > > Oh, then the whole thing is overly-complicated. This should just be > > if (pud_leaf(pud)) > return PUD_SIZE; But that doesn't handle non-pagetable aligned hugetlb sizes. Granted, that's unlikely at the PUD level, but why be inconsistent.. So we really want: if (p*d_leaf(p*d)) { if (!'special') { page = p*d_page(p*d); if (PageHuge(page)) return page_size(compound_head(page)); } return P*D_SIZE; } That gets us: - regular page-table aligned large-pages - 'funny' hugetlb sizes The only thing it doesn't gets us is kernel usage of 'funny' sizes, which is why that function is weak (arm64, power32, sparc64 have funny sizes and at the very least arm64 uses them for kernel mappings too). Now, when you add !PMD THP sizes (presumably for architectures that have 'funny' sizes, otherwise what's the point), then you get to add '|| PageTransHuge()' to the above PageHuge() (and fix PageTransHuge() to actually do what it claims it does). Arguably we could fix arm64 with something like the below, but then, I'd have to audit powerpc32 and sparc64 again to see if I can make that work for them too -- not today. --- --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7003,6 +7003,10 @@ static u64 perf_virt_to_phys(u64 virt) #ifdef CONFIG_MMU +#ifndef pte_cont +#define pte_cont(pte) (false) +#endif + /* * Return the MMU page size of a given virtual address. * @@ -7077,7 +7081,7 @@ __weak u64 arch_perf_get_page_size(struc if (!pte_devmap(pte) && !pte_special(pte)) { page = pte_page(pte); - if (PageHuge(page)) { + if (PageHuge(page) || pte_cont(pte)) { u64 size = page_size(compound_head(page)); pte_unmap(ptep); return size;