Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752118AbdLSRVb (ORCPT ); Tue, 19 Dec 2017 12:21:31 -0500 Received: from resqmta-ch2-03v.sys.comcast.net ([69.252.207.35]:46574 "EHLO resqmta-ch2-03v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751785AbdLSRV1 (ORCPT ); Tue, 19 Dec 2017 12:21:27 -0500 Date: Tue, 19 Dec 2017 11:21:24 -0600 (CST) From: Christopher Lameter X-X-Sender: cl@nuc-kabylake To: Michal Hocko cc: Kemi Wang , Greg Kroah-Hartman , Andrew Morton , Vlastimil Babka , Mel Gorman , Johannes Weiner , YASUAKI ISHIMATSU , Andrey Ryabinin , Nikolay Borisov , Pavel Tatashin , David Rientjes , Sebastian Andrzej Siewior , Dave , Andi Kleen , Tim Chen , Jesper Dangaard Brouer , Ying Huang , Aaron Lu , Aubrey Li , Linux MM , Linux Kernel Subject: Re: [PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16 In-Reply-To: <20171219162029.GD2787@dhcp22.suse.cz> Message-ID: References: <1513665566-4465-1-git-send-email-kemi.wang@intel.com> <1513665566-4465-3-git-send-email-kemi.wang@intel.com> <20171219162029.GD2787@dhcp22.suse.cz> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-CMAE-Envelope: MS4wfGIuWtaJmawkgFSd4frSC2NTKZP1cnmqCabfhRo4h2NQUU3VyMQOsO9hO1ZpCXWCRCnu1mQldSK0fJsYthzXYGDxq8XHAiBi9eK3BqrH8e0tVoTgv+YI 8By9CLzKdGjee+xopf2hv5r5dD2pCjr8jDtXtreb/kg2zXAYZv/HInmz5DrJvmgOuUdKmaoaooRnkfpMeHV/CgJi8ql4arSFaFzf0HoPoM9R4YnuAJjJoYAG qPV7pEqdZ9Vk3fdo1kjFfajQR+RPzUh1tF12zru/qksebQ/L7DFRTDcKZeASlxGo41hzBzl3IpMyl8HM3AfjpJQmAWUAMEL4nN8Ft1QR/NTQ7OJgQmqlxS0G r3xS/4pkJrjH2M8vwTCyhte6KEafl4uKSN/LEaPCqC4gZER9/FN4b/9M7NcXDc6KMCGFXRggRD4EkWUBZMuHXmLuhnjh97qcIPRnEVQQ91kD9FctzEzRqVFu WGPH52bjO8xjATg6pIfp/RCqWtI+I6Atz/j8YHNhg/GRJPoczRD4MnbZ+MyJuvv5nR/IwXTgv2jD0z1BhqyiTnbr3Q4zaT0XB0x4Gqo4O0++1FgOek4aNWPa sbk0wvmT0UzudyxwAYueEyGwwdX3EGHV5lwILRuPPMSyyn7qR19J+utQmyDWwknk2NuIFiF+OFk47HmM1srxgCRgL70UIyHAcy+j/J5lreqR9qRhJfgHpzmb /lJayGyIUHjZzxOd5JRV1Rmabjlu+boDXHDE+rg3zWHJ0e/CnRu8rRtY0/upjfvfzsdAWAWbp6nAU7B7M5dnbbrWxULoH2QrxKvSsdH60ynjlY5VQVvjNsF6 cnJ3X5n4DXJphrLfU5bkwVjHy7aAfFH/6+Arc3g9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1479 Lines: 31 On Tue, 19 Dec 2017, Michal Hocko wrote: > > Well the reason for s8 was to keep the data structures small so that they > > fit in the higher level cpu caches. The large these structures become the > > more cachelines are used by the counters and the larger the performance > > influence on the code that should not be impacted by the overhead. > > I am not sure I understand. We usually do not access more counters in > the single code path (well, PGALLOC and NUMA counteres is more of an > exception). So it is rarely an advantage that the whole array is in the > same cache line. Besides that this is allocated by the percpu allocator > aligns to the type size rather than cache lines AFAICS. I thought we are talking about NUMA counters here? Regardless: A typical fault, system call or OS action will access multiple zone and node counters when allocating or freeing memory. Enlarging the fields will increase the number of cachelines touched. > Maybe it used to be all different back then when the code has been added > but arguing about cache lines seems to be a bit problematic here. Maybe > you have some specific workloads which can prove me wrong? Run a workload that does some page faults? Heavy allocation and freeing of memory? Maybe that is no longer relevant since the number of the counters is large that the accesses are so sparse that each action pulls in a whole cacheline. That would be something we tried to avoid when implementing the differentials.