Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp160852pxa; Tue, 4 Aug 2020 02:15:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJlhd2YG+wU/woOUxXAcWxmSj3tR7vnxp3JNyuN5ugDf3r7ydq0z7tlJW5oy1um3PcqUAL X-Received: by 2002:a05:6402:b87:: with SMTP id cf7mr19711383edb.131.1596532542550; Tue, 04 Aug 2020 02:15:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596532542; cv=none; d=google.com; s=arc-20160816; b=d6vbmflCOy5m/AH0sEzQMLPlCZ8ctcJhbCIL1Y3rzGD+GiEqAAnKkYvyXub5utRBdI ypeuq3nL5pPReAakeWyf8hQHph266E2g7DIJ1opgg8e8j0dL4FSdfI8aY0h4xQK+uo1V meVY+de2NCs0Ya7Zxb6prlQVTjbKtc+enc/DKY5Lo2GVXBYBFPHCbDdFabvJbSNZbfZr jB5klYjvHiWstsYjfMdKj10Heu5rjfBMrg1GdxB1kUq37I8THtGvUnw4lKOHYBXGrkLA papnqeZsosWZXZAnn/uMxtyDJjlQLcuFT2ffOlKIXz7cTrEmXqKWpxNG0xfJEI7nvhlp vc6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=pqpsenqjxvdNzBBDcDCj8EyIQ3Kl5a5kIctvfREA/mI=; b=y8x0jrMQlxXB5wGnwU5qnXGcfUxJzivL5VVBaguTiYLz/sQIuONS+Y/koPv72PguK2 YQH0R/uWljBM4+YjI8nVqd5rvnPnRjgf5FqopLsu0/oxJfKfquqs/XGbMtigzlIKGWRb EdnfeYRtlc5u5qJRr2Hm4yrrVorEgLEmx7Ur5tNGcNHKu+6miHqvUxHmYnLRXBuyXPP9 cTPHWEuKT7Ks1OvuC0IKdtIWQAAg5AIfUG0yi2FYFYyYJASk9/2hKJtqiCwZn5zUdkOI RqDx0k0jn2jMIojKjkU2jJJHQCn6Hf0wULTCPP4fR1QAaoNQQo2ytwVVNclYPpkZC4gu OCJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cb21si12048920edb.31.2020.08.04.02.15.18; Tue, 04 Aug 2020 02:15:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726356AbgHDJM5 (ORCPT + 99 others); Tue, 4 Aug 2020 05:12:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:37674 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726201AbgHDJM4 (ORCPT ); Tue, 4 Aug 2020 05:12:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B0900B1B9; Tue, 4 Aug 2020 09:13:10 +0000 (UTC) Subject: Re: [PATCH] mm: sort freelist by rank number To: Cho KyongHo Cc: David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hyesoo.yu@samsung.com, janghyuck.kim@samsung.com References: <1596435031-41837-1-git-send-email-pullip.cho@samsung.com> <5f41af0f-4593-3441-12f4-5b0f7e6999ac@redhat.com> <20200804023548.GA186735@KEI> From: Vlastimil Babka Message-ID: <947a09ba-968b-4c4d-68bb-d13de9c885a1@suse.cz> Date: Tue, 4 Aug 2020 11:12:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200804023548.GA186735@KEI> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/4/20 4:35 AM, Cho KyongHo wrote: > On Mon, Aug 03, 2020 at 05:45:55PM +0200, Vlastimil Babka wrote: >> On 8/3/20 9:57 AM, David Hildenbrand wrote: >> > On 03.08.20 08:10, pullip.cho@samsung.com wrote: >> >> From: Cho KyongHo >> >> >> >> LPDDR5 introduces rank switch delay. If three successive DRAM accesses >> >> happens and the first and the second ones access one rank and the last >> >> access happens on the other rank, the latency of the last access will >> >> be longer than the second one. >> >> To address this panelty, we can sort the freelist so that a specific >> >> rank is allocated prior to another rank. We expect the page allocator >> >> can allocate the pages from the same rank successively with this >> >> change. It will hopefully improves the proportion of the consecutive >> >> memory accesses to the same rank. >> > >> > This certainly needs performance numbers to justify ... and I am sorry, >> > "hopefully improves" is not a valid justification :) >> > >> > I can imagine that this works well initially, when there hasn't been a >> > lot of memory fragmentation going on. But quickly after your system is >> > under stress, I doubt this will be very useful. Proof me wrong. ;) >> >> Agreed. The implementation of __preferred_rank() seems to be very simple and >> optimistic. > > DRAM rank is selected by CS bits from DRAM controllers. In the most systems > CS bits are alloated to specific bit fields in BUS address. For example, > If CS bit is allocated to bit[16] in bus (physical) address in two rank > system, all 16KiB with bit[16] = 1 are in the rank 1 and the others are > in the rank 0. > This patch is not beneficial to other system than the mobile devices > with LPDDR5. That is why the default behavior of this patch is noop. Hmm, the patch requires at least pageblock_nr_pages, which is 2MB on x86 (dunno about ARM), so 16KiB would be way too small. What are the actual granularities then? >> I think these systems could perhaps better behave as NUMA with (interleaved) >> nodes for each rank, then you immediately have all the mempolicies support etc >> to achieve what you need? Of course there's some cost as well, but not the costs >> of adding hacks to page allocator core? > > Thank you for the proposal. NUMA will be helpful to allocate pages from > a specific rank programmatically. I should consider NUMA if rank > affinity should be also required. > However, page allocation overhead by this policy (page migration and > reclamation ect.) will give the users worse responsiveness. The intend > of this patch is to reduce rank switch delay optimistically without > hurting page allocation speed. The problem is, without some control of page migration and reclaim, the simple preference approach will not work after some uptime, as David suggested. It will just mean that the preferred rank will be allocated first, then the non-preferred rank (Linux will fill all unused memory with page cache if possible), then reclaim will free memory from both ranks without any special care, and new allocations will thus come from both ranks.