Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751463AbeAEDwg (ORCPT + 1 other); Thu, 4 Jan 2018 22:52:36 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41812 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751250AbeAEDwf (ORCPT ); Thu, 4 Jan 2018 22:52:35 -0500 Subject: Re: [PATCH 1/3] mm, numa: rework do_pages_move To: Michal Hocko , Andrew Morton References: <20180103082555.14592-1-mhocko@kernel.org> <20180103082555.14592-2-mhocko@kernel.org> Cc: Zi Yan , Naoya Horiguchi , "Kirill A. Shutemov" , Vlastimil Babka , Andrea Reale , Anshuman Khandual , linux-mm@kvack.org, LKML , Michal Hocko From: Anshuman Khandual Date: Fri, 5 Jan 2018 09:22:22 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20180103082555.14592-2-mhocko@kernel.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18010503-0016-0000-0000-00000513D597 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18010503-0017-0000-0000-0000285023F5 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-05_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801050048 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/03/2018 01:55 PM, Michal Hocko wrote: > From: Michal Hocko > > do_pages_move is supposed to move user defined memory (an array of > addresses) to the user defined numa nodes (an array of nodes one for > each address). The user provided status array then contains resulting > numa node for each address or an error. The semantic of this function is > little bit confusing because only some errors are reported back. Notably > migrate_pages error is only reported via the return value. This patch > doesn't try to address these semantic nuances but rather change the > underlying implementation. > > Currently we are processing user input (which can be really large) > in batches which are stored to a temporarily allocated page. Each > address is resolved to its struct page and stored to page_to_node > structure along with the requested target numa node. The array of these > structures is then conveyed down the page migration path via private > argument. new_page_node then finds the corresponding structure and > allocates the proper target page. > > What is the problem with the current implementation and why to change > it? Apart from being quite ugly it also doesn't cope with unexpected > pages showing up on the migration list inside migrate_pages path. > That doesn't happen currently but the follow up patch would like to > make the thp migration code more clear and that would need to split a > THP into the list for some cases. > > How does the new implementation work? Well, instead of batching into a > fixed size array we simply batch all pages that should be migrated to > the same node and isolate all of them into a linked list which doesn't > require any additional storage. This should work reasonably well because > page migration usually migrates larger ranges of memory to a specific > node. So the common case should work equally well as the current > implementation. Even if somebody constructs an input where the target > numa nodes would be interleaved we shouldn't see a large performance > impact because page migration alone doesn't really benefit from > batching. mmap_sem batching for the lookup is quite questionable and > isolate_lru_page which would benefit from batching is not using it even > in the current implementation. Hi Michal, After slightly modifying your test case (like fixing the page size for powerpc and just doing simple migration from node 0 to 8 instead of the interleaving), I tried to measure the migration speed with and without the patches on mainline. Its interesting.... 10000 pages | 100000 pages -------------------------- Mainline 165 ms 1674 ms Mainline + first patch (move_pages) 191 ms 1952 ms Mainline + all three patches 146 ms 1469 ms Though overall it gives performance improvement, some how it slows down migration after the first patch. Will look into this further.