Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754466AbdCIXrC (ORCPT ); Thu, 9 Mar 2017 18:47:02 -0500 Received: from mail-co1nam03on0103.outbound.protection.outlook.com ([104.47.40.103]:15232 "EHLO NAM03-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753965AbdCIXrA (ORCPT ); Thu, 9 Mar 2017 18:47:00 -0500 Authentication-Results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=cs.rutgers.edu; Message-ID: <58C1E948.9020306@cs.rutgers.edu> Date: Thu, 9 Mar 2017 17:46:16 -0600 From: Zi Yan User-Agent: Postbox 5.0.11 (Windows/20170302) MIME-Version: 1.0 To: Mel Gorman CC: David Nellans , Anshuman Khandual , , , , , , , , , , , , , Naoya Horiguchi Subject: Re: [PATCH 0/6] Enable parallel page migration References: <20170217112453.307-1-khandual@linux.vnet.ibm.com> <20170309150904.pnk6ejeug4mktxjv@suse.de> <2a2827d0-53d0-175b-8ed4-262629e01984@nvidia.com> <20170309221522.hwk4wyaqx2jonru6@suse.de> In-Reply-To: <20170309221522.hwk4wyaqx2jonru6@suse.de> X-Enigmail-Version: 1.2.3 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig230AB448C8C0DF2269928DE5" X-Originating-IP: [12.1.252.66] X-ClientProxiedBy: BN6PR03CA0060.namprd03.prod.outlook.com (10.173.137.22) To DM5PR14MB1657.namprd14.prod.outlook.com (10.171.156.15) X-MS-Office365-Filtering-Correlation-Id: 90715668-ca93-4fa6-1bc0-08d46746928d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:DM5PR14MB1657; X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;3:9gy4Zv5+egp46KvzKAykYKoAvabBLSzNFBmMB0Mu2OikdlHcvqR81Iozu1ObjLO4Yg5q8bhX/dybDBC0tEnGCJEXVRL/Ayc5szGEiRGIPiOMa7b4MTghzdaogb3YRoi5aUtbH6AJvMtcxog9JX30Evs722D04HX0pYiBETOCmRz5LuZgaTWYhsKHSzB6SHBgGD5zo7JOez8rtUlwQN4xVx8+0htt20Qd0a55+XMFZaDHUcgoPyixZxmR3icrb4ckJgcUHTtr4EZ8Z2IYIgLi9Q==;25:k3hd8Oqhzutt/wG0Cc4snjRQ2zUxl/1LuCP3515PVFhJ3RhtxA5B51gkhTfyDq48RtCPRNnoIH2dJc1/Yp0iKReKOsXkUPhVjtTjmzMYT8CVSWcQMGKxXemJ71oYuR0uY5OSadUgAIUTygDjLDzWSjENTpF5w6gSLpIolEDJOqoAEyf2jxy6F3zVGo/9m5gzjBLTkbnE3JtrWzqTFXpUmuyaBz0TJP0ME2XMO8Fsa6E2lK30r2EuUZ0x9v8rb2zSzvvd68S9dTdviZ/S0JVyxcZ96dJEIDD4rMsDSfBMu0+SggQtfjiTJRHKq365Qea85aB9aQF96mxWzQb8l4okmUB+pnktsBVJ1YA/j0BsG+3vGptFeW3DJnPd9x6Kl8nkorQOm2HLJIGeMhHQ0KqSLKhujAsOnL8oQ/Knhk7WoZKCEsI/+3nJdfavn7UbUdRpgaKHhcDC28QZR/Jc5KWE8Q== X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;31:8/9446Hg0Ze0hmPRDpSbUxOu0EsobTUUU2iMYO0nrAyCDxArTRclcoTaKV2YpEGVqPboT2HAgZ6T0RbtsDFzyWTY9DC+s+uXrwZYUdzM4S3WZpkVX+LB8pOg+pzaeoB6SUSs5cg6e0Lv0Pr1iauBa79ZD7dOT4DOY51O9drIb37HJY+ldBL1RMWDdJrbQoqVZj5xJltio2/80fInDhJPCSpKZAtim9JSO8Er9WY0pC8BlNk/tTwrahjYCT/ocvXH;20:jj8nS2eHCHPtjUcYrJZh6456s8nEaymyZSUowdJh1o08X8RpnlZnYx0eCFwnIrLDAUELmv8/Xk9mzmg5z2/43nJOSCoG3OeVvjd3Y6AXODWqg889FPdR9Y1ri137xyZp+IZ3JnUulMMzNrnef7I7lKkE29XcxGsH2JDzy1zZ4oR8NXQhrhcbzW83zAy4dcLEUQxVWMtICC7A8HDq1cuB/NmhqSQq9uBzUK7P/UUaeyrRSyZvHJhZE9fvNkpkXNoUnQLzRAaAT+B/QoK3r9lvL45lCO7r+PsSUKJ9iOQprnGacfkZVzBZ07OuElctB1GhCX840aNtTeDWJxSCOxvK22jViY+IaxoFIUrhjSd34ZVHFFD6i2w2WOtBgozwTvTVBJPqu0m29mhid65P3VxQovnyD8WROUb4Z8pRdvzSRgZ9OBqOtOOBTYSON13NrXTkzawRTrGzB87O8aJ4S5YMzpU9LwwV3t3TqASbAlm4adtxY1nHVe8+JsY+GzbtMPWi X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123555025)(20161123560025)(20161123564025)(20161123558025)(20161123562025)(6072148);SRVR:DM5PR14MB1657;BCL:0;PCL:0;RULEID:;SRVR:DM5PR14MB1657; X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;4:xqPepbWFjH140v5Yg+/YGuUhpxANFh09lGXni8regQlCV0+MvGscL5Dz10MsKN+0ulINijS4or4QDbswGXCVzgg659SXhIgu3lbOeXNxCiuIMwUiHzKe5i0BQ/ZSkZDyVH6YoZtB+Tedp8c2hXpPt+QQTrYer5d4XX3oZxBOernHrwim8rLjSvzzB2kgGl9So9smSVTvTgmhkzgqNphAB/fn7BKLVrTxNP58d2xRDdILCPqA3QjD1HOhLhzBWtmP3aCE74aH5c/CTUEOGIZggZFLE2LBYC1jxIGfi1vYrC+wt6N8oJxwBzubmEPN598r3lHYnXHvEhQvIvupbuv/2I9CK0awoE4CmFOI4loDLqK4GdqZIcCNBa4LAQHuJrZEUWTlUx5fecvNTb7BUMVLOmtHHvfPvH0qYFVyKcoBb98HqDYNzgR2lyLCMj98Os9vIixuXOIxsay068wpefCkxu1awxm8pZMxLiuf6ZBPSoBdUNOSi/5P2j6mMpkecOWC6cLkncNPhbHgF/yLrGlI4RqXowUJT/eYAi7v1+uvMGYmXGfiUu0mYxxkrmpt46De95rpOgQjIqbNGNut94J20xIKGnFlfkX5aVuPhFTb7cs= X-Forefront-PRVS: 0241D5F98C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(39450400003)(50986999)(305945005)(5660300001)(54356999)(65816999)(33656002)(229853002)(76176999)(7736002)(21480400002)(512874002)(7416002)(53936002)(88552002)(2906002)(87266999)(6116002)(568964002)(42186005)(3846002)(75432002)(25786008)(84326002)(66066001)(90366009)(38730400002)(93886004)(86362001)(6486002)(81166006)(189998001)(110136004)(42882006)(54906002)(5890100001)(6246003)(4326008)(8676002)(77096006)(6916009)(6666003)(4001350100001)(2950100002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR14MB1657;H:[10.20.179.126];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM5PR14MB1657;23:zwmHIWgFl835H0fD0diw02Afb86kKvoejXGD0uAbn?= =?us-ascii?Q?x5UfQCeXD6+V4PPLX2fYRRs2u4TiIHfJtO6ccRgaN8sKbmh0Zi+Y445JRnXf?= =?us-ascii?Q?T2Rg7C7QvSKxC0gnnax5QRMt/xs0WMc6qvwzJ3fE1fWw7A1dD9rhAgt4qFzi?= =?us-ascii?Q?MHg5QZkEgd6e3QXqtpY6N+jzyflP5qsxcv8vfMyKyxKqMfj0y+x22qjW9G2B?= =?us-ascii?Q?ijGxgGby+GDwzPgUHVRvpL5JyEUzvLzRZezFkJkRNpHoVy1WZ2hR2MVfNV1/?= =?us-ascii?Q?UVtrhv0IooaHne6IINo0fqefB45Ev1KghhhFPKotBAkbLfl2AKz6hm84uLXd?= =?us-ascii?Q?jz+e1yjjMwxEs89sCI5jVEFkUqBEf6YJPUPjK74+0nVMErmmxd3nGZkn8U+0?= =?us-ascii?Q?rk02idJuF7ZYJE/+DT+/OBA4dlezWefbFE+nV/rtuf8Qsv6xhv+47TpWjdJQ?= =?us-ascii?Q?q89sApJB62RmlbKNHVr2R1zOOVxWSFlxgJIkH3jVHuBYzd+caVx+WaXWpCR7?= =?us-ascii?Q?F4c2hvxhqTEW24NpMQjEOuvtVhThwieiV+faP4AWbI0KvssOfg7ampwlaWs0?= =?us-ascii?Q?tLm62+he1DIyzXbCRyUVWBdpQ0jfhfj2k2EFXDXSVWAOQ7+bLJgDQXN9/IvX?= =?us-ascii?Q?q1uJOUAkZ/f7nTmb/0o7tBlJjDe3BIWC+82fEB3JzNMVe2wMJnBoRwoPS3iI?= =?us-ascii?Q?AtIYwtqVU/cMBEjzJ+P1vPWIzCz/LayV0Ruiryulw0b8862OL3G3YgddjL3z?= =?us-ascii?Q?7oQJnS0amUgI9s5S189NEHpDLOW/F7AkQwgtqvIRPwqUkjhP0icLZdevp7Bc?= =?us-ascii?Q?nYd6fWi7M8+8NA2lEuJkHgbETaY0k06M5xPooNTabsLgUtrIO5A1QLEwA1YW?= =?us-ascii?Q?sZhDHZJQocrD9VUTCW3E3nE5Vnyf+N4uA/iWgeIiiYT0KLuKawjctYv2/T7f?= =?us-ascii?Q?TccDq6d4/afneV8WNIET7YrexgtHWhh9Rd1gKK2mzyZZ8UeTUD8Q9IREyE7t?= =?us-ascii?Q?zaVLy9ELzKrsa/1U06aQvG911UHHyTqjEIlfn0yOABR/4K1bji+gwafTo1oT?= =?us-ascii?Q?Ef1/COtR/X+K5QaAcedVMMIFOrOI0byIehozbgGeoVzCBNtY4oE96lDKFnl8?= =?us-ascii?Q?0/QP5Zgugwrj4GDxhdAgIny8uFwPBvRf8Xv/GQf0Ni6LCQbWn1vDw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;6:7Z2zyhAEREW4+6CyQUyf8xBvJZQsbvFkAK0wIWHcgpUT0yzWePjs8xrV23sZ4YHq5AGpEKClcKDxNzwdZ7t814nP+jtX6W7vU/KxJMtJJCm2J8wz/k7KPWP18lrr8fcJSLLe+2P2usdHFHJndXEgixkbB/ZrKpKcR0yYTNLhktuSzt+WrdlH+UYWtwUDlob61fxnRgRBxgS5G+/F/KlG+Y8W7vSoosLAGBdmUM+QX8tYWIfhkpPcampFBz2LoMAHaiBV5VaFr5OQ07lHlzRpM2zdUTO+obGETs2cdfeFPC6BtcxsFFYNrk/o23ZnRh3zUukem5E4BcUnfTA6eUopcIQztZVN7TRWrEDa0Nc32h6qRxNpDEnf3oXzx2SuoHU2nnWvHW3+IbqtPsivmRQx1Q==;5:/M5rI6TmODplnms1qXhazRp7t5ll4tmL3K8dJ77W3qK3q7NKk8nArD/835OItRDZw/4OsQgh4VhGwiXLCTwG1+r4+NmkyNNJrNSKHtUGXKwQMGwrNenMlcMqgldU6TVv0GSA0O+CpgRZN3XODY3QAg==;24:CrOHudz8cyI4VIgYNyGaHjAlQ1OslRL7mJXWbRaJM6n/1kXcjpZZ3wrBzXTw/PZ387xGRMt+GzahesfXmKQ5TYsXxknCFHNVCLw282k6jtY= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;7:XDkXmT6r4RIwdIQDEQb5beR8MCmkJoA34XmEXmAiSNpmV5i+nykdHFRbIk6sGtRAma43SVEJdjbkkwznAy90rAM97V4jKVQXgvjU+YUzlPozdo0YUh8dXGLBW4epnwqNNI0u8UPw/yMDXD2vkKrju7hAnKJe2MOp9J1CVCJFCz8QVRVxSdzRyEg5UG+od74NYFBMs8CvDHMyzmqvFKuJqwkaQpj6Ip9aeouwodaEG/MfPZJ2bUbHYxz1EM7fbGQTuDEpGOpZ3oPT01jLpLcERpjVIgSXY4bI899uWPzMd8SKvYug/JWq4InAHqtZuI3om6+59zN9CzeNbr9+vpIR3w== X-OriginatorOrg: cs.rutgers.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Mar 2017 23:46:54.6545 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR14MB1657 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5912 Lines: 156 --------------enig230AB448C8C0DF2269928DE5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Mel, Thanks for pointing out the problems in this patchset. It was my intern project done in NVIDIA last summer. I only used micro-benchmarks to demonstrate the big memory bandwidth utilization gap between base page migration and THP migration along with serialized page migration vs parallel page migration. Here are cross-socket serialized page migration results from calling move_pages() syscall: In x86_64, a Intel two-socket E5-2640v3 box, single 4KB base page migration takes 62.47 us, using 0.06 GB/s BW, single 2MB THP migration takes 658.54 us, using 2.97 GB/s BW, 512 4KB base page migration takes 1987.38 us, using 0.98 GB/s BW. In ppc64, a two-socket Power8 box, single 64KB base page migration takes 49.3 us, using 1.24 GB/s BW, single 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW, 256 64KB base page migration takes 2543.65 us, using 6.14 GB/s BW. THP migration is not slow at all when compared to a group of equivalent base page migrations. For 1-thread vs 8-thread THP migration: In x86_64, 1-thread 2MB THP migration takes 658.54 us, using 2.97 GB/s BW, 8-thread 2MB THP migration takes 227.76 us, using 8.58 GB/s BW. In ppc64, 1-thread 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW, 8-thread 16MB THP migration takes 1223.87 us, using 12.77 GB/s BW. This big increase on BW utilization is the motivation of pushing this patchset. >=20 > So the key potential issue here in my mind is that THP migration is too= slow > in some cases. What I object to is improving that using a high priority= > workqueue that potentially starves other CPUs and pollutes their cache > which is generally very expensive. I might not completely agree with this. Using a high priority workqueue can guarantee page migration work is done ASAP. Otherwise, we completely lose the speedup brought by parallel page migration, if data copy threads have to wait. I understand your concern on CPU utilization impact. I think checking CPU utilization and only using idle CPUs could potentially avoid this problem. >=20 > Lets look at the core of what copy_huge_page does in mm/migrate.c which= > is the function that gets parallelised by the series in question. For > a !HIGHMEM system, it's woefully inefficient. Historically, it was an > implementation that would work generically which was fine but maybe not= > for future systems. It was also fine back when hugetlbfs was the only h= uge > page implementation and COW operations were incredibly rare on the grou= nds > due to the risk that they could terminate the process with prejudice. >=20 > The function takes a huge page, splits it into PAGE_SIZE chunks, kmap_a= tomics > the source and destination for each PAGE_SIZE chunk and copies it. The > parallelised version does one kmap and copies it in chunks assuming the= > THP is fully mapped and accessible. Fundamentally, this is broken in th= e > generic sense as the kmap is not guaranteed to make the whole page nece= ssary > but it happens to work on !highmem systems. What is more important to > note is that it's multiple preempt and pagefault enables and disables > on a per-page basis that happens 512 times (for THP on x86-64 at least)= , > all of which are expensive operations depending on the kernel config an= d > I suspect that the parallisation is actually masking that stupid overhe= ad. You are right on kmap, I think making this patchset depend on !HIGHMEM can avoid the problem. It might not make sense to kmap potentially 512 base pages to migrate a THP in a system with highmem. >=20 > At the very least, I would have expected an initial attempt of one patc= h that > optimised for !highmem systems to ignore kmap, simply disable preempt (= if > that is even necessary, I didn't check) and copy a pinned physical->phy= sical > page as a single copy without looping on a PAGE_SIZE basis and see how > much that gained. Do it initially for THP only and worry about gigantic= > pages when or if that is a problem. I can try this out to show how much improvement we can obtain from existing THP migration, which is shown in the data above. >=20 > That would be patch 1 of a series. Maybe that'll be enough, maybe not = but > I feel it's important to optimise the serialised case as much as possib= le > before considering parallelisation to highlight and justify why it's > necessary[1]. If nothing else, what if two CPUs both parallelise a migr= ation > at the same time and end up preempting each other? Between that and the= > workqueue setup, it's potentially much slower than an optimised serial = copy. >=20 > It would be tempting to experiment but the test case was not even inclu= ded > with the series (maybe it's somewhere else)[2]. While it's obvious how > such a test case could be constructed, it feels unnecessary to construc= t > it when it should be in the changelog. Do you mean performing multiple parallel page migrations at the same time and show all the page migration time? --=20 Best Regards, Yan Zi --------------enig230AB448C8C0DF2269928DE5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJYwelqAAoJEEGLLxGcTqbMpLgIAJ4laqF6RLuGU5pd1iq487Fk k3xsMdVFISchr8AIJUayaQfD/b074eyRk8s5kprclbLG4+QAeHRhdexlWwuONVus GQTUlFWm2ZuFu+A0tZRtWuln6rJ8h1po0o7Q9z4KW7GE4BVVyjNVPAvXtM4kjsF6 hnQYfoknANRnTKAWb1D/wtvU0C+ftfxJkWpw7x3RMC1spUybbZBFEQFuFYIEBvHA kVH9BIlGwAhWpxTA5ONIyZfBIo+BOwTNHabG5gKzRszwk7hyuaRiu39dabOUky63 3WHO59yeNXojSu7WuHq5f9qC97+GHUrrEt1xYh6xxnco54Gv/4ZYVdFeU1iO9bk= =yU1Q -----END PGP SIGNATURE----- --------------enig230AB448C8C0DF2269928DE5--