Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754754AbbGXUW0 (ORCPT ); Fri, 24 Jul 2015 16:22:26 -0400 Received: from mail-db3on0064.outbound.protection.outlook.com ([157.55.234.64]:40736 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753908AbbGXUWV (ORCPT ); Fri, 24 Jul 2015 16:22:21 -0400 Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none; Subject: Re: [PATCH v4 1/5] nohz_full: add support for "cpu_isolated" mode To: Andy Lutomirski , Paul McKenney References: <1436817481-8732-1-git-send-email-cmetcalf@ezchip.com> <1436817481-8732-2-git-send-email-cmetcalf@ezchip.com> <55A4271B.9040506@ezchip.com> <55AE993E.6040501@ezchip.com> CC: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , Christoph Lameter , Viresh Kumar , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" From: Chris Metcalf Message-ID: <55B29E6F.7020600@ezchip.com> Date: Fri, 24 Jul 2015 16:22:07 -0400 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: CY1PR14CA0044.namprd14.prod.outlook.com (25.164.65.140) To DB5PR02MB0774.eurprd02.prod.outlook.com (25.161.243.145) X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0774;2:3dG3kFNI/0oYf2KaS5mpRGY+5W7Z9JudH50IAm/kzDl251lY/RzdU+oFGqIKEoWm;3:u3yq/fNC4qW+Jztq+QEn4BmyEQ4CVXowiWsm9OIRIETCxsmPGuBmiXwTqMEhlv3YDxFUAIGZwrA6ph2vHSoM15Xe5tiHL9CI83ZInu1F+pSZuUgBPwNbUz6+UcGGm9ZZyDIFGQEawF9/as+OvcGxQA==;25:7pFkH5qvRsNBxLztTv/p3W4CfVrWp+NtJjorfQ8EJjkKTySHqDPYoDHU9BGx+ZH2KhuFGu2FbxJ7kMCF4WfVw7tYaNKshYSXQ3wuk86wiq0FWSFfHQF0YUOQyz7AHHbjKheAbeu5bdmtv2KFhjTYbu8qheLQ1v6NAEVhddjvDp9t8yaOS8g50JQ7NzCjzZVAu8PDxXaTjQ9z3MikwgCyjkwfrFtpOdAlst4GpxykTbe9k4s6CLhARbSoTk3FfCrGH55jeH7eDHKFCGvSSAZfhw==;20:Fbjxup2PJ8CMgainMHTQm1TFcLhnODJo71tMPfaY7xTWt1mvo4k0rEXdcmwgy390ak6DyTSk0tXOyKr4qwF5/u11XfoVSbWNCgwLAO3qrJXbEKlAKe4/MFXQf/nypDeXZ3vq5e3VVGuQqKLf+Fl3SrzCFsEJife+ekd9RdqmtsU= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0774; DB5PR02MB0774: X-MS-Exchange-Organization-RulesExecuted X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:DB5PR02MB0774;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0774; X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0774;4:BcCLU23UMPxGHFmj8VnjSD3G3Nz5Yw/yf+ykWHnqPxWWRAeGy/v3zwmtmjHoboAl/E8664PRpf6kvx1NCqdnIwGK6qFz1qEeTMtkGP1w7er4EAYTo1bKxCRy7z5vc99yQad65s23CYrhJd2qTdQvLVFy2YftsEMG+aZiH1rFxp6eLuiJdgdVahZCR73gLKqGeqjWpxf0vLVPehCapz8vTW+iW0QO2M+l6LkbLIHJtM/7HorVMVauYDO0tUgpjrPC8J+8andbc9kv/6AYbOBgV52PLMuUZupXx2rMC5Nql8Q= X-Forefront-PRVS: 0647963F84 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(6049001)(24454002)(377454003)(479174004)(62966003)(77156002)(19580395003)(87976001)(92566002)(66066001)(65956001)(65806001)(33656002)(19580405001)(76176999)(50986999)(54356999)(65816999)(80316001)(86362001)(122386002)(83506001)(40100003)(5001770100001)(23676002)(4001350100001)(42186005)(36756003)(5001960100002)(189998001)(77096005)(2950100001)(15975445007)(93886004)(46102003)(47776003)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB5PR02MB0774;H:[10.7.0.41];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjVQUjAyTUIwNzc0OzIzOnZ2WUtFMjF1bjAwWWZpL0F5Tld6bWNMSUM3?= =?utf-8?B?RkpxbDUxcjhnZ3I4QlcybWU2VGtVNHNORm96RmNGSk1CeXpsbHl2UTllaHJY?= =?utf-8?B?TWVRK0tITjJMTDVUOTZPWWZkU2RMU0xUc0EveWpmdHkyWkU2ZFc3M3E3dnNh?= =?utf-8?B?SkZxU2xrQ0I0dmpySUlwTXBSL09DTHZubjFqWFVvTll0Y05ObnoxWi8vSm5u?= =?utf-8?B?MnNUSmxrNzFYTHQyaVE5bFppbVhFaDVGWWVTUkJ2VHRKVktZVkxoYVpGWDBo?= =?utf-8?B?cVRhR2x4SU5sRU1pdFhGa0FpUXdPSExYYTlOUGU4QlN4SEhyK3J6ZFFCNXR0?= =?utf-8?B?ayt2ZHNUQnhOdVo2YTI4MFdZNWlNWm1objFKZytCZWtpa3lqZGg5YTNyblp3?= =?utf-8?B?Nk5aTk85bmxwd3lBMkJLdDlmeHRnSEEzTWE5U21lVGpUdmh3M2FuakVrbVph?= =?utf-8?B?UWhxQXRWb2hub3BPRExqUitoT1lQRit4MzlGanQzeTBkOWJEd0tVMC9pT2dX?= =?utf-8?B?NU5LL3JNMUhFOHlGRndJcmRjT0dnYlB3SVZYbW1yajRQNWR0a1JTbU5uRzg4?= =?utf-8?B?VEVva3h4Q0lsQ3JuYm9td2h2d2lOVURya2phb2tOTW80UnptcVo2QlNTQVND?= =?utf-8?B?Y1pLWjFobXdwNkJmUHlGdDhaZEtRYmJVNTlUOVRMWGJ3MU0zVTkvdk16OFBY?= =?utf-8?B?QmRRNzBTZW1TTENPK1hCOW5INkdESHA1bitvQVVlQXkvdHp0WTBaR1RKTE1Y?= =?utf-8?B?ZGJuKzRLN0xWWXRrNXptNlNDVk0vandBSG9SekJ0UWVmMWpYMjVHSWxnbzlP?= =?utf-8?B?aUczbG5KeFNaSnBGN24wRUlqMUJoQW1MVU94UnI5NjNRMlVvL01uVW9qODUy?= =?utf-8?B?eWVHajdmeXBLTEIxVHp4VnUrSVVZM1NTWDRyenlaZVZ4SkE4Z2xpdXhjZ3R0?= =?utf-8?B?Z29CZGpySEs4aEdlYVRqbVg0Z25DY2RNYkFFOXdDWWZTeWN2d0lnNnEzN1Q2?= =?utf-8?B?SzRsbmlPZXd1cllna1J3dzF6UklRVlBYVjc0M2FvVDE4bHZ3VklMU012YUlm?= =?utf-8?B?UWpzY01ZaWI1NU1IVEEraVJTd1J4Rk80MGRra00yZ2NncDd0MFpXMGVPN1pO?= =?utf-8?B?aFFyU3UzWHpZYUpRN0FqQ0tiOTVvVXdYd1NkcWprZHVUMjlSdXBvMU1Sc1My?= =?utf-8?B?bGVjY2d4NnVuby9ENml0d2NMT2Zqa1FYakx6VVNCS2RPcDI3NGZOT1ZUc1A5?= =?utf-8?B?c01nUWp3WWhMMXlDU0FPTWtPVFhETG42bG5yNEVJRC8zanVZUXhZd1RITGFl?= =?utf-8?B?aEZ0VUNYQnVnT2Y4RjhLc1daVXhYQ203cjhiVTFOb0VQZXFIZXhCOFkyMFJI?= =?utf-8?B?L2JMVkRTakh6ZkVEQzM0cFRjeWVLM3l1QThJZzlnPT0=?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR02MB0774;5:ZqYBsYxMXFKiJGtdyxsZlnbFg7WiompiJRDpw/Pw/kbIQMvwtsKeHUzIfxZ2EuS0xE1ojG+DPQMfqYsEjyYEe/WEReMPXSAgnRp9zI71h2C6XQgt1eEpHsAPrNYuoJlSkwmrKcZsdyHBMPm0W6ZcPg==;24:WZMGOFAcJ9qQiifi2e5kHxab19SbHDx/p6Z6bWnQMaXI46T9X49M6fE3wx5uXezIaM63jxbQocHm03g9gurLQW1/UzueUPQ1zrFFo+MjZa0=;20:RNrVJ1ggOfUxcAlrMQySZz0ixISi/p9oLnCXbnA7FRDKEsDw7l/HvGY8afvI3bN7j/iPfXrQcFp73JPZxwSa1g== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jul 2015 20:22:14.7709 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR02MB0774 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2957 Lines: 61 On 07/21/2015 03:26 PM, Andy Lutomirski wrote: > On Tue, Jul 21, 2015 at 12:10 PM, Chris Metcalf wrote: >> So just for the sake of precision, the thing I'm talking about >> is the lru_add_drain() call on kernel exit. Are you proposing >> that we call that for every nohz_full core on kernel exit? >> I'm not opposed to this, but I don't know if other nohz >> developers feel like this is the right tradeoff. > I'm proposing either that we do that or that we arrange for other cpus > to be able to steal our LRU list while we're in RCU user/idle. That seems challenging; there is a lot that has to be done in lru_add_drain() and we may not want to do it for the "soft isolation" mode Frederic alludes to in a later email. And, we would have to add a bunch of locking to allow another process to steal the list from under us, so that's not obviously going to be a performance win in terms of the per-cpu page cache for normal operations. Perhaps there could be a lock taken that nohz_full processes have to take just to exit from userspace, and that other tasks could take to do things on behalf of the nohz_full process that it thinks it can do locklessly. It gets complicated, since you'd want to tie that to whether the nohz_full process was currently in the kernel or not, so some kind of atomic update on the context_tracking state or some such, perhaps. Still not really clear if that overhead is worth it (both from a maintenance point of view and the possible performance hit). Limiting it just to the hard isolation mode seems like a good answer since there we really know that userspace does not care about the performance implications of kernel/userspace transitions, and it doesn't cause slowdowns to anyone else. For now I will bundle it in with my respin as part of the "hard isolation" mode Frederic proposed. >> Well, in principle if we accepted my proposed patch series >> and then over time came to decide that it was reasonable >> for nohz_full to have these complete cpu isolation >> semantics, the one proposed ABI simply becomes a no-op. >> So it's not as problematic an ABI as some. > What if we made it a debugfs thing instead of a prctl? Have a mode > where the system tries really hard to quiesce itself even at the cost > of performance. No, since it's really a mode within an individual task that you'd like to switch on and off depending on what the task is trying to do - strict mode while it's running its main fast-path userspace code, but certainly not strict mode during its setup, and possibly leaving strict mode to run some kinds of slow-path, diagnostic, or error-handling code. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/