Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754707AbbGXUa0 (ORCPT ); Fri, 24 Jul 2015 16:30:26 -0400 Received: from mail-db3on0080.outbound.protection.outlook.com ([157.55.234.80]:3675 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752925AbbGXUaX (ORCPT ); Fri, 24 Jul 2015 16:30:23 -0400 Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none; Subject: Re: [PATCH v4 2/5] nohz: support PR_CPU_ISOLATED_STRICT mode To: Andy Lutomirski References: <1436817481-8732-1-git-send-email-cmetcalf@ezchip.com> <1436817481-8732-3-git-send-email-cmetcalf@ezchip.com> <55AE9EAC.4010202@ezchip.com> CC: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" From: Chris Metcalf Message-ID: <55B2A03F.4070009@ezchip.com> Date: Fri, 24 Jul 2015 16:29:51 -0400 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: BY2PR09CA0037.namprd09.prod.outlook.com (10.242.234.165) To HE1PR02MB0778.eurprd02.prod.outlook.com (25.161.118.142) X-Microsoft-Exchange-Diagnostics: 1;HE1PR02MB0778;2:N2Wm6BR/jmwHrjkIRNB+VWlVGW7q8Nb2M65FSWcZxpHaIH5ehNwBgpk3zF7p22ok0oYuQh0TJVg5Ptp943TgDpXRAFePEtZGzmAT+JVLtoDqCHhsYA8AFLeGNabe4ty+EcW7RL7QosHo0RsK2z5VtiSfQCpZnb46C3orPdkyQdg=;3:EaKvNMV8jyO2l0cVN5LM9jIQrHQVFXhuav4kyrg/SrQn+Cts9DQQZsu4XrtxkkXeQ5MxFJiALiPVoZHlyo35PVRltNjSm2ycVVOUVmNjG+ERa7h9CBCSuKhqhqGAqoyCds6Q2pBS27ORj7+jc2WEKg==;25:/H42nA79NqvQtG7PKnbDIdaN64ak7i9gprIi8ejHPPmEHsnVJhLDKkmOXSEfF2biim8XextQOkthjNcFMATkRqzFvSbJbcbZ5ZSK2nKJBsbLT1osVgsV8CDM6i3aPjsIsdw1JNTgKBeHyb773u1Z7cQJbuWrenyqIXhxxDUVnnBubcNVWaXhqEa66THC8nymi0XUpJU/v3PhYaBW5KEx8vrR1C0WEEpQCbrPKQYwdNn4lvrqAmXkExOVLsHnHWEn+N5d2thScdMr7GFznBg+qg==;20:RkOThc5XZTMHab/04HLnmKBs5QE8RL0cHKjwcpd2h2Qr9SWG8eBxTe7q0/oc1dQCyHGy03eb4NfQeYVynyRQukpBdtGRoCzkCBblVsO94XA7C26SaeQewl3dH2O5zjwjSQ/CKFd4d6U4/hLvyoVnkiTOOH42yfulZ56x1K5PKJ4= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR02MB0778; HE1PR02MB0778: X-MS-Exchange-Organization-RulesExecuted X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:HE1PR02MB0778;BCL:0;PCL:0;RULEID:;SRVR:HE1PR02MB0778; X-Microsoft-Exchange-Diagnostics: 1;HE1PR02MB0778;4:1PNgC1g6VlTE5V0IQ3sqq0cr8fewV4tztO4g+jhmAbJjBXZ2TzMaX1zi0pdQsZ45rUaproG48d9nBaclofvwwkcgyJ/uTy8NVGFZ9YsjYjpjPwH9Q9LlhiByfoCQRRgaEZ+j40QbPvAr1s6GcP+E3S9rTRRAxyn2okIRrZ45CGp57txaVsNNmcwS2lmDATbXiyIkj7AhB+JLIZjpw5WZ0vZLcrEp1CtgM6e0Aky7KdRksGdSnzH8ixPvp8JrQQZkcqhP8tAJAUEHnj3XWc3EOvo7ujL4KYI+H9DUaCLiM84= X-Forefront-PRVS: 0647963F84 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(6049001)(24454002)(479174004)(377454003)(62966003)(87976001)(33656002)(47776003)(65806001)(122386002)(40100003)(77156002)(15975445007)(65956001)(46102003)(189998001)(93886004)(83506001)(110136002)(42186005)(2950100001)(5001960100002)(65816999)(77096005)(19580395003)(66066001)(4001350100001)(5001920100001)(86362001)(23676002)(80316001)(36756003)(19580405001)(50986999)(92566002)(54356999)(76176999)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR02MB0778;H:[10.7.0.41];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjAyTUIwNzc4OzIzOktuWmlkaGNvRi95d3IzU0d2MFVRNzFZeFZj?= =?utf-8?B?bmFUK3JlQ2FVWHcxcy9iZi92TUpzd0RRakFCWWVaYnVnOHlyQzFyYjYyc1li?= =?utf-8?B?ditSQUgvQ21UN2RYUGo2NVJ6a2F4ZnN0Rmg1YWZlRjREWHdHYW1jdXBGZVNE?= =?utf-8?B?YjB3UEFWK3QxL1BkenlHczFtNTJQd1pCczJGSGVTa2NMM2xldFhoYTk4VlQ2?= =?utf-8?B?cWNNdDkrVXFwdjF6emZaTGU4bURXTjhrWTBwbmVhUGtIcmZ3QXY2bjJ0OGo0?= =?utf-8?B?NXp3TEQxTVRreDdLc2grWWtwazdlZjVTMmZjNHZ4eFBwano1L1RiWWFWcFNX?= =?utf-8?B?ck9iTmcrSkFGSEcvVGpTMWtYU2k5QmVMS053d2UzN0VhTmtoZnVGaFFveDdY?= =?utf-8?B?blJSNHRta3J5N04zUklZWjFmSVV5Tm1Ga2hxVVpNdDBIaWVPczdJMmJrQnV1?= =?utf-8?B?SzdkS2dCZDdoUmpUb0hCand5bnZDZmFDc2NrbmQ3MFVWVE1BRk5ZT09tSFRE?= =?utf-8?B?eHNBcS9CNFFtUk9rUkFTNVhnamcra3BGbFYrdUZQdCsvZi9vY1lSTkFvdWxq?= =?utf-8?B?RFdXVmtaWlhPUFZIVFdUZTVZWndDbm1nUC9acUNpRGZ0dW8wYXcvNmxzNkxk?= =?utf-8?B?bll2c1JldUxvNGZDR2hXeHhFTmk3TWp6VzdLNmNwalNhQ3BPak56Q1hhc29J?= =?utf-8?B?cUM5V2EwK3BJc1ByeDRpREdDa1VJTDBOUDJPTUsxdXBOU2pWY0VLQlRmZGFn?= =?utf-8?B?OGg3NkczcHlQZ3l1bnh3Q0lieXpLazJWU0EyQVRzQ3l6bCszdDBvSXNFSzRX?= =?utf-8?B?eTdONHdhVWJacU1STVk3a2VEaXRGMXBkSlh4ck1wWndja3JIUDViVUMvRnVZ?= =?utf-8?B?STJUc3ZIZURLdmZ3SG85QmFuK1VJK1VIa3ZKaU1zb3Y5S3BpWEpMOFh6VFlx?= =?utf-8?B?L016SExWRGxnblZSVWsrRnlvWnVQVGl2T25obmRFdDB5M3Zna1dUajFJQmdV?= =?utf-8?B?RThsK1A1OG4xc3BQaW9nOG1TVkxhZnUzOW9QMlc1a3A0Y0t6VXR2UnBqM0lX?= =?utf-8?B?OUNDMlVNRjZEY2Qwb1U2cFlQNnQzRTY3Q1VxTHpreTR4aEVXTWNCNHYzVVFK?= =?utf-8?B?VGJBSE5SK0JJck1INSt2SG9DbXc0U2V1MlExRkR5SmVIaExQWHV6QWRMM1pp?= =?utf-8?B?c2FVV3Z3L3k2SnBqVmtLL1Q3YTluSktudDJjVm42bzBXNTROWi9PUEVEc1hy?= =?utf-8?B?UFZNemEwY3AxSzZpdElkZXRyTEVHaGVnNlVlN0ovdkdDZjk2TjcyNndOUGl3?= =?utf-8?B?aFJBMllRcnBKZ3p0aFJETmswYjFHZzFTczgxSCtlREdrSHdPQ3dFOVV4aTBl?= =?utf-8?B?RnpqQStWbjhMTEFidldyZUsvS2ZaWXlzYlpGQkNKRnNGc0xmUk1KV2pDZUht?= =?utf-8?Q?ZVCr80=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR02MB0778;5:kJoW1qMVVA7Q0DEcomLPQuFTLmLe3F5Nu42EcvR73DUA9lBb1CxNdYKkNpL5NwQw1PaTlxdGo/dyvip9utI5lc6pTE0kl8Prrk45Y/rWYsaMDaag/QITJFIcRLXUPX7NtAPseSqhGLzpbGWIPkUe0w==;24:zK6orFwfXrZuvt6OKo3UtE8kS43yK8lFg10BU2XVr/mHZH744gyUhbDlorZw82U25B/OMBGOM33EdrUhJPioCL3R9P8LZLb68Wzccx803GI=;20:O8GCmoTl0j9qXiwINl2RXnsr8Prtesj14NREiK//qlC9EU4BF5N+ZBolPDu84xqqoL/85MdnFji++1aOrKTTAQ== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jul 2015 20:30:11.7011 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR02MB0778 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4787 Lines: 91 On 07/21/2015 03:42 PM, Andy Lutomirski wrote: > On Tue, Jul 21, 2015 at 12:34 PM, Chris Metcalf wrote: >> Second, you suggest a tracepoint. I'm OK with creating a tracepoint >> dedicated to cpu_isolated strict failures and making that the only >> way this mechanism works. But, earlier community feedback seemed to >> suggest that the signal mechanism was OK; one piece of feedback >> just requested being able to set which signal was delivered. Do you >> think the signal idea is a bad one? Are you proposing potentially >> having a signal and/or a tracepoint? > I prefer the tracepoint. It's friendlier to debuggers, and it's > really about diagnosing a kernel problem, not a userspace problem. > Also, I really doubt that people should deploy a signal thing in > production. What if an NMI fires and kills their realtime program? No, this piece of the patch series is about diagnosing bugs in the userspace program (likely in third-party code, in our customers' experience). When you violate strict mode, you get a signal and you have a nice pointer to what instruction it was that caused you to enter the kernel. You are right that running this in production is likely not a great idea, as is true for other debugging mechanisms. But you might really want to have it as a signal with a signal handler that fires to generate a trace of some kind into the application's existing tracing mechanisms, so the app doesn't just report "wow, I lost a bunch of time in here somewhere, sorry about those packets I dropped on the floor", but "here's where I took a strict signal". You probably drop a few additional packets due to the signal handling and logging, but given you've already fallen away from 100% in this case, the extra diagnostics are almost certainly worth it. In this case it's probably not as helpful to have a tracepoint-based solution, just because you really do want to be able to easily integrate into the app's existing logging framework. My sense, I think, is that we can easily add tracepoints to the strict failure code in the future, so it may not be worth trying to widen the scope of the patch series just now. >> Last, you mention systemwide configuration for monitoring. Can you >> expand on what you mean by that? We already support the monitoring >> only on the nohz_full cores, so to that extent it's already systemwide. >> And the per-task flag has to be set by the running process when it's >> ready for this state, so that can't really be systemwide configuration. >> I don't understand your suggestion on this point. > I'm really thinking about systemwide configuration for isolation. I > think we'll always (at least in the nearish term) need the admin's > help to set up isolated CPUs. If the admin makes a whole CPU be > isolated, then monitoring just that CPU and monitoring it all the time > seems sensible. If we really do think that isolating a CPU should > require a syscall of some sort because it's too expensive otherwise, > then we can do it that way, too. And if full isolation requires some > user help (e.g. don't do certain things that break isolation), then > having a per-task monitoring flag seems reasonable. > > We may always need the user's help to avoid IPIs. For example, if one > thread calls munmap, the other thread is going to get an IPI. There's > nothing we can do about that. I think we're mostly agreed on this stuff, though your use of "monitored" doesn't really match the "strict" mode in this patch. It's certainly true that, for example, we advise customers not to run the slow-path code on a housekeeping cpu as a thread in the same process space as the fast-path code on the nohz_full cores, just because things like fclose() on a file descriptor will lead to free() which can lead to munmap() and an IPI to the fast path. >> I'm certainly OK with rebasing on top of 4.3 after the context >> tracking stuff is better. That said, I think it makes sense to continue >> to debate the intent of the patch series even if we pull this one >> patch out and defer it until after 4.3, or having it end up pulled >> into some other repo that includes the improvements and >> is being pulled for 4.3. > Sure, no problem. I will add a comment to the patch and a note to the series about this, but for now I'll keep it in the series. If we can arrange to pull it into Frederic's tree after the context_tracking changes, we can respin it at that point to layer it on top. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/