Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761495AbcJROVH (ORCPT ); Tue, 18 Oct 2016 10:21:07 -0400 Received: from ex13-edg-ou-001.vmware.com ([208.91.0.189]:15990 "EHLO EX13-EDG-OU-001.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757696AbcJROU5 (ORCPT ); Tue, 18 Oct 2016 10:20:57 -0400 From: Alok Kataria To: "jacob.jun.pan@linux.intel.com" , "rui.zhang@intel.com" CC: "linux-kernel@vger.kernel.org" Subject: Regression in intel_powerclamp, due to cpu whitelist removal Thread-Topic: Regression in intel_powerclamp, due to cpu whitelist removal Thread-Index: AQHSKUrTuSOnE6DRdU2lL2UjAwZ8qg== Date: Tue, 18 Oct 2016 14:20:49 +0000 Message-ID: <2FF1D5AB-46C6-4BEC-A5A7-EC9C16A99919@vmware.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=akataria@vmware.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [117.220.248.148] x-ms-office365-filtering-correlation-id: 0219fd0b-f27c-47ce-76ef-08d3f761f602 x-microsoft-exchange-diagnostics: 1;BY2PR05MB694;20:AXkHFjS2VnO++gK9hIZplk9NMtZBH5dePi+CqIobZlrF77iFuY6wNO19PjRjFDp5VM6KyovKL+OrRjJ0YLziWeKTbw2Ujht3BaNpxwr70jspbAH/qiAlRthndxGfllE5ZpBe1c1CP8zyQ+T2CSR3mkmmyIOEyp7cWe+s/ECDrOo= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB694; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:BY2PR05MB694;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB694; x-forefront-prvs: 00997889E7 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(199003)(189002)(99286002)(8936002)(33656002)(7736002)(82746002)(81156014)(81166006)(8676002)(92566002)(7846002)(305945005)(10400500002)(101416001)(50986999)(54356999)(19580395003)(106116001)(586003)(106356001)(189998001)(102836003)(3280700002)(5001770100001)(15975445007)(2900100001)(229853001)(77096005)(66066001)(4326007)(6116002)(97736004)(3846002)(122556002)(3660700001)(86362001)(68736007)(2501003)(5660300001)(2906002)(87936001)(575784001)(36756003)(5002640100001)(105586002)(83716003)(104396002);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR05MB694;H:BY2PR05MB696.namprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <227A7926EE17B146921A542009A87B62@namprd05.prod.outlook.com> MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Oct 2016 14:20:49.8895 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB694 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u9IELBjI010867 Content-Length: 4303 Lines: 61 Hi Jacob, Zhang, One of your recent commit "thermal/powerclamp: remove cpu whitelist” [1], has caused a regression in the kernel. That commit changed powerclamp_probe from requiring all of the following features: X86_FEATURE_NONSTOP_TSC X86_FEATURE_CONSTANT_TSC X86_FEATURE_MWAIT X86_FEATURE_ARAT to *any* of them. The problem is clamp_thread still wants to use mwait_idle_with_hints even if the CPU doesn't support it. This was reported by our users when running ubuntu 16.10 (4.8.0-22-generic) inside a VMware VM, though as mentioned above I don’t think it is specific to our platform. We have seen kernel panics due to invalid opcode because of this. Below is the stack trace for your reference. [ 5.736416] invalid opcode: 0000 [#1] SMP [ 5.736455] Modules linked in: vmw_vsock_vmci_transport vsock vmw_balloon intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd intel_rapl_perf input_leds joydev serio_raw snd_ens1371 snd_ac97_codec gameport ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore i2c_piix4 shpchp vmw_vmci nfit floppy(+) mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid ahci libahci e1000 mptspi mptscsih psmouse mptbase vmwgfx scsi_transport_spi ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pata_acpi fjes [ 5.744370] CPU: 1 PID: 912 Comm: kidle_inject/1 Not tainted 4.8.0-22-generic #24-Ubuntu [ 5.744373] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 5.744375] task: ffff9658f7a663c0 task.stack: ffff9658fa908000 [ 5.744378] RIP: 0010:[] [] clamp_thread+0x2b8/0x5d0 [intel_powerclamp] [ 5.744380] RSP: 0018:ffff9658fa90be00 EFLAGS: 00010246 [ 5.744383] RAX: ffff9658fa908008 RBX: 00000000fffee0a6 RCX: 0000000000000000 [ 5.744386] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246 [ 5.744388] RBP: ffff9658fa90bec0 R08: ffff9658fa908000 R09: 0000000000000000 [ 5.744391] R10: 000000000001cbf7 R11: 0000000000000000 R12: ffffffff8db581a0 [ 5.744393] R13: ffff9658fa908000 R14: 0000000000000000 R15: ffff9658fa908000 [ 5.744396] FS: 0000000000000000(0000) GS:ffff9658fc640000(0000) knlGS:0000000000000000 [ 5.744398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.744401] CR2: 00007ffa6cc262e8 CR3: 000000003ab3b000 CR4: 00000000001406e0 [ 5.744403] Stack: [ 5.744406] 0000000000000001 ffff9658f7a66dc0 ffff9658fc659200 00000000e878d638 [ 5.744409] 0000000000000001 00000002fc659200 0000000000000001 ffff9658fa908008 [ 5.744411] 0000000000000000 ffff9658fc64fea8 00000000fffee0a6 ffffffffc05720a0 [ 5.744414] Call Trace: [ 5.744416] [] ? pkg_state_counter+0xa0/0xa0 [intel_powerclamp] [ 5.744419] [] ? powerclamp_set_cur_state+0x170/0x170 [intel_powerclamp] [ 5.744421] [] ? powerclamp_set_cur_state+0x170/0x170 [intel_powerclamp] [ 5.744424] [] kthread+0xd8/0xf0 [ 5.744427] [] ret_from_fork+0x1f/0x40 [ 5.744429] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 5.744432] Code: cc e9 ba 00 00 00 eb 19 0f 1f 00 0f ae f0 65 48 8b 04 25 04 69 01 00 0f ae b8 08 c0 ff ff 0f ae f0 31 d2 48 8b 44 24 38 48 89 d1 <0f> 01 c8 49 8b 45 08 a8 08 75 0b b9 01 00 00 00 4c 89 f0 0f 01 [ 5.744434] RIP [] clamp_thread+0x2b8/0x5d0 [intel_powerclamp] [ 5.744437] RSP [ 5.744440] invalid opcode: 0000 [#2] SMP [ 5.744452] ---[ end trace cf659c4076bf2804 ]--- Looking at the instruction at the RIP shows that the kernel attempted to execute “monitor” instruction. 8b8: 0f 01 c8 monitor %rax,%rcx,%rdx 8bb: 49 8b 45 08 mov 0x8(%r13),%rax To fix this, I think you should restore the explicit feature check “if block” that was removed in the above mentioned commit. Can you please look at this ? Thanks, Alok [1] b721ca0d192754deccb89fb01c77e41e6fd91ad9 https://github.com/torvalds/linux/commit/b721ca0d192754deccb89fb01c77e41e6fd91ad9,