Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753589AbbGaWMA (ORCPT ); Fri, 31 Jul 2015 18:12:00 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:60039 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753321AbbGaWL6 (ORCPT ); Fri, 31 Jul 2015 18:11:58 -0400 From: Shaohua Li To: , CC: , Suresh Siddha , Thomas Gleixner , "H. Peter Anvin" , Ingo Molnar , , Subject: [PATCH] x86: serialize LVTT and TSC_DEADLINE write Date: Fri, 31 Jul 2015 15:11:55 -0700 Message-ID: <75ed9226b028a31e37861fbba51cdffbcfe04eda.1438300532.git.shli@fb.com> X-Mailer: git-send-email 1.8.5.6 X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-07-31_11:2015-07-31,2015-07-31,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2531 Lines: 67 We saw a strange issue with local APIC timer. Some random CPU doesn't receive any local APIC timer interrupt, which causes different issues. The cpu uses TSC-Deadline mode for local APIC timer and APIC is in xAPIC mode. When this happens, manually writing TSC_DEADLINE MSR can trigger interrupt again and the system goes normal. Currently we only see this issue in E5-2660 v2 and E5-2680 v2 CPU. Compiler version seems mattering too, it's quite easy to reproduce the issue with v4.7 gcc. Since the local APIC timer interrupt number is 0, we either lose the first interrupt or TSC_DEADLINE MSR isn't set correctly. After some debugging, we believe it's the serialize issue described in Intel SDM. In xAPIC mode, write to APIC LVTT and write to TSC_DEADLINE isn't serialized. Debug shows read TSC_DEADLINE MSR followed the very first MSR write returns 0 in the buggy cpu. The patch uses the algorithm Intel SDM described. The issue only happens in xAPIC mode, but it doesn't bother to check the APIC mode I guess. Without this patch, we see the issue after ~5 reboots. With it, we don't see it after 24hr reboot test. Cc: Suresh Siddha Cc: Thomas Gleixner Cc: H. Peter Anvin Cc: Ingo Molnar Cc: stable@vger.kernel.org v3.7+ Signed-off-by: Shaohua Li --- arch/x86/kernel/apic/apic.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index dcb5285..b7890b3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { + u64 msr; + + /* + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. + * This uses the algorithm described in Intel SDM to serialize + * the two writes + * */ + while (1) { + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); + if (msr) + break; + } + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); + printk_once(KERN_DEBUG "TSC deadline timer enabled\n"); return; } -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/