Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752967AbcD2UkM (ORCPT ); Fri, 29 Apr 2016 16:40:12 -0400 Received: from mga04.intel.com ([192.55.52.120]:35691 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752554AbcD2UkJ (ORCPT ); Fri, 29 Apr 2016 16:40:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,553,1455004800"; d="scan'208";a="965633907" Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues To: Andy Lutomirski , Yu-cheng Yu References: <5723A353.7060209@linux.intel.com> <20160429195741.GA15402@test-lenovo> <5723BE1F.7040300@linux.intel.com> <20160429200709.GA15412@test-lenovo> Cc: X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , "linux-kernel@vger.kernel.org" , Andy Lutomirski , Borislav Petkov , Sai Praneeth Prakhya , "Ravi V. Shankar" , Fenghua Yu From: Dave Hansen Message-ID: <5723C6A7.4020704@linux.intel.com> Date: Fri, 29 Apr 2016 13:40:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1238 Lines: 24 On 04/29/2016 01:25 PM, Andy Lutomirski wrote: > On Fri, Apr 29, 2016 at 1:07 PM, Yu-cheng Yu wrote: >> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote: >>> That's not feasible. Think of dynamic libraries or just-in-time >>> compilers. What instruction set does /usr/bin/java use, for instance? :) >> >> The java argument is true. In that case or when the bitmask is >> missing, we can allocate for all supported features. > > I actually want to see us moving in the direction of unconditionally > allocating everything on process startup. If we can stop using CR0.TS > entirely, I think everything will be better. We can absolutely allocate the worst-case XSAVE buffer at task startup for folks that never want to see a latency spike in the life of the app no matter what. But I also think it would be pretty nice if 'ls' didn't pay the 2k cost to have AVX-512 state if it's not using AVX-512. We also don't have to do this with CR0.TS. We'd actually use a combination of out-of-line (not appended to task_struct) XSAVE buffers and XGETBV1 to check the size of our XSAVE buffer before we call XSAVE* and resize it when needed. Maybe nobody will ever care enough about 2kbytes/thread, though.