Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53D30C7EE2E for ; Fri, 24 Feb 2023 23:56:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229636AbjBXX4N (ORCPT ); Fri, 24 Feb 2023 18:56:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229560AbjBXX4K (ORCPT ); Fri, 24 Feb 2023 18:56:10 -0500 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8CC355047 for ; Fri, 24 Feb 2023 15:56:09 -0800 (PST) Received: by mail-pf1-x42e.google.com with SMTP id cb13so446478pfb.5 for ; Fri, 24 Feb 2023 15:56:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=roI2102NVn8tZj8AT0mqQAS15kZ5FO9em57hIeDNwj0=; b=QB5cj7QKPC4eETDdYpgWnv3fyPKxoD9KZMEpY6QM0fHRGhMhDE4uxjfcIAoTovvyd6 h+ff9uW223/aV2TtJYDG4H6O5DdJAERUpY6e/VUR8g2Xbvv7rG6lSk2Zft5rgHL4mPqi nnEugeB85k/jI5MnwhOTyqWPfmvxztT6Aj8C4yKN8OcWw4L+GP1tAnKhiSLAKgq49Co4 GljDwHscPWZ7fNKEpdlUBZ4r2+JKDNTJ4H/B3enW9G2Cw5JmeLX1Ftn3pzzBDR1u9A3t Ihqm3IYqgD5EuUqKgbBYpq6HyScjibA9/wo5+/onFYXHUBFrrgKD/WYgI0F2XJrs76+B wnMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=roI2102NVn8tZj8AT0mqQAS15kZ5FO9em57hIeDNwj0=; b=5KGKocQY1y4pWOw0p4V6mOA0EYBHnz74fJK1pTWifw0GYAc1Z/PC67ltyVL5H8YjuI qx7rKUIE2NeIWdJAbyO+dn5sj4mfEHj0AM+LKFFherfgrarGkQfTbpRxIHZ0/ALdW07z CmhDu4aiMjazStzjtTZpvE83Oy+Ug/KC18JnFNlHPFWd9CG76YLCwhffehMf5WPz+fiv C8dllJh0EObFvLYbNXlqiBrL8XKg8OMlaPJxbYBD4z4FnwynvJ2xpHHO0icIv99c7Qm7 SHgr6uymlb4Wg9MKgdK2VZtoO2D7eFnqQ9vKrTqIYesae0UrErd+KlsrKZpl7TYt7SI7 2Xbg== X-Gm-Message-State: AO0yUKXOBQqUSfk8w0nbF3ia8r7T5h/nim4JzfY+yTNEDSiIDf9+ZPfu DjlwGib4tBLEZiOi/2rMjV0Ndw== X-Google-Smtp-Source: AK7set91H8uZuErh0/LuDjee/awTv2DYiHBWv4zGyaBtHG2f3acLLiXNiqNVrCCS2NWZB/r598n+iw== X-Received: by 2002:aa7:9ed2:0:b0:5e3:2f9b:b5e8 with SMTP id r18-20020aa79ed2000000b005e32f9bb5e8mr3480281pfq.4.1677282968888; Fri, 24 Feb 2023 15:56:08 -0800 (PST) Received: from google.com (77.62.105.34.bc.googleusercontent.com. [34.105.62.77]) by smtp.gmail.com with ESMTPSA id s25-20020a63af59000000b004f1cb6ffe81sm45530pgo.64.2023.02.24.15.56.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 15:56:08 -0800 (PST) Date: Fri, 24 Feb 2023 23:56:04 +0000 From: Mingwei Zhang To: "Chang S. Bae" Cc: Thomas Gleixner , Sean Christopherson , Paolo Bonzini , "H. Peter Anvin" , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, Jim Mattson , Venkatesh Srinivas , Aaron Lewis , Chao Gao Subject: Re: [PATCH v3 01/13] x86/fpu/xstate: Avoid getting xstate address of init_fpstate if fpstate contains the component Message-ID: References: <20230221163655.920289-1-mizhang@google.com> <20230221163655.920289-2-mizhang@google.com> <87ilfum6xh.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 22, 2023, Chang S. Bae wrote: > On 2/22/2023 10:40 AM, Mingwei Zhang wrote: > > > > We have this [1]: > > > > > > > > if (fpu_state_size_dynamic()) > > > > mask &= (header.xfeatures | xinit->header.xcomp_bv); > > > > > > > > If header.xfeatures[18] = 0 then mask[18] = 0 because > > > > xinit->header.xcomp_bv[18] = 0. Then, it won't hit that code. So, I'm > > > > confused about the problem that you described here. > > > > > > Read the suggested changelog I wrote in my reply to Mingwei. > > > > > > TLDR: > > > > > > xsave.header.xfeatures[18] = 1 > > > xinit.header.xfeatures[18] = 0 > > > -> mask[18] = 1 > > > -> __raw_xsave_addr(xsave, 18) <- Success > > > -> __raw_xsave_addr(xinit, 18) <- WARN > > Oh, sigh.. This should be caught last time. > > Hmm, then since we store init state for legacy ones [1], unless it is too > aggressive, perhaps the loop can be simplified like this: > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index 714166cc25f2..2dac6f5f3ade 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -1118,21 +1118,13 @@ void __copy_xstate_to_uabi_buf(struct membuf to, > struct fpstate *fpstate, > zerofrom = offsetof(struct xregs_state, extended_state_area); > > /* > - * The ptrace buffer is in non-compacted XSAVE format. In > - * non-compacted format disabled features still occupy state space, > - * but there is no state to copy from in the compacted > - * init_fpstate. The gap tracking will zero these states. > + * Indicate which states to copy from fpstate. When not present in > + * fpstate, those extended states are either initialized or > + * disabled. They are also known to have an all zeros init state. > + * Thus, remove them from 'mask' to zero those features in the user > + * buffer instead of retrieving them from init_fpstate. > */ > - mask = fpstate->user_xfeatures; Do we need to change this line and the comments? I don't see any of these was relevant to this issue. The original code semantic is to traverse all user_xfeatures, if it is available in fpstate, copy it from there; otherwise, copy it from init_fpstate. We do not assume the component in init_fpstate (but not in fpstate) are all zeros, do we? If it is safe to assume that, then it might be ok. But at least in this patch, I want to keep the original semantics as is without the assumption. > - > - /* > - * Dynamic features are not present in init_fpstate. When they are > - * in an all zeros init state, remove those from 'mask' to zero > - * those features in the user buffer instead of retrieving them > - * from init_fpstate. > - */ > - if (fpu_state_size_dynamic()) > - mask &= (header.xfeatures | xinit->header.xcomp_bv); > + mask = header.xfeatures; Same here. Let's not adding this optimization in this patch. > > for_each_extended_xfeature(i, mask) { > /* > @@ -1151,9 +1143,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, > struct fpstate *fpstate, > pkru.pkru = pkru_val; > membuf_write(&to, &pkru, sizeof(pkru)); > } else { > - copy_feature(header.xfeatures & BIT_ULL(i), &to, > + membuf_write(&to, > __raw_xsave_addr(xsave, i), > - __raw_xsave_addr(xinit, i), > xstate_sizes[i]); > } > /* > > > Chang: to reproduce this issue, you can simply run the amx_test in the > > kvm selftest directory. > > Yeah, I was able to reproduce it with this ptrace test: > > diff --git a/tools/testing/selftests/x86/amx.c > b/tools/testing/selftests/x86/amx.c > index 625e42901237..ae02bc81846d 100644 > --- a/tools/testing/selftests/x86/amx.c > +++ b/tools/testing/selftests/x86/amx.c > @@ -14,8 +14,10 @@ > #include > #include > #include > +#include > #include > #include > +#include > > #include "../kselftest.h" /* For __cpuid_count() */ > > @@ -826,6 +828,76 @@ static void test_context_switch(void) > free(finfo); > } > > +/* Ptrace test */ > + > +static bool inject_tiledata(pid_t target) > +{ > + struct xsave_buffer *xbuf; > + struct iovec iov; > + > + xbuf = alloc_xbuf(); > + if (!xbuf) > + fatal_error("unable to allocate XSAVE buffer"); > + > + load_rand_tiledata(xbuf); > + > + memcpy(&stashed_xsave->bytes[xtiledata.xbuf_offset], > + &xbuf->bytes[xtiledata.xbuf_offset], > + xtiledata.size); > + > + iov.iov_base = xbuf; > + iov.iov_len = xbuf_size; > + > + if (ptrace(PTRACE_SETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov)) > + fatal_error("PTRACE_SETREGSET"); > + > + if (ptrace(PTRACE_GETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov)) > + err(1, "PTRACE_GETREGSET"); > + > + if (!memcmp(&stashed_xsave->bytes[xtiledata.xbuf_offset], > + &xbuf->bytes[xtiledata.xbuf_offset], > + xtiledata.size)) > + return true; > + else > + return false; > +} > + > +static void test_ptrace(void) > +{ > + pid_t child; > + int status; > + > + child = fork(); > + if (child < 0) { > + err(1, "fork"); > + } else if (!child) { > + if (ptrace(PTRACE_TRACEME, 0, NULL, NULL)) > + err(1, "PTRACE_TRACEME"); > + > + /* Use the state to expand the kernel buffer */ > + load_rand_tiledata(stashed_xsave); > + > + raise(SIGTRAP); > + _exit(0); > + } > + > + do { > + wait(&status); > + } while (WSTOPSIG(status) != SIGTRAP); > + > + printf("\tInject tile data via ptrace()\n"); > + > + if (inject_tiledata(child)) > + printf("[OK]\tTile data was written on ptracee.\n"); > + else > + printf("[FAIL]\tTile data was not written on ptracee.\n"); > + > + ptrace(PTRACE_DETACH, child, NULL, NULL); > + wait(&status); > + if (!WIFEXITED(status) || WEXITSTATUS(status)) > + err(1, "ptrace test"); > +} > + > int main(void) > { > /* Check hardware availability at first */ > @@ -846,6 +918,8 @@ int main(void) > ctxtswtest_config.num_threads = 5; > test_context_switch(); > > + test_ptrace(); > + > clearhandler(SIGILL); > free_stashed_xsave(); > > Thanks, > Chang > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n386 > Nice one. Yeah both ptrace and KVM are calling this function so the above code would also be enough to trigger the bug. Thanks. -Mingwei