Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4036425ybg; Fri, 25 Oct 2019 12:24:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqyr9re1vm9Mv5rO+1B9Z8j6JDWrrMX2iPKq2GfjJmV9uK/VzqK/JwxIZ+tjTll+Pjp2/R+3 X-Received: by 2002:a50:b5e3:: with SMTP id a90mr5607602ede.201.1572031448094; Fri, 25 Oct 2019 12:24:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572031448; cv=none; d=google.com; s=arc-20160816; b=Sg9N9BEpdVxSOicaffgEtOKDOI83cL/ol/509AYgL/sIAmEcEBdWrImAxjej+/pgGq bQV+L8gRmix/NRey+DBTxz1Sf3e9zI6GnWMdab8eBmOKNcS9H2uVlZH1EUFya+CekxDU 6aq3Hsc4PN23ozNLihPgkzfvmUfKilmF4pPAkk1+tSZjKzNYC4zQNlNBVJREr1yOpsxs SC3esVktqn676EFvUSLnWSA4RrK7/drgnsjmglqCv95f++bEWnrCIMzehu8pcICKufOc RBtcr4AcmAHV0s5G+hihger0iUqXQDoUQ127XELs9itlpVIiu7OknP2jB8ci+FBcd0BO TBgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=pcBZ89IuM1/59N4EK0452yE/YqtYsgZrhnmOdEp0p2w=; b=FKFZIo1q5Ny7KDKmtprtpQY3eKF7Prs99wUylHalgafiuCTI2C7AO2yj5xsZ+X47CC 7ojWPKsBwf3Aynf6GqMFFrETjUqxCUJAk0vBuKxWwwPVPDK3InuAP+GuSKAgWnwql6pO nBtXKvTZHhs4FZi7gBCtNCymQZG3gy02DMgeZixomWjerI7W7e9OY3IGi9SsiPvzEqsf SRkQy9eW6u+TvTCNs/ahuUULB9pw8Oqt/uw1a7uE76nlwUvZPoMf1wN9kwakoUeGCJ+V 6s6HZpQ9YWhyhDSuRrnmUGM/ZFfDa0wXWTkZOOUxC5ELiHOqRtJEnKwIamFJD8ae7jzm 55Mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si1857593ejc.218.2019.10.25.12.23.44; Fri, 25 Oct 2019 12:24:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394239AbfJYGJF (ORCPT + 99 others); Fri, 25 Oct 2019 02:09:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:54592 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2394092AbfJYGJF (ORCPT ); Fri, 25 Oct 2019 02:09:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B5C05B544; Fri, 25 Oct 2019 06:09:02 +0000 (UTC) Subject: Re: [PATCH] x86/stackframe/32: repair 32-bit Xen PV To: Jan Beulich , Andy Lutomirski Cc: X86 ML , Peter Zijlstra , lkml , "xen-devel@lists.xenproject.org" References: <4c4b0cdf-55e5-7be5-bf49-08fe8fd18dca@suse.com> From: =?UTF-8?B?SsO8cmdlbiBHcm/Dnw==?= Message-ID: <76fd94f0-cfc6-5ad5-673b-725f4d72a69b@suse.com> Date: Fri, 25 Oct 2019 08:09:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <4c4b0cdf-55e5-7be5-bf49-08fe8fd18dca@suse.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25.10.19 08:06, Jan Beulich wrote: > On 24.10.2019 18:11, Andy Lutomirski wrote: >> On Mon, Oct 7, 2019 at 3:41 AM Jan Beulich wrote: >>> >>> Once again RPL checks have been introduced which don't account for a >>> 32-bit kernel living in ring 1 when running in a PV Xen domain. The >>> case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3 >>> as well just in case. >> >> I'm okay with the generated code, but IMO the macro is too indirect >> for something that's trivial. >> >>> >>> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") >>> Signed-off-by: Jan Beulich >>> >>> --- a/arch/x86/entry/entry_32.S >>> +++ b/arch/x86/entry/entry_32.S >>> @@ -48,6 +48,17 @@ >>> >>> #include "calling.h" >>> >>> +#ifndef CONFIG_XEN_PV >>> +# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK >>> +#else >>> +/* >>> + * When running paravirtualized on Xen the kernel runs in ring 1, and hence >>> + * simple mask based tests (i.e. ones not comparing against USER_RPL) have to >>> + * ignore bit 0. See also the C-level get_kernel_rpl(). >>> + */ >> >> How about: >> >> /* >> * When running on Xen PV, the actual %cs register in the kernel is 1, not 0. >> * If we need to distinguish between a %cs from kernel mode and a %cs from >> * user mode, we can do test $2 instead of test $3. >> */ >> #define USER_SEGMENT_RPL_MASK 2 > > I.e. you're fine using just the single bit in all configurations? > >>> +# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1) >>> +#endif >>> + >>> .section .entry.text, "ax" >>> >>> /* >>> @@ -172,7 +183,7 @@ >>> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >>> .if \no_user_check == 0 >>> /* coming from usermode? */ >>> - testl $SEGMENT_RPL_MASK, PT_CS(%esp) >>> + testl $USER_SEGMENT_RPL_MASK, PT_CS(%esp) >> >> Shouldn't PT_CS(%esp) be 0 if we came from the kernel? I'm guessing >> the actual bug is in whatever code put 1 in here in the first place. >> >> In other words, I'm having trouble understanding why there is any >> context in which some value would be 3 for user mode and 1 for kernel >> mode. Obviously if we're manually IRETing to kernel mode, we need to >> set CS to 1, but if we're filling in our own PT_CS, we should just >> write 0. >> >> The supposedly offending commit (""x86/stackframe/32: Provide >> consistent pt_regs") looks correct to me, so I suspect that the >> problem is elsewhere. Or is it intentional that Xen PV's asm >> (arch/x86/xen/whatever) sticks 1 into the CS field on the stack? > > Manually created / updated frames _could_ in principle modify the > RPL, but ones coming from hardware (old 32-bit hypervisors) or Xen > (64-bit hypervisors) will have an RPL of 1, as already said by > Andrew. We could in principle also add a VM assist for the > hypervisor to store an RPL of 0, but I'd expect this to require > further kernel changes, and together with the old behavior still > being required to support I'm unconvinced this would be worth it. > >> Also, why are we supporting 32-bit Linux PV guests at all? Can we >> just delete this code instead? > > This was already suggested by Jürgen (now also CC-ed), but in reply > it was pointed out that the process would be to first deprecate the > code, and remove it only a couple of releases later if no-one comes > up with a reason to retain it. Thanks for the reminder. I'll send a patch with the deprecation warning for 32-bit PV. Juergen