Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750807AbdFBHCU (ORCPT ); Fri, 2 Jun 2017 03:02:20 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:42025 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751126AbdFBHCS (ORCPT ); Fri, 2 Jun 2017 03:02:18 -0400 Date: Fri, 2 Jun 2017 09:02:10 +0200 From: Heiko Carstens To: Martin Schwidefsky Cc: David Hildenbrand , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Huth , Christian Borntraeger Subject: Re: [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste References: <20170529163202.13077-1-david@redhat.com> <20170601124651.3e7969ab@mschwideX1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170601124651.3e7969ab@mschwideX1> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17060207-0020-0000-0000-0000037DDE64 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17060207-0021-0000-0000-000041F56A40 Message-Id: <20170602070210.GA4221@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-02_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706020132 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1674 Lines: 33 On Thu, Jun 01, 2017 at 12:46:51PM +0200, Martin Schwidefsky wrote: > > Unfortunately, converting all page tables to 4k pgste page tables is > > not possible without provoking various race conditions. > > That is one approach we tried and was found to be buggy. The point is that > you are not allowed to reallocate a page table while a VMA exists that is > in the address range of that page table. > > Another approach we tried is to use an ELF flag on the qemu executable. > That does not work either because fs/exec.c allocates and populates the > new mm struct for the argument pages before fs/binfmt_elf.c comes into > play. How about if you would fail the system call within arch_check_elf() if you detect that the binary requires pgstes (as indicated by elf flags) and then restart the system call? That is: arch_check_elf() e.g. would set a thread flag that future mm's should be allocated with pgstes. Then do_execve() would cleanup everything and return to entry.S. Upon return to userspace we detect this condition and simply restart the system call, similar to signals vs -ERESTARTSYS. That would make do_execve() cleanup everything and upon reentering it would allocate an mm with the pgste flag set. Maybe this is a bit over-simplified, but might work. At least I also don't like the next "hack", that is specifically designed to only work with how QEMU is currently implemented. It might break with future QEMU changes or the next user space implementation that drives the kvm interface, but is doing everything differently. Let's look for a "clean" solution that will always work. We had too many hacks for this problem and *all* of them were broken.