Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp489204pxk; Wed, 9 Sep 2020 10:27:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4Y7Zpt6fnE3xF++KaNC94apt+rZGa/sOwzc157mo3VYKEbWpfvMmc82telXYEOUQG8rbR X-Received: by 2002:a17:906:819:: with SMTP id e25mr4889250ejd.211.1599672478569; Wed, 09 Sep 2020 10:27:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599672478; cv=none; d=google.com; s=arc-20160816; b=CIOQwe6jHy+8mwZxkxuR9+l07qwfKOlr3gNbyXUcCpWaDZ7ls4oK9pc3Az69xote0q pPRH1qjym35ZbREF7fLP30DXkzByR12cUTX329grpJt4Tc803cEtTnPv3m0ZtIBEgmQE 8mob8vOQkTn4GPqamTE/aIJndJkwn3u0cjjy+vRM19DTAG36eIPB6iZmwcYzAz3Up6Op Jfipp7wNqZtmtqvvJUEETDNHxDErEzZxZkjmh19YkCtnvJud/RUHzn84tZbmJSlBWKEy 0VfTR5ZdOCSBFcIEbitxwywW7SvZ6X55eReHJ5jLx/rOVGZVWHzGi0kkyfkb7uldRGRM rvQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=cjurvjugr7NYNyJv2e5m0mRxD9kWPQ1Co8p7cVwZgRc=; b=u9RVBFCX6SC1KCMDdoRPIhOIPfeA2CQQPMbJNon+B4aJ3Tw0PAGtY2m/ShXODN4Wb6 v6XTErsUNPJuR0RpIed0yWLRzk6UxYbdX9cnshePzkX+MbYA3v2ZSSryQIUs7NFLFpvm ovbjSdtIGBmr2F+sPIzXn4VaP/vMLZk7pe5mdlYsGoAyOTOSTqnjMM2lSHvZOdoLbf8A 5Cu3O5EuU0H6ueHiUrwU9nR5safIL97HUv2BXVohHJwav6bRUXVwY45tPlWU+hJAiIks WufpJRoXOsGOrcG/53tYRV6zNnS6Jf7pYftWuM16Z4D6s1bHt9NF2BOn+yMwOEyqjkOm p74Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=a0Fn+WGs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y25si1982504edv.59.2020.09.09.10.27.33; Wed, 09 Sep 2020 10:27:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=a0Fn+WGs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729781AbgIIR0i (ORCPT + 99 others); Wed, 9 Sep 2020 13:26:38 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60044 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729005AbgIIR0g (ORCPT ); Wed, 9 Sep 2020 13:26:36 -0400 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 089HJ2gK051405; Wed, 9 Sep 2020 13:25:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=cjurvjugr7NYNyJv2e5m0mRxD9kWPQ1Co8p7cVwZgRc=; b=a0Fn+WGsOgKb9v9OSCIZ4vTawA6Q4ZWyUk5Zt6IyTNtTXdZySGDE4pfNW4f9sOhvDRO5 pyaNWhLyJ2qz3kHm1udz4qSWUWaxsbyYzZLNWKFPeYNcxy2qdx0OXkhkTDPLQoBCBK1t 7YomGSfviJhw2ax52kN9UbRlZeV16c8bRZP4tJOx9BLgNI8eZteBXenPoAxximHCBe3r oc3mh8uaZpxgVJjOxB3Kwhe+XjRIXqlbY7UiHRXs8QbI/LmnYJW28BEv9GJIlDCWO9Nn BcMJVH24ws0Fur+qARRdqFfpfnTFoGkjKZORJo+fOQcFQTXSPiLNreCZu/z5cCKZzuRh 6Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 33f3b88711-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Sep 2020 13:25:42 -0400 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 089HKUCM063750; Wed, 9 Sep 2020 13:25:42 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 33f3b886y8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Sep 2020 13:25:42 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 089HLmU7024299; Wed, 9 Sep 2020 17:25:39 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 33c2a84w57-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Sep 2020 17:25:39 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 089HPaUJ57147902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 9 Sep 2020 17:25:36 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A04C5AE058; Wed, 9 Sep 2020 17:25:36 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 75DA7AE057; Wed, 9 Sep 2020 17:25:35 +0000 (GMT) Received: from thinkpad (unknown [9.171.79.102]) by d06av26.portsmouth.uk.ibm.com (Postfix) with SMTP; Wed, 9 Sep 2020 17:25:35 +0000 (GMT) Date: Wed, 9 Sep 2020 19:25:34 +0200 From: Gerald Schaefer To: Dave Hansen Cc: Jason Gunthorpe , John Hubbard , LKML , linux-mm , linux-arch , Andrew Morton , Linus Torvalds , Russell King , Mike Rapoport , Catalin Marinas , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Jeff Dike , Richard Weinberger , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Andrey Ryabinin , linux-x86 , linux-arm , linux-power , linux-sparc , linux-um , linux-s390 , Alexander Gordeev , Vasily Gorbik , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda Subject: Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200909192534.442f8984@thinkpad> In-Reply-To: References: <20200907180058.64880-1-gerald.schaefer@linux.ibm.com> <20200907180058.64880-2-gerald.schaefer@linux.ibm.com> <0dbc6ec8-45ea-0853-4856-2bc1e661a5a5@intel.com> <20200909142904.00b72921@thinkpad> X-Mailer: Claws Mail 3.17.6 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-09_12:2020-09-09,2020-09-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxscore=0 lowpriorityscore=0 clxscore=1015 mlxlogscore=999 spamscore=0 priorityscore=1501 phishscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009090147 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 9 Sep 2020 09:18:46 -0700 Dave Hansen wrote: > On 9/9/20 5:29 AM, Gerald Schaefer wrote: > > This only works well as long there are real pagetable pointers involved, > > that can also be used for iteration. For gup_fast, or any other future > > pagetable walkers using the READ_ONCE logic w/o lock, that is not true. > > There are pointers involved to local pXd values on the stack, because of > > the READ_ONCE logic, and our middle-level iteration will suddenly iterate > > over such stack pointers instead of pagetable pointers. > > By "There are pointers involved to local pXd values on the stack", did > you mean "locate" instead of "local"? That sentence confused me. > > Which code is it, exactly that allocates these troublesome on-stack pXd > values, btw? It is the gup_pXd_range() call sequence in mm/gup.c. It starts in gup_pgd_range() with "pgdp = pgd_offset(current->mm, addr)" and then the "pgd_t pgd = READ_ONCE(*pgdp)" which creates the first local stack variable "pgd". The next-level call to gup_p4d_range() gets this "pgd" value as input, but not the original pgdp pointer where it was read from. This is already the essential difference to other pagetable walkers like e.g. walk_pXd_range() in mm/pagewalk.c, where the original pointer is passed through. With READ_ONCE, that pointer must not be further de-referenced, so instead the value is passed over. In gup_p4d_range() we then have "p4dp = p4d_offset(&pgd, addr)", with &pgd being a pointer to the passed over pgd value, so that's the first pXd pointer that does not point directly to the pXd in the page table, but a local stack variable. With folded p4d, p4d_offset(&pgd, addr) will simply return the passed-in &pgd pointer, so we now also have p4dp point to that. That continues with "p4d_t p4d = READ_ONCE(*p4dp)", and that second stack variable passed to gup_huge_pud() and so on. Due to inlining, all those variables will not really be passed anywhere, but simply sit on the stack. So far, IIUC, that would also happen on x86 (or everywhere else actually) for folded levels, i.e. some pXd_offset() calls would simply return the passed in (stack) value pointer. This works as designed, and it will not lead to the "iteration over stack pointer" for anybody but s390, because the pXd_addr_end() boundaries usually take care that you always return to pgd level for iteration, and that is the only level with a real pagetable pointer. For s390, we stay at the first non-folded level and do the iteration there, which is fine for other pagetable walkers using the original pointers, but not for the READ_ONCE-style gup_fast. I actually had to draw myself a picture to get some hold of this, or rather a walk-through with a certain pud-crossing range in a folded 3-level scenario. Not sure if I would have understood my explanation above w/o that, but I hope you can make some sense out of it. Or draw yourself a picture :-)