Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp4114367pxk; Tue, 8 Sep 2020 11:03:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaQs4eww3eqtLpnBVUlOFNdcpBmax6j8kRg1kq8841dXtTSd8XTq9dlbZ/bQpMr9ycHqcV X-Received: by 2002:a17:906:150b:: with SMTP id b11mr26966728ejd.234.1599588233550; Tue, 08 Sep 2020 11:03:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599588233; cv=none; d=google.com; s=arc-20160816; b=ou8TyaFkK1YG86VtO3BZeVIy651KRpdY44hyB6Tzh7v3OddZ1wKbfADeNNbw0rbe/p qbfsoDPoOw4b2WMLFk5BkY/xgFBgY4YUyxgPjm/zY+p+htAaH+12UysjZwZu3pOhn3Rp v3N4+iYxVw40S+mKMeBgGZdUvxxHCHdhsaYqae35RX7+PRWvBwnxb3xAW9gxrbLuclUs zED24ydyEgIZdJuEiDqZZiq74Lo3XEy7CK2c2zGOuXEJenur2gSx8VVO0dChc73wdPaJ CbWth4uo0f56HwJzzzIF5cx7gBRWIgbBCDxVv37R5gVZLOFh+vJE7k1Pj1VeBdAtM2W4 lyMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=9vA8/AJsGG0zJ8yywQHukKRgSAWG6O3CqIKPsfzZe8o=; b=wQnU3MIy/SjE68cbjQHZA86mBCUay7w2Hjnx5Y+LVDys7fbqSQLRm2LNGjv8lii2RG xV2hGQFgUvuu4wp990csEFW9JLP17P0Bknc41Yjd4Dj2XokoM7dcGA2XXlcEwJdYK+Yb ey61bVqcAra2CDuCX8Bcc8K2j3M0B0iQIOkGBRe9pOUaJ51w75bVwDLpmK6dfuIH+MCd hbz8dPbPlehI9tcwkLOG3h+bsXC5PMz5WDugRv7jjn5W4fTSO9hD4Bki9fG9pfF+bqHV mU5AnnfRrApKYP1H2EIA09+6rGcKIJteqhXnMYvmSNhuns3nQLDgwgKnoXDgr7UMLTOa Pz/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=BjLNfTKY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o9si13153016ejr.108.2020.09.08.11.03.30; Tue, 08 Sep 2020 11:03:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=BjLNfTKY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732133AbgIHSCg (ORCPT + 99 others); Tue, 8 Sep 2020 14:02:36 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:2902 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731574AbgIHSAb (ORCPT ); Tue, 8 Sep 2020 14:00:31 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 088HaqH0157903; Tue, 8 Sep 2020 13:59:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=9vA8/AJsGG0zJ8yywQHukKRgSAWG6O3CqIKPsfzZe8o=; b=BjLNfTKYlNEEc2uStg8DiatoJq2c8F1l7LCfltGYE12PF61BhosKLmVMhzQWKFic7DjB ODOQ4Ab7V0OMAwEaC30qGbwq1qp1T1sQo51pCTWKzreipRU6hiEWQxJmQe5KclxEt0E5 EBuUZCeG2aSeNd5qvxhpE8RDWX35jy6XCZslgOkkR8DkDFWEAobN/09andvUhXdwhHlx wsQizFZosh34pdmvFLO4JDQmopAvtfmKsdHT4+nBXtmkS4cUtLhcGG8Z3chJIJWEkkxr AQqz2pmQMasv2p8pBrjFZXu8KdUi6V75qyQ/ZhRV9ub4Y3a9IRQg1Y8uS9bNduFJGmeX OQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 33edub1q6f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Sep 2020 13:59:53 -0400 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 088Hc2Al163051; Tue, 8 Sep 2020 13:59:52 -0400 Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 33edub1q5e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Sep 2020 13:59:52 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 088HwJR5021149; Tue, 8 Sep 2020 17:59:49 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma02fra.de.ibm.com with ESMTP id 33c2a828hk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Sep 2020 17:59:49 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 088HxkQm32244068 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 8 Sep 2020 17:59:46 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9951211C050; Tue, 8 Sep 2020 17:59:46 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 55EC911C04A; Tue, 8 Sep 2020 17:59:45 +0000 (GMT) Received: from thinkpad (unknown [9.171.25.197]) by d06av25.portsmouth.uk.ibm.com (Postfix) with SMTP; Tue, 8 Sep 2020 17:59:45 +0000 (GMT) Date: Tue, 8 Sep 2020 19:59:44 +0200 From: Gerald Schaefer To: Dave Hansen Cc: Jason Gunthorpe , John Hubbard , LKML , linux-mm , linux-arch , Andrew Morton , Linus Torvalds , Russell King , Mike Rapoport , Catalin Marinas , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Jeff Dike , Richard Weinberger , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Andrey Ryabinin , linux-x86 , linux-arm , linux-power , linux-sparc , linux-um , linux-s390 , Alexander Gordeev , Vasily Gorbik , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda Subject: Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200908195944.1a25d1bb@thinkpad> In-Reply-To: <0dbc6ec8-45ea-0853-4856-2bc1e661a5a5@intel.com> References: <20200907180058.64880-1-gerald.schaefer@linux.ibm.com> <20200907180058.64880-2-gerald.schaefer@linux.ibm.com> <0dbc6ec8-45ea-0853-4856-2bc1e661a5a5@intel.com> X-Mailer: Claws Mail 3.17.6 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-08_09:2020-09-08,2020-09-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 suspectscore=0 spamscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 phishscore=0 adultscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009080165 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 8 Sep 2020 07:30:50 -0700 Dave Hansen wrote: > On 9/7/20 11:00 AM, Gerald Schaefer wrote: > > Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast > > code") introduced a subtle but severe bug on s390 with gup_fast, due to > > dynamic page table folding. > > Would it be fair to say that the "fake" page table entries s390 > allocates on the stack are what's causing the trouble here? That might > be a nice thing to open up with here. "Dynamic page table folding" > really means nothing to me. We do not really allocate anything on the stack, it is the generic logic from gup_fast that passes over pXd values (read once before), and using pointers to such (stack) variables instead of real pXd pointers. That, combined with the fact that we just return the passed in pointer in pXd_offset() for folded levels. That works similar on x86 IIUC, but with static folding, and thus also proper pXd_addr_end() results because of statically (and correspondingly) defined Pxd_INDEX/SHIFT. We always have static 5-level PxD_INDEX/SHIFT, and that cannot really be made dynamic, so we just make pXd_addr_end() dynamic instead, and that requires the pXd value to determine the correct pagetable level. Still makes my head spin when trying to explain, sorry. It is a very special s390 oddity, or lets call it "feature", because I don't think any other architecture has "dynamic pagetable folding" capability, depending on process requirements, for whatever it is worth... > > > @@ -2521,7 +2521,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, > > do { > > pmd_t pmd = READ_ONCE(*pmdp); > > > > - next = pmd_addr_end(addr, end); > > + next = pmd_addr_end_folded(pmd, addr, end); > > if (!pmd_present(pmd)) > > return 0; > > It looks like you fix this up later, but this would be a problem if left > this way. There's no documentation for whether I use > pmd_addr_end_folded() or pmd_addr_end() when writing a page table walker. Yes, that is very unfortunate. We did have some lengthy comment in include/linux/pgtable.h where the pXd_addr_end(_folded) were defined. But that was moved to arch/s390/include/asm/pgtable.h in this version, probably because we already had the generalization in mind, where we would not need such explanation in common header any more. So, it might help better understand the issue that we have with dynamic page table folding and READ_ONCE-style pagetable walkers when looking at that comment. Thanks for pointing out, that comment should definitely go into include/linux/pgtable.h again. At least if we would still go for that "s390 fix first, generalization second" approach, but it seems we have other / better options now.