Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp754996ybj; Tue, 5 May 2020 07:06:57 -0700 (PDT) X-Google-Smtp-Source: APiQypLDDZ9Q+JRuPYLq8GQRM4Raakeub0lLLEVELb0Me8Pk+rbyL2l8+PIS87Wk352PuaBA8w7t X-Received: by 2002:a50:950a:: with SMTP id u10mr2866511eda.45.1588687616890; Tue, 05 May 2020 07:06:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588687616; cv=none; d=google.com; s=arc-20160816; b=s1rT8VdvHcRNbC/BTmYdtmbHvQXn3EeXPkxMbGT8eVSboGgxVG7QQQ+WLTMHUlztw7 MDfLFULhfec4rE88nQV5wuV0E7ef4SZH0loWpXiuXtDatYi2Me/boifSvSEGtdmd7/yj 66TQNwNQyeqaxbc7Nw4s/yKAcfp9aWTtz68veT2zizHbCFHpA5xwH8LPe/XPEOZ1s5Kt gbvXufC5u/r8rwM/0nOYxO9CH7zdgqWNTkm8X3h975xOBZw6c/PLdKLghNXTTTQgHcy0 hR+2h2Un3NN/RYNIrr0OmeoSyDvZT9m2JWt3xagJvQy/BK0T3g2MBbnPgvghUJphE5jo K59A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:from:references:cc:to:subject; bh=4C1s2jdU+QYqFnoHD5iEK0bop62PnAyfq1lnV2vs288=; b=y50PaTE2fRXM+/cDO4i2EyOv3/FGGfTq7qxASh0nc3W5TRu+7MHtLe/1ubzVKWkB7L /Tim0+O0Fx542DPgsHGW9kycxOaKAvWCK2grm3QLvwLR86OkQM4iWNPJDvF7cLZKEKt6 lasK+779XrpI67xRNR6sos8evS/IfmZCq3riHEDAqtdVROIXE9a1dK/jAmkQawHWcFcj z76ZJFgs7+tczvyxbirM2sXQzbP43NoUyll+BjOhYJp4ZOf+fgPFlCPv364PlJV/GdXn Bc3O4kKcLNpargKLGtSlQfAIBfFDyoLFuVYjkgizl3yduQU4PaTrONeP0UHgSR3IJy7k Fi4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs1si1290179edb.200.2020.05.05.07.06.26; Tue, 05 May 2020 07:06:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729257AbgEEOBp (ORCPT + 99 others); Tue, 5 May 2020 10:01:45 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48100 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728076AbgEEOBo (ORCPT ); Tue, 5 May 2020 10:01:44 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 045DXIsX076454; Tue, 5 May 2020 10:01:29 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 30s2g2xvqu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:01:28 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 045DXVXO077770; Tue, 5 May 2020 10:01:27 -0400 Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 30s2g2xvb9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:01:27 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 045E0NKp021577; Tue, 5 May 2020 14:01:14 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma01fra.de.ibm.com with ESMTP id 30s0g5aw3t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 14:01:14 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 045E19rj31326312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 May 2020 14:01:09 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0100311C054; Tue, 5 May 2020 14:01:09 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 12C4611C052; Tue, 5 May 2020 14:01:08 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.145.49.139]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 5 May 2020 14:01:07 +0000 (GMT) Subject: Re: [PATCH v2 1/1] fs/splice: add missing callback for inaccessible pages To: Ulrich Weigand , Dave Hansen Cc: Claudio Imbrenda , viro@zeniv.linux.org.uk, david@redhat.com, akpm@linux-foundation.org, aarcange@redhat.com, linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, jack@suse.cz, kirill@shutemov.name, peterz@infradead.org, sean.j.christopherson@intel.com, Ulrich.Weigand@de.ibm.com References: <20200430143825.3534128-1-imbrenda@linux.ibm.com> <1a3f5107-9847-73d4-5059-c6ef9d293551@de.ibm.com> <3d379d9e-241c-ef3b-dcef-20fdd3b8740d@de.ibm.com> <20200504134154.GA21001@oc3748833570.ibm.com> <231da2f1-a6ef-0cf9-7f57-95e8b925997b@intel.com> <20200505135556.GA9920@oc3748833570.ibm.com> From: Christian Borntraeger Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= xsFNBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABzUNDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKDJuZCBJQk0gYWRkcmVzcykgPGJvcm50cmFlZ2VyQGxpbnV4LmlibS5j b20+wsF5BBMBAgAjBQJdP/hMAhsDBwsJCAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQEXu8 gLWmHHy/pA/+JHjpEnd01A0CCyfVnb5fmcOlQ0LdmoKWLWPvU840q65HycCBFTt6V62cDljB kXFFxMNA4y/2wqU0H5/CiL963y3gWIiJsZa4ent+KrHl5GK1nIgbbesfJyA7JqlB0w/E/SuY NRQwIWOo/uEvOgXnk/7+rtvBzNaPGoGiiV1LZzeaxBVWrqLtmdi1iulW/0X/AlQPuF9dD1Px hx+0mPjZ8ClLpdSp5d0yfpwgHtM1B7KMuQPQZGFKMXXTUd3ceBUGGczsgIMipZWJukqMJiJj QIMH0IN7XYErEnhf0GCxJ3xAn/J7iFpPFv8sFZTvukntJXSUssONnwiKuld6ttUaFhSuSoQg OFYR5v7pOfinM0FcScPKTkrRsB5iUvpdthLq5qgwdQjmyINt3cb+5aSvBX2nNN135oGOtlb5 tf4dh00kUR8XFHRrFxXx4Dbaw4PKgV3QLIHKEENlqnthH5t0tahDygQPnSucuXbVQEcDZaL9 WgJqlRAAj0pG8M6JNU5+2ftTFXoTcoIUbb0KTOibaO9zHVeGegwAvPLLNlKHiHXcgLX1tkjC DrvE2Z0e2/4q7wgZgn1kbvz7ZHQZB76OM2mjkFu7QNHlRJ2VXJA8tMXyTgBX6kq1cYMmd/Hl OhFrAU3QO1SjCsXA2CDk9MM1471mYB3CTXQuKzXckJnxHkHOwU0ETpw8+AEQAJjyNXvMQdJN t07BIPDtbAQk15FfB0hKuyZVs+0lsjPKBZCamAAexNRk11eVGXK/YrqwjChkk60rt3q5i42u PpNMO9aS8cLPOfVft89Y654Qd3Rs1WRFIQq9xLjdLfHh0i0jMq5Ty+aiddSXpZ7oU6E+ud+X Czs3k5RAnOdW6eV3+v10sUjEGiFNZwzN9Udd6PfKET0J70qjnpY3NuWn5Sp1ZEn6lkq2Zm+G 9G3FlBRVClT30OWeiRHCYB6e6j1x1u/rSU4JiNYjPwSJA8EPKnt1s/Eeq37qXXvk+9DYiHdT PcOa3aNCSbIygD3jyjkg6EV9ZLHibE2R/PMMid9FrqhKh/cwcYn9FrT0FE48/2IBW5mfDpAd YvpawQlRz3XJr2rYZJwMUm1y+49+1ZmDclaF3s9dcz2JvuywNq78z/VsUfGz4Sbxy4ShpNpG REojRcz/xOK+FqNuBk+HoWKw6OxgRzfNleDvScVmbY6cQQZfGx/T7xlgZjl5Mu/2z+ofeoxb vWWM1YCJAT91GFvj29Wvm8OAPN/+SJj8LQazd9uGzVMTz6lFjVtH7YkeW/NZrP6znAwv5P1a DdQfiB5F63AX++NlTiyA+GD/ggfRl68LheSskOcxDwgI5TqmaKtX1/8RkrLpnzO3evzkfJb1 D5qh3wM1t7PZ+JWTluSX8W25ABEBAAHCwV8EGAECAAkFAk6cPPgCGwwACgkQEXu8gLWmHHz8 2w//VjRlX+tKF3szc0lQi4X0t+pf88uIsvR/a1GRZpppQbn1jgE44hgF559K6/yYemcvTR7r 6Xt7cjWGS4wfaR0+pkWV+2dbw8Xi4DI07/fN00NoVEpYUUnOnupBgychtVpxkGqsplJZQpng v6fauZtyEcUK3dLJH3TdVQDLbUcL4qZpzHbsuUnTWsmNmG4Vi0NsEt1xyd/Wuw+0kM/oFEH1 4BN6X9xZcG8GYUbVUd8+bmio8ao8m0tzo4pseDZFo4ncDmlFWU6hHnAVfkAs4tqA6/fl7RLN JuWBiOL/mP5B6HDQT9JsnaRdzqF73FnU2+WrZPjinHPLeE74istVgjbowvsgUqtzjPIG5pOj cAsKoR0M1womzJVRfYauWhYiW/KeECklci4TPBDNx7YhahSUlexfoftltJA8swRshNA/M90/ i9zDo9ySSZHwsGxG06ZOH5/MzG6HpLja7g8NTgA0TD5YaFm/oOnsQVsf2DeAGPS2xNirmknD jaqYefx7yQ7FJXXETd2uVURiDeNEFhVZWb5CiBJM5c6qQMhmkS4VyT7/+raaEGgkEKEgHOWf ZDP8BHfXtszHqI3Fo1F4IKFo/AP8GOFFxMRgbvlAs8z/+rEEaQYjxYJqj08raw6P4LFBqozr nS4h0HDFPrrp1C2EMVYIQrMokWvlFZbCpsdYbBI= Message-ID: Date: Tue, 5 May 2020 16:01:07 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200505135556.GA9920@oc3748833570.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-05-05_08:2020-05-04,2020-05-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 suspectscore=0 bulkscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 adultscore=0 phishscore=0 mlxlogscore=959 clxscore=1015 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050105 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05.05.20 15:55, Ulrich Weigand wrote: > On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote: >> On 5/4/20 6:41 AM, Ulrich Weigand wrote: >>> You're right that there is no mechanism to prevent new references, >>> but that's really never been the goal either. We're simply trying >>> to ensure that no I/O is ever done on a page that is in the "secure" >>> (or inaccessible) state. To do so, we rely on the assumption that >>> all code that starts I/O on a page cache page will *first*: >>> - mark the page as pending I/O by either taking an extra page >>> count, or by setting the Writeback flag; then: >>> - call arch_make_page_accessible(); then: >>> - start I/O; and only after I/O has finished: >>> - remove the "pending I/O" marker (Writeback and/or extra ref) >> >> Let's ignore writeback for a moment because get_page() is the more >> general case. The locking sequence is: >> >> 1. get_page() (or equivalent) "locks out" a page from converting to >> inaccessbile, >> 2. followed by a make_page_accessible() guarantees that the page >> *stays* accessible until >> 3. I/O is safe in this region >> 4. put_page(), removes the "lock out", I/O now unsafe > > Yes, exactly. > >> They key is, though, the get_page() must happen before >> make_page_accessible() and *every* place that acquires a new reference >> needs a make_page_accessible(). > > Well, sort of: every place that acquires a new reference *and then > performs I/O* needs a make_page_accessible(). There seem to be a > lot of plain get_page() calls that aren't related to I/O. > >> try_get_page() is obviously one of those "new reference sites" and it >> only has one call site outisde of the gup code: generic_pipe_buf_get(), >> which is effectively patched by the patch that started this thread. The >> fact that this one oddball site _and_ gup are patched now is a good sign. >> >> *But*, I still don't know how that could work nicely: >> >>> static inline __must_check bool try_get_page(struct page *page) >>> { >>> page = compound_head(page); >>> if (WARN_ON_ONCE(page_ref_count(page) <= 0)) >>> return false; >>> page_ref_inc(page); >>> return true; >>> } >> >> If try_get_page() collides with a freeze_page_refs(), it'll hit the >> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure >> that warning is _actually_ valid since freeze_page_refs() isn't truly a >> 0 refcount. But, the fact that this hasn't been encountered means that >> the testing here is potentially lacking. > > This is indeed interesting. In particular if you compare try_get_page > with try_get_compound_head in gup.c, which does instead: > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > return NULL; > > which seems more reasonable to me, given the presence of the > page_ref_freeze method. So I'm not sure why try_get_page has <= 0. Just looked at commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function which says: Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. If try_get_page must not be called with an existing reference, that means that when we call it the page reference is already higher and our freeze will never succeed. That would imply that we cannot trigger this. No?