Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp756003ybj; Tue, 5 May 2020 07:07:49 -0700 (PDT) X-Google-Smtp-Source: APiQypKMj//CTrQ3XqLVaBB5WQEuDlOOmzpWVecp1sM5pPy1x1OKFAUxCfFzpfcksxt9xiOxDpJz X-Received: by 2002:a50:d0d5:: with SMTP id g21mr2914431edf.92.1588687669277; Tue, 05 May 2020 07:07:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588687669; cv=none; d=google.com; s=arc-20160816; b=HryUR2/W++IZ3U9n8MW3dBsDo+co+h0rwzCmLxaT9wWRJbHYnHS5KdUd90KP3/L1BY q9zhpj/7tgcWqllDWOi/Q+5P81n1UR/nd3pevMg+OgtLGBxxo03VnK2YZS8GDqN7OcVH BqYiAe9vwQlcHcgo+mITQMwJgelyP6Xr79dCdPze7tjoiIXa5AHhfjAZhOyMwF8LNX42 IzobEFz43vjROi/a302ePY7W0LkXRGTJhxCxaZeHW+3RREQFDlSm8O1g6dqLBM8loYw6 DG35EY9zV+66mzWEvcfLRRLI54h0A607FEa67BAv6iVIgKzSTDcSdeVS26PV1DJo0ePY PH7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:references:cc:to:from:subject; bh=Oo1Q1Zs+hr2Ku3atI2VVROpCn0BYPwVSDHvRVqXhLOk=; b=SiV2OPtWWdAUIrZDwaQlUhgRHBgo8mZjB9DMY+J/U+gHUGc7DkZubXId3GoTrhvpTc ug3AMCy7w/XfNLFRKffgVDK3X76XDjuhfKgMC2JQSQl4xevEURWad8KjxXlUyBlFEQ1z PqjlUgga+AIqBxszuLwhW5D2leswd3tGjenKISL2yBD3lOirr3PaExIqn1MR/4fvNfri MvoOEpnf/Gd50HVa9CtlhG5jP9h68wjGGgToXGKLwCO1zDP8JDXqBSAITkrrXP1asQiu Gb0sixP0+r7f9LPllzkXzlMJPs03dw2mOxue1YP9cthRmpmIifwIASTLUK3iT/oNqbRH 1ezw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r3si1316119edc.405.2020.05.05.07.07.20; Tue, 05 May 2020 07:07:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729479AbgEEOEL (ORCPT + 99 others); Tue, 5 May 2020 10:04:11 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60850 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728932AbgEEOEJ (ORCPT ); Tue, 5 May 2020 10:04:09 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 045E3dOg071480; Tue, 5 May 2020 10:03:57 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 30u8sgsyya-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:03:57 -0400 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 045E3uaf073628; Tue, 5 May 2020 10:03:57 -0400 Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 30u8sgsybh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:03:56 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 045E0B3j017794; Tue, 5 May 2020 14:03:06 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma02fra.de.ibm.com with ESMTP id 30s0g5aw53-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 14:03:06 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 045E31PZ57868530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 May 2020 14:03:01 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B427011C05C; Tue, 5 May 2020 14:03:01 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A9A7311C050; Tue, 5 May 2020 14:03:00 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.145.49.139]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 5 May 2020 14:03:00 +0000 (GMT) Subject: Re: [PATCH v2 1/1] fs/splice: add missing callback for inaccessible pages From: Christian Borntraeger To: Ulrich Weigand , Dave Hansen Cc: Claudio Imbrenda , viro@zeniv.linux.org.uk, david@redhat.com, akpm@linux-foundation.org, aarcange@redhat.com, linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, jack@suse.cz, kirill@shutemov.name, peterz@infradead.org, sean.j.christopherson@intel.com, Ulrich.Weigand@de.ibm.com References: <20200430143825.3534128-1-imbrenda@linux.ibm.com> <1a3f5107-9847-73d4-5059-c6ef9d293551@de.ibm.com> <3d379d9e-241c-ef3b-dcef-20fdd3b8740d@de.ibm.com> <20200504134154.GA21001@oc3748833570.ibm.com> <231da2f1-a6ef-0cf9-7f57-95e8b925997b@intel.com> <20200505135556.GA9920@oc3748833570.ibm.com> Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= xsFNBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABzUNDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKDJuZCBJQk0gYWRkcmVzcykgPGJvcm50cmFlZ2VyQGxpbnV4LmlibS5j b20+wsF5BBMBAgAjBQJdP/hMAhsDBwsJCAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQEXu8 gLWmHHy/pA/+JHjpEnd01A0CCyfVnb5fmcOlQ0LdmoKWLWPvU840q65HycCBFTt6V62cDljB kXFFxMNA4y/2wqU0H5/CiL963y3gWIiJsZa4ent+KrHl5GK1nIgbbesfJyA7JqlB0w/E/SuY NRQwIWOo/uEvOgXnk/7+rtvBzNaPGoGiiV1LZzeaxBVWrqLtmdi1iulW/0X/AlQPuF9dD1Px hx+0mPjZ8ClLpdSp5d0yfpwgHtM1B7KMuQPQZGFKMXXTUd3ceBUGGczsgIMipZWJukqMJiJj QIMH0IN7XYErEnhf0GCxJ3xAn/J7iFpPFv8sFZTvukntJXSUssONnwiKuld6ttUaFhSuSoQg OFYR5v7pOfinM0FcScPKTkrRsB5iUvpdthLq5qgwdQjmyINt3cb+5aSvBX2nNN135oGOtlb5 tf4dh00kUR8XFHRrFxXx4Dbaw4PKgV3QLIHKEENlqnthH5t0tahDygQPnSucuXbVQEcDZaL9 WgJqlRAAj0pG8M6JNU5+2ftTFXoTcoIUbb0KTOibaO9zHVeGegwAvPLLNlKHiHXcgLX1tkjC DrvE2Z0e2/4q7wgZgn1kbvz7ZHQZB76OM2mjkFu7QNHlRJ2VXJA8tMXyTgBX6kq1cYMmd/Hl OhFrAU3QO1SjCsXA2CDk9MM1471mYB3CTXQuKzXckJnxHkHOwU0ETpw8+AEQAJjyNXvMQdJN t07BIPDtbAQk15FfB0hKuyZVs+0lsjPKBZCamAAexNRk11eVGXK/YrqwjChkk60rt3q5i42u PpNMO9aS8cLPOfVft89Y654Qd3Rs1WRFIQq9xLjdLfHh0i0jMq5Ty+aiddSXpZ7oU6E+ud+X Czs3k5RAnOdW6eV3+v10sUjEGiFNZwzN9Udd6PfKET0J70qjnpY3NuWn5Sp1ZEn6lkq2Zm+G 9G3FlBRVClT30OWeiRHCYB6e6j1x1u/rSU4JiNYjPwSJA8EPKnt1s/Eeq37qXXvk+9DYiHdT PcOa3aNCSbIygD3jyjkg6EV9ZLHibE2R/PMMid9FrqhKh/cwcYn9FrT0FE48/2IBW5mfDpAd YvpawQlRz3XJr2rYZJwMUm1y+49+1ZmDclaF3s9dcz2JvuywNq78z/VsUfGz4Sbxy4ShpNpG REojRcz/xOK+FqNuBk+HoWKw6OxgRzfNleDvScVmbY6cQQZfGx/T7xlgZjl5Mu/2z+ofeoxb vWWM1YCJAT91GFvj29Wvm8OAPN/+SJj8LQazd9uGzVMTz6lFjVtH7YkeW/NZrP6znAwv5P1a DdQfiB5F63AX++NlTiyA+GD/ggfRl68LheSskOcxDwgI5TqmaKtX1/8RkrLpnzO3evzkfJb1 D5qh3wM1t7PZ+JWTluSX8W25ABEBAAHCwV8EGAECAAkFAk6cPPgCGwwACgkQEXu8gLWmHHz8 2w//VjRlX+tKF3szc0lQi4X0t+pf88uIsvR/a1GRZpppQbn1jgE44hgF559K6/yYemcvTR7r 6Xt7cjWGS4wfaR0+pkWV+2dbw8Xi4DI07/fN00NoVEpYUUnOnupBgychtVpxkGqsplJZQpng v6fauZtyEcUK3dLJH3TdVQDLbUcL4qZpzHbsuUnTWsmNmG4Vi0NsEt1xyd/Wuw+0kM/oFEH1 4BN6X9xZcG8GYUbVUd8+bmio8ao8m0tzo4pseDZFo4ncDmlFWU6hHnAVfkAs4tqA6/fl7RLN JuWBiOL/mP5B6HDQT9JsnaRdzqF73FnU2+WrZPjinHPLeE74istVgjbowvsgUqtzjPIG5pOj cAsKoR0M1womzJVRfYauWhYiW/KeECklci4TPBDNx7YhahSUlexfoftltJA8swRshNA/M90/ i9zDo9ySSZHwsGxG06ZOH5/MzG6HpLja7g8NTgA0TD5YaFm/oOnsQVsf2DeAGPS2xNirmknD jaqYefx7yQ7FJXXETd2uVURiDeNEFhVZWb5CiBJM5c6qQMhmkS4VyT7/+raaEGgkEKEgHOWf ZDP8BHfXtszHqI3Fo1F4IKFo/AP8GOFFxMRgbvlAs8z/+rEEaQYjxYJqj08raw6P4LFBqozr nS4h0HDFPrrp1C2EMVYIQrMokWvlFZbCpsdYbBI= Message-ID: Date: Tue, 5 May 2020 16:03:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-05-05_08:2020-05-04,2020-05-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 priorityscore=1501 adultscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0 impostorscore=0 phishscore=0 suspectscore=0 mlxlogscore=927 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050114 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05.05.20 16:01, Christian Borntraeger wrote: > > > On 05.05.20 15:55, Ulrich Weigand wrote: >> On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote: >>> On 5/4/20 6:41 AM, Ulrich Weigand wrote: >>>> You're right that there is no mechanism to prevent new references, >>>> but that's really never been the goal either. We're simply trying >>>> to ensure that no I/O is ever done on a page that is in the "secure" >>>> (or inaccessible) state. To do so, we rely on the assumption that >>>> all code that starts I/O on a page cache page will *first*: >>>> - mark the page as pending I/O by either taking an extra page >>>> count, or by setting the Writeback flag; then: >>>> - call arch_make_page_accessible(); then: >>>> - start I/O; and only after I/O has finished: >>>> - remove the "pending I/O" marker (Writeback and/or extra ref) >>> >>> Let's ignore writeback for a moment because get_page() is the more >>> general case. The locking sequence is: >>> >>> 1. get_page() (or equivalent) "locks out" a page from converting to >>> inaccessbile, >>> 2. followed by a make_page_accessible() guarantees that the page >>> *stays* accessible until >>> 3. I/O is safe in this region >>> 4. put_page(), removes the "lock out", I/O now unsafe >> >> Yes, exactly. >> >>> They key is, though, the get_page() must happen before >>> make_page_accessible() and *every* place that acquires a new reference >>> needs a make_page_accessible(). >> >> Well, sort of: every place that acquires a new reference *and then >> performs I/O* needs a make_page_accessible(). There seem to be a >> lot of plain get_page() calls that aren't related to I/O. >> >>> try_get_page() is obviously one of those "new reference sites" and it >>> only has one call site outisde of the gup code: generic_pipe_buf_get(), >>> which is effectively patched by the patch that started this thread. The >>> fact that this one oddball site _and_ gup are patched now is a good sign. >>> >>> *But*, I still don't know how that could work nicely: >>> >>>> static inline __must_check bool try_get_page(struct page *page) >>>> { >>>> page = compound_head(page); >>>> if (WARN_ON_ONCE(page_ref_count(page) <= 0)) >>>> return false; >>>> page_ref_inc(page); >>>> return true; >>>> } >>> >>> If try_get_page() collides with a freeze_page_refs(), it'll hit the >>> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure >>> that warning is _actually_ valid since freeze_page_refs() isn't truly a >>> 0 refcount. But, the fact that this hasn't been encountered means that >>> the testing here is potentially lacking. >> >> This is indeed interesting. In particular if you compare try_get_page >> with try_get_compound_head in gup.c, which does instead: >> >> if (WARN_ON_ONCE(page_ref_count(head) < 0)) >> return NULL; >> >> which seems more reasonable to me, given the presence of the >> page_ref_freeze method. So I'm not sure why try_get_page has <= 0. > > > Just looked at > commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function > > which says: > Also like 'get_page()', you can't use this function unless you already > had a reference to the page. The intent is that you can use this > exactly like get_page(), but in situations where you want to limit the > maximum reference count. > > The code currently does an unconditional WARN_ON_ONCE() if we ever hit > the reference count issues (either zero or negative), as a notification > that the conditional non-increment actually happened. > > If try_get_page must not be called with an existing reference, that means s/not// > that when we call it the page reference is already higher and our freeze > will never succeed. That would imply that we cannot trigger this. No? >