Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3019312ybi; Sun, 26 May 2019 12:38:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqyfo2dI9YOUNT19XaTau2Jzhnd2q2gQjSjSz/xeK6qWqOthpzRAa6R1EBoALkj3iyvh8Yp7 X-Received: by 2002:a17:902:f082:: with SMTP id go2mr113667303plb.279.1558899504409; Sun, 26 May 2019 12:38:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558899504; cv=none; d=google.com; s=arc-20160816; b=IY2xz8yuTILuPHi8j00T1Nt7xUHixwU6W55ZBR4ttMF/4hiNbFFaAv57x131GntJpX xlCXBq/DZ2kcai9VTWb/qoN5NGy7oD05wXtl9XgrMBfm2sgSRF2EmlvhFgL5ZFmfUJrP Wn+at+jCH+1Wf0sIWfhy2rp7Sox5eYi7kEWFjrNBwntH9fDDhg9iRJY4IBwuyFmGVD8I LeM/def2dG8ZnZt5cA1Nph6uPOptEnNXSJAwEUWjbpP+3fhKkGFNsZ90ErgkxY/JDAAh yRr0+MviRLCptRJSUnZA2pCgS2rUdM+m8MUs23N4Wj0qR1fErywwOHJaTtBb59CRyQCd MTXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=4bAi0oeaArOPLn6scQaVFxy1qRC8dBasz9hYyGZ31EY=; b=oIbuwOufH/cu7UPVK8rGOeHFsYB6ocpH3sRI5wfExoi0SDS+WqsACSJd7whblGWr3n sVv08Hx2sDrXiB27OvekCzKemdn7H7IwbcKYlWEwJRVXM9IfGNd+DJVqGKCnWwueTS1i tRCId94DfWAWAdDuy5qN23ALP0mFKJQjNQjYq58w+qAAv7RPW394o31lzRSlSk5n/ZXz dQonDTVeHG1CFmC45QYDWmxkk8XQciK/eXkkR2zaWWuNXazFuBg9eWQ/JhY8zenxuj+g ByrDNpi40P05WAhH//jCySQUqNB78fWFHsOX50y2fZVnKZ8nS/xv+cAWolxSj1T2gVHZ esfQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d65si14351028pgc.330.2019.05.26.12.38.08; Sun, 26 May 2019 12:38:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727184AbfEZThA convert rfc822-to-8bit (ORCPT + 99 others); Sun, 26 May 2019 15:37:00 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:46462 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726994AbfEZThA (ORCPT ); Sun, 26 May 2019 15:37:00 -0400 Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1hUywt-00040n-BE; Sun, 26 May 2019 21:36:51 +0200 Date: Sun, 26 May 2019 21:36:51 +0200 From: Sebastian Andrzej Siewior To: Hugh Dickins Cc: Andrew Morton , Mike Rapoport , Andrea Arcangeli , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Borislav Petkov , Pavel Machek , Dave Hansen Subject: Re: [PATCH] mm/gup: continue VM_FAULT_RETRY processing event for pre-faults Message-ID: <20190526193651.spvm2vtrwxlhsjrv@linutronix.de> References: <1557844195-18882-1-git-send-email-rppt@linux.ibm.com> <20190522122113.a2edc8aba32f0fad189bae21@linux-foundation.org> <20190522194322.5k52docwgp5zkdcj@linutronix.de> <20190525084546.fap2wkefepeia22f@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-05-25 11:09:15 [-0700], Hugh Dickins wrote: > On Sat, 25 May 2019, Sebastian Andrzej Siewior wrote: > > On 2019-05-24 15:22:51 [-0700], Hugh Dickins wrote: > > > I've now run a couple of hours of load successfully with Mike's patch > > > to GUP, no problem; but whatever the merits of that patch in general, > > > I agree with Andrew that fault_in_pages_writeable() seems altogether > > > more appropriate for copy_fpstate_to_sigframe(), and have now run a > > > couple of hours of load successfully with this instead (rewrite to taste): > > > > so this patch instead of Mike's GUP patch fixes the issue you observed? > > Yes. > > > Is this just a taste question or limitation of the function in general? > > I'd say it's just a taste question. Though the the fact that your > usage showed up a bug in the get_user_pages_unlocked() implementation, > demanding a fix, does indicate that it's a more fragile and complex > route, better avoided if there's a good simple alternative. If it were > not already on your slowpath, I'd also argue fault_in_pages_writeable() > is a more efficient way to do it. Okay. The GUP functions are not properly documented for my taste. There is no indication whether or not the mm_sem has to be acquired prior invoking it. Following the call chain of get_user_pages() I ended up in __get_user_pages_locked() `locked = NULL' indicated that mm_sem is no acquired and then I saw this: | if (!locked) | /* VM_FAULT_RETRY couldn't trigger, bypass */ | return ret; kind of suggesting that it is okay to invoke it without holding the mm_sem prefault. It passed a few tests and then https://lkml.kernel.org/r/1556657902.6132.13.camel@lca.pw happened. After that, I switched to the locked variant and the problem disappeared (also I noticed that MPX code is invoked within ->mmap()). > > I'm asking because it has been suggested and is used in MPX code (in the > > signal path but .mmap) and I'm not aware of any limitation. But as I > > wrote earlier to akpm, if the MM folks suggest to use this instead I am > > happy to switch. > > I know nothing of MPX, beyond that Dave Hansen has posted patches to > remove that support entirely, so I'm surprised arch/x86/mm/mpx.c is > still in the tree. I need to poke at that. I has been removed but then KVM folks complained that they kind of depend on that if it has been exposed to the guest. We need to fade it out slowly… > But peering at it now, it looks as if it's using > get_user_pages() while holding mmap_sem, whereas you (sensibly enough) > used get_user_pages_unlocked() to handle the mmap_sem for you - > the trouble with that is that since it knows it's in control of > mmap_sem, it feels free to drop it internally, and that takes it > down the path of the premature return when pages NULL that Mike is > fixing. MPX's get_user_pages() is not free to go that way. oki. > > > --- 5.2-rc1/arch/x86/kernel/fpu/signal.c > > > +++ linux/arch/x86/kernel/fpu/signal.c > > > @@ -3,6 +3,7 @@ > > > * FPU signal frame handling routines. > > > */ > > > > > > +#include > > > #include > > > #include > > > > > > @@ -189,15 +190,7 @@ retry: > > > fpregs_unlock(); > > > > > > if (ret) { > > > - int aligned_size; > > > - int nr_pages; > > > - > > > - aligned_size = offset_in_page(buf_fx) + fpu_user_xstate_size; > > > - nr_pages = DIV_ROUND_UP(aligned_size, PAGE_SIZE); > > > - > > > - ret = get_user_pages_unlocked((unsigned long)buf_fx, nr_pages, > > > - NULL, FOLL_WRITE); > > > - if (ret == nr_pages) > > > + if (!fault_in_pages_writeable(buf_fx, fpu_user_xstate_size)) > > > goto retry; > > > return -EFAULT; > > > } > > > > > > (I did wonder whether there needs to be an access_ok() check on buf_fx; > > > but if so, then I think it would already have been needed before the > > > earlier copy_fpregs_to_sigframe(); but I didn't get deep enough into > > > that to be sure, nor into whether access_ok() check on buf covers buf_fx.) > > > > There is an access_ok() at the begin of copy_fpregs_to_sigframe(). The > > memory is allocated from user's stack and there is (later) an > > access_ok() for the whole region (which can be more than the memory used > > by the FPU code). > > Yes, but remember I know nothing of this FPU signal code, so I cannot > tell whether an access_ok(buf, size) is good enough to cover the range > of an access_ok(buf_fx, fpu_user_xstate_size). yes, because size >= fpu_user_xstate_size > Your "(later)" worries me a little - I hope you're not writing first > and checking the limits later; but what you're doing may be perfectly > correct, I'm just too far from understanding the details to say; but > raised the matter because (I think) get_user_pages_unlocked() would > entail an access_ok() check where fault_in_pages_writable() would not. no, we first check the range and then write. It is later checked again after the size has been extended. > Hugh Sebastian