Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3171310pxk; Tue, 15 Sep 2020 11:55:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxB/kEJJ51tTIeXqom+dw81KHhZFhw13lQf6WM3rDzFx9e5bo4wbdBotEjwzhm4auusjoGE X-Received: by 2002:a17:906:c7d9:: with SMTP id dc25mr21207122ejb.452.1600196114038; Tue, 15 Sep 2020 11:55:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600196114; cv=none; d=google.com; s=arc-20160816; b=ABtz1wjXPJ6s2w0pmjfy4hm//SfmmQpwXeEGSBpwb8Z+DugZymPII9X8cHQyu4IPNQ bedanYoyLHRVX8qr9LxX2DhSAsasiUxvWeZKUsE6XjBceXaQIVHHZlMYMlHE/H22lznG h2FLdS0bZV7RLLdFtne3ML+RNesYlRmncK6aw34jKSm+ZcP9gKTK1CJVyWrXFkG6aT6z LiCsHpEwWBEsZEqz1V7i9cbhaMX0ZuLvugtVrAROz1NeYB7ekrIsH1RT/PdARLD+s3mh Nj/FztteqzlXmsK+IHxAtE7yhoVdRPI6FM8Qdq+sLaXY6d81G75KZ79Qr5qNllfGmir4 lENg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=zu264ZWKL3xSWGx/0U692POMmKaWLapYhgO85UxwWQSIimVd4jL+laOCmD1aBbSuPU aEtyL9VcnBHhj/C1+IErs/tHe6VzhKIyTnzGFNMNHl3HMSfLnei9p34tfjfAGsn+359o FB9pXvcU6nnxe5NNh6Y+K3182ylOkF3RLr+WD8kWoqPJpE0oqrA1FnqeNSLeok35ZKI1 DQ2G1qBQFcTiXrs+slHeWZcJHo5FhninWgwqX8jzkMjB+E4xMBWkpKiJ2yXcz2VkLtKa GNw4i0X1POJdCI8IhuM0obTGWbszx/8UpOjyNmMv6wS0bC0KUti3pcQl8Jb0T2qd9hU5 bckg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=gdIp7xtB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h26si9901180eji.739.2020.09.15.11.54.50; Tue, 15 Sep 2020 11:55:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=gdIp7xtB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727797AbgIOSyT (ORCPT + 99 others); Tue, 15 Sep 2020 14:54:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727771AbgIORXk (ORCPT ); Tue, 15 Sep 2020 13:23:40 -0400 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42BE8C061355 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) Received: by mail-io1-xd44.google.com with SMTP id r25so4994074ioj.0 for ; Tue, 15 Sep 2020 10:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=gdIp7xtBO/MSpEb9o8jN2IyQr26hmZe4Y1Inos4WGDR9JaBIQIi1PU3JIQChI0g3d7 LNCuIXI9NjrLEa4Z7VzoHoYibkJTvYYNe4XMxYvI8Key8Lz+Dn5TOJoG3OirB4lNf/0v CZwnoRL2aTjSfXTMe2vCnRhhGcF5/px2guqwjtyoY+pY4/WqMrRsa3daDkNGpvajy3Os F5adkpgafrtJqtA1mdt1vlIxNc/cvIEHgPvBIGAHbOJRmRqdTTGQSyxSLDdT0bWH+lVs aoX0hyxrh/WWLqV3qXKOZbrIk4oQpUdoGJ5imOxIC9ur6GCE2cmivVAVgws5orQ0743v GgWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jT9s1JAVx2W5cNkDMrd7bWYf1jbIzkRTHh6c6VRDMuY=; b=hoz4ueGfdS9vwf9Um1PzSBvDXNmMMvHDyixCCsjudIsQMwYKC7NZavViIaIh2Jm1VG KwO5XImn2yb2kBpiBbwwlieqniemgBS5OUH/zsKOD9ROsxA9JzedUuCgXZwR/5S0xahz rQlt/QW4Y1lO6rCZnBqSwDweIjEPUNPQ/4fta8Hdx4WFJkeGMkKFvq8DqZN1mAacahrp 7odd1LPN8L9uOjTRMJacT3EPAT9eTHRdCxs4HcEDw8076cjgxM06zGxkMNwdR0gsBoBh mgz+q9wyvVjc1gI6MW5x3eojt1Jrn1V9vQnG9dFsPaP67cyJoapQgjgZpv9lHt0rMGUp MOxw== X-Gm-Message-State: AOAM533rXIsXeMmnJU+lPkcMtr2BaS4mdXNXwkifNvkEPBRc7lKfmte3 FL5xf3V7tS3UahKeZd/Xh/ER1w== X-Received: by 2002:a5e:9916:: with SMTP id t22mr16004622ioj.163.1600190061826; Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z2sm4640548ilz.37.2020.09.15.10.14.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Sep 2020 10:14:21 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kIEX6-006Vzt-2h; Tue, 15 Sep 2020 14:14:20 -0300 Date: Tue, 15 Sep 2020 14:14:20 -0300 From: Jason Gunthorpe To: Vasily Gorbik Cc: John Hubbard , Linus Torvalds , Gerald Schaefer , Alexander Gordeev , Peter Zijlstra , Dave Hansen , LKML , linux-mm , linux-arch , Andrew Morton , Russell King , Mike Rapoport , Catalin Marinas , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Jeff Dike , Richard Weinberger , Dave Hansen , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Andrey Ryabinin , linux-x86 , linux-arm , linux-power , linux-sparc , linux-um , linux-s390 , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda Subject: Re: [PATCH v2] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200915171420.GK1221970@ziepe.ca> References: <20200911200511.GC1221970@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 11, 2020 at 10:36:43PM +0200, Vasily Gorbik wrote: > Currently to make sure that every page table entry is read just once > gup_fast walks perform READ_ONCE and pass pXd value down to the next > gup_pXd_range function by value e.g.: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > ... > pudp = pud_offset(&p4d, addr); > > This function passes a reference on that local value copy to pXd_offset, > and might get the very same pointer in return. This happens when the > level is folded (on most arches), and that pointer should not be iterated. > > On s390 due to the fact that each task might have different 5,4 or > 3-level address translation and hence different levels folded the logic > is more complex and non-iteratable pointer to a local copy leads to > severe problems. > > Here is an example of what happens with gup_fast on s390, for a task > with 3-levels paging, crossing a 2 GB pud boundary: > > // addr = 0x1007ffff000, end = 0x10080001000 > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > unsigned int flags, struct page **pages, int *nr) > { > unsigned long next; > pud_t *pudp; > > // pud_offset returns &p4d itself (a pointer to a value on stack) > pudp = pud_offset(&p4d, addr); > do { > // on second iteratation reading "random" stack value > pud_t pud = READ_ONCE(*pudp); > > // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 > next = pud_addr_end(addr, end); > ... > } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack > > return 1; > } > > This happens since s390 moved to common gup code with > commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") > and commit 1a42010cdc26 ("s390/mm: convert to the generic > get_user_pages_fast code"). s390 tried to mimic static level folding by > changing pXd_offset primitives to always calculate top level page table > offset in pgd_offset and just return the value passed when pXd_offset > has to act as folded. > > What is crucial for gup_fast and what has been overlooked is > that PxD_SIZE/MASK and thus pXd_addr_end should also change > correspondingly. And the latter is not possible with dynamic folding. > > To fix the issue in addition to pXd values pass original > pXdp pointers down to gup_pXd_range functions. And introduce > pXd_offset_lockless helpers, which take an additional pXd > entry value parameter. This has already been discussed in > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > > Cc: # 5.2+ > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") > Reviewed-by: Gerald Schaefer > Reviewed-by: Alexander Gordeev > Signed-off-by: Vasily Gorbik > --- > v2: added brackets &pgd -> &(pgd) Reviewed-by: Jason Gunthorpe Regards, Jason