Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp631777pxb; Wed, 13 Jan 2021 11:58:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJx22EKXmHIgW5Ga9WZaFAKOLjDnp1pVbITddVAuGQHec2aElvdkK7t83y+2kaA57ZNWAYd7 X-Received: by 2002:a17:906:3c11:: with SMTP id h17mr2735551ejg.20.1610567933435; Wed, 13 Jan 2021 11:58:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610567933; cv=none; d=google.com; s=arc-20160816; b=wFQcQ3/gOWPOSEo9r/oEG6F7ACa7rFDb79u+L5IjYXweE/iCNAiLd6B92JhLaT51bx SC0AYuhs+kvJFX9AgxdyVzDHN3y4dSuXlql1asD8QWW54zzbKK1MX+ikDwKR/toJC/PR sltqQRG+xUz6M31pnJDH/q9V6E8ProFmQ0wD+700xPs0MXD+SC5oPly8XLkYt+fD+6H7 BoDQ7AHAwxNK54nCZD175KXy0fANn0JYyA9A1uWqhxhCteuFSmWZryH/uz2X83QgIsy8 cwBfG2+eNEqBFjqxn5xqc5bamvju1amNxCTSOIXBm92UXMDNI3cB4+4U28YuRiJybCXX jrpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=j2KHy6ytFaCxtFCpnxnAAp4By3drKdosRae2NFEDzUM=; b=ecaZGVPOMZhw69wlwPgqsS5MFvmfjl2tZF6pfLHHDDqlm01YxSmyAXp7Y6XbnN7ynF TyPjc0ug1HKDAtdDscjev5zTuoA/FE5GnZmDbKp5UZOnt89RWySGB+OTOG7GygWhkJw4 1JimF6SDqj+AxZjRHH7bNmPMHWzXiY9N/WVnnR7V3y9Yj4YuABDBce0+620gneEMNdJq HSsObJoTerrx9a84aRR1cO76jjM3zUwAFaG+P9O0IFpBXENRogWYcbn78piUTqTNiVq8 FTkaKBcaCoN8ajZaVmKAKdwTUrswP6n3Xi4wlNuz/tDrEcEQXFng+wB1up6JMmF8DmGF ssXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=IRMzoOo7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c18si1515975ede.504.2021.01.13.11.58.28; Wed, 13 Jan 2021 11:58:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=IRMzoOo7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728851AbhAMT4M (ORCPT + 99 others); Wed, 13 Jan 2021 14:56:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728734AbhAMT4L (ORCPT ); Wed, 13 Jan 2021 14:56:11 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E2BEC0617A2 for ; Wed, 13 Jan 2021 11:55:31 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id c14so2014557qtn.0 for ; Wed, 13 Jan 2021 11:55:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j2KHy6ytFaCxtFCpnxnAAp4By3drKdosRae2NFEDzUM=; b=IRMzoOo7LuVrwmBlGKEz71GRBRg7cY+S5B3mH3i0TSq/0N9aJCbG0tIGyFPQpMb4ya M+spbtAg+5dXCn6zcd/Jrl060Cr7TMU76ssCVbKUfhbUL7Uyp9RnmbLNUNF1wZJglmWr tmnaq5qtA4/PR1Tmp0nFTwXL5ztM45VMTnftRKHl3kzUzCi1IrHn5T+m+a2RNzBXBBsw D/5mgCiv8McU+V8oEABeIz2Z9s70ro7kOe7JduuNX4D2YU0WJdzD8qTu8uOijnbQd2d0 hjyofIHk6YiUWrlvW5subuz/fltvSS4PijEe/pw8s2Xw/pDs2bMHD5jbMJrRrhg7zHph 15xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j2KHy6ytFaCxtFCpnxnAAp4By3drKdosRae2NFEDzUM=; b=Rrigtx598Et1vC2PZgg0mtcqfEcv9Hjaad+vFq1g608W8Rqf+yoR6dQm4UGFB74fD5 V2PrYYaidJvNW9RnEETg4y9Ps7xVFNYlxx45wS5cVSj5FLI1T9q5BYVwFPnF6o9vu58t 0WG75HdMBtg4/riOqtDZgugy4Hkzdyy09l+GYzztPRrmmd8bKobsKoAxX6w0oAAOAU1i r//2tBKbXukbagQNj7QVucECJN0TfmdqGkHGhZWWI7wxWn0xXMkHkBoliC5CyaQP+tLd TXRBbq9THbRkPVlsveDPkZ5A7sHLs2BvrENURzo7jU0YrvLcdF2qBAQ3eF1OFOKTDJqU GKGQ== X-Gm-Message-State: AOAM530qhShN99WOlk0Y/QcX9j8Hn72EarFrCtrVMK5KEEQSe62N71KS Ji1XJLCHfMl9tcV2RB2nLuCaGw== X-Received: by 2002:aed:39a6:: with SMTP id m35mr3951749qte.29.1610567730489; Wed, 13 Jan 2021 11:55:30 -0800 (PST) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id q6sm1716839qkd.41.2021.01.13.11.55.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Jan 2021 11:55:29 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kzmEq-000f2u-QV; Wed, 13 Jan 2021 15:55:28 -0400 Date: Wed, 13 Jan 2021 15:55:28 -0400 From: Jason Gunthorpe To: Pavel Tatashin Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures Message-ID: <20210113195528.GD4605@ziepe.ca> References: <20201217185243.3288048-1-pasha.tatashin@soleen.com> <20201217185243.3288048-9-pasha.tatashin@soleen.com> <20201217205048.GL5487@ziepe.ca> <20201218141927.GM5487@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 13, 2021 at 02:43:50PM -0500, Pavel Tatashin wrote: > On Fri, Dec 18, 2020 at 9:19 AM Jason Gunthorpe wrote: > > > > On Thu, Dec 17, 2020 at 05:02:03PM -0500, Pavel Tatashin wrote: > > > Hi Jason, > > > > > > Thank you for your comments. My replies below. > > > > > > On Thu, Dec 17, 2020 at 3:50 PM Jason Gunthorpe wrote: > > > > > > > > On Thu, Dec 17, 2020 at 01:52:41PM -0500, Pavel Tatashin wrote: > > > > > +/* > > > > > + * Verify that there are no unpinnable (movable) pages, if so return true. > > > > > + * Otherwise an unpinnable pages is found return false, and unpin all pages. > > > > > + */ > > > > > +static bool check_and_unpin_pages(unsigned long nr_pages, struct page **pages, > > > > > + unsigned int gup_flags) > > > > > +{ > > > > > + unsigned long i, step; > > > > > + > > > > > + for (i = 0; i < nr_pages; i += step) { > > > > > + struct page *head = compound_head(pages[i]); > > > > > + > > > > > + step = compound_nr(head) - (pages[i] - head); > > > > > > > > You can't assume that all of a compound head is in the pages array, > > > > this assumption would only work inside the page walkers if the page > > > > was found in a PMD or something. > > > > > > I am not sure I understand your comment. The compound head is not > > > taken from the pages array, and not assumed to be in it. It is exactly > > > the same logic as that we currently have: > > > https://soleen.com/source/xref/linux/mm/gup.c?r=a00cda3f#1565 > > > > Oh, that existing logic is wrong too :( Another bug. > > I do not think there is a bug. > > > You can't skip pages in the pages[] array under the assumption they > > are contiguous. ie the i+=step is wrong. > > If pages[i] is part of a compound page, the other parts of this page > must be sequential in this array for this compound page That is true only if the PMD points to the page. If the PTE points to a tail page then there is no requirement that other PTEs are contiguous with the compount page. At this point we have no idea if the GUP logic got this compound page as a head page in a PMD or as a tail page from a PTE, so we can't assume a contiguous run of addresses. Look at split_huge_pmd() - it doesn't break up the compound page it just converts the PMD to a PTE array and scatters the tail pages to the PTE. I understand Matt is pushing on this idea more by having compound pages in the page cache, but still mapping tail pages when required. > This is actually standard migration procedure, elsewhere in the kernel > we migrate pages in exactly the same fashion: isolate and later > migrate. The isolation works for LRU only pages. But do other places cause a userspace visible random failure when LRU isolation fails? I don't like it at all, what is the user supposed to do? Jason