Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1603460pxu; Thu, 17 Dec 2020 14:05:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJyC0sZ5YO/vZcT5xyDWuQauxrF57UcdhbhoKABIpStdgf9+x6l/Pp54PLw5Lc6zRgFOQl6D X-Received: by 2002:aa7:c543:: with SMTP id s3mr1516101edr.88.1608242740776; Thu, 17 Dec 2020 14:05:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608242740; cv=none; d=google.com; s=arc-20160816; b=hvt7evo2gtsqMW+bO5fxxk0r9WT1M2I3z1NOzYO+f1RNNyZMH5SXLz184TsgO3RnEj uYrOquWu/F25rTXWp4bmKCB2X9LZBP0NnUztF+f/b+nqibIQLMHkBiIrEHfboJ5hr/UN I8CpDZz9QWSKK2kdI3da5X/7W9RKebugrdeldiKdAcghjFcn0qQ9keQoLlZRl/33+i7u he9BCGCKRfAjPyS+TEKiesjq1FnNMClJhzvwESzJ9rZhLkNY84LJ1rrLiGS/2iyWKSUh SAy4MThtxRnsvi8fLnQxRN4Qi/Flbfoyz6aW9ISISfm+C9/tOXrX+wG2SPk69TBHiYcC TxmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9vdnsUgYkB36Ep59teaQpHG5J09HGP1/qSDhF8tulqw=; b=U0eRHHjjg3ShjSZ+ap1WX3OiV/8r6FJxX8Pbd5+Zfq53ALzxI9ug+P+JR5jscVRFZY eD2AyeEG2/Ayc/joEwx03MLyoxYxlCk+bfRhKMrXuC3LokzxjaiiVmwsRdYhyhy94nRk 85jfbs0ovrajRGOf7ibfgQ3m1pQQ3eNsaV/Ua8Vh6jFN1F6qGHLNrQZsMF+5K/Mpq7cR bVl2Aag47M5pLosBRcMe3vcMCMq2cjzL1Dge3AQphnDGDfj2vOHejwtoZ8SrACnAGcTr 2xBAxIK4hqyl+cOTnCPLnQyFPeIdu9zShv7ov5RIQOX13OrOuomMsxF154UTG3kkXnM9 Zf5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=IyWl6xJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p19si3431289ejc.398.2020.12.17.14.05.18; Thu, 17 Dec 2020 14:05:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=IyWl6xJf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732075AbgLQWDW (ORCPT + 99 others); Thu, 17 Dec 2020 17:03:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732021AbgLQWDV (ORCPT ); Thu, 17 Dec 2020 17:03:21 -0500 Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A1A9C0617A7 for ; Thu, 17 Dec 2020 14:02:41 -0800 (PST) Received: by mail-ej1-x631.google.com with SMTP id jx16so247235ejb.10 for ; Thu, 17 Dec 2020 14:02:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9vdnsUgYkB36Ep59teaQpHG5J09HGP1/qSDhF8tulqw=; b=IyWl6xJfaeIwBHp2Vnyb602iv3sW5tz1Z1AonhBzz//6kqX6ut9KC3FJr8VW8yMCoO WRkzbAJo3tcX4BE4tIWUTt9b5IMTegSH0DxRJYvoOCTmYk3N7ZxBLzBGTWkXrzz85kmB uJQ1gHlb6oGRhyKkKvoevO32Bd/V8u31wI7z/0YmWJEKPv/Yu/8MTwjHgTtwebJGBHfz mMzBZBYMFhrvEj93oVEq2KK1wAh2W8sOEFebz96Puan3tNp4aIGxmiPOsd45rAIxLcmF AozEgbPb4M4QG2zOLYiatpVfuSqNk0lpaAu4P2+j5Wt4pNPzxaptN2sOKJyjbtCayXCs OsAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9vdnsUgYkB36Ep59teaQpHG5J09HGP1/qSDhF8tulqw=; b=p6PUPjo/kvoZSDVWvTEt8vKFxqZPy4vOI4Q45JMOD9YwsM0QjttOX72WcYyiRiyGor lyMTZcF1xHpfVqJhRFoMnXDLnebtWQmpwvG4SPv9xXKGR/BK3lbQ80gyVEYziFdECvCV FzXu6xIvIaqpOj/9u46Fs8p1+Xbop0+ylzEEWWAmc09AUFPZKJ9P4Yjw4GTfFtDuPcTF 9K+xZFJf/Z1D2jpBKdeGcWgUCpp7RPjDCw6TqBDCFd6fiL6gpiqLhw/ymwnobK22lrqy Lj5WQV70EQbO78DT57JT/RQZZSWWtUUFML50sjG4S0aMPF9TqB4aPK8A2CAKlLwnj75S KxmQ== X-Gm-Message-State: AOAM533mi2zmeRLUr20BAmnfzIbLmgPBWIwaBbajwMlBCK//Fc3teuQh Ubxq+GTN9RrJMUi2Sj9p8ipV+aQZi5cfrRpVRUzueA== X-Received: by 2002:a17:906:ce51:: with SMTP id se17mr1122204ejb.314.1608242559864; Thu, 17 Dec 2020 14:02:39 -0800 (PST) MIME-Version: 1.0 References: <20201217185243.3288048-1-pasha.tatashin@soleen.com> <20201217185243.3288048-9-pasha.tatashin@soleen.com> <20201217205048.GL5487@ziepe.ca> In-Reply-To: <20201217205048.GL5487@ziepe.ca> From: Pavel Tatashin Date: Thu, 17 Dec 2020 17:02:03 -0500 Message-ID: Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures To: Jason Gunthorpe Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jason, Thank you for your comments. My replies below. On Thu, Dec 17, 2020 at 3:50 PM Jason Gunthorpe wrote: > > On Thu, Dec 17, 2020 at 01:52:41PM -0500, Pavel Tatashin wrote: > > +/* > > + * Verify that there are no unpinnable (movable) pages, if so return true. > > + * Otherwise an unpinnable pages is found return false, and unpin all pages. > > + */ > > +static bool check_and_unpin_pages(unsigned long nr_pages, struct page **pages, > > + unsigned int gup_flags) > > +{ > > + unsigned long i, step; > > + > > + for (i = 0; i < nr_pages; i += step) { > > + struct page *head = compound_head(pages[i]); > > + > > + step = compound_nr(head) - (pages[i] - head); > > You can't assume that all of a compound head is in the pages array, > this assumption would only work inside the page walkers if the page > was found in a PMD or something. I am not sure I understand your comment. The compound head is not taken from the pages array, and not assumed to be in it. It is exactly the same logic as that we currently have: https://soleen.com/source/xref/linux/mm/gup.c?r=a00cda3f#1565 > > > + if (gup_flags & FOLL_PIN) { > > + unpin_user_pages(pages, nr_pages); > > So we throw everything away? Why? That isn't how the old algorithm worked It is exactly like the old algorithm worked: if there are pages to be migrated (not pinnable pages) we unpinned everything. See here: https://soleen.com/source/xref/linux/mm/gup.c?r=a00cda3f#1603 If cma_pages_list is not empty unpin everything. The list is not empty if we isolated some pages, we isolated some pages if there are some pages which are not pinnable. Now, we do exactly the same thing, but cleaner, and handle errors. We must unpin everything because if we fail, no pages should stay pinned, and also if we migrated some pages, the pages array must be updated, so we need to call __get_user_pages_locked() pin and repopulated pages array. > > > @@ -1654,22 +1664,55 @@ static long __gup_longterm_locked(struct mm_struct *mm, > > struct vm_area_struct **vmas, > > unsigned int gup_flags) > > { > > - unsigned long flags = 0; > > + int migrate_retry = 0; > > + int isolate_retry = 0; > > + unsigned int flags; > > long rc; > > > > - if (gup_flags & FOLL_LONGTERM) > > - flags = memalloc_pin_save(); > > + if (!(gup_flags & FOLL_LONGTERM)) > > + return __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > > + NULL, gup_flags); > > > > - rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, NULL, > > - gup_flags); > > + /* > > + * Without FOLL_WRITE fault handler may return zero page, which can > > + * be in a movable zone, and also will fail to isolate during migration, > > + * thus the longterm pin will fail. > > + */ > > + gup_flags &= FOLL_WRITE; > > Is &= what you mean here? |= right? Right, I meant |=. > > Seems like we've ended up in a weird place if FOLL_LONGTERM always > includes FOLL_WRITE. Putting the zero page in ZONE_MOVABLE seems like > a bad idea, no? I am not sure, I just found this problem during testing, and this is the solution I am proposing. I am worried about limiting the zero page to a non movable zone, but let's see what others think about this. > > > + /* > > + * Migration may fail, we retry before giving up. Also, because after > > + * migration pages[] becomes outdated, we unpin and repin all pages > > + * in the range, so pages array is repopulated with new values. > > + * Also, because of this we cannot retry migration failures in a loop > > + * without pinning/unpinnig pages. > > + */ > > The old algorithm made continuous forward progress and only went back > to the first migration point. That is not right, the old code went back to the beginning. Making continuous progress is possible, but we won't see any performance betnefit from it, because migration failures is already exception scenarios where machine is under memory stress. The truth is if we fail to migrate it is unlikely will succeed if we retry right away, so giving some time between retries may be even beneficial. Also with continious progress we need to take care of some corner cases where we need to unpin already succeeded pages in case if forward progress is not possible. Also, adjust pages array, start address etc. > > > + for (; ; ) { > > while (true)? Hm, the same thing? :) > > > + rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > > + NULL, gup_flags); > > > + /* Return if error or if all pages are pinnable */ > > + if (rc <= 0 || check_and_unpin_pages(rc, pages, gup_flags)) > > + break; > > So we sweep the pages list twice now? Yes, but O(N) is the same. No new operation is added. Before we had something like this: while (npages) check if pinnable isolate while (npages) unpin migrate while (npages) pin Now: while(npages) check if pinnable while(npages) unpin while(npages) isolate migrate pin > > > + /* Some pages are not pinnable, migrate them */ > > + rc = migrate_movable_pages(rc, pages); > > + > > + /* > > + * If there is an error, and we tried maximum number of times > > + * bail out. Notice: we return an error code, and all pages are > > + * unpinned > > + */ > > + if (rc < 0 && migrate_retry++ >= PINNABLE_MIGRATE_MAX) { > > + break; > > + } else if (rc > 0 && isolate_retry++ >= PINNABLE_ISOLATE_MAX) { > > + rc = -EBUSY; > > I don't like this at all. It shouldn't be so flakey > > Can you do migration without the LRU? I do not think it is possible, we must isolate pages before migration. Pasha