Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2061692pxu; Fri, 18 Dec 2020 04:45:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJw6wLeXmfVtr3+UaOWPpZnvUqs+rbuQn8hy5x79SdJL9JujU9OgNYbKccUTinVLtePQR0E8 X-Received: by 2002:a17:906:e082:: with SMTP id gh2mr3735232ejb.406.1608295530107; Fri, 18 Dec 2020 04:45:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608295530; cv=none; d=google.com; s=arc-20160816; b=npT0s35PDru0EBAIk4v5Gq6FMuCOwJn1XLyi1EzTBTE3T2hs7T+wjno8FkMJHCBVso EHkeidmFZpbhDOrECDDspJjcGxzriBuSViijydyCOIFTChayYUcBcGNtjX+JSN+RHXgG rOJ5I8z+zMgr8o3aAa/HThSse8fgEeZDD7YZow1oRD7K5wOJQ747610IJK6pPxGQi3W1 KpALjiIHIjQqdRZ03OfjghP1yNpsavEQeHa7Hbi9XWxTTo5hzJYAeMzDH1obJNzBv4uj 8iam6o66ho7tcijY7juXKONVGE+tKUSDTPxc4+jGPgiraGLXtcXyn8kuHHqK2NQWYbB0 G2CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=VP62gLy4L6lfMGIRwtzfjFvE+h27PxSa+U0ikjWYO+Y=; b=aXDHyOW9n3iHgPVBkMtynDj9Wres6OnDeaGxUYh3EYU5aIAE6arxSR1M5uRMZiPn4S FpeXLgYsmflPkICOEoKdUM591AW4C06x+1FQdO9a+IFk9deQloNgE6pqsRrWXkDZ//CA UyajrvlwO3oIQ0rXR42YtEk6LURAuMqpL9MKw7qpvfOtifBTty3gTl5nVIdq3pJc0fAL lCUwJA8zd54IfDUzyNqrM8o2rmlOdf4Z7PElv2IFr/car88WXKWdhxJ9eTMeRcaJKqrc lMR4sGpgf+NghWRGNJucAJea/+AT+eZsZm8fsKaGcZN7IX7Vlwd2AerqICGNHPi/uHwh a6YQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=X0hCklMn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id co1si6119725edb.571.2020.12.18.04.45.06; Fri, 18 Dec 2020 04:45:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=X0hCklMn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726570AbgLRMof (ORCPT + 99 others); Fri, 18 Dec 2020 07:44:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726395AbgLRMoe (ORCPT ); Fri, 18 Dec 2020 07:44:34 -0500 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE593C061282 for ; Fri, 18 Dec 2020 04:43:53 -0800 (PST) Received: by mail-ej1-x630.google.com with SMTP id q22so3033637eja.2 for ; Fri, 18 Dec 2020 04:43:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VP62gLy4L6lfMGIRwtzfjFvE+h27PxSa+U0ikjWYO+Y=; b=X0hCklMnpRKCi/ALhX9mvgOwcS8w3p+GTPamqE39ttdKNscia4P58KD++FD2plVHEo A2UZwBioUoldfE+g1gnuZazRSXRtwyTvuOxtpxexB4ny4bAneoWMowNMk9+NyYIUf2tE fpmcwGa+RSYTcM6j2JIqqfqaD4QUyoutw2JvGnjh8ZNCVnoEaeXMnhl6/rs6njNIjw2u EnZXSgbc0WUzbI2WF28Za2/SO0iaCFCLXz58pWOZi2/vvGDTdfQOlTVEBtshkQw+6Pxj sj7yDpUPFFgfKQa11E681HdSZboZCp+5rRtgc3lOCDMxrce9g0AJLJRq1ieG7Dd3HOBR Wphw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VP62gLy4L6lfMGIRwtzfjFvE+h27PxSa+U0ikjWYO+Y=; b=ZRj6Kbq+gvyauLDgABfjjDPVROxYgF+g52slpPpr31uGInnEGvaLzMUCgccHZ4kakz K6GfqkIjgMA+uO8eBY6hqnZGdQdMk1cbCbf7/Kjxi8HF+gtCRxtND9WHmvLZdKNkxhaq KKRw8rv1sOAlYHiTY4gB+DeMbSjUu2Kf0hMYujgy5InGJrjVaZw4XOyj7oIzD0ZfrU2t c8T716kJ7SapKT7bfMTaOAy7D2ZxKABlo/7CvL91PXCSTee2m27Qxis0hdN4gC8A6pkH rp2t5q4P/m8ixncs5og6S4VhO15GwRx4mNNRBa0Q47DQCdgfHZuDkP6GrIwwanxe+sfr TXYA== X-Gm-Message-State: AOAM530UXpEyIToMddbMEohhTJi3JZyRdXtMixS/7Y6Z+fGKOyWBcQyY 4WHlbtpfHdn7dpCLADthN2N6HohzoewdxiamKeFJCA== X-Received: by 2002:a17:906:7d98:: with SMTP id v24mr3816705ejo.129.1608295432506; Fri, 18 Dec 2020 04:43:52 -0800 (PST) MIME-Version: 1.0 References: <20201217185243.3288048-1-pasha.tatashin@soleen.com> <20201217185243.3288048-9-pasha.tatashin@soleen.com> <20201218104655.GW32193@dhcp22.suse.cz> In-Reply-To: <20201218104655.GW32193@dhcp22.suse.cz> From: Pavel Tatashin Date: Fri, 18 Dec 2020 07:43:15 -0500 Message-ID: Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures To: Michal Hocko Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Jason Gunthorpe , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 18, 2020 at 5:46 AM Michal Hocko wrote: > > On Thu 17-12-20 13:52:41, Pavel Tatashin wrote: > [...] > > +#define PINNABLE_MIGRATE_MAX 10 > > +#define PINNABLE_ISOLATE_MAX 100 > > Why would we need to limit the isolation retries. Those should always be > temporary failure unless I am missing something. Actually, during development, I was retrying isolate errors infinitely, but during testing found a hung where when FOLL_TOUCH without FOLL_WRITE is passed (fault in kernel without write flag), the zero page is faulted. The isolation of the zero page was failing every time, therefore the process was hanging. Since then, I fixed this problem by adding FOLL_WRITE unconditionally to FOLL_LONGTERM, but I was worried about other possible bugs that would cause hangs, so decided to limit isolation errors. If you think it its not necessary, I can unlimit isolate retires. > I am not sure about the > PINNABLE_MIGRATE_MAX either. Why do we want to limit that? migrate_pages > already implements its retry logic why do you want to count retries on > top of that? I do agree that the existing logic is suboptimal because True, but again, just recently, I worked on a race bug where pages can end up in per-cpu list after lru_add_drain_all() but before isolation, so I think retry is necessary. > the migration failure might be ephemeral or permanent but that should be > IMHO addressed at migrate_pages (resp. unmap_and_move) and simply report > failures that are permanent - e.g. any potential pre-existing long term > pin - if that is possible at all. If not what would cause permanent > migration failure? OOM? Yes, OOM is the main cause for migration failures. And also a few cases described in movable zone comment, where it is possible during boot some pages can be allocated by memblock in movable zone due to lack of memory resources (even if those resources were added later), hardware page poisoning is another rare example. > -- > Michal Hocko > SUSE Labs