Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2106760pxu; Fri, 18 Dec 2020 05:49:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJxyqLx0vz+VQVUfASJp61UDZF39pc7IMy5sZ28iV/xjF+yKIue4bEGIYkv1KPyxAhw5sSKj X-Received: by 2002:a17:906:195a:: with SMTP id b26mr4113746eje.4.1608299368668; Fri, 18 Dec 2020 05:49:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608299368; cv=none; d=google.com; s=arc-20160816; b=toJaJNHIRRvbEQQMyVz6DendTVFihP6EYzPE5GPcRTaVWdNMLY0kaQTj7u8Z3lk8d9 Or6W3lCOX17h/CX8W5vwqqUXh0MHS0M/hiqYQ9YPB8eGuAK9IMJsRg/2YIqViVngFS1p JCj19VTRSpyinl9JRlTJHB3hfy59gJ7MDBj/3jy8ABw1/WqJnnrZFCAAnLXYNVfdtXj1 EaDSdV7IX1Z7ZqXCrPKkuvghExAbD+bh07XXRSDR+yp+JbWvG4GN2km/bbpKf9ObCfh/ lpdjsWPlRkRX4T2a7/t4M22I9bBb3U9cMOvTobDgRbWw4cW5WTL5T1PVhIOGoleNCljV Ww5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=e/CHgMLH2UdVHVqifkFPGlm0faQZyFjgpZO1JW9Bwng=; b=FC0O5A8Q6g1w2bn5egWsweAw54fkuSGsvKZjK/mJ/Q7Yju+lBvOGP3SUKBksw/uE3J lgPjBK2y52HdsDETBs4V8StZwxbPMOYD+mCTLqtPz/TEo+p4jrzcfdzxMY4yCK/Tm/fw Ts1UwXo2WiY4UxzVqcfJVopuqBd9zhohc53zIqO6u8EtB/aIVTBOQapaZnRp4ZPGuHxZ PdpGwjI9E/zRbIdbHJp48/xfgu71HLqOtNZlUUWf3LvgLDeW1gf0V+6zsB9E/cqIWd0o ZX1rXXSqhi2LFF3fWDiKVoZcC0qHKNkEROvMcX0fqBfrFTkV60G9HtZqaXIY55pjtrct CNTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=H7siudVo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q26si4309487eju.216.2020.12.18.05.49.04; Fri, 18 Dec 2020 05:49:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=H7siudVo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727360AbgLRNGV (ORCPT + 99 others); Fri, 18 Dec 2020 08:06:21 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:26717 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727228AbgLRNGV (ORCPT ); Fri, 18 Dec 2020 08:06:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608296694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e/CHgMLH2UdVHVqifkFPGlm0faQZyFjgpZO1JW9Bwng=; b=H7siudVoExcOES8FcVNtN7nqlGhjSCp0hUA6nJiqBFbFa+ifpuNgym4CAgsEEShmf2mklb aPaGStqKZuYSieThfOdDBXZ0cO3+wgNHOx3+RXcasLdsipjzx5Nl2SAnE2DArZXJ1ULfes OxR3+fJhp3gDpIzBFBoh/GVqqzKhMCc= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-527-Zo7UmX0lOWSVE2VxUtgIpw-1; Fri, 18 Dec 2020 08:04:51 -0500 X-MC-Unique: Zo7UmX0lOWSVE2VxUtgIpw-1 Received: by mail-wr1-f69.google.com with SMTP id r8so1139028wro.22 for ; Fri, 18 Dec 2020 05:04:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=e/CHgMLH2UdVHVqifkFPGlm0faQZyFjgpZO1JW9Bwng=; b=mz5enRI0bouBcdNCZGm6Ic/q4HjFyU2h5+CNx/B0X8X1WmCYj0KjeT2gfChpxPE13u oecQ+I1+XqxXsDSdxb3sOA2wxqz9hodcFn7GAvzYkwneBRX9M72b7Xc2PtBeGW++zfH3 w1hrztwKrtyqEKk/LHv7sUORsm+68czcbcI9O3/T1wUR0j2CdZWPVaHClUSNG3p+TsZ8 mKox3MZ3HWWQif4WUNxT6xW0Jt9em6U6Yz0ftgq3SPOYJqdwiDW9aHV1sFIgffE6szw+ 1BFiy7RuC2yQBsy7YP1FgfIEnVibRCvZLCfTWdbZGD8regakqHnMgucGLuq7G5nlrh0O 2SQw== X-Gm-Message-State: AOAM530LqFpa22sf6AXQL1i9qFxCjba4LSpCxKKu0XZnYHqyqEeYmwt7 yDTDOt9vUV+kkIAT29SJ6U/KBM62dwBrrGmQKbC7yoshJUn63xG713niEJ14eLdp42XUiCYA6BC oP2kVeKQNcDFHiCrYV7E/+hH1 X-Received: by 2002:a5d:674c:: with SMTP id l12mr4337397wrw.399.1608296690709; Fri, 18 Dec 2020 05:04:50 -0800 (PST) X-Received: by 2002:a5d:674c:: with SMTP id l12mr4337352wrw.399.1608296690473; Fri, 18 Dec 2020 05:04:50 -0800 (PST) Received: from [192.168.3.114] (p5b0c6327.dip0.t-ipconnect.de. [91.12.99.39]) by smtp.gmail.com with ESMTPSA id k10sm12747574wrq.38.2020.12.18.05.04.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 18 Dec 2020 05:04:49 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures Date: Fri, 18 Dec 2020 14:04:48 +0100 Message-Id: <1671AFC0-3D06-4C4E-934D-CB6DC0AFE4A1@redhat.com> References: Cc: Michal Hocko , LKML , linux-mm , Andrew Morton , Vlastimil Babka , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Jason Gunthorpe , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org In-Reply-To: To: Pavel Tatashin X-Mailer: iPhone Mail (18B92) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Am 18.12.2020 um 13:43 schrieb Pavel Tatashin := >=20 > =EF=BB=BFOn Fri, Dec 18, 2020 at 5:46 AM Michal Hocko wr= ote: >>=20 >> On Thu 17-12-20 13:52:41, Pavel Tatashin wrote: >> [...] >>> +#define PINNABLE_MIGRATE_MAX 10 >>> +#define PINNABLE_ISOLATE_MAX 100 >>=20 >> Why would we need to limit the isolation retries. Those should always be >> temporary failure unless I am missing something. >=20 > Actually, during development, I was retrying isolate errors > infinitely, but during testing found a hung where when FOLL_TOUCH > without FOLL_WRITE is passed (fault in kernel without write flag), the > zero page is faulted. The isolation of the zero page was failing every > time, therefore the process was hanging. >=20 > Since then, I fixed this problem by adding FOLL_WRITE unconditionally > to FOLL_LONGTERM, but I was worried about other possible bugs that > would cause hangs, so decided to limit isolation errors. If you think > it its not necessary, I can unlimit isolate retires. >=20 >> I am not sure about the >> PINNABLE_MIGRATE_MAX either. Why do we want to limit that? migrate_pages >> already implements its retry logic why do you want to count retries on >> top of that? I do agree that the existing logic is suboptimal because >=20 > True, but again, just recently, I worked on a race bug where pages can > end up in per-cpu list after lru_add_drain_all() but before isolation, > so I think retry is necessary. >=20 >> the migration failure might be ephemeral or permanent but that should be >> IMHO addressed at migrate_pages (resp. unmap_and_move) and simply report >> failures that are permanent - e.g. any potential pre-existing long term >> pin - if that is possible at all. If not what would cause permanent >> migration failure? OOM? >=20 > Yes, OOM is the main cause for migration failures. And also a few > cases described in movable zone comment, where it is possible during > boot some pages can be allocated by memblock in movable zone due to > lack of memory resources (even if those resources were added later), > hardware page poisoning is another rare example. >=20 How is concurrent migration handled? Like memory offlining, compaction, allo= c_contig_range() while trying to pin? >> -- >> Michal Hocko >> SUSE Labs >=20