Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1603785pxu; Sat, 12 Dec 2020 19:53:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJxu+IXLcsgzoqmxf2dA4MadcRxg1xj/+Y2TX+DkPNuUNXfJ49z33fv3Od/6L26e8ndMcK5U X-Received: by 2002:a05:6402:1d15:: with SMTP id dg21mr18748773edb.280.1607831583875; Sat, 12 Dec 2020 19:53:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607831583; cv=none; d=google.com; s=arc-20160816; b=TuInPAXng+DDdy00kbWx9MkOmeQ28NreFVbaTuCwYImDbeFMtXXt1wtT7lq7O0MNEy N/NgId3hHKxhCWtELVjeK1EyceoCoXcHZMRpWlzg3Va3MeLPFGKYBYTNuqqCyPponiix 76WxXAgIIdVmfBgAe0CDnWRk450TJUItsBX8sfjPZ6eEGZ96hZnLkOCmCY07qK2GO4x7 vK1GeA5xcvKjVoUcatgRehmOpO5XiMx7MpITGbNHY9ozMTx4+FobcScZ/lHhwqFXGGPA XWmFiuXjXkYIXXG4GW7nL5iZv+l5ZD85BEBEPmze58LMezJzUCUN5jZRa3r14Y812p3J M73w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=2ofz0LLzLgTKfndVa7Y4jwNU81NgrxPa3g7jb7sTuuQ=; b=ZTtTrU9AUbU5ZSGA7/UwvTKjC/n4S7XV4co9alBlKJUauxeEI5DyYOubbSLV0IJTB1 uYmjGAhGkJsQ1tZC0OzyBsXWzBce2/KIA1HGkBDdB7wPIu8sGm+mBVGxw7pG9XxN4QcF nB88PY3UTImBFeoYyQ0/G1/pFM1Oc3GUN5sWBqKJRxl2sP0g0O1LR5xYnD5MqWafQYl3 7v9kK6JocQj+4oMJqzNhmAHM9KteBF/nM41npmFgx1y8loF8xvl7x2G8eZki+NC7uxAY I1RVaHoP1ABOokAEm1wVV/sNTMM/ufXCXRD294YxBvnx2cYR27c4CDCzbsFFU4NfOKRm BlrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dpio/tDj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u9si7587799edd.26.2020.12.12.19.52.41; Sat, 12 Dec 2020 19:53:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dpio/tDj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393068AbgLKVy4 (ORCPT + 99 others); Fri, 11 Dec 2020 16:54:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:26922 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391893AbgLKVyc (ORCPT ); Fri, 11 Dec 2020 16:54:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607723585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2ofz0LLzLgTKfndVa7Y4jwNU81NgrxPa3g7jb7sTuuQ=; b=Dpio/tDjuoOL1vX7rME93SogWBth3GWcxaTKYXmkMN8noQSaNUOK6Im8Q51mIaNlud2/Z5 KSQxkYbwjgNw5poB60UroYC9x0F+gI2OrjrB9PpergdqjT/L4Z6KcsSG3/kD8DzafWzqUs b3OIGAbvfu5INYvG2C3d/lt062xf33s= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-119-it7P4bgoO2yXMV8AfUrFMw-1; Fri, 11 Dec 2020 16:53:03 -0500 X-MC-Unique: it7P4bgoO2yXMV8AfUrFMw-1 Received: by mail-wm1-f69.google.com with SMTP id s130so2928726wme.0 for ; Fri, 11 Dec 2020 13:53:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=2ofz0LLzLgTKfndVa7Y4jwNU81NgrxPa3g7jb7sTuuQ=; b=foJTHphzhymJaF8FVn5/54v20JIrpDiJHKAVd6h1Heb5ET4BKygYX7J2M78jvJJ7mi FYLj4brRpVCiJaLYCWL3rwZUWvy28CSP/1YVuls9B9GTccUC0KJ3IUHooT2JUwtMLuqe NunCZQha3YFsXhtv2sntrcSqtrYDc2N8ZYRAqDGQvWrwL8XV0GBaSUT5qGsYAuTCW+2S 4bajDwEe/9O3m2FzscPYZLO+ZbK0xQGi+Dw6UaS8ENMvllXbstg4nFDlxG4cP2Smd7EP yk4Cwkihy9KkFVHoPKrbKybzodRYzgMB/ugLclLxxE00mXw7wZye/OYcYldhA+R17Cw8 Y2Aw== X-Gm-Message-State: AOAM532obK/xTHQVxAiRSZ+L4VVgezsg8bgwbT6mwV5wK5Liz7bLcRfz XhASVO69dGFkII9VzXROt0t1oGDU1D3Uou332zw1MYxOrH2o3guJ0P+jxuWezK9DeXhFGb52IRd uMwskDWvse7Suu8kN3aDcFbjH X-Received: by 2002:a1c:4156:: with SMTP id o83mr15520315wma.178.1607723582221; Fri, 11 Dec 2020 13:53:02 -0800 (PST) X-Received: by 2002:a1c:4156:: with SMTP id o83mr15520295wma.178.1607723582002; Fri, 11 Dec 2020 13:53:02 -0800 (PST) Received: from [192.168.3.114] (p4ff23c7c.dip0.t-ipconnect.de. [79.242.60.124]) by smtp.gmail.com with ESMTPSA id z3sm17572565wrn.59.2020.12.11.13.53.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Dec 2020 13:53:01 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v3 5/6] mm/gup: migrate pinned pages out of movable zone Date: Fri, 11 Dec 2020 22:53:00 +0100 Message-Id: <447A41F3-EB94-4DA4-8B98-038B127774A5@redhat.com> References: Cc: David Hildenbrand , Jason Gunthorpe , LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List In-Reply-To: To: Pavel Tatashin X-Mailer: iPhone Mail (18B92) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Am 11.12.2020 um 22:36 schrieb Pavel Tatashin := >=20 > =EF=BB=BFOn Fri, Dec 11, 2020 at 4:29 PM David Hildenbrand wrote: >>=20 >>=20 >>>> Am 11.12.2020 um 22:09 schrieb Pavel Tatashin : >>>=20 >>> =EF=BB=BFOn Fri, Dec 11, 2020 at 3:46 PM Jason Gunthorpe w= rote: >>>>=20 >>>>> On Fri, Dec 11, 2020 at 03:40:57PM -0500, Pavel Tatashin wrote: >>>>> On Fri, Dec 11, 2020 at 3:23 PM Jason Gunthorpe wrote: >>>>>>=20 >>>>>> On Fri, Dec 11, 2020 at 03:21:39PM -0500, Pavel Tatashin wrote: >>>>>>> @@ -1593,7 +1592,7 @@ static long check_and_migrate_cma_pages(struct= mm_struct *mm, >>>>>>> } >>>>>>>=20 >>>>>>> if (!isolate_lru_page(head)) { >>>>>>> - list_add_tail(&head->lru, &cma= _page_list); >>>>>>> + list_add_tail(&head->lru, &mov= able_page_list); >>>>>>> mod_node_page_state(page_pgdat(h= ead), >>>>>>> NR_ISOLATED_= ANON + >>>>>>> page_is_file= _lru(head), >>>>>>> @@ -1605,7 +1604,7 @@ static long check_and_migrate_cma_pages(struct= mm_struct *mm, >>>>>>> i +=3D step; >>>>>>> } >>>>>>>=20 >>>>>>> - if (!list_empty(&cma_page_list)) { >>>>>>> + if (!list_empty(&movable_page_list)) { >>>>>>=20 >>>>>> You didn't answer my earlier question, is it OK that ZONE_MOVABLE >>>>>> pages leak out here if ioslate_lru_page() fails but the >>>>>> moval_page_list is empty? >>>>>>=20 >>>>>> I think the answer is no, right? >>>>> In my opinion it is OK. We are doing our best to not pin movable >>>>> pages, but if isolate_lru_page() fails because pages are currently >>>>> locked by someone else, we will end up long-term pinning them. >>>>> See comment in this patch: >>>>> + * 1. Pinned pages: (long-term) pinning of movable pages is av= oided >>>>> + * when pages are pinned and faulted, but it is still possi= ble that >>>>> + * address space already has pages in ZONE_MOVABLE at the t= ime when >>>>> + * pages are pinned (i.e. user has touches that memory befo= re >>>>> + * pinning). In such case we try to migrate them to a diffe= rent zone, >>>>> + * but if migration fails the pages can still end-up pinned= in >>>>> + * ZONE_MOVABLE. In such case, memory offlining might retry= a long >>>>> + * time and will only succeed once user application unpins p= ages. >>>>=20 >>>> It is not "retry a long time" it is "might never complete" because >>>> userspace will hold the DMA pin indefinitely. >>>>=20 >>>> Confused what the point of all this is then ?? >>>>=20 >>>> I thought to goal here is to make memory unplug reliable, if you leave >>>> a hole like this then any hostile userspace can block it forever. >>>=20 >>> You are right, I used a wording from the previous comment, and it >>> should be made clear that pin may be forever. Without these patches it >>> is guaranteed that hot-remove will fail if there are pinned pages as >>> ZONE_MOVABLE is actually the first to be searched. Now, it will fail >>> only due to exceptions listed in ZONE_MOVABLE comment: >>>=20 >>> 1. pin + migration/isolation failure >>=20 >> Not sure what that really means. We have short-term pinnings (although we= might have a better term for =E2=80=9Epinning=E2=80=9C here) for example, w= hen a process dies (IIRC). There is a period where pages cannot get migrated= and offlining code has to retry (which might take a while). This still appl= ies after your change - are you referring to that? >>=20 >>> 2. memblock allocation due to limited amount of space for kernelcore >>> 3. memory holes >>> 4. hwpoison >>> 5. Unmovable PG_offline pages (? need to study why this is a scenario). >>=20 >> Virtio-mem is the primary user in this context. >>=20 >>> Do you think we should unconditionally unpin pages, and return error >>> when isolation/migration fails? >>=20 >> I=E2=80=98m not sure what you mean here. Who=E2=80=99s supposed to unpin w= hich pages? >=20 > Hi David, >=20 > When check_and_migrate_movable_pages() is called, the pages are > already pinned. If some of those pages are in movable zone, and we > fail to migrate or isolate them what should we do: proceed, and keep > it as exception of when movable zone can actually have pinned pages or > unpin all pages in the array, and return an error, or unpin only pages > in movable zone, and return an error? >=20 I guess revert what we did (unpin) and return an error. The interesting ques= tion is what can make migration/isolation fail a) out of memory: smells like a zone setup issue. Failures are acceptable I g= uess. b) short term pinnings: process dying - not relevant I guess. Other cases? (= Fork?) c) ? Once we clarified that, we actually know how likely it will be to return an e= rror (and making vfio pinnings fail etc). > Pasha