Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3439576pxk; Mon, 28 Sep 2020 18:30:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwlqWYZITWumU/mIPc2L6UZTFIEu1kry/eyqUP5/wjMSgUFqwVnfEf0KROTSwBvm5hGc0wz X-Received: by 2002:a17:906:3552:: with SMTP id s18mr1497064eja.23.1601343053534; Mon, 28 Sep 2020 18:30:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601343053; cv=none; d=google.com; s=arc-20160816; b=c6My3M/e9MZxu2VxNTXj8Ht/IDfBAtIAxwN5Rb0PyLwq05u6/elD0FpANLwR3zo9XH ikh1NkSte2/XdHM7n/4IExPRlc1ulGyDkdRm7BZTF5gba6w2fKEuUXbT3GOeJQFRTIyD e3CnKEa73xO2Uhe7UMl0VK/tY7cxnohz+v0HXkWVSUMgzJvjbS9/IDPokNTapdihj0AV ZZmlrpj9dfbGhD/fKJODp1pBM7aCVkuAOpApcFrg1RmCby69TV1QNXvqMOVOdgykZrwU ZlWw9j/s5bYPZVVz3Dobaueq7kZLehiD55/kheYuueU8r50UzUBOUBCTLhN4P+nITQmA du8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=xKV+Kc0YURbZ9OuArrhLoETNqVLN6dHTtHTSNaBaB7U=; b=qS9dWcd7fRwZtOeFu5WHh48dlwKstaztIFh1+ENC66YQhRzvCibs4Un9imSUZojKsT UcRuScP3s9dX6t/YIXdeq7OrBvySTqFVo5ga2N+h28OKLnmXZy4bE1WYaSmTqr9dVOq9 9+Rs8PyZ9w3phCzifSw24cJup1fcQ+spHp65fIkgG1AAg4gEnvIjHI/wNtXdqHlrmBiE SPZt4eoY2Y3+7O6f2398Gs4GmR62fkzkQ2W3NDjVcXSIz5hJSDAMyF//kWyZW5KMMuDH DqvcRiWY965oODxIgMWmBMtxFueMfi9k9GFLqQ1dqNv6du0MpAak/yyOVzBiJrpBE2VE uZrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Mq6Cl8wM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x20si1815518edv.348.2020.09.28.18.30.23; Mon, 28 Sep 2020 18:30:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Mq6Cl8wM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727248AbgI2B2R (ORCPT + 99 others); Mon, 28 Sep 2020 21:28:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726379AbgI2B2R (ORCPT ); Mon, 28 Sep 2020 21:28:17 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 476C8C061755 for ; Mon, 28 Sep 2020 18:28:17 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id o5so2920532qke.12 for ; Mon, 28 Sep 2020 18:28:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=xKV+Kc0YURbZ9OuArrhLoETNqVLN6dHTtHTSNaBaB7U=; b=Mq6Cl8wM4kps/RaLae0PaTPMYi+Zb6zhxcab8XToESKjVw2h8tyNmq0lCk7MralPNY 3cWEXDlSCQSNw1G6kipkq/qoOJkU5NuEoOUmjH63yNkBATNzK9o1xu4Wjvr/A8xkMuBT qGo4APYOocCMYHC/UB7CridjOaNQ+3VRf/B2BrAefxK0OUuVy9/DoIl2qMXkruicQ36o C5hjWLqTG3WMk5nbBdEV7qmZhxp86asvr4lTQEOt160eASXMnjtnv75v7w6DrwVtZM4Y ncomlD3plN0eQV6nuYdloE5TkZLPk5PsU0sKX3RrFwpFbAq/mtgR4mtKAM4DcxJGGFDW RU/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=xKV+Kc0YURbZ9OuArrhLoETNqVLN6dHTtHTSNaBaB7U=; b=KbyAJQZrspM5Tid2RW4rMxP1/Z1dLpjEWbLP/HKmQyNmkF4hp2I0jB9NRr06pMnPIq 9DvwBwVCoWPqUf6xW79m9Xuub409p9dWjodSJ2Eju8LEQ9We55bmeVcD/ZYraHqGpWjA aasVnjmWqKoetpcIeY2GColbIVmknXaqeSKgMBN1LsiVmmwK+m2lqxvLy0D3SSMm0JB2 DOJBZibu1igaTGxwNmJDfESpEHvPND4kw5lksg/61/7TqYWrliS+FXH/gIgIJqUGogB5 v4Cnf8kHeJDoI32kLOeKJU6GdW7eAjttz5/svI12OWKAPjmy6Frj7GWIJpmEczT/hb/H ICWg== X-Gm-Message-State: AOAM533adudpSwcUO0ZYG7zCb44pi+A8ixmdcIUhmJu/f/WmzkC94lwQ bbBsCvGJGjGcAWtUmcVRTcbr7Vr3xe+8gX1pxpg= X-Received: by 2002:a37:d95:: with SMTP id 143mr2423293qkn.224.1601342896418; Mon, 28 Sep 2020 18:28:16 -0700 (PDT) MIME-Version: 1.0 References: <1601283046-15329-1-git-send-email-iamjoonsoo.kim@lge.com> <20200928165215.f46924bfff9a109131048f81@linux-foundation.org> In-Reply-To: <20200928165215.f46924bfff9a109131048f81@linux-foundation.org> From: Joonsoo Kim Date: Tue, 29 Sep 2020 10:28:05 +0900 Message-ID: Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs To: Andrew Morton Cc: Linux Memory Management List , LKML , Michal Hocko , Vlastimil Babka , "Aneesh Kumar K . V" , Mel Gorman , kernel-team@lge.com, Joonsoo Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2020=EB=85=84 9=EC=9B=94 29=EC=9D=BC (=ED=99=94) =EC=98=A4=EC=A0=84 8:52, A= ndrew Morton =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84= =B1: > > On Mon, 28 Sep 2020 17:50:46 +0900 js1304@gmail.com wrote: > > > From: Joonsoo Kim > > > > memalloc_nocma_{save/restore} APIs can be used to skip page allocation > > on CMA area, but, there is a missing case and the page on CMA area coul= d > > be allocated even if APIs are used. This patch handles this case to fix > > the potential issue. > > > > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist > > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn= 't > > specified. > > Changelog doesn't describe the end-user visible effects of the bug. > Please send this description? How about this one? memalloc_nocma_{save/restore} APIs can be used to skip page allocation on CMA area, but, there is a missing case and the page on CMA area could be allocated even if APIs are used. This patch handles this case to fix the potential issue. For now, these APIs are used to prevent long-term pinning on the CMA page. When the long-term pinning is requested on the CMA page, it is migrated to the non-CMA page before pinning. This non-CMA page is allocated by using memalloc_nocma_{save/restore} APIs. If APIs doesn't work as intended, the CMA page is allocated and it is pinned for a long time. This long-term = pin for the CMA page causes cma_alloc() failure and it could result in wrong behaviour on the device driver who uses the cma_alloc(). Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't specified. > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone= , > > struct page *page; > > > > if (likely(order =3D=3D 0)) { > > - page =3D rmqueue_pcplist(preferred_zone, zone, gfp_flags, > > + /* > > + * MIGRATE_MOVABLE pcplist could have the pages on CMA ar= ea and > > + * we need to skip it when CMA area isn't allowed. > > + */ > > + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA || > > + migratetype !=3D MIGRATE_MOVABLE) { > > + page =3D rmqueue_pcplist(preferred_zone, zone, gf= p_flags, > > migratetype, alloc_flags); > > - goto out; > > + goto out; > > + } > > } > > > > /* > > We still really really don't want to be adding overhead to the page > allocation hotpath for a really really obscure feature from a single > callsite. In fact, if we clear the __GFP_MOVABLE flag when initializing the allocation context, we can avoid CMA page allocation without adding this overhead to the page allocation hotpath. But, I think this is not a good practice since it cheats the allocation type. There would be a side-effect, for example, we cannot use the memory on the ZONE_MOVABLE in this case. > Do we have an understanding of how many people's kernels are enabling > CONFIG_CMA? AFAIK, the almost embedded system enables CONFIG_CMA. For example, the handset, TV, AVN in a car. Recently, Roman makes CMA usable for huge page allocation so users are continuously increased. > I previously suggested retrying the allocation in > __gup_longterm_locked() but you said "it cannot ensure that we > eventually get the non-CMA page". Please explain why? To avoid allocating a CMA page, the pcplist should be empty. Retry doesn't guarantee that we will hit the case that the pcplist is empty. It increases the probability for this case, but, I think that relying on the probability is not a good practice. > What about manually emptying the pcplists beforehand? It also increases the probability. schedule() or interrupt after emptying b= ut before the allocation could invalidate the effect. > Or byassing the pcplists for this caller and calling __rmqueue() directly= ? What this patch does is this one. > > @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone, > > > > do { > > page =3D NULL; > > - if (alloc_flags & ALLOC_HARDER) { > > + if (order > 0 && alloc_flags & ALLOC_HARDER) { > > page =3D __rmqueue_smallest(zone, order, MIGRATE_= HIGHATOMIC); > > if (page) > > trace_mm_page_alloc_zone_locked(page, ord= er, migratetype); > > What does this hunk do? MIGRATE_HIGHATOMIC is a reserved area for high order atomic allocation. This hunk makes that order-0 allocation skip this area. Thanks.