Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2483453pxk; Mon, 14 Sep 2020 14:54:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzy9GbNtzlVHwooug/8C7U+Jflc/HbdPfJB34mu0gW496Xa984CVt9c8vMW1UGyI+GDXkov X-Received: by 2002:a17:906:a444:: with SMTP id cb4mr16469234ejb.432.1600120477956; Mon, 14 Sep 2020 14:54:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600120477; cv=none; d=google.com; s=arc-20160816; b=MgabyTbgbWzdo9pXW38s4mBhKochr4Aa0Pgck3boM7WjD1S9qMOcUE86Z5anLE0ae7 C/21B14FqlA5d5qOytvJulonG8MKhUPYfN8eQzZdFXm11ffm0aZ6vS7vIrbMn1RILf0c TPrNZzK/X1LnP+8RyeFTS6pxHpSqJR02kCTv9WlFE41E00ExRaPshvOTApoGvDFzUomP 8LTpyTNyCVCkwDJNyBbW3YkjgQSON73b82saOnh+qHDIYpt81jcUsrpD28b7HXxT2gYH gkiswVpN4cHnvysan3FezdINEFZ2T5h6LlvL0MB+rSTBUC5UUPeW2cvz1IWX9h05EaDT MHlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature; bh=McgfBzMTGCjtNIsTYVb4iriu19nAkMDbTGSM3XXPH4A=; b=zeAIogZLiGvXCUOYzI3u8RxoB8XK6h/OFW1+bWpHwvUBC1n+XDZPWZuRPfxOrB9VDp UZrOafmDllgGjAF6ugBSUBK94N6QGxsDPf6x9vMwRLblu+9uhD4+A70Xo8CmEOuRowvT FafGCM1dsksnNTYwBVcPI0E4aA+vdj3K9y3Mq33NHhWRdjsELZX6zrsHdcZsj7jm/iEW jMpgjFY0OcWarmNIEK/6kZ3TcGeFkhq7tg5MZ+P9TG8bMvwJMS+ZnEJOL10JQwbXQuKa KhdHs+brDLSPBys4QXauGO9KgmOjD7m9s0+Vtby/mLg0OB1HOsKCPFm3x7IXAj2Ni24h PaFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=vyG7iDRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e5si3788530ejk.275.2020.09.14.14.54.16; Mon, 14 Sep 2020 14:54:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=vyG7iDRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726166AbgINVxF (ORCPT + 99 others); Mon, 14 Sep 2020 17:53:05 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:16041 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726013AbgINVw5 (ORCPT ); Mon, 14 Sep 2020 17:52:57 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1600120376; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=McgfBzMTGCjtNIsTYVb4iriu19nAkMDbTGSM3XXPH4A=; b=vyG7iDRemRZFfgW5NReb1MgS+MSLmYqx4dsDfy4Ad28e7BgY22Puo0GybIIe3m9ZOtwmehlM fuHbjHbtFYp4ywUe55nqjMHtUgXI9/9GkcLqaI8CGh9MQKk03zWZA4p3/R3N97CojHe3pt0h HBgIGqJh9f38peRq+HRyIgOoP3M= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n04.prod.us-west-2.postgun.com with SMTP id 5f5fe6274ba82a82fd75a33c (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 14 Sep 2020 21:52:39 GMT Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 6F5E6C433F0; Mon, 14 Sep 2020 21:52:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cgoldswo) by smtp.codeaurora.org (Postfix) with ESMTPSA id D68DFC433CA; Mon, 14 Sep 2020 21:52:37 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 14 Sep 2020 14:52:37 -0700 From: Chris Goldsworthy To: David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, pratikp@codeaurora.org, pdaly@codeaurora.org, sudaraja@codeaurora.org, iamjoonsoo.kim@lge.com, linux-arm-msm-owner@vger.kernel.org, Vinayak Menon , linux-kernel-owner@vger.kernel.org Subject: Re: [PATCH v2] mm: cma: indefinitely retry allocations in cma_alloc In-Reply-To: <72ae0f361df527cf70946992e4ab1eb3@codeaurora.org> References: <06489716814387e7f147cf53d1b185a8@codeaurora.org> <1599851809-4342-1-git-send-email-cgoldswo@codeaurora.org> <010101747e998731-e49f209f-8232-4496-a9fc-2465334e70d7-000000@us-west-2.amazonses.com> <72ae0f361df527cf70946992e4ab1eb3@codeaurora.org> Message-ID: <57119844135c2b3ac5d075d077cd8c8e@codeaurora.org> X-Sender: cgoldswo@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-09-14 11:33, Chris Goldsworthy wrote: > On 2020-09-14 02:31, David Hildenbrand wrote: >> What about long-term pinnings? IIRC, that can happen easily e.g., with >> vfio (and I remember there is a way via vmsplice). >> >> Not convinced trying forever is a sane approach in the general case >> ... > > Hi David, > > I've botched the threading, so there are discussions with respect to > the previous patch-set that is missing on this thread, which I will > summarize below: > > V1: > [1] https://lkml.org/lkml/2020/8/5/1097 > [2] https://lkml.org/lkml/2020/8/6/1040 > [3] https://lkml.org/lkml/2020/8/11/893 > [4] https://lkml.org/lkml/2020/8/21/1490 > [5] https://lkml.org/lkml/2020/9/11/1072 > > [1] features version of the patch featured a finite number of retries, > which has been stable for our kernels. In [2], Andrew questioned > whether we could actually find a way of solving the problem on the > grounds that doing a finite number of retries doesn't actually fix the > problem (more importantly, in [4] Andrew indicated that he would > prefer not to merge the patch as it doesn't solve the issue). In [3], > I suggest one actual fix for this, which is to use > preempt_disable/enable() to prevent context switches from occurring > during the periods in copy_one_pte() and exit_mmap() (I forgot to > mention this case in the commit text) in which _refcount > _mapcount > for a page - you would also need to prevent interrupts from occurring > to if we were to fully prevent the issue from occurring. I think this > would be acceptable for the copy_one_pte() case, since there _refcount > > _mapcount for little time. For the exit_mmap() case, however, _refcount is greater than _mapcount whilst the page-tables are being torn down for a process - that could be too long for disabling preemption / interrupts. > > So, in [4], Andrew asks about two alternatives to see if they're > viable: (1) acquiring locks on the exit_mmap path and migration paths, > (2) retrying indefinitely. In [5], I discuss how using locks could > increase the time it takes to perform a CMA allocation, such that a > retry approach would avoid increased CMA allocation times. I'm also > uncertain about how the locking scheme could be implemented > effectively without introducing a new per-page lock that will be used > specifically to solve this issue, and I'm not sure this would be > accepted. > > We're fine with doing indefinite retries, on the grounds that if there > is some long-term pinning that occurs when alloc_contig_range returns > -EBUSY, that it should be debugged and fixed. Would it be possible to > make this infinite-retrying something that could be enabled or > disabled by a defconfig option? > > Thanks, > > Chris. Actually, if we were willing to have a defconfig option for enabling / disabling indefinite retries on the return of -EBUSY, would it be possibly to re-structure the patch to allow either (1) indefinite retrying, or (2) doing a fixed number of retires (as some people might want to tolerate CMA allocation failures in favor of making progress)? -- The Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project