Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1264932pxp; Thu, 17 Mar 2022 06:20:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz882sb5ZGdJePCR7UjMZUWNN6p6kVTwTx6wce69xFZNCP3DP2vYCI01NHrpzJrbNpM9DB2 X-Received: by 2002:a17:907:6091:b0:6db:ef36:f0ff with SMTP id ht17-20020a170907609100b006dbef36f0ffmr4437545ejc.505.1647523232514; Thu, 17 Mar 2022 06:20:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647523232; cv=none; d=google.com; s=arc-20160816; b=aGIl0JWbbPUKpOk5T62cpqtsdFPHl4+lmn1dRLDivLJKJtceOg95fxVkxp+vjOOo7S EKwmuFkyi2RwEkHbZsJOF80GdCJeusC+5EIQGdVmwj3+XrS7uDRrj6T94JO7T09bV03L z+MJc3QeJhXzTmUd9WCFNvkv51ARLpE8Hx/9qe4mEKCTqiUAvtGskAXtWxAqzmloFPIO SEt5XQEEXfJG7PLGHFFHuVJPxeyvzrOFNnDi+ZS7Es7OW3e1GDwbU0fzHOJAk5UxSycR 3jT3GFG00lipiT7Vr+j6iOqkYMS/WVnrg26PhRNRk79DXqFOL6f4cLNCQKVLseJr6cTV 4f+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=oI5WSIRwZsbGZlVeSYcyiheGQZ2AXv4KqUCkckxJqyo=; b=YuffCTVbtuw1QZy+kBCjChG18wpU0CkmFy0U+5WmgPAl23knpwwWtyfYMu7havk/VO TeT73koUGZpyToUgWmtqdzmwcuqXqnuMyqyfLgMEyQ8Ko48zBpnNTGOfn8IwQXfDJ02/ 53hIeV6lE/11qyXs8On7FHWZ7EL6bGxDAWO1ujTx9itE7FQ/bNFUtalCpBge0jCvYOkl Z746NWyVwp1Bs6j9RW7rDTOmjB/2h8cIUmKSPqMzD5jD2AUstFhraIlM8N1D2NkVKLym +Qu+iUWRjf9g6bqQgjoNssozNMXFgrvVUiOwynk/ftRmWL+xENZyiWc1yGylUvP2PyIA INVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K25yauTZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wi4-20020a170906fd4400b006df6e027565si2813491ejb.137.2022.03.17.06.20.07; Thu, 17 Mar 2022 06:20:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K25yauTZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232771AbiCQK5C (ORCPT + 99 others); Thu, 17 Mar 2022 06:57:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232675AbiCQK5B (ORCPT ); Thu, 17 Mar 2022 06:57:01 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C822619E3BE for ; Thu, 17 Mar 2022 03:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647514544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oI5WSIRwZsbGZlVeSYcyiheGQZ2AXv4KqUCkckxJqyo=; b=K25yauTZw8HrU9LlvQ2uK/47p5t3Pif4+NtnLSUcqEQXQWvM0VU/UOKfEYX/rJpw8e1SuS WW1MZ0AdkyNkgOUHVE42f8HXuHEtJOM9nOmVG9ufXTv+OkgJ/rZHc0o5oL3HLkT6tdrHj8 ARfMAumER2ljJ3rV3Ahj7wIyRxBdbeg= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-187-rrFZxCQnOeuIJKqC-UkuGA-1; Thu, 17 Mar 2022 06:55:42 -0400 X-MC-Unique: rrFZxCQnOeuIJKqC-UkuGA-1 Received: by mail-wr1-f70.google.com with SMTP id t15-20020adfdc0f000000b001ef93643476so1432439wri.2 for ; Thu, 17 Mar 2022 03:55:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=oI5WSIRwZsbGZlVeSYcyiheGQZ2AXv4KqUCkckxJqyo=; b=ZmLIc2GEKsmSqaySEHT88n3KomCA+84q4bL3E9QtPeCL+f4t3g9C0Xae4guHWqd2wF t/BwGAQsDbra9Un8dClqX93v3uuYtqCzBPvchhL48oPaCE2oCjnpDSgyeJI5jzhTkJef mIsJpfIuLLjkMTs7x+R6s0c4vKYt9w1pXSwguVDoL+NYO++hN1sjH6PHGpZGi90XHCfm kPi9PTr9fLCxRAiB+w1m4m8svKwruliOc45u3U2fLY+uTA/r0IWVvOF44PeIPnA0VTJL wv9Aa5dMwydYiebotLrWqTta2pvIuSLNtb9ot+7+Iq4TDmIgPSHctvBBf0ZZOLzVxvQ4 hLng== X-Gm-Message-State: AOAM531zeQ8ewdp85e7cJiyv0tFMuWNBElgYmFjucV41ajyoHgpEKcNM 8DlKwVc0ieANCitT+pL31oo+Rszg1CQJxMoiunhM1q5BUtBVY/CId9kLb/zc95juXhHWkciqdSG a0zl+sNA22WLSjfTLCDt+NZ2m X-Received: by 2002:a5d:59a4:0:b0:203:914f:52fa with SMTP id p4-20020a5d59a4000000b00203914f52famr3417265wrr.257.1647514541648; Thu, 17 Mar 2022 03:55:41 -0700 (PDT) X-Received: by 2002:a5d:59a4:0:b0:203:914f:52fa with SMTP id p4-20020a5d59a4000000b00203914f52famr3417232wrr.257.1647514541257; Thu, 17 Mar 2022 03:55:41 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:20af:34be:985b:b6c8? ([2a09:80c0:192:0:20af:34be:985b:b6c8]) by smtp.gmail.com with ESMTPSA id r2-20020a0560001b8200b00203dffb9598sm3290679wru.86.2022.03.17.03.55.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Mar 2022 03:55:40 -0700 (PDT) Message-ID: <93480fb1-6992-b992-4c93-0046f3b92d7a@redhat.com> Date: Thu, 17 Mar 2022 11:55:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Content-Language: en-US To: Dong Aisheng , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, dongas86@gmail.com, shawnguo@kernel.org, linux-imx@nxp.com, akpm@linux-foundation.org, m.szyprowski@samsung.com, lecopzer.chen@mediatek.com, vbabka@suse.cz, stable@vger.kernel.org, shijie.qin@nxp.com References: <20220315144521.3810298-1-aisheng.dong@nxp.com> <20220315144521.3810298-2-aisheng.dong@nxp.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes In-Reply-To: <20220315144521.3810298-2-aisheng.dong@nxp.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15.03.22 15:45, Dong Aisheng wrote: > When there're multiple process allocing dma memory in parallel s/allocing/allocating/ > by calling dma_alloc_coherent(), it may fail sometimes as follows: > > Error log: > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16 > cma: number of available pages: > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+ > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+ > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages > > When issue happened, we saw there were still 33161 pages (129M) free CMA > memory and a lot available free slots for 148 pages in CMA bitmap that we > want to allocate. > > If dumping memory info, we found that there was also ~342M normal memory, > but only 1352K CMA memory left in buddy system while a lot of pageblocks > were isolated. s/If/When/ > > Memory info log: > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB > active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB > unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB > bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0 > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI) > 36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI) > 8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB > > The root cause of this issue is that since commit a4efc174b382 > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent > memory allocation. It's possible that the memory range process A trying > to alloc has already been isolated by the allocation of process B during > memory migration. > > The problem here is that the memory range isolated during one allocation > by start_isolate_page_range() could be much bigger than the real size we > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES. > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES > is big (e.g. 32M with max_order 14) and CMA memory is relatively small > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that > all CMA memory may have already been isolated by other processes when > one trying to allocate memory using dma_alloc_coherent(). > Since current CMA code will only scan one time of whole available CMA > memory, then dma_alloc_coherent() may easy fail due to contention with > other processes. > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY > error in case the target memory range may has been temporarily isolated > by others and released later. But you patch doesn't check for -EBUSY and instead might retry forever, on any allocation error, no? I'd really suggest letting alloc_contig_range() return -EAGAIN in case the isolation failed and handling -EAGAIN only in a special way instead. In addition, we might want to stop once we looped to often I assume. -- Thanks, David / dhildenb