Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1434457pxb; Thu, 4 Mar 2021 11:09:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJzV+LjfS9C/3lhXrm01IBgQxRt42LsEkBqVm5yaOwFRfX12UN2+fbFuXrMJ3wEjRGZwLEzd X-Received: by 2002:a17:906:b747:: with SMTP id fx7mr6120063ejb.474.1614884953936; Thu, 04 Mar 2021 11:09:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614884953; cv=none; d=google.com; s=arc-20160816; b=GHMRutmeEK+mu/K+wWZRi86i6RAzZ2MmQkyfuqXVU0p7/rrJVY3vvGjc0MJVfh49Qt vPTEYDjg9gAH1pA9xanWySvzX/W5tKlnHffSg+tX8/oYNc/a8jmRqnLcgzuuLnMeBVZx msm0T9KDE96a+NrMFbT/zdFWvBxu4mrrQInMu8rst60iBmt8uNG7oHltrLppHDOeY0w9 lZipNH56ihaFceZ9i31wT9c8ceKib/ZTodDkj8KcS+siVR8qwRIHEb3jaOBlqZPxljk5 f7M05X6dFD+ywjQpVi3NLzFDAblpcwOHE1WXHhRTVTOQePb0NubykIAuzuueyhpfRFxJ 8+QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=SvoG8PH6dFacNFIYLiSSp4aCzcuSGoW8XP7i139LeKE=; b=LFPNQbLiObyhlejojuahhPwK+h6eTEwNe5JwR2EJX/blkhlUx13lfFb9lH6tiTJxJs eCp65pE/nptRonaPwa6znVRZPmRX2nL+6g3geNDUIESUzolsEpzv045sJApvRU63fr8F jDGOseeLu/g6utsQ4WvBYN9Nv5oMyPZwEEEW2UZHl6i7DnwdLo0gudKxXvXjkl8tRwnX FyzeOBAK+i+Us0pDP4xJQ/oyJ0xMe6JqulaKPbsJvYvREx0W5PDz1yCJn8+YgZQEaHDH 5OWCE8dYBvSI5PeWbU0G7L3pft41v+6EHdQjVEqsB/UDyXo/WjmUTAYI3QUqyoDW6OES L9pA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RhMBTJO7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o11si29718eju.241.2021.03.04.11.08.50; Thu, 04 Mar 2021 11:09:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RhMBTJO7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233710AbhCDSM2 (ORCPT + 99 others); Thu, 4 Mar 2021 13:12:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232910AbhCDSMT (ORCPT ); Thu, 4 Mar 2021 13:12:19 -0500 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44E81C061574 for ; Thu, 4 Mar 2021 10:11:39 -0800 (PST) Received: by mail-pf1-x430.google.com with SMTP id t29so19504635pfg.11 for ; Thu, 04 Mar 2021 10:11:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SvoG8PH6dFacNFIYLiSSp4aCzcuSGoW8XP7i139LeKE=; b=RhMBTJO7wwKdRPvYiCjrVMyO9I4FmWrIsLlUGIu9mc/tHqdN9miLpof0Gu6ZHfySPF +RaVuaa8vvwjEp5Be80jlEiGIkkdnrc0WxpG4qNHM3i1R8tqMZse2lMtQ78bZGJxAWHX 62868iG/7zBLZNvw1/7izUu7tssTkR7DIIUhCpFLOR49XD0J4WLmlOxAoUHRw9H09WKH PryaSYvAb+Qkj46mMbHj3fGhhBRWV68sUNrbauUM9QN7FlnH/yGq9528/pzWffY4w1sx FFexn3oqShC2mxFpQS6bYkDUALT57scwUxA4bfFvnO0MwA3zdkXxl7NZLEhRXGjZPyme nZRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=SvoG8PH6dFacNFIYLiSSp4aCzcuSGoW8XP7i139LeKE=; b=OznLLTaUxyXJCLjWawRzd1UnqAV+cYotgk3GR2do2LqPk/PGKey4+n8LidWSMvu/dK 9Hhy0wYlzhMvHdwyHVzVpGocksMcedRUkkhcU4odsnAm3u5wLgSmF/33oBPzkEykE4mq k1LbdPn95XOAFvHFuD3CFJCCUVpiWob6IybiuHm7mKiBYF4actS6yo5W9d0ysAZPM54q NbY8b4NOEw8YNYhEIs6LOQ9afzX/hKiW5hLNV7uzmzTgdeWy/pV+0nLFy+rh+BjCXNkn 6+k7IC07JlbqK6aAA8+c2t/itA56/SQXFbdGNbgc9uUWBtMZ/jGcdeEYogeo9lsEaebI J5Kg== X-Gm-Message-State: AOAM5336EEfxCOhtKU6ZhI+vBfS7zLA8FL14hM3/2X+k9W4V3xsnIvNk T3qd7EPIPCbPdtnL/XrK0szTf/umEHA= X-Received: by 2002:a63:c343:: with SMTP id e3mr4746948pgd.8.1614881498786; Thu, 04 Mar 2021 10:11:38 -0800 (PST) Received: from google.com ([2620:15c:211:201:edb1:8010:5c27:a8cc]) by smtp.gmail.com with ESMTPSA id j9sm10398865pjn.32.2021.03.04.10.11.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 10:11:37 -0800 (PST) Sender: Minchan Kim Date: Thu, 4 Mar 2021 10:11:35 -0800 From: Minchan Kim To: David Hildenbrand Cc: Michal Hocko , Andrew Morton , linux-mm , LKML , joaodias@google.com Subject: Re: [PATCH] mm: be more verbose for alloc_contig_range faliures Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 04, 2021 at 06:23:09PM +0100, David Hildenbrand wrote: > > > You want to debug something, so you try triggering it and capturing debug > > > data. There are not that many alloc_contig_range() users such that this > > > would really be an issue to isolate ... > > > > cma_alloc uses alloc_contig_range and cma_alloc has lots of users. > > Even, it is expoerted by dmabuf so any userspace would trigger the > > allocation by their own. Some of them could be tolerant for the failure, > > rest of them could be critical. We should't expect it by limited kernel > > usecase. > > Assume you are debugging allocation failures. You either collect the data > yourself or ask someone to send you that output. You care about any > alloc_contig_range() allocation failures that shouldn't happen, don't you? > > > > > > > > > Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is > > > problematic (putting aside NORETRY logic and similar aside). So any such > > > page you hit is worth investigating and, therefore, worth getting logged for > > > debugging purposes. > > > > If you believe the every alloc_contig_range failure is problematic > > Every one where we should have guarantees I guess: ZONE_MOVABLE or > MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees. Indeed. > > > and there is no such realy example I menionted above in the world, > > I am happy to put this chunk to support dynamic debugging. > > Okay? > > > > +#if defined(CONFIG_DYNAMIC_DEBUG) || \ > > + (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) > > +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state, > > + DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); > > +int alloc_contig_ratelimit(void) > > +{ > > + return __ratelimit(&alloc_contig_ratelimit_state); > > +} > > + > > ^ do we need ratelimiting with dynamic debugging enabled? Main argument was debug message flooding. Even though we play with dynamic debugging, the issue never disappear. > > > +void dump_migrate_failure_pages(struct list_head *page_list) > > +{ > > + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, > > + "migrate failure"); > > + if (DYNAMIC_DEBUG_BRANCH(descriptor) && > > + alloc_contig_ratelimit()) { > > + struct page *page; > > + > > + WARN(1, "failed callstack"); > > + list_for_each_entry(page, page_list, lru) > > + dump_page(page, "migration failure"); > > Are all pages on the list guaranteed to be problematic, or only the first > entry? I assume all. All. > > > + } > > +} > > +#else > > +static inline void dump_migrate_failure_pages(struct list_head *page_list) > > +{ > > +} > > +#endif > > + > > /* [start, end) must belong to a single zone. */ > > static int __alloc_contig_migrate_range(struct compact_control *cc, > > unsigned long start, unsigned long end) > > @@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, > > NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE); > > } > > if (ret < 0) { > > + dump_migrate_failure_pages(&cc->migratepages); > > putback_movable_pages(&cc->migratepages); > > return ret; > > } > > > > > > If that's the way dynamic debugging is configured/enabled (still have to > look into it) - yes, that goes into the right direction. As I said above, > you should dump only where we have some kind of guarantees I assume. Sure, let me wait for your review before sending next revision. Thanks for the review!