Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp676394imm; Fri, 15 Jun 2018 04:36:09 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKjWSRTcDcxWiqUw0SHFYBE4PABKfQ+cHDWG200NVErNQ3Vm0f8lKvBQVWZkosypl9XERcp X-Received: by 2002:a63:6485:: with SMTP id y127-v6mr1260660pgb.126.1529062569622; Fri, 15 Jun 2018 04:36:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529062569; cv=none; d=google.com; s=arc-20160816; b=dlY4FkaIarmiynx1VZ/tbb+3MkFnKvU93BJy9WllKy3HN/4cLYVMVxw39E/5Zou5uo f85Nv9w/y8KUicm+WF/o+S2e814XC9oc8peNWEu2R4SJa3uPlVTQ9lTMkJb0O2+TF8jy WRZ3DpXvch+8VIQ9wXJCTHhEvPD6Umk1fewWZdx/V3u72wjrSVYDQGicb+rXM+tWsmvC HYUhgu8XA0VUpTVPpk/v8FlJJSSZyA1FzjTSF3G217DSDquiJypejOy/oO5aHbfCsvF2 lVntTgjUfq0ZKuzaVH+6A7l3z2RbwkUYEpC2tlh8XjWt2j6wLkHvQSkO4mkUnRZXLxSd gUOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=bBS/yhauK8b/UsyQ7a4nB0K4xlu4DpmsWs6DEZAQIMA=; b=HtLEg70Ke1nceLqcA3tN4LgazUrY5dD5BU5niKeWPwApNO75zGI/VUybS51yrEEtdB eN29httkdYMRMu3T7Ik6drKtp0UTiZTbDEs/8xByo8/grQMkk6XHccV1Z1jzB5S5kfMJ /OHP0ZJYdSfgMPt9C4inyACbviV7mt1bM8AtNfz8aw036kTcfcZfcK49JstctyTIgTaa dpV7vqYxHWkU2dc44gyjWX+pqWsjzNH2rQNHZAAAdSxXyxibpmWmR6HZLJN0JpPn3Vv1 OQdXyrb8eA8sYacqBwfV5Dmp2rrO6xV82gUNFkPd1vfJq6acGP7GjlOU/MOj+IlLn6H3 5tQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y36-v6si2487596pga.89.2018.06.15.04.35.41; Fri, 15 Jun 2018 04:36:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965807AbeFOLfO (ORCPT + 99 others); Fri, 15 Jun 2018 07:35:14 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:58552 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965736AbeFOLfN (ORCPT ); Fri, 15 Jun 2018 07:35:13 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 70E204023841; Fri, 15 Jun 2018 11:35:12 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F12E7111AF20; Fri, 15 Jun 2018 11:35:07 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id w5FBZ7m2016739; Fri, 15 Jun 2018 07:35:07 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id w5FBZ7Bg016735; Fri, 15 Jun 2018 07:35:07 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Fri, 15 Jun 2018 07:35:07 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Michal Hocko cc: jing xia , Mike Snitzer , agk@redhat.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dm bufio: Reduce dm_bufio_lock contention In-Reply-To: <20180615073201.GB24039@dhcp22.suse.cz> Message-ID: References: <1528790608-19557-1-git-send-email-jing.xia@unisoc.com> <20180612212007.GA22717@redhat.com> <20180614073153.GB9371@dhcp22.suse.cz> <20180615073201.GB24039@dhcp22.suse.cz> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 15 Jun 2018 11:35:12 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 15 Jun 2018 11:35:12 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mpatocka@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Jun 2018, Michal Hocko wrote: > On Thu 14-06-18 14:34:06, Mikulas Patocka wrote: > > > > > > On Thu, 14 Jun 2018, Michal Hocko wrote: > > > > > On Thu 14-06-18 15:18:58, jing xia wrote: > > > [...] > > > > PID: 22920 TASK: ffffffc0120f1a00 CPU: 1 COMMAND: "kworker/u8:2" > > > > #0 [ffffffc0282af3d0] __switch_to at ffffff8008085e48 > > > > #1 [ffffffc0282af3f0] __schedule at ffffff8008850cc8 > > > > #2 [ffffffc0282af450] schedule at ffffff8008850f4c > > > > #3 [ffffffc0282af470] schedule_timeout at ffffff8008853a0c > > > > #4 [ffffffc0282af520] schedule_timeout_uninterruptible at ffffff8008853aa8 > > > > #5 [ffffffc0282af530] wait_iff_congested at ffffff8008181b40 > > > > > > This trace doesn't provide the full picture unfortunately. Waiting in > > > the direct reclaim means that the underlying bdi is congested. The real > > > question is why it doesn't flush IO in time. > > > > I pointed this out two years ago and you just refused to fix it: > > http://lkml.iu.edu/hypermail/linux/kernel/1608.1/04507.html > > Let me be evil again and let me quote the old discussion: > : > I agree that mempool_alloc should _primarily_ sleep on their own > : > throttling mechanism. I am not questioning that. I am just saying that > : > the page allocator has its own throttling which it relies on and that > : > cannot be just ignored because that might have other undesirable side > : > effects. So if the right approach is really to never throttle certain > : > requests then we have to bail out from a congested nodes/zones as soon > : > as the congestion is detected. > : > > : > Now, I would like to see that something like that is _really_ necessary. > : > : Currently, it is not a problem - device mapper reports the device as > : congested only if the underlying physical disks are congested. > : > : But once we change it so that device mapper reports congested state on its > : own (when it has too many bios in progress), this starts being a problem. > > So has this changed since then? If yes then we can think of a proper > solution but that would require to actually describe why we see the > congestion, why it does help to wait on the caller rather than the > allocator etc... Device mapper doesn't report congested state - but something else does (perhaps the user inserted a cheap slow usb stick or sdcard?). And device mapper is just a victim of that. Why should device mapper sleep because some other random block device is congested? > Throwing statements like ... > > > I'm sure you'll come up with another creative excuse why GFP_NORETRY > > allocations need incur deliberate 100ms delays in block device drivers. > > ... is not really productive. I've tried to explain why I am not _sure_ what > possible side effects such a change might have and your hand waving > didn't really convince me. MD is not the only user of the page > allocator... > > E.g. why has 41c73a49df31 ("dm bufio: drop the lock when doing GFP_NOIO > allocation") even added GFP_NOIO request in the first place when you > keep retrying and sleep yourself? Because mempool uses it. Mempool uses allocations with "GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN". An so dm-bufio uses these flags too. dm-bufio is just a big mempool. If you argue that these flags are incorrect - then fix mempool_alloc. > The changelog only describes what but > doesn't explain why. Or did I misread the code and this is not the > allocation which is stalling due to congestion? Mikulas