Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp469728imm; Fri, 15 Jun 2018 00:34:15 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKBhBCwDpfL8KKRVKpTs4rPnaUdSilcnF968Y3UploGUP32UvNOtMleQsOevuSSTsjEnKQs X-Received: by 2002:aa7:85c9:: with SMTP id z9-v6mr638659pfn.55.1529048055142; Fri, 15 Jun 2018 00:34:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529048055; cv=none; d=google.com; s=arc-20160816; b=Ah/tW0hElB/ng6qOpxpL93M5asg69nlfzb6WGbxyWTSys7+1iOoNDrbr0IN+3PCsWm tANFaLp9TGRMd28/2XYQfy0TbeJZxb1/n6zMSO6sR0QZcK8ROnJGt3maXTqeDG8MHJDS H+Mw0bZtM8o3vT6GKp4LlmwKpoAEanFGUDVxhoRN34jBka802RF6T6pmkrcDCeh1Go0E UgGsHxDhD6LApnNP0Gz7yv6scD85o4qdj9swz1sc1tI8RlVVd1Hjokwikmd04JXxlqPu J3fXrsVJ35mVDeBWdmiaqibgPjv72JsDIRjjwj04KE//tAv20QxPkcGUut8tSTLa/Pqt vktg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=lFHq+UeJjdP5j5B0GSLwqzKLobjVvQ9BmEIy+KjeZV0=; b=nn7+iyyrvid8VvExzv/VT5Jt68FVW5pXDIz7SesXi150DCW536CIGGfMyp6/xb58GQ sA3I8HO6CNk8Dh9hPeyybjOSNoq6H9cgLkp4z2VQQR44l0wYS6GAkxGz9F3qTvBMp3Zw P9YrdNmJ2C0oTYDuylIKZVaYGK1oF+1OiMNN2PTw8QPN05+JUQcxGglggghSwKbFyC0i rejtikGJeqXg8Q2wfOwhl/RkaIbc0gmBM53zjn9ObZFjoPYlw9QmaxzfNgWFteyZvaZM VdKyyAn5G7+3jpKQBau2PIwY2LaOuB5s/BW3mFvCikjnnl36ZVsBFiehm+6NJpBtQR2b i1AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8-v6si6979304plk.261.2018.06.15.00.34.00; Fri, 15 Jun 2018 00:34:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755908AbeFOHcE (ORCPT + 99 others); Fri, 15 Jun 2018 03:32:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:35686 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755574AbeFOHcD (ORCPT ); Fri, 15 Jun 2018 03:32:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 1D97AAED1; Fri, 15 Jun 2018 07:32:02 +0000 (UTC) Date: Fri, 15 Jun 2018 09:32:01 +0200 From: Michal Hocko To: Mikulas Patocka Cc: jing xia , Mike Snitzer , agk@redhat.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dm bufio: Reduce dm_bufio_lock contention Message-ID: <20180615073201.GB24039@dhcp22.suse.cz> References: <1528790608-19557-1-git-send-email-jing.xia@unisoc.com> <20180612212007.GA22717@redhat.com> <20180614073153.GB9371@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 14-06-18 14:34:06, Mikulas Patocka wrote: > > > On Thu, 14 Jun 2018, Michal Hocko wrote: > > > On Thu 14-06-18 15:18:58, jing xia wrote: > > [...] > > > PID: 22920 TASK: ffffffc0120f1a00 CPU: 1 COMMAND: "kworker/u8:2" > > > #0 [ffffffc0282af3d0] __switch_to at ffffff8008085e48 > > > #1 [ffffffc0282af3f0] __schedule at ffffff8008850cc8 > > > #2 [ffffffc0282af450] schedule at ffffff8008850f4c > > > #3 [ffffffc0282af470] schedule_timeout at ffffff8008853a0c > > > #4 [ffffffc0282af520] schedule_timeout_uninterruptible at ffffff8008853aa8 > > > #5 [ffffffc0282af530] wait_iff_congested at ffffff8008181b40 > > > > This trace doesn't provide the full picture unfortunately. Waiting in > > the direct reclaim means that the underlying bdi is congested. The real > > question is why it doesn't flush IO in time. > > I pointed this out two years ago and you just refused to fix it: > http://lkml.iu.edu/hypermail/linux/kernel/1608.1/04507.html Let me be evil again and let me quote the old discussion: : > I agree that mempool_alloc should _primarily_ sleep on their own : > throttling mechanism. I am not questioning that. I am just saying that : > the page allocator has its own throttling which it relies on and that : > cannot be just ignored because that might have other undesirable side : > effects. So if the right approach is really to never throttle certain : > requests then we have to bail out from a congested nodes/zones as soon : > as the congestion is detected. : > : > Now, I would like to see that something like that is _really_ necessary. : : Currently, it is not a problem - device mapper reports the device as : congested only if the underlying physical disks are congested. : : But once we change it so that device mapper reports congested state on its : own (when it has too many bios in progress), this starts being a problem. So has this changed since then? If yes then we can think of a proper solution but that would require to actually describe why we see the congestion, why it does help to wait on the caller rather than the allocator etc... Throwing statements like ... > I'm sure you'll come up with another creative excuse why GFP_NORETRY > allocations need incur deliberate 100ms delays in block device drivers. ... is not really productive. I've tried to explain why I am not _sure_ what possible side effects such a change might have and your hand waving didn't really convince me. MD is not the only user of the page allocator... E.g. why has 41c73a49df31 ("dm bufio: drop the lock when doing GFP_NOIO allocation") even added GFP_NOIO request in the first place when you keep retrying and sleep yourself? The changelog only describes what but doesn't explain why. Or did I misread the code and this is not the allocation which is stalling due to congestion? -- Michal Hocko SUSE Labs