Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1157846imm; Mon, 9 Jul 2018 18:47:58 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdTaRVRXeJfznlmX+Uvp7DwUttWlDouW0R1j33pCBGymhqkGcSxZI5+u3NgskJEixYGZAZO X-Received: by 2002:a63:1d5e:: with SMTP id d30-v6mr19915011pgm.12.1531187278324; Mon, 09 Jul 2018 18:47:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531187278; cv=none; d=google.com; s=arc-20160816; b=Z1BkO9SOTmvArTv1HgLLJPI/c8ZOVJcpsXVKU+TmY8XwrSEnugYcVVcVBAN1yVV8N2 butVC7VyyK1QOAzSOoIThg7vnabLM6rOBbBGk5YQFFMqDjOeII2by7fL+4/+YEESVtl9 zfFFfr9Eo0K1EHxJ7A19iauFF2cGwSCGn5b8WAiWgqi4V6AHGkvpC9MA8hFZIUW+RcrE EiYloUurq3uR+WKi1icMt8c/6btvvP66lhzbPtm/hRuOHaqyHjP1jVS5XBNIGaRDQ2LV 5kJqqE+K4PaAptOQBdMyojb40DuRCZEGUcB0k0u/0W1l+6Id5uIHgyKe2pTe9xRTIbko 33WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=1YZgcA3XDcKjrlD21aC/JGlj4nJhJi7S7GYWXVRtIvo=; b=FTDpX0CAGkVRARhHxqtD7zHCb0QBBaPIen2zbbZ5a2IvpaiB6qU/ajBR84eug39BZ4 pz70HfZm2A7KT+O/Q8r2ltRKHM5hSxSbW9Vqd5lBtT5hXP4Zl4kN8zjBc5Q9rEEfBkcO 5umpVFIrR7/t0SkwlXBClRrMGT2bPlaifH+cQ1waFqG9BJD+mUrF3ISw5kSwipdlKbjx FnoEYlwp9jqEU69X7iOqUw1ZTQCMPBBkPNCh9lMgnZIIPn7Ajce8L9awP/0VfTZyr71H 3NhwnkzEARCvBMmuj342p2yQJFh4ofZEXd60ovt75Bdpzq5DHQh12qwdW3XZsVxpx5Xi 04VQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s64-v6si16170121pfg.175.2018.07.09.18.47.43; Mon, 09 Jul 2018 18:47:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754554AbeGJBrB (ORCPT + 99 others); Mon, 9 Jul 2018 21:47:01 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43788 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752112AbeGJBq4 (ORCPT ); Mon, 9 Jul 2018 21:46:56 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C2A574001388; Tue, 10 Jul 2018 01:46:55 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 981B42156889; Tue, 10 Jul 2018 01:46:55 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id w6A1kta3011752; Mon, 9 Jul 2018 21:46:55 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id w6A1ksf0011700; Mon, 9 Jul 2018 21:46:55 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 9 Jul 2018 21:46:54 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: John Stoffel cc: Jens Axboe , linux-mm@kvack.org, linux-block@vger.kernel.org, Mike Snitzer , linux-kernel@vger.kernel.org, mhocko@kernel.org, dm-devel@redhat.com, xia , "Alasdair G. Kergon" Subject: Re: [dm-devel] [PATCH] mm: set PF_LESS_THROTTLE when allocating memory for i/o In-Reply-To: <23358.29981.262730.50194@quad.stoffel.home> Message-ID: References: <23358.29981.262730.50194@quad.stoffel.home> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 10 Jul 2018 01:46:55 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 10 Jul 2018 01:46:55 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mpatocka@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 5 Jul 2018, John Stoffel wrote: > >>>>> "Mikulas" == Mikulas Patocka writes: > > Mikulas> It has been noticed that congestion throttling can slow down > Mikulas> allocations path that participate in the IO and thus help the > Mikulas> memory reclaim. Stalling those allocation is therefore not > Mikulas> productive. Moreover mempool allocator and md variants of the > Mikulas> same already implement their own throttling which has a > Mikulas> better way to be feedback driven. Stalling at the page > Mikulas> allocator is therefore even counterproductive. > > Can you show numbers for this claim? This is bug report where this problem is reported: https://www.redhat.com/archives/dm-devel/2018-June/msg00161.html A process takes dm-bufio lock, allocates memory and memory allocation sleeps in wait_iff_congested. kswapd is attempting to take the dm-bufio lock as well, so sleeping in wait_iff_congested delays kswapd. Delaying kswapd prevents the system from reclaiming memory. > I would think that throttling needs to be done as close to the disk as > possible, and propogate back up the layers to have this all work well, > so that faster devices (and the layers stacked on them) will work better > without stalling on a slow device. The drivers that participate in memory reclaim shouldn't be throttled at all. If you throttle them, you slow down reclaim - and if you slow down reclaim, the congestion will not be resolved. > Mikulas> PF_LESS_THROTTLE is a task flag denoting allocation context that is > Mikulas> participating in the memory reclaim which fits into these IO paths > Mikulas> model, so use the flag and make the page allocator aware they are > Mikulas> special and they do not really want any dirty data throttling. > > Mikulas> The throttling causes stalls on Android - it uses the dm-verity driver > Mikulas> that uses dm-bufio. Allocations in dm-bufio were observed to sleep in > Mikulas> wait_iff_congested repeatedly. > > Mikulas> Signed-off-by: Mikulas Patocka > Mikulas> Acked-by: Michal Hocko # mempool_alloc and bvec_alloc > Mikulas> Cc: stable@vger.kernel.org > > Mikulas> --- > Mikulas> block/bio.c | 4 ++++ > Mikulas> drivers/md/dm-bufio.c | 14 +++++++++++--- > Mikulas> drivers/md/dm-crypt.c | 8 ++++++++ > Mikulas> drivers/md/dm-integrity.c | 4 ++++ > Mikulas> drivers/md/dm-kcopyd.c | 3 +++ > Mikulas> drivers/md/dm-verity-target.c | 4 ++++ > Mikulas> drivers/md/dm-writecache.c | 4 ++++ > Mikulas> mm/mempool.c | 4 ++++ > Mikulas> 8 files changed, 42 insertions(+), 3 deletions(-) > > Mikulas> Index: linux-2.6/mm/mempool.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/mm/mempool.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/mm/mempool.c 2018-06-29 03:47:16.270000000 +0200 > Mikulas> @@ -369,6 +369,7 @@ void *mempool_alloc(mempool_t *pool, gfp > Mikulas> unsigned long flags; > Mikulas> wait_queue_entry_t wait; > Mikulas> gfp_t gfp_temp; > Mikulas> + unsigned old_flags; > > Mikulas> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO); > Mikulas> might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > Mikulas> @@ -381,7 +382,10 @@ void *mempool_alloc(mempool_t *pool, gfp > > Mikulas> repeat_alloc: > > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> element = pool->alloc(gfp_temp, pool->pool_data); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> if (likely(element != NULL)) > Mikulas> return element; > > Mikulas> Index: linux-2.6/block/bio.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/block/bio.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/block/bio.c 2018-06-29 03:47:16.270000000 +0200 > Mikulas> @@ -217,6 +217,7 @@ fallback: > Mikulas> } else { > Mikulas> struct biovec_slab *bvs = bvec_slabs + *idx; > Mikulas> gfp_t __gfp_mask = gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO); > Mikulas> + unsigned old_flags; > > Mikulas> /* > Mikulas> * Make this allocation restricted and don't dump info on > Mikulas> @@ -229,7 +230,10 @@ fallback: > Mikulas> * Try a slab allocation. If this fails and __GFP_DIRECT_RECLAIM > Mikulas> * is set, retry with the 1-entry mempool > Mikulas> */ > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> bvl = kmem_cache_alloc(bvs->slab, __gfp_mask); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> if (unlikely(!bvl && (gfp_mask & __GFP_DIRECT_RECLAIM))) { > Mikulas> *idx = BVEC_POOL_MAX; > Mikulas> goto fallback; > Mikulas> Index: linux-2.6/drivers/md/dm-bufio.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-bufio.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-bufio.c 2018-06-29 03:47:16.270000000 +0200 > Mikulas> @@ -356,6 +356,7 @@ static void __cache_size_refresh(void) > Mikulas> static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask, > Mikulas> unsigned char *data_mode) > Mikulas> { > Mikulas> + void *ptr; > Mikulas> if (unlikely(c->slab_cache != NULL)) { > Mikulas> *data_mode = DATA_MODE_SLAB; > Mikulas> return kmem_cache_alloc(c->slab_cache, gfp_mask); > Mikulas> @@ -363,9 +364,14 @@ static void *alloc_buffer_data(struct dm > > Mikulas> if (c->block_size <= KMALLOC_MAX_SIZE && > Mikulas> gfp_mask & __GFP_NORETRY) { > Mikulas> + unsigned old_flags; > Mikulas> *data_mode = DATA_MODE_GET_FREE_PAGES; > Mikulas> - return (void *)__get_free_pages(gfp_mask, > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> + ptr = (void *)__get_free_pages(gfp_mask, > c-> sectors_per_block_bits - (PAGE_SHIFT - SECTOR_SHIFT)); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> + return ptr; > Mikulas> } > > Mikulas> *data_mode = DATA_MODE_VMALLOC; > Mikulas> @@ -381,8 +387,10 @@ static void *alloc_buffer_data(struct dm > Mikulas> */ > Mikulas> if (gfp_mask & __GFP_NORETRY) { > Mikulas> unsigned noio_flag = memalloc_noio_save(); > Mikulas> - void *ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); > Mikulas> - > Mikulas> + unsigned old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> + ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> memalloc_noio_restore(noio_flag); > Mikulas> return ptr; > Mikulas> } > Mikulas> Index: linux-2.6/drivers/md/dm-integrity.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-integrity.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-integrity.c 2018-06-29 03:47:16.270000000 +0200 > Mikulas> @@ -1318,6 +1318,7 @@ static void integrity_metadata(struct wo > Mikulas> int r; > > Mikulas> if (ic->internal_hash) { > Mikulas> + unsigned old_flags; > Mikulas> struct bvec_iter iter; > Mikulas> struct bio_vec bv; > Mikulas> unsigned digest_size = crypto_shash_digestsize(ic->internal_hash); > Mikulas> @@ -1331,8 +1332,11 @@ static void integrity_metadata(struct wo > Mikulas> if (unlikely(ic->mode == 'R')) > Mikulas> goto skip_io; > > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space, > Mikulas> GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> if (!checksums) > Mikulas> checksums = checksums_onstack; > > Mikulas> Index: linux-2.6/drivers/md/dm-kcopyd.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-kcopyd.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-kcopyd.c 2018-06-29 03:47:16.270000000 +0200 > Mikulas> @@ -245,7 +245,10 @@ static int kcopyd_get_pages(struct dm_kc > Mikulas> *pages = NULL; > > Mikulas> do { > Mikulas> + unsigned old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> pl = alloc_pl(__GFP_NOWARN | __GFP_NORETRY | __GFP_KSWAPD_RECLAIM); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> if (unlikely(!pl)) { > Mikulas> /* Use reserved pages */ > Mikulas> pl = kc->pages; > Mikulas> Index: linux-2.6/drivers/md/dm-verity-target.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-verity-target.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-verity-target.c 2018-06-29 03:47:16.280000000 +0200 > Mikulas> @@ -596,9 +596,13 @@ no_prefetch_cluster: > Mikulas> static void verity_submit_prefetch(struct dm_verity *v, struct dm_verity_io *io) > Mikulas> { > Mikulas> struct dm_verity_prefetch_work *pw; > Mikulas> + unsigned old_flags; > > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> pw = kmalloc(sizeof(struct dm_verity_prefetch_work), > Mikulas> GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > > Mikulas> if (!pw) > Mikulas> return; > Mikulas> Index: linux-2.6/drivers/md/dm-writecache.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-writecache.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-writecache.c 2018-06-29 03:47:16.280000000 +0200 > Mikulas> @@ -1473,6 +1473,7 @@ static void __writecache_writeback_pmem( > Mikulas> unsigned max_pages; > > Mikulas> while (wbl->size) { > Mikulas> + unsigned old_flags; > wbl-> size--; > Mikulas> e = container_of(wbl->list.prev, struct wc_entry, lru); > Mikulas> list_del(&e->lru); > Mikulas> @@ -1486,6 +1487,8 @@ static void __writecache_writeback_pmem( > Mikulas> bio_set_dev(&wb->bio, wc->dev->bdev); > wb-> bio.bi_iter.bi_sector = read_original_sector(wc, e); > wb-> page_offset = PAGE_SIZE; > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> if (max_pages <= WB_LIST_INLINE || > Mikulas> unlikely(!(wb->wc_list = kmalloc(max_pages * sizeof(struct wc_entry *), > Mikulas> GFP_NOIO | __GFP_NORETRY | > Mikulas> @@ -1493,6 +1496,7 @@ static void __writecache_writeback_pmem( > wb-> wc_list = wb->wc_list_inline; > Mikulas> max_pages = WB_LIST_INLINE; > Mikulas> } > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > > Mikulas> BUG_ON(!wc_add_block(wb, e, GFP_NOIO)); > > Mikulas> Index: linux-2.6/drivers/md/dm-crypt.c > Mikulas> =================================================================== > Mikulas> --- linux-2.6.orig/drivers/md/dm-crypt.c 2018-06-29 03:47:16.290000000 +0200 > Mikulas> +++ linux-2.6/drivers/md/dm-crypt.c 2018-06-29 03:47:16.280000000 +0200 > Mikulas> @@ -2181,12 +2181,16 @@ static void *crypt_page_alloc(gfp_t gfp_ > Mikulas> { > Mikulas> struct crypt_config *cc = pool_data; > Mikulas> struct page *page; > Mikulas> + unsigned old_flags; > > Mikulas> if (unlikely(percpu_counter_compare(&cc->n_allocated_pages, dm_crypt_pages_per_client) >= 0) && > Mikulas> likely(gfp_mask & __GFP_NORETRY)) > Mikulas> return NULL; > > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> page = alloc_page(gfp_mask); > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> if (likely(page != NULL)) > Mikulas> percpu_counter_add(&cc->n_allocated_pages, 1); > > Mikulas> @@ -2893,7 +2897,10 @@ static int crypt_map(struct dm_target *t > > Mikulas> if (cc->on_disk_tag_size) { > Mikulas> unsigned tag_len = cc->on_disk_tag_size * (bio_sectors(bio) >> cc->sector_shift); > Mikulas> + unsigned old_flags; > > Mikulas> + old_flags = current->flags & PF_LESS_THROTTLE; > Mikulas> + current->flags |= PF_LESS_THROTTLE; > Mikulas> if (unlikely(tag_len > KMALLOC_MAX_SIZE) || > Mikulas> unlikely(!(io->integrity_metadata = kmalloc(tag_len, > Mikulas> GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN)))) { > Mikulas> @@ -2902,6 +2909,7 @@ static int crypt_map(struct dm_target *t > io-> integrity_metadata = mempool_alloc(&cc->tag_pool, GFP_NOIO); > io-> integrity_metadata_from_pool = true; > Mikulas> } > Mikulas> + current_restore_flags(old_flags, PF_LESS_THROTTLE); > Mikulas> } > > Mikulas> if (crypt_integrity_aead(cc)) > > Mikulas> -- > Mikulas> dm-devel mailing list > Mikulas> dm-devel@redhat.com > Mikulas> https://www.redhat.com/mailman/listinfo/dm-devel >