Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756629AbbLBLAW (ORCPT ); Wed, 2 Dec 2015 06:00:22 -0500 Received: from outbound-smtp08.blacknight.com ([46.22.139.13]:32862 "EHLO outbound-smtp08.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755220AbbLBLAU (ORCPT ); Wed, 2 Dec 2015 06:00:20 -0500 Date: Wed, 2 Dec 2015 11:00:09 +0000 From: Mel Gorman To: "Huang, Ying" Cc: lkp@01.org, LKML , Andrew Morton , Rik van Riel , Vitaly Wool , David Rientjes , Christoph Lameter , Johannes Weiner , Michal Hocko , Vlastimil Babka , Will Deacon , Linus Torvalds Subject: Re: [lkp] [mm, page_alloc] d0164adc89: -100.0% fsmark.app_overhead Message-ID: <20151202110009.GA2015@techsingularity.net> References: <87ziy1a89f.fsf@yhuang-dev.intel.com> <20151126132511.GG14880@techsingularity.net> <87oaegmeer.fsf@yhuang-dev.intel.com> <20151127100647.GH14880@techsingularity.net> <87h9k4kzcv.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <87h9k4kzcv.fsf@yhuang-dev.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6198 Lines: 121 On Mon, Nov 30, 2015 at 10:14:24AM +0800, Huang, Ying wrote: > > There is no reference to OOM possibility in the email that I can see. Can > > you give examples of the OOM messages that shows the problem sites? It was > > suspected that there may be some callers that were accidentally depending > > on access to emergency reserves. If so, either they need to be fixed (if > > the case is extremely rare) or a small reserve will have to be created > > for callers that are not high priority but still cannot reclaim. > > > > Note that I'm travelling a lot over the next two weeks so I'll be slow to > > respond but I will get to it. > > Here is the kernel log, the full dmesg is attached too. The OOM > occurs during fsmark testing. > > Best Regards, > Huang, Ying > > [ 31.453514] kworker/u4:0: page allocation failure: order:0, mode:0x2200000 > [ 31.463570] CPU: 0 PID: 6 Comm: kworker/u4:0 Not tainted 4.3.0-08056-gd0164ad #1 > [ 31.466115] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 > [ 31.477146] Workqueue: writeback wb_workfn (flush-253:0) > [ 31.481450] 0000000000000000 ffff880035ac75e8 ffffffff8140a142 0000000002200000 > [ 31.492582] ffff880035ac7670 ffffffff8117117b ffff880037586b28 ffff880000000040 > [ 31.507631] ffff88003523b270 0000000000000040 ffff880035abc800 ffffffff00000000 This is an allocation failure and is not a triggering of the OOM killer so the severity is reduced but it still looks like a bug in the driver. Looking at the history and the discussion, it appears to me that __GFP_HIGH was cleared from the allocation site by accident. I strongly suspect that Will Deacon thought __GFP_HIGH was related to highmem instead of being related to high priority. Will, can you review the following patch please? Ying, can you test please? ---8<--- virtio: allow vring descriptor allocations to use high-priority reserves Commit b92b1b89a33c ("virtio: force vring descriptors to be allocated from lowmem") prevented the inappropriate use of highmem pages but it also masked out __GFP_HIGH. __GFP_HIGH is used for GFP_ATOMIC allocation requests to grant access to a small emergency reserve. It's intended for user by callers that have no alternative. Ying Huang reported the following page allocation failure warning after commit d0164adc89f6 ("mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd") kworker/u4:0: page allocation failure: order:0, mode:0x2200000 CPU: 0 PID: 6 Comm: kworker/u4:0 Not tainted 4.3.0-08056-gd0164ad #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 Workqueue: writeback wb_workfn (flush-253:0) 0000000000000000 ffff880035ac75e8 ffffffff8140a142 0000000002200000 ffff880035ac7670 ffffffff8117117b ffff880037586b28 ffff880000000040 ffff88003523b270 0000000000000040 ffff880035abc800 ffffffff00000000 Call Trace: [] dump_stack+0x4b/0x69 [] warn_alloc_failed+0xdb/0x140 [] __alloc_pages_nodemask+0x874/0xa60 [] alloc_pages_current+0x92/0x120 [] new_slab+0x3d4/0x480 [] __slab_alloc+0x376/0x470 [] ? alloc_indirect+0x1d/0x50 [] ? xfs_submit_ioend_bio+0x31/0x40 [] ? alloc_indirect+0x1d/0x50 [] __kmalloc+0x20d/0x260 [] alloc_indirect+0x1d/0x50 [] virtqueue_add_sgs+0x2cc/0x3a0 [] __virtblk_add_req+0xb0/0x1f0 [] ? pagevec_lookup_tag+0x21/0x30 [] ? blk_rq_map_sg+0x1e2/0x4f0 [] virtio_queue_rq+0x112/0x280 [] __blk_mq_run_hw_queue+0x1d7/0x370 [] blk_mq_run_hw_queue+0x9f/0xc0 [] blk_mq_insert_requests+0xfa/0x1a0 [] blk_mq_flush_plug_list+0x123/0x140 [] blk_flush_plug_list+0xa7/0x200 [] blk_finish_plug+0x29/0x40 [] wb_writeback+0x185/0x2c0 [] wb_workfn+0xf5/0x390 [] process_one_work+0x157/0x420 [] worker_thread+0x69/0x4a0 [] ? rescuer_thread+0x380/0x380 [] kthread+0xef/0x110 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Commit d0164adc89f6 ("mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd") is stricter about reserves. It distinguishes between callers that are high-priority with access to emergency reserves and callers that simply do not want to sleep and have recovery options. The reported allocation failure is truly atomic with no recovery options that appears to have cleared __GFP_HIGH by mistake for reasons that are unrelated to highmem. This patch restores the flag. Signed-off-by: Mel Gorman --- drivers/virtio/virtio_ring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 096b857e7b75..f9e119e6df18 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -107,9 +107,10 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq, /* * We require lowmem mappings for the descriptors because * otherwise virt_to_phys will give us bogus addresses in the - * virtqueue. + * virtqueue. Access to high-priority reserves is preserved + * if originally requested by GFP_ATOMIC. */ - gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH); + gfp &= ~__GFP_HIGHMEM; desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp); if (!desc) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/