Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp938280ybg; Wed, 3 Jun 2020 18:35:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOANXxXNUDH07hPY3SkXXCsmqnrxjqbTgHnMSQSwwuQplPhrsoPvbXCdN5NqkVozzDAgCD X-Received: by 2002:a50:ee01:: with SMTP id g1mr2110254eds.44.1591234538744; Wed, 03 Jun 2020 18:35:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591234538; cv=none; d=google.com; s=arc-20160816; b=icbXoO12N7P/qFS62Q8W/BUu01oGPNqSTFakaOQn+iuCJH+K876RxUldXHnW6dw83J E42e4GXmhvGXHUhrWEqMDG8IOStvLYxFG5vBeQPMWvaLVqvGq1wHX9ehwnHoDIt5ZDOO PeYUTXLIvVSgTBCwYCCeSvKyl7dS/IpBdleqxntU/et7Ba5lyJBPq869/ssOcDNcwnW3 tTBultVRoNe/mSZBgn12qXjr+6f1CRtNqLgNBKY30vRTNzOClteln02xcYCKMud9ahk1 YNAnGozbu+taHTF9pQjCC/zsRDZoSdkjS7vY20B+UlUpvc8k/zLdnvQblsGHOSZgGAEA xTOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-signature; bh=4vipdXWneTcdQDAu6JbWhyQeuEqwMDjkliQKHSJwEH8=; b=W2aNAIFts82b9zvo6rSb/jyGlN8HgvBMOyZWrpq655N8d/V/Tg7g9a9yZQZN3Y9bH2 oSJBNUsF5YS0LuVFrvA9mz4RnGJJ6jCwGJfIHHDjkobNH+xKoqMXEw+6FCcgfQ+dP3zM CHMqNXIU1N5/4ZZj8AbRCCzY31JlkpLAgQt69htHCcynhI0Jn7dmlCkWVo9kIXMy55Sg nZ6hLgWIzkEFCAlFeANjCRCyDEURj9sOWfOfgLQBz997N+fSYlFiFC+IioNdct8f+JEm fl2ne6/Gs98/KIq2+V2pj/kC3t4+02/6Hl2laV7czMZyGmMis3H5HZxXdmEeTgzcaf45 op4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm1 header.b=cIwUViir; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=S8YKghCe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b101si766841edf.579.2020.06.03.18.34.43; Wed, 03 Jun 2020 18:35:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm1 header.b=cIwUViir; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=S8YKghCe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726409AbgFDBas (ORCPT + 99 others); Wed, 3 Jun 2020 21:30:48 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:45699 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725983AbgFDBas (ORCPT ); Wed, 3 Jun 2020 21:30:48 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 11BAD5C0106; Wed, 3 Jun 2020 21:30:47 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 03 Jun 2020 21:30:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=4vipdXWneTcdQDAu6JbWhyQeuEq wMDjkliQKHSJwEH8=; b=cIwUViirQ9aI81E1h0bKidmTFtjoZ3LThdQfLivnDXP 2btgd5UtvVfYduXrqevolSErnEYP2HZnQbdnc61HbLzpZdUJc/RvgQpiMGH+bOlb 6Ivkzm0h2a4JS6qL+aRABncNgBWPKrHvbb+lkJblhT43dkmDPk1Ddf5MBjZdf+BH ww74xt8wIMuMw8/aBBQUF0FM3jsARRUK96SqIA+AoX929SoDHZb910EXEchb/2nf JaLfrJPWdHt9vuWjtQGARoNXfeE/4QaoMaUi5MxH0tGt7plBif2Yi7tUHTIZ52rC EdsfjCMdnHkVk8KWGWBk4XwevOM1KbHD+PVZEbs+xDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=4vipdX WneTcdQDAu6JbWhyQeuEqwMDjkliQKHSJwEH8=; b=S8YKghCehYdlM9NUVAcy4B VSjw0fBXZl7sAnCD5BS8WoUavWx0V3IfVv39aT+ViY8jH3ezW1RJPupIUm5m6NSZ 4Obj2dJVtVZBSWrh89ONFFxa2U9F833qfyVEnAagdSAF2oryyD4YAMnPYRdF68F/ nCHzGOnlAx1uXOAgTsfoJPiLGxHC0kMTdBPQlUkoIc050N4blHTk6tlPutvHLlhA dGAbC8V54HSB+ElKNShotd9H9NiI0g8ghKxf/PaKkRJ+LmXkm2PYenjhrAPwZM73 SYx3r+yNZEd0unJs0GEfXMpoorV+t+QJmIp9WwtIp4Uig3F2VGGPx821uhu1Z48A == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrudegtddggeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedukefhkeelueegveetheelffffjeegleeuudelfeefuedtleffueejfffh ueffudenucfkphepieejrdduiedtrddvudejrddvhedtnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdr uggv X-ME-Proxy: Received: from intern.anarazel.de (c-67-160-217-250.hsd1.ca.comcast.net [67.160.217.250]) by mail.messagingengine.com (Postfix) with ESMTPA id 504183280065; Wed, 3 Jun 2020 21:30:46 -0400 (EDT) Date: Wed, 3 Jun 2020 18:30:45 -0700 From: Andres Freund To: Jens Axboe Cc: io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org Subject: Re: [PATCHSET v5 0/12] Add support for async buffered reads Message-ID: <20200604013045.7gu7xopreusbdea2@alap3.anarazel.de> References: <20200526195123.29053-1-axboe@kernel.dk> <20200604005916.niy2mejjcsx4sv6t@alap3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2020-06-03 19:04:17 -0600, Jens Axboe wrote: > > The workload that triggers the bug within a few seconds is postgres > > doing a parallel sequential scan of a large table (and aggregating the > > data, but that shouldn't matter). In the triggering case that boils down > > to 9 processes sequentially reading a number of 1GB files (we chunk > > tables internally into smaller files). Each process will read a 512kB > > chunk of the file on its own, and then claim the next 512kB from a > > shared memory location. Most of the IO will be READV requests, reading > > 16 * 8kB into postgres' buffer pool (which may or may not be neighboring > > 8kB pages). > > I'll try and reproduce this, any chance you have a test case that can > be run so I don't have to write one from scratch? The more detailed > instructions the better. It shouldn't be too hard to write you a detailed script for reproducing the issue. But it'd not be an all that minimal reproducer, unless it also triggers on smaller scale (it's a 130GB database that triggers the problem reliably, and small tables don't seem to do so reliably). I'll try to write that up after I set up kvm / repro there. One thing I forgot in the earlier email: I ran the benchmark using 'perf stat -a -e ...'. I'm fairly, but not absolutely, certain that it also triggered without that. I don't think it's related, but I thought I better mention it. > I have a known issue with request starvation, wonder if that could be it. > I'm going to rebase the branch on top of the aops->readahead() changes > shortly, and fix that issue. Hopefully that's what's plaguing your run > here, but if not, I'll hunt that one down. FWIW, I had iostat -xm /dev/nvme1n1 1 running during this. Shortly before the crash I see: Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 6221.00 956.09 3428.00 35.53 0.24 157.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.48 99.00 Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 6456.00 978.83 3439.00 34.75 0.21 155.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.38 98.70 It's maybe also worth noting that in this workload the results are *worse* than when using 5.7-rc7 io_uring. So perhaps request starvation isn't the worst guess... Greetings, Andres Freund