Received: by 10.223.185.116 with SMTP id b49csp1705031wrg; Wed, 14 Feb 2018 23:38:18 -0800 (PST) X-Google-Smtp-Source: AH8x226G40cVcz1i5jRkxeki6ypkFeI4EHNr07KsCxr59nhueh5wriPoF9dIT8ZKz30NOW+27zjM X-Received: by 10.99.124.2 with SMTP id x2mr1508313pgc.184.1518680298843; Wed, 14 Feb 2018 23:38:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518680298; cv=none; d=google.com; s=arc-20160816; b=zp6qklgQIVFrUIDnPgbWmmTi8ryODEhIiRJodnw7a0pXtsIqrI2cOUC0j/ForL726Q +/HNmBDb5XjuoZWhlvxVzJfT/2V6r6YZaApuyD0hxI4caS++WsULhO2pDmVQ4HKcDgJs xHctfws0pfRibnym2mRI8M/9x/JdwhHjDLMDPLflWbNUf5TN3U9S07HMdKfjNepH0Vz9 fzi5EXtzH4G+/5+I7TNi64Lw/PDkDN5hxb9nPtm88mjLirUaOudAwWr4vD8GroKyphCW ikzK6MOw1lYHxfGLaEJCD1ktGhs17p+B+oG/lKeR2INBqK5qXY474kyS6dGPkIz9y+YQ iBgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=z5c6jts+lu2tBsiEf/jzBJqsYXPmmchQeh5bPsTf+UM=; b=BkR8SomlBbYIt+CnU0aKJp96LQJom/+nBwWQjsnFlILbf/ulHkp/A49WZJaiiUyWoI fMJ3pciJh3Grm9Svzg50h5YFowfoMDd6iNAdPBZMpcO8uR/vXxO/HtK/p7T0OxKGGjha bPOYGOufxXzld4Bvegkvc4Irs+IOdg72dZk85SKlM5UY7tduPeyRGWNu9kwsB9hzAMef P8GJXOoQUVNSaL39oaI2HmAUVjexhprZahOum02hh3YDCi0dqWftwLyGwKshc1Lmllap G5/ngs5qVUbDO9W24b+v2Y5GP/CdUe/OSABavFZz2IDs4EXMfCPD8soKHtpG5LaDwNHX UHzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MsDTHFrm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si4018740pgt.833.2018.02.14.23.38.04; Wed, 14 Feb 2018 23:38:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MsDTHFrm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754909AbeBOHhR (ORCPT + 99 others); Thu, 15 Feb 2018 02:37:17 -0500 Received: from mail-wm0-f49.google.com ([74.125.82.49]:55107 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754805AbeBOHhQ (ORCPT ); Thu, 15 Feb 2018 02:37:16 -0500 Received: by mail-wm0-f49.google.com with SMTP id z81so863980wmb.4 for ; Wed, 14 Feb 2018 23:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=z5c6jts+lu2tBsiEf/jzBJqsYXPmmchQeh5bPsTf+UM=; b=MsDTHFrm1PRjZNPb8+yNvoEFrlyAagkif00uFB+pm7t4eW6Qk+RLdrH431bWx5jtq4 imy8aDRYvXRFB6AFU1R5YsQ4KFi4uzVnZ+BqUU+wJIF/k/8wn1WrK13UA94mM3vc9ja1 Cxoi+p1SthlrzBpeh8/x2k+jXg8RBbBJvcuh2FMPjjIaPsMl26XfmE+wHtCUXd0KFY5m MAj1teotNO3Abk/1/XBlxpN+1kfeEzVjbIzrp05Vaphl0QHlBtW0KhH4MILR1xEu6Mxd S+Whfbz+XGYiMyZFnRA2iFfSmusny9baLermGeE4jMjKqW8xJ4pMbgIbpWOQVMyMdtSU kf/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=z5c6jts+lu2tBsiEf/jzBJqsYXPmmchQeh5bPsTf+UM=; b=RDX+2bs84BL2q+FnfvEk+n8gtw1OAcmE/B12ocDnFhZQM3SsdvUkHDzD83KN4UbGfH 3FLGPGANn+5WT6bFyspWlbY7Fm5IMNQJINND2HFzwBgvbfEO4cyfX/A6JRFV7jOpakaO MINoo6GVDSUjI9zCNBbfH63UDR4Iv729aY9rEGiRjCi8iqUBpHlmS4YdRLROidraFgW6 QQP+dSL5rQeVO0V0XDwofsigKnSghKNh4H9zOkBiSpYKhLPVFPYMw4haqgL1NGrJa6qm ha9pNQJyO4IUEyr2d9V0cwYswfpajsadXVniACAFtZZ11EtFiV09K8h5261JbePBdcQR e9hg== X-Gm-Message-State: APf1xPB0kHk2w5w2yQARSM7wV0S9h/isuUJR/wdTCXP8Mmz6Xu1sMjS0 GSY/FYhSFMFS5BATvM61drx4VA== X-Received: by 10.28.193.2 with SMTP id r2mr1119207wmf.153.1518680234755; Wed, 14 Feb 2018 23:37:14 -0800 (PST) Received: from [10.43.17.17] (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id c56sm19765627wrc.82.2018.02.14.23.37.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Feb 2018 23:37:14 -0800 (PST) Subject: Re: [dm-devel] DM Regression in 4.16-rc1 - read() returns data when it shouldn't To: NeilBrown , Mike Snitzer , Mikulas Patocka Cc: device-mapper development , Linux Kernel Mailing List References: <70cda2a3-f246-d45b-f600-1f9d15ba22ff@gmail.com> <871shnqp9l.fsf@notabene.neil.brown.name> <20180214230526.GA25114@redhat.com> <87y3jvp13k.fsf@notabene.neil.brown.name> From: Milan Broz Message-ID: <1df6de2e-bfd3-84c4-cffb-83b6299c6d23@gmail.com> Date: Thu, 15 Feb 2018 08:37:12 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 In-Reply-To: <87y3jvp13k.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/15/2018 01:07 AM, NeilBrown wrote: > > And looking again at the code, it doesn't seem so bad. I was worried > about reassigning ci.io->orig_bio after calling > __split_and_process_non_flush(), but the important thing is to set it > before the last dec_pending() call, and the code gets that right. > > I think I've found the problem though: > > /* done with normal IO or empty flush */ > bio->bi_status = io_error; > > When we have chained bios, there are multiple sources for error which > need to be merged into bi_status: all bios that were chained to this > one, and this bio itself. > > __bio_chain_endio uses: > > if (!parent->bi_status) > parent->bi_status = bio->bi_status; > > so the new status won't over-ride the old status (unless there are > races....), but dm doesn't do that - I guess it didn't have to worry > about chains before. > > So this: > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index d6de00f367ef..68136806d365 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error) > queue_io(md, bio); > } else { > /* done with normal IO or empty flush */ > - bio->bi_status = io_error; > + if (io_error) > + bio->bi_status = io_error; > bio_endio(bio); > } > } > Hi, well, it indeed fixes the reported problem, thanks. But I just tested the first (dm.c) change above. The important part is also that if the error is hits bio later, it will produce short read (and not fail of the whole operation), this can be easily tested if we switch the error and the zero target in that reproducer //system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test"); system("echo -e \'0 10000000 zero\'\\\\n\'10000000 1000000 error\' | dmsetup create test"); So it seems your patch works. I am not sure about this part though: > should fix it. And __bio_chain_endio should probably be > > diff --git a/block/bio.c b/block/bio.c > index e1708db48258..ad77140edc6f 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) > { > struct bio *parent = bio->bi_private; > > - if (!parent->bi_status) > + if (bio->bi_status) > parent->bi_status = bio->bi_status; Shouldn't it be if (!parent->bi_status && bio->bi_status) parent->bi_status = bio->bi_status; ? Maybe one rephrased question: what about priorities for errors (bio statuses)? For example, part of a bio fails with EILSEQ (data integrity check error, BLK_STS_PROTECTION), part with EIO (media error, BLK_STS_IOERR). What should be reported for parent bio? (I would expect I always get EIO here because this is hard, uncorrectable error.) Anyway, thanks for the fix! Milan > bio_put(bio); > return parent; > > to remove the chance that a race will hide a real error. > > Milan: could you please check that the patch fixes the dm-crypt bug? > I've confirmed that it fixes your test case. > When you confirm, I'll send a proper patch. > > I guess I should audit all code that assigns to ->bi_status and make > sure nothing ever assigns 0 to it. > > Thanks, > NeilBrown >