Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754465AbcKUPcy (ORCPT ); Mon, 21 Nov 2016 10:32:54 -0500 Received: from forwardcorp1m.cmail.yandex.net ([5.255.216.198]:46309 "EHLO forwardcorp1m.cmail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752859AbcKUPcw (ORCPT ); Mon, 21 Nov 2016 10:32:52 -0500 Authentication-Results: smtpcorp1p.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Subject: Re: [BUG 4.4.26] bio->bi_bdev == NULL in raid6 return_io() To: NeilBrown , Konstantin Khlebnikov , Shaohua Li References: <251e243a-ebcd-ae83-0850-a2143d2423ca@yandex-team.ru> <20161107194627.hsdk7zqoxznxdixl@kernel.org> <87r365eidd.fsf@notabene.neil.brown.name> Cc: "linux-kernel@vger.kernel.org" , linux-raid@vger.kernel.org, linux-block@vger.kernel.org, Jens Axboe , Christoph Hellwig From: Konstantin Khlebnikov Message-ID: Date: Mon, 21 Nov 2016 18:32:48 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <87r365eidd.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2813 Lines: 74 On 21.11.2016 04:23, NeilBrown wrote: > On Sun, Nov 20 2016, Konstantin Khlebnikov wrote: > >> On 07.11.2016 23:34, Konstantin Khlebnikov wrote: >>> On Mon, Nov 7, 2016 at 10:46 PM, Shaohua Li wrote: >>>> On Sat, Nov 05, 2016 at 01:48:45PM +0300, Konstantin Khlebnikov wrote: >>>>> return_io() resolves request_queue even if trace point isn't active: >>>>> >>>>> static inline struct request_queue *bdev_get_queue(struct block_device *bdev) >>>>> { >>>>> return bdev->bd_disk->queue; /* this is never NULL */ >>>>> } >>>>> >>>>> static void return_io(struct bio_list *return_bi) >>>>> { >>>>> struct bio *bi; >>>>> while ((bi = bio_list_pop(return_bi)) != NULL) { >>>>> bi->bi_iter.bi_size = 0; >>>>> trace_block_bio_complete(bdev_get_queue(bi->bi_bdev), >>>>> bi, 0); >>>>> bio_endio(bi); >>>>> } >>>>> } >>>> >>>> I can't see how this could happen. What kind of tests/environment are these running? >>> >>> That was a random piece of production somewhere. >>> Cording to time all crashes happened soon after reboot. >>> There're several raids, probably some of them were still under resync. >>> >>> For now we have only few machines with this kernel. But I'm sure that >>> I'll get much more soon =) >> >> I've added this debug patch for catching overflow of active stripes in bio >> >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -164,6 +164,7 @@ static inline void raid5_inc_bi_active_stripes(struct bio *bio) >> { >> atomic_t *segments = (atomic_t *)&bio->bi_phys_segments; >> atomic_inc(segments); >> + BUG_ON(!(atomic_read(segments) & 0xffff)); >> } >> >> And got this. Counter in %edx = 0x00010000 >> >> So, looks like one bio (discard?) can cover more than 65535 stripes > > 65535 stripes - 256M. I guess that is possible. Christoph has > suggested that now would be a good time to stop using bi_phys_segments > like this. Is it possible to fix this by limiting max_hw_sectors and max_hw_discard_sectors for raid queue? This should be much easier to backport into stable kernels. I've found that setup also have dm/lvm on the top of md raid so hat might be more complicated problem. Because I cannot see how bio could be big enough to overflow that counter. That was raid6 with 10 disks and 256k chunk. max_hw_discard_sectors and max_hw_sectors cannot be bigger than UINT_MAX. Thus in this case bio cannot cover more than 16384 data chunks, 20480 chunks including checksums. Please fix me if I'm wrong. > > I have some patches which should fix this. I'll post them shortly. I'd > appreciate it if you would test and confirm that they work (and don't > break anything else) Ok, I'll try to check that patchset. -- Konstantin