Received: by 10.192.165.148 with SMTP id m20csp4151239imm; Mon, 30 Apr 2018 12:44:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpYisOlKw95oQ+74n5uk7gvxGZweLkO4cwg7ob6Q+NbjXIbSi8o8ATp0cY7oL0VHb9J9EHZ X-Received: by 10.98.65.132 with SMTP id g4mr13180169pfd.51.1525117446165; Mon, 30 Apr 2018 12:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525117446; cv=none; d=google.com; s=arc-20160816; b=PBk8PTbkBua/Sc7GOyyNu9FQsP9+RfrJeKQNQc5YeClvlWr46/34r9jmPStoFgAJc3 uIbG1/o/OFQW3VgiJ3kkBk1u3T8jQzAibHsC4UJZZP71gE/wlEfe0d+506ts9s35gWhx Z7XHvLB3gn5yR3SJNCsfGUcyJ+z3DdAVxoO9SvmFdjQZZS5ZnWOWFm7pbhTn1ySYa6kJ uFfQZYcErF34sMsHe/3jTzRMqu2XyrpgaVbJn5Bs388n+kZsCbvtA+pY99WvBpPuWQms +0PntEhYqzJmScHdbg8Jrq0SQRh6I6PGaB0t6NpLpLdO2YcU3EcBMzPRjRloaHEs+4+u ffMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dmarc-filter :arc-authentication-results; bh=c/w+p2Fyv468WCot5nBVjOHCqksggJaEHBZNY5KAnYM=; b=EnVTZWQBE/zkVn8PMYmUeduK5wMLedlRxGDa8+nvNb2Bn2L61yd4PBLDnlb9HLWY5t 283GnUSgH257NyWLWFrMd7BPnYq/9/LLdO45D20imb4MWGeXPiGLjt6pCd8uI+Ek+87q ZiHH/PtwGjsBiWf4oaWidiH167EffFBdr3Wi9eF1ZFdKeai6rM4QUo/Z32TtpWaNaDuP vXBsYyhQVtTHP4Ul5FRn4w+rZP5RFVKMKOsiP+8fdHEAYAnYVXhBYqIAI1eyBfoAoOKT 0rN2YwOZx1eRZ8CXB/OXJ59ohEllm9Bai2as/LyjKFQpfX1McaRqIeNmKV7kAIBuv+EF AUyA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 24si2723022pfp.161.2018.04.30.12.43.51; Mon, 30 Apr 2018 12:44:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932372AbeD3TnY (ORCPT + 99 others); Mon, 30 Apr 2018 15:43:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:36104 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932296AbeD3T2Z (ORCPT ); Mon, 30 Apr 2018 15:28:25 -0400 Received: from localhost (unknown [104.132.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 107BE22E02; Mon, 30 Apr 2018 19:28:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 107BE22E02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=fail smtp.mailfrom=gregkh@linuxfoundation.org From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Bart Van Assche , Jens Axboe , Damien Le Moal , Christoph Hellwig , Hannes Reinecke , "Martin K. Petersen" Subject: [PATCH 4.16 061/113] scsi: sd_zbc: Avoid that resetting a zone fails sporadically Date: Mon, 30 Apr 2018 12:24:32 -0700 Message-Id: <20180430184017.634438324@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430184015.043892819@linuxfoundation.org> References: <20180430184015.043892819@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Bart Van Assche commit ccce20fc7968d546fb1e8e147bf5cdc8afc4278a upstream. Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is called from sd_probe_async() and since sd_revalidate_disk() calls sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called concurrently with blkdev_report_zones() and/or blkdev_reset_zones(). That can cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it to the actual value, even if no drive characteristics have changed. Avoid that this can happen by making the following changes: - Protect the code that updates zone information with blk_queue_enter() and blk_queue_exit(). - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that these functions do not modify struct scsi_disk before all zone information has been obtained. Note: since commit 055f6e18e08f ("block: Make q_usage_counter also track legacy requests"; kernel v4.15) the request queue freezing mechanism also affects legacy request queues. Fixes: 89d947561077 ("sd: Implement support for ZBC devices") Signed-off-by: Bart Van Assche Cc: Jens Axboe Cc: Damien Le Moal Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: stable@vger.kernel.org # v4.16 Reviewed-by: Damien Le Moal Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd_zbc.c | 140 ++++++++++++++++++++++++++++--------------------- include/linux/blkdev.h | 5 + 2 files changed, 87 insertions(+), 58 deletions(-) --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -400,8 +400,10 @@ static int sd_zbc_check_capacity(struct * * Check that all zones of the device are equal. The last zone can however * be smaller. The zone size must also be a power of two number of LBAs. + * + * Returns the zone size in bytes upon success or an error code upon failure. */ -static int sd_zbc_check_zone_size(struct scsi_disk *sdkp) +static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp) { u64 zone_blocks = 0; sector_t block = 0; @@ -412,8 +414,6 @@ static int sd_zbc_check_zone_size(struct int ret; u8 same; - sdkp->zone_blocks = 0; - /* Get a buffer */ buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL); if (!buf) @@ -445,16 +445,17 @@ static int sd_zbc_check_zone_size(struct /* Parse zone descriptors */ while (rec < buf + buf_len) { - zone_blocks = get_unaligned_be64(&rec[8]); - if (sdkp->zone_blocks == 0) { - sdkp->zone_blocks = zone_blocks; - } else if (zone_blocks != sdkp->zone_blocks && - (block + zone_blocks < sdkp->capacity - || zone_blocks > sdkp->zone_blocks)) { - zone_blocks = 0; + u64 this_zone_blocks = get_unaligned_be64(&rec[8]); + + if (zone_blocks == 0) { + zone_blocks = this_zone_blocks; + } else if (this_zone_blocks != zone_blocks && + (block + this_zone_blocks < sdkp->capacity + || this_zone_blocks > zone_blocks)) { + this_zone_blocks = 0; goto out; } - block += zone_blocks; + block += this_zone_blocks; rec += 64; } @@ -467,8 +468,6 @@ static int sd_zbc_check_zone_size(struct } while (block < sdkp->capacity); - zone_blocks = sdkp->zone_blocks; - out: if (!zone_blocks) { if (sdkp->first_scan) @@ -488,8 +487,7 @@ out: "Zone size too large\n"); ret = -ENODEV; } else { - sdkp->zone_blocks = zone_blocks; - sdkp->zone_shift = ilog2(zone_blocks); + ret = zone_blocks; } out_free: @@ -500,21 +498,21 @@ out_free: /** * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone). - * @sdkp: The disk of the bitmap + * @nr_zones: Number of zones to allocate space for. + * @numa_node: NUMA node to allocate the memory from. */ -static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp) +static inline unsigned long * +sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node) { - struct request_queue *q = sdkp->disk->queue; - - return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones) - * sizeof(unsigned long), - GFP_KERNEL, q->node); + return kzalloc_node(BITS_TO_LONGS(nr_zones) * sizeof(unsigned long), + GFP_KERNEL, numa_node); } /** * sd_zbc_get_seq_zones - Parse report zones reply to identify sequential zones * @sdkp: disk used * @buf: report reply buffer + * @zone_shift: logarithm base 2 of the number of blocks in a zone * @seq_zone_bitamp: bitmap of sequential zones to set * * Parse reported zone descriptors in @buf to identify sequential zones and @@ -524,7 +522,7 @@ static inline unsigned long *sd_zbc_allo * Return the LBA after the last zone reported. */ static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf, - unsigned int buflen, + unsigned int buflen, u32 zone_shift, unsigned long *seq_zones_bitmap) { sector_t lba, next_lba = sdkp->capacity; @@ -543,7 +541,7 @@ static sector_t sd_zbc_get_seq_zones(str if (type != ZBC_ZONE_TYPE_CONV && cond != ZBC_ZONE_COND_READONLY && cond != ZBC_ZONE_COND_OFFLINE) - set_bit(lba >> sdkp->zone_shift, seq_zones_bitmap); + set_bit(lba >> zone_shift, seq_zones_bitmap); next_lba = lba + get_unaligned_be64(&rec[8]); rec += 64; } @@ -552,12 +550,16 @@ static sector_t sd_zbc_get_seq_zones(str } /** - * sd_zbc_setup_seq_zones_bitmap - Initialize the disk seq zone bitmap. + * sd_zbc_setup_seq_zones_bitmap - Initialize a seq zone bitmap. * @sdkp: target disk + * @zone_shift: logarithm base 2 of the number of blocks in a zone + * @nr_zones: number of zones to set up a seq zone bitmap for * * Allocate a zone bitmap and initialize it by identifying sequential zones. */ -static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp) +static unsigned long * +sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift, + u32 nr_zones) { struct request_queue *q = sdkp->disk->queue; unsigned long *seq_zones_bitmap; @@ -565,9 +567,9 @@ static int sd_zbc_setup_seq_zones_bitmap unsigned char *buf; int ret = -ENOMEM; - seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(sdkp); + seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(nr_zones, q->node); if (!seq_zones_bitmap) - return -ENOMEM; + return ERR_PTR(-ENOMEM); buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL); if (!buf) @@ -578,7 +580,7 @@ static int sd_zbc_setup_seq_zones_bitmap if (ret) goto out; lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE, - seq_zones_bitmap); + zone_shift, seq_zones_bitmap); } if (lba != sdkp->capacity) { @@ -590,12 +592,9 @@ out: kfree(buf); if (ret) { kfree(seq_zones_bitmap); - return ret; + return ERR_PTR(ret); } - - q->seq_zones_bitmap = seq_zones_bitmap; - - return 0; + return seq_zones_bitmap; } static void sd_zbc_cleanup(struct scsi_disk *sdkp) @@ -611,44 +610,64 @@ static void sd_zbc_cleanup(struct scsi_d q->nr_zones = 0; } -static int sd_zbc_setup(struct scsi_disk *sdkp) +static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks) { struct request_queue *q = sdkp->disk->queue; + u32 zone_shift = ilog2(zone_blocks); + u32 nr_zones; int ret; - /* READ16/WRITE16 is mandatory for ZBC disks */ - sdkp->device->use_16_for_rw = 1; - sdkp->device->use_10_for_rw = 0; - /* chunk_sectors indicates the zone size */ - blk_queue_chunk_sectors(sdkp->disk->queue, - logical_to_sectors(sdkp->device, sdkp->zone_blocks)); - sdkp->nr_zones = - round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift; + blk_queue_chunk_sectors(q, + logical_to_sectors(sdkp->device, zone_blocks)); + nr_zones = round_up(sdkp->capacity, zone_blocks) >> zone_shift; /* * Initialize the device request queue information if the number * of zones changed. */ - if (sdkp->nr_zones != q->nr_zones) { - - sd_zbc_cleanup(sdkp); - - q->nr_zones = sdkp->nr_zones; - if (sdkp->nr_zones) { - q->seq_zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp); - if (!q->seq_zones_wlock) { + if (nr_zones != sdkp->nr_zones || nr_zones != q->nr_zones) { + unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL; + size_t zone_bitmap_size; + + if (nr_zones) { + seq_zones_wlock = sd_zbc_alloc_zone_bitmap(nr_zones, + q->node); + if (!seq_zones_wlock) { ret = -ENOMEM; goto err; } - ret = sd_zbc_setup_seq_zones_bitmap(sdkp); - if (ret) { - sd_zbc_cleanup(sdkp); + seq_zones_bitmap = sd_zbc_setup_seq_zones_bitmap(sdkp, + zone_shift, nr_zones); + if (IS_ERR(seq_zones_bitmap)) { + ret = PTR_ERR(seq_zones_bitmap); + kfree(seq_zones_wlock); goto err; } } - + zone_bitmap_size = BITS_TO_LONGS(nr_zones) * + sizeof(unsigned long); + blk_mq_freeze_queue(q); + if (q->nr_zones != nr_zones) { + /* READ16/WRITE16 is mandatory for ZBC disks */ + sdkp->device->use_16_for_rw = 1; + sdkp->device->use_10_for_rw = 0; + + sdkp->zone_blocks = zone_blocks; + sdkp->zone_shift = zone_shift; + sdkp->nr_zones = nr_zones; + q->nr_zones = nr_zones; + swap(q->seq_zones_wlock, seq_zones_wlock); + swap(q->seq_zones_bitmap, seq_zones_bitmap); + } else if (memcmp(q->seq_zones_bitmap, seq_zones_bitmap, + zone_bitmap_size) != 0) { + memcpy(q->seq_zones_bitmap, seq_zones_bitmap, + zone_bitmap_size); + } + blk_mq_unfreeze_queue(q); + kfree(seq_zones_wlock); + kfree(seq_zones_bitmap); } return 0; @@ -660,6 +679,7 @@ err: int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { + int64_t zone_blocks; int ret; if (!sd_is_zoned(sdkp)) @@ -696,12 +716,16 @@ int sd_zbc_read_zones(struct scsi_disk * * Check zone size: only devices with a constant zone size (except * an eventual last runt zone) that is a power of 2 are supported. */ - ret = sd_zbc_check_zone_size(sdkp); - if (ret) + zone_blocks = sd_zbc_check_zone_size(sdkp); + ret = -EFBIG; + if (zone_blocks != (u32)zone_blocks) + goto err; + ret = zone_blocks; + if (ret < 0) goto err; /* The drive satisfies the kernel restrictions: set it up */ - ret = sd_zbc_setup(sdkp); + ret = sd_zbc_setup(sdkp, zone_blocks); if (ret) goto err; --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -605,6 +605,11 @@ struct request_queue { * initialized by the low level device driver (e.g. scsi/sd.c). * Stacking drivers (device mappers) may or may not initialize * these fields. + * + * Reads of this information must be protected with blk_queue_enter() / + * blk_queue_exit(). Modifying this information is only allowed while + * no requests are being processed. See also blk_mq_freeze_queue() and + * blk_mq_unfreeze_queue(). */ unsigned int nr_zones; unsigned long *seq_zones_bitmap;