Received: by 10.192.165.148 with SMTP id m20csp4749780imm; Tue, 24 Apr 2018 07:51:55 -0700 (PDT) X-Google-Smtp-Source: AIpwx49r4dwWDftkFbWZ1i9rRiqWO4TaWmUUCBx1bY7JBC6RPIfUmX2AcX6YEPzBACRzq0fdqb2v X-Received: by 10.98.245.139 with SMTP id b11mr24005875pfm.113.1524581515128; Tue, 24 Apr 2018 07:51:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524581515; cv=none; d=google.com; s=arc-20160816; b=rYRBghYqPJ+jpENyQ4Vxp055b1c12vY65RnflBmHIn2W90r6FI0eKGUXeijU6Hy/6Z 5AyeeMJg++++448rCGHOi8CWaatyXr5XbpznigryqEU3d2CvJirZKtXZD/kovn/wMAg4 NYuTRSjt6o8dIfWa/3BrMHGT+AEbm9uiv3QOaKzu/DD77bvHhXgfVxwgdsvdCgtZQFtZ 2yfy4mKhDSumeFq1Oc1PkqRUc33GS8Kb3jB2sXjl+Wva+f6WmNdm6K9zCZsruX6qbvUn r/p1XNJAq/Pss9ycewdGKMwBAcP8c3e3wl47Ik2ZULe2pJsUsm78deSfXVAnjRv9qL8p rlzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :references:cc:to:subject:from:arc-authentication-results; bh=diVU0PUm1lvgsC9vsvHkr9yZr++8bcZz87QXYitbuE8=; b=s7WAcH4IEgAcGiAhglIOGavniYuJkBMlftQyi4HiKJepqgVdBbjLBlDOsAJQq8nOI7 a/qhGJZRrVTbeWfW8TBmXm3uCZmkWBFpWj9Zt4acUR3jAQC5XNsqcy+8w3si3cnQVvjk Z4haQLZM2HXJY59IzDg8jn3aWVSG3sSid89Lb289WT7jQwVRzTlyClAhsFpRSXNyR6YV UY7TZ8jvYg8oHrpXawceSfiOyj03mQLeCWS8w7f18WB+HQjx+mDs2OndfzBjaduohu5T u/y+iLQsPh2HN7hYTUOPdZizP3GAxTz6WviUv5/qw3Ahb+rQyqZbRiHkWFzR7nuBFVlO sZgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l187si11670198pge.46.2018.04.24.07.51.40; Tue, 24 Apr 2018 07:51:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751301AbeDXOub convert rfc822-to-8bit (ORCPT + 99 others); Tue, 24 Apr 2018 10:50:31 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45966 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750749AbeDXOu2 (ORCPT ); Tue, 24 Apr 2018 10:50:28 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w3OEoAZf064964 for ; Tue, 24 Apr 2018 10:50:28 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hj523eesx-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 24 Apr 2018 10:50:26 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 24 Apr 2018 15:50:04 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 24 Apr 2018 15:50:00 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w3OEo0cA11796932; Tue, 24 Apr 2018 14:50:00 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BF3C6A405D; Tue, 24 Apr 2018 15:41:59 +0100 (BST) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5173BA4051; Tue, 24 Apr 2018 15:41:59 +0100 (BST) Received: from oc2481388365.ibm.com (unknown [9.152.212.160]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 24 Apr 2018 15:41:59 +0100 (BST) From: Steffen Maier Subject: Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events To: Bart Van Assche , "osandov@osandov.com" Cc: "lizefan@huawei.com" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "rostedt@goodmis.org" , "axboe@kernel.dk" , "mingo@redhat.com" , "gregkh@linuxfoundation.org" References: <20180413130719.22921-1-maier@linux.ibm.com> <20180413130719.22921-3-maier@linux.ibm.com> <20180415083154.GA12254@kroah.com> <59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com> <20180419192341.GC20941@vader> <8624e2bec853f0bb195966ab54bc42bf15eda0ec.camel@wdc.com> Date: Tue, 24 Apr 2018 16:49:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <8624e2bec853f0bb195966ab54bc42bf15eda0ec.camel@wdc.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8BIT X-TM-AS-GCONF: 00 x-cbid: 18042414-0040-0000-0000-00000450F0B1 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18042414-0041-0000-0000-000020F5621C Message-Id: <056d8ee9-5e1d-8219-c24c-9130d6c1dcbf@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-04-24_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804240142 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/19/2018 10:56 PM, Bart Van Assche wrote: > On Thu, 2018-04-19 at 12:24 -0700, Omar Sandoval wrote: >> We quiesce and freeze the queue before tearing it down, so it won't be >> NULL while we're still doing I/O. Cc'ing Bart in case I'm lying to you, >> though ;) > > blk_cleanup_queue() waits until all I/O requests have finished. Since the > block layer tracing code is triggered from inside the code that processes a > request it is safe to access the request pointer from inside the tracing code. > But I think the question was about the parent of the request queue kobj > instead of about the request queue pointer, the device structure that is > embedded in struct gendisk? I think that parent can disappear at any time. > Most block drivers call del_gendisk() before they call blk_cleanup_queue(). > Unless I'm overlooking something I think the request queue will need to > obtain a reference to the gendisk device from inside blk_register_queue() > and drop that reference from inside blk_cleanup_queue() to make it safefor > the tracing code to access struct gendisk. Got it, thanks, makes sense. I did some research, tried the refcount approach, and failed. Experiment Preparation: Stopped any udevd and lvm services to avoid additional I/O. Used the following debug/tracing setup: modprobe sd_mod echo -n 'module core +pt; module kobject +pt' > /sys/kernel/debug/dynamic_debug/control trace-cmd start -e block -f "(dev >= 0x800000 && dev <= 0x80000f) || dev == 0" -e scsi -p function -l "*_gendisk" -l blk_alloc_queue -l blk_register_queue -l blk_unregister_queue -l blk_cleanup_queue echo "blk_alloc_queue_node:stacktrace" >> /sys/kernel/debug/tracing/set_ftrace_filter echo "scsi_execute:stacktrace" >> /sys/kernel/debug/tracing/set_ftrace_filter echo "blk_register_queue:stacktrace" >> /sys/kernel/debug/tracing/set_ftrace_filter echo "blk_unregister_queue:stacktrace" >> /sys/kernel/debug/tracing/set_ftrace_filter echo "blk_cleanup_queue:stacktrace" >> /sys/kernel/debug/tracing/set_ftrace_filter Experiment: modprobe scsi_debug num_parts=1 sleep 3 echo "HOTUNPLUG" > /sys/kernel/debug/tracing/trace_marker echo 1 > /sys/class/scsi_device/0:0:0:0/device/delete echo 0 > /sys/kernel/debug/tracing/tracing_on Also tried a similar sequence but hotplugged & hotunplugged a DASD disk (non-SCSI) incl. mq for comparison. The object life cycle seems to be: (1) blk_alloc_queue() also creates gendisk kobj (1a) SCSI does LUN probing I/O most of that is blktrace RWBS field value 'N' i.e. non-R/W with dev_t==0 since q->kobj.parent is still NULL we cannot get gendisk and thus dev_t near the end is the first regular read block I/O with dev_t!=0 as bio!=NULL and bi_disk or rq_disk are non-NULL (2) blk_register_queue() also creates queue kobj with gendisk parent now we can follow q->kobj.parent to gendisk to get dev_t, e.g. for RWBS 'N' such as implicit SCSI test unit ready on blk dev close (3) optionally "regular" I/O (4) blk_unregister_queue() called by del_gendisk() does kobject_del(&q->kobj) which NULLifies q->kobj.parent again the last put of gendisk reference releases gendisk kobj (5) blk_cleanup_queue() the last put of queue (/mq) reference releases queue (/mq) kobj The patch set would have made dev_t meaningful between and incl. (2) and (3) for (un)plug and even for other block events with RWBS 'N'. I noticed that I missed one more spot in block_rq_complete [PATCH 3/3], which explains why I had missed some rq_completions while filtering block trace events for specific (non-zero) dev_t. It still wouldn't have been perfect as (1a) events still are dev_t==0 even though the gendisk exists since (1). I tried the same but increased scsi_debug/delay to a huge value and made one I/O request pending with "dd" before the hotunplug to see how flushing would work. For SCSI, at least, all flushing including a full (and unsuccessful due to eh TUR tmo) scsi_eh escalation for several minutes happened between 'block' kobj release during (4) and gendisk release during (5), i.e. before the gendisk was gone but q->kobj.parent was already NULL. The 'queue' kobj was released last finally. Since I cannot prove it's always like this, I followed Bart's suggestion and added another refcount get at (2) blk_register_queue(). However, when I want to put that additional refcount at (5) blk_cleanup_queue(), we only get a queue pointer argument as context and q->kobj.parent==NULL since (4), so I don't know how to get to the gendisk object for the refcount put. Probably the additional refcount would not help either as (4) has the unconditional kobject_del(&q->kobj) NULLifying q->kobj.parent anyway... I guess I came full circle. Side notes: The existing code holds a reference on gendisk, but between (2) and (4). Apparently block trace events can easily observe parts such as (1a) which are difficult for traditional blktrace to be started via blk dev ioctl (or for ftrace 'blk' tracer which can be enabled between (2) and (4)). Independently, I started to wonder if there is a refcount leak in the early error out if kobject_add() failed: > int blk_register_queue(struct gendisk *disk) > struct device *dev = disk_to_dev(disk); > struct request_queue *q = disk->queue; > ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue"); > ^^^ref+1 > if (ret < 0) { > blk_trace_remove_sysfs(dev); Above kobject_get already did its side-effect to provide the second argument when calling kobject_add. Even if kobject_add failed, we still have the refcount on the gendisk but don't put it? > goto unlock; > } > if (q->request_fn || (q->mq_ops && q->elevator)) { > ret = elv_register_queue(q); > if (ret) { > mutex_unlock(&q->sysfs_lock); > kobject_uevent(&q->kobj, KOBJ_REMOVE); > kobject_del(&q->kobj); > blk_trace_remove_sysfs(dev); > kobject_put(&dev->kobj); In contrast, this early return seems to do the pairwise put. > return ret; > } > } > unlock: > mutex_unlock(&q->sysfs_lock); > return ret; > } -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294