Received: by 10.192.165.156 with SMTP id m28csp844210imm; Mon, 16 Apr 2018 09:35:02 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/KoiRhOD8UySnUKAhEQHYs08AQq6lsH7STRZ9s9CJHrjbZPOoDtBoDb5ZRDtuCtX1+sXVZ X-Received: by 2002:a17:902:b2c8:: with SMTP id x8-v6mr16032665plw.83.1523896502341; Mon, 16 Apr 2018 09:35:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523896502; cv=none; d=google.com; s=arc-20160816; b=ZexID9jgC8CkI+TQg3rbtsyPQY8BtJKxqqInhbF7gUkdm4ZuhNZ3HPE17vPDKBmxDw oYbM6XzMHsHdbrsrFO3Akiti6nKMehu1tk977vtqGFMHcEWj9liMBLle5npI4K1BcY8V Yfz85u+HgO1s8NTQaF8NMgsRHlBr5khZw2OrlzNgg1yIQZIUu6WNOhuSfjav6qkOT6nO FBrGije5H+xvmhUShTpAA1j9aPzSMTGX4aULbchwkbKdu197FKeNI5Sg76I8WQ/K0g5l kL/xhwKnsem6hSearDgkJFHeSSHVrBKRlUyI7f8RUKAP/7R4PWp0bmOYMfVWGIV2/G04 u7YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=bahCWJrTVVSUoeCcvICpeTAsYvzfnt13HQMUV9Uf1A8=; b=y/VZtMYAOwXlQ87o0Mos6mLzI8lYATvnx1Rk0cqyMT6NvJtXNSg0tiGwlKxnrzgEPX HcZM2zlCIiuptak813IgRw1UZX+jiHwYGytkuNG+UFCSoGGtcQP/7hyVLugkKu95AjCA vSF+AUXtKm0a3SKkvJUtMx55Tec1QKTzLRlxMCpz3VZQjRZkE8ySONgczkZ4bU6nLYGA 6ncUgf28B92TGKDSEKmhH3sz9x2lfQALnFXdMPmPc4utab5vyFs8lb/yomcbjILARc5C ECDqnTsRJI39mGsbHUDUvdDabXmiOwcymQLtFKpQl+ezYmRwmc3Coptbuiw5rQ/+hXp5 G0fA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y14si1997411pgo.85.2018.04.16.09.34.47; Mon, 16 Apr 2018 09:35:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752604AbeDPQdh convert rfc822-to-8bit (ORCPT + 99 others); Mon, 16 Apr 2018 12:33:37 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:33166 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751070AbeDPQdg (ORCPT ); Mon, 16 Apr 2018 12:33:36 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w3GGSqqf110756 for ; Mon, 16 Apr 2018 12:33:36 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hcwd4epc7-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Mon, 16 Apr 2018 12:33:35 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 16 Apr 2018 17:33:32 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 16 Apr 2018 17:33:28 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w3GGXS9116515128; Mon, 16 Apr 2018 16:33:28 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5F595A4040; Mon, 16 Apr 2018 17:25:38 +0100 (BST) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 064C7A4051; Mon, 16 Apr 2018 17:25:38 +0100 (BST) Received: from oc2481388365.ibm.com (unknown [9.152.212.160]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 16 Apr 2018 17:25:37 +0100 (BST) Subject: Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events To: Greg Kroah-Hartman Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Steven Rostedt , Ingo Molnar , Jens Axboe , Li Zefan , Li Zefan References: <20180413130719.22921-1-maier@linux.ibm.com> <20180413130719.22921-3-maier@linux.ibm.com> <20180415083154.GA12254@kroah.com> From: Steffen Maier Date: Mon, 16 Apr 2018 18:33:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180415083154.GA12254@kroah.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8BIT X-TM-AS-GCONF: 00 x-cbid: 18041616-0012-0000-0000-000005CB799C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18041616-0013-0000-0000-00001947BEDD Message-Id: <59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-04-16_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804160149 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Greg, On 04/15/2018 10:31 AM, Greg Kroah-Hartman wrote: > On Fri, Apr 13, 2018 at 03:07:18PM +0200, Steffen Maier wrote: >> Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block >> trace points to TRACE_EVENT()") to be equivalent to traditional blktrace >> output. Also this allows event filtering to not always get all (un)plug >> events. >> >> NB: The NULL pointer check for q->kobj.parent is certainly racy and >> I don't have enough experience if it's good enough for a trace event. >> The change did work for my cases (block device read/write I/O on >> zfcp-attached SCSI disks and dm-mpath on top). >> >> While I haven't seen any prior art using driver core (parent) relations >> for trace events, there are other cases using this when no direct pointer >> exists between objects, such as: >> #define to_scsi_target(d) container_of(d, struct scsi_target, dev) >> static inline struct scsi_target *scsi_target(struct scsi_device *sdev) >> { >> return to_scsi_target(sdev->sdev_gendev.parent); >> } > > That is because you "know" the parent of a target device is a > scsi_target. true >> This is the object model we make use of here: >> >> struct gendisk { >> struct hd_struct { >> struct device { /*container_of*/ >> struct kobject kobj; <--+ >> dev_t devt; /*deref*/ | >> } __dev; | >> } part0; | >> struct request_queue *queue; ..+ | >> } : | >> : | >> struct request_queue { <..............+ | >> /* queue kobject */ | >> struct kobject { | >> struct kobject *parent; --------+ > > Are you sure about this? I double checked it with crash on a running system chasing pointers and looking at structure debug symbols. But of course I cannot guarantee it's always been like this and will be. >> } kobj; >> } >> >> The difference to blktrace parsed output is that block events don't use the >> partition's minor number but the containing block device's minor number: > > Why do you want the block device's minor number here? What is wrong > with the partition's minor number? I would think you want that instead. No change introduced with my patch. I just describe state of the art since the mentioned https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=55782138e47d. It (or even its predecessor) used request_queue as trace function argument (plus mostly either request or bio). So that's the currently available context for these events. My change is consistent with that. But then again, it's not much of a problem as we do have the remap event which shows the mapping from partition to blockdev. blktrace, hooking with callbacks on the block trace events, has its own context information [struct blk_trace] and can get to e.g. the dev_t with its own real pointers without using driver core relations. But I had the impression that's only wired if one uses the blktrace IOCTL or the blk tracer [do_blk_trace_setup()], not for "pure" block events. > static void blk_add_trace_plug(void *ignore, struct request_queue *q) > { > struct blk_trace *bt = q->blk_trace; ^^^^^^^^^^^^ > > if (bt) > __blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL); > } > > static void blk_add_trace_unplug(void *ignore, struct request_queue *q, > unsigned int depth, bool explicit) > { > struct blk_trace *bt = q->blk_trace; ^^^^^^^^^^^^ > > if (bt) { > __be64 rpdu = cpu_to_be64(depth); > u32 what; > > if (explicit) > what = BLK_TA_UNPLUG_IO; > else > what = BLK_TA_UNPLUG_TIMER; > > __blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL); > } > } > struct blk_trace { > int trace_state; > struct rchan *rchan; > unsigned long __percpu *sequence; > unsigned char __percpu *msg_data; > u16 act_mask; > u64 start_lba; > u64 end_lba; > u32 pid; > u32 dev; ^^^ > struct dentry *dir; > struct dentry *dropped_file; > struct dentry *msg_file; > struct list_head running_list; > atomic_t dropped; > }; >> $ dd if=/dev/sdf1 count=1 >> >> $ cat /sys/kernel/debug/tracing/trace >> block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0 >> block_bio_queue: 8,80 R 2048 + 32 [dd] >> block_getrq: 8,80 R 2048 + 32 [dd] >> block_plug: 8,80 [dd] >> ^^^^ >> block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd] >> block_unplug: 8,80 [dd] 1 explicit >> ^^^^ >> block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd] >> block_rq_complete: 8,80 R () 2048 + 32 [0] >> diff --git a/include/trace/events/block.h b/include/trace/events/block.h >> index a13613d27cee..cffedc26e8a3 100644 >> --- a/include/trace/events/block.h >> +++ b/include/trace/events/block.h >> @@ -460,14 +460,18 @@ TRACE_EVENT(block_plug, >> TP_ARGS(q), >> >> TP_STRUCT__entry( >> + __field( dev_t, dev ) >> __array( char, comm, TASK_COMM_LEN ) >> ), >> >> TP_fast_assign( >> + __entry->dev = q->kobj.parent ? >> + container_of(q->kobj.parent, struct device, kobj)->devt : 0; > > That really really really scares me. It feels very fragile and messing > with parent pointers is ripe for things breaking in the future in odd > and unexplainable ways. > > And how can the parent be NULL? I'm hoping for help by block layer experts. I suppose block device unplug/removal could be a case. But I don't know the details how this works and if object release is protected while I/O is pending and new I/O is rejected beforehand. That might make my approach safe as it would not call the trace functions while the parent pointer changes. > I don't know the block layer but this feels very wrong to me. Are you > sure there isn't some other way to get this info? No, I'm not sure at all. But I'm no block layer expert either. This is just an idea I had which did work for my cases and I'm looking for confirmation or denial by the experts. So if it's not safe from a block layer point of view either, then I have to ditch it. -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294