Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1038424imm; Fri, 17 Aug 2018 10:45:08 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzKfRmb8R1pr9iYewDV+wf5jmqhT7aj7BP/CBRnRMUz05UbX048pbXvqSo0yUo5alljpCcp X-Received: by 2002:a17:902:7b83:: with SMTP id w3-v6mr34261197pll.192.1534527908224; Fri, 17 Aug 2018 10:45:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534527908; cv=none; d=google.com; s=arc-20160816; b=l7lhv3R82CBdh2UOHVA+qMhHf1/UPxQqvjA0rGtJ4HZgF0Brd+Qhv+J2wx+n6908Ax 7EwE744IW2CWN8s+Oi2pUFyHOQX4rcjrJiXZTb53TI/P8n7+XAhgRSMB2+H3rGUVNk/5 /n79tReAjugtFVJB5hJgVNBNQBYqW1KW60jdbR5m7hmFJH+JuS+Xv2ZxTpysaksLPhXH GCYMvCHfjLBFRgBQW5HHWGylFU3uUTSQ+sCZ5iRjzpLaQUyJfwz92ebiMvUQhMl19t3K oBMa6tHGWmNYopo9AO4SYsnkYvKO9LC5EeFM/J/3oqeMEyf1xpeKGgZWiFc9kwaxirDK p3OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:openpgp:from:references:cc:to:subject :arc-authentication-results; bh=qoov7QMrI9NcxOD2yrDEiMzQK93hOVNbP9uEwwbMuvk=; b=hL0an1V9OmcEC1+zwaf2yf9n4WKhld+L7vqYCfYy5/O3PmmLU10zvSm3IDiC1+Hkj7 qwSdvN7DdREBZAqsCAqKA6Dhn3Eg1R2VKZUPJGvnkFrb9VRAD0ChNQiI56/AXP6zLak8 Hkl76dDn4W7N+h//vlyx42+gDxEen+5rGZxs2fewnDwFHduc01D1bN+ecIl7Y8AAfqz2 bQJAxw/eosGnrSh7VTEUOuEq0vaFA7TOShGuQ1MVF8r1lMo5frD5k3iLgRvBC7lsbpUh tBehX+geTm49ENEkKLseAFbL9Yy8EK0zcm8LLd5LaxCxyUyVzV7mivbK+gJDUGBMoIlQ V2rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i17-v6si2613575pgl.296.2018.08.17.10.44.51; Fri, 17 Aug 2018 10:45:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728004AbeHQUm1 (ORCPT + 99 others); Fri, 17 Aug 2018 16:42:27 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]:33946 "EHLO vps-vb.mhejs.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727645AbeHQUm0 (ORCPT ); Fri, 17 Aug 2018 16:42:26 -0400 Received: by vps-vb.mhejs.net with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1fqihK-0007lO-F1; Fri, 17 Aug 2018 19:38:06 +0200 Subject: Re: [PATCH] cfq: clear queue pointers from cfqg after unpinning them in cfq_pd_offline To: Paolo Valente Cc: Jens Axboe , linux-block@vger.kernel.org, linux-kernel , Joseph Qi , Tejun Heo , jiufei.xue@linux.alibaba.com, Caspar Zhang References: <4FF39F18-108B-43BD-85A2-A09DB7755865@linaro.org> From: "Maciej S. Szmigiero" Openpgp: preference=signencrypt Autocrypt: addr=mail@maciej.szmigiero.name; prefer-encrypt=mutual; keydata= xsFNBFpGusUBEADXUMM2t7y9sHhI79+2QUnDdpauIBjZDukPZArwD+sDlx5P+jxaZ13XjUQc 6oJdk+jpvKiyzlbKqlDtw/Y2Ob24tg1g/zvkHn8AVUwX+ZWWewSZ0vcwp7u/LvA+w2nJbIL1 N0/QUUdmxfkWTHhNqgkNX5hEmYqhwUPozFR0zblfD/6+XFR7VM9yT0fZPLqYLNOmGfqAXlxY m8nWmi+lxkd/PYqQQwOq6GQwxjRFEvSc09m/YPYo9hxh7a6s8hAP88YOf2PD8oBB1r5E7KGb Fv10Qss4CU/3zaiyRTExWwOJnTQdzSbtnM3S8/ZO/sL0FY/b4VLtlZzERAraxHdnPn8GgxYk oPtAqoyf52RkCabL9dsXPWYQjkwG8WEUPScHDy8Uoo6imQujshG23A99iPuXcWc/5ld9mIo/ Ee7kN50MOXwS4vCJSv0cMkVhh77CmGUv5++E/rPcbXPLTPeRVy6SHgdDhIj7elmx2Lgo0cyh uyxyBKSuzPvb61nh5EKAGL7kPqflNw7LJkInzHqKHDNu57rVuCHEx4yxcKNB4pdE2SgyPxs9 9W7Cz0q2Hd7Yu8GOXvMfQfrBiEV4q4PzidUtV6sLqVq0RMK7LEi0RiZpthwxz0IUFwRw2KS/ 9Kgs9LmOXYimodrV0pMxpVqcyTepmDSoWzyXNP2NL1+GuQtaTQARAQABzTBNYWNpZWogUy4g U3ptaWdpZXJvIDxtYWlsQG1hY2llai5zem1pZ2llcm8ubmFtZT7CwZQEEwEIAD4WIQRyeg1N 257Z9gOb7O+Ef143kM4JdwUCWka6xQIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIX gAAKCRCEf143kM4Jdx4+EACwi1bXraGxNwgFj+KI8T0Xar3fYdaOF7bb7cAHllBCPQkutjnx 8SkYxqGvSNbBhGtpL1TqAYLB1Jr+ElB8qWEV6bJrffbRmsiBPORAxMfu8FF+kVqCYZs3nbku XNzmzp6R/eii40S+XySiscmpsrVQvz7I+xIIYdC0OTUu0Vl3IHf718GBYSD+TodCazEdN96k p9uD9kWNCU1vnL7FzhqClhPYLjPCkotrWM4gBNDbRiEHv1zMXb0/jVIR/wcDIUv6SLhzDIQn Lhre8LyKwid+WQxq7ZF0H+0VnPf5q56990cEBeB4xSyI+tr47uNP2K1kmW1FPd5q6XlIlvh2 WxsG6RNphbo8lIE6sd7NWSY3wXu4/R1AGdn2mnXKMp2O9039ewY6IhoeodCKN39ZR9LNld2w Dp0MU39LukPZKkVtbMEOEi0R1LXQAY0TQO//0IlAehfbkkYv6IAuNDd/exnj59GtwRfsXaVR Nw7XR/8bCvwU4svyRqI4luSuEiXvM9rwDAXbRKmu+Pk5h+1AOV+KjKPWCkBEHaASOxuApouQ aPZw6HDJ3fdFmN+m+vNcRPzST30QxGrXlS5GgY6CJ10W9gt/IJrFGoGxGxYjj4WzO97Rg6Mq WMa7wMPPNcnX5Nc/b8HW67Jhs3trj0szq6FKhqBsACktOU4g/ksV8eEtnM7AzQRaRrwiAQwA xnVmJqeP9VUTISps+WbyYFYlMFfIurl7tzK74bc67KUBp+PHuDP9p4ZcJUGC3UZJP85/GlUV dE1NairYWEJQUB7bpogTuzMI825QXIB9z842HwWfP2RW5eDtJMeujzJeFaUpmeTG9snzaYxY N3r0TDKj5dZwSIThIMQpsmhH2zylkT0jH7kBPxb8IkCQ1c6wgKITwoHFjTIO0B75U7bBNSDp XUaUDvd6T3xd1Fz57ujAvKHrZfWtaNSGwLmUYQAcFvrKDGPB5Z3ggkiTtkmW3OCQbnIxGJJw /+HefYhB5/kCcpKUQ2RYcYgCZ0/WcES1xU5dnNe4i0a5gsOFSOYCpNCfTHttVxKxZZTQ/rxj XwTuToXmTI4Nehn96t25DHZ0t9L9UEJ0yxH2y8Av4rtf75K2yAXFZa8dHnQgCkyjA/gs0ujG wD+Gs7dYQxP4i+rLhwBWD3mawJxLxY0vGwkG7k7npqanlsWlATHpOdqBMUiAR22hs02FikAo iXNgWTy7ABEBAAHCwXwEGAEIACYWIQRyeg1N257Z9gOb7O+Ef143kM4JdwUCWka8IgIbDAUJ A8JnAAAKCRCEf143kM4Jd9nXD/9jstJU6L1MLyr/ydKOnY48pSlZYgII9rSnFyLUHzNcW2c/ qw9LPMlDcK13tiVRQgKT4W+RvsET/tZCQcap2OF3Z6vd1naTur7oJvgvVM5lVhUia2O60kEZ XNlMLFwLSmGXhaAXNBySpzN2xStSLCtbK58r7Vf9QS0mR0PGU2v68Cb8fFWcYu2Yzn3RXf0Y dIVWvaQG9whxZq5MdJm5dknfTcCG+MtmbP/DnpQpjAlgVmDgMgYTBW1W9etU36YW0pTqEYuv 6cmRgSAKEDaYHhFLTR1+lLJkp5fFo3Sjm7XqmXzfSv9JGJGMKzoFOMBoLYv+VFnMoLX5UJAs 0JyFqFY2YxGyLd4J103NI/ocqQeU0TVvOZGVkENPSxIESnbxPghsEC0MWEbGsvqA8FwvU7Xf GhZPYzTRf7CndDnezEA69EhwpZXKs4CvxbXo5PDTv0OWzVaAWqq8s8aTMJWWAhvobFozJ63z afYHkuEjMo0Xps3o3uvKg7coooH521nNsv4ci+KeBq3mgMCRAy0g/Ef+Ql7mt900RCBHu4tk tOhPc3J1ep/e2WAJ4ngUqJhilzyCJnzVJ4cT79VK/uPtlfUCZdUz+jTC88TmP1p5wlucS31k Thy/CV4cqDFB8yzEujTSiRzd7neG3sH0vcxBd69uvSxLZPLGID840k0v5sftPA== Message-ID: <11f9f800-f7d8-288d-4b17-f026170d5eba@maciej.szmigiero.name> Date: Fri, 17 Aug 2018 19:38:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.0 MIME-Version: 1.0 In-Reply-To: <4FF39F18-108B-43BD-85A2-A09DB7755865@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17.08.2018 19:30, Paolo Valente wrote: > > >> Il giorno 17 ago 2018, alle ore 19:28, Maciej S. Szmigiero ha scritto: >> >> The current linux-block, 4.18 and 4.17 can reliably be crashed within few >> minutes by running the following bash snippet: >> >> mkfs.ext4 -v /dev/sda3 && mount /dev/sda3 /mnt/test/ -t ext4; >> while true; do >> mkdir /sys/fs/cgroup/unified/test/; >> echo $$ >/sys/fs/cgroup/unified/test/cgroup.procs; >> dd if=/dev/zero of=/mnt/test/test-$(( RANDOM * 10 / 32768 )) bs=1M count=1024 & >> echo $$ >/sys/fs/cgroup/unified/cgroup.procs; >> sleep 1; >> kill -KILL $!; wait $!; >> rmdir /sys/fs/cgroup/unified/test; >> done >> >> # cat /sys/block/sda/queue/scheduler >> noop [cfq] >> # cat /sys/block/sda/queue/rotational >> 1 >> # cat /sys/fs/cgroup/unified/cgroup.subtree_control >> cpu io memory pids >> >> The backtraces vary but often they are NULL pointer dereferences due to >> various cfqq fields being NULL. >> Or BUG_ON(cfqq->ref <= 0) in cfq_put_queue() triggers due to cfqq reference >> count being zero. >> >> Bisection points at >> commit 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"). >> The prime suspect looked like .pd_offline_fn() method being called multiple >> times, but from analyzing the mentioned commit this didn't seem possible >> and runtime trials have confirmed that. >> >> However, CFQ's cfq_pd_offline() implementation of the above method were >> leaving queue pointers intact in cfqg after unpinning them. >> After making sure that they are cleared to NULL in this function I can no >> longer reproduce the crash. >> > > By chance, did you check whether is BFQ is ok in this respect? I wasn't able to crash BFQ with the above test and in fact had run my machines on BFQ until I was able to find a fix for this in CFQ. Also, BFQ has a bit similar code in bfq_put_async_queues() that is called from bfq_pd_offline() that is already NULL-ing the passed pointer. > Thanks, > Paolo Regards, Maciej