Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755923AbXKFAUE (ORCPT ); Mon, 5 Nov 2007 19:20:04 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754453AbXKFATv (ORCPT ); Mon, 5 Nov 2007 19:19:51 -0500 Received: from wa-out-1112.google.com ([209.85.146.180]:47358 "EHLO wa-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754149AbXKFATu (ORCPT ); Mon, 5 Nov 2007 19:19:50 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=WWAywa2BvzPkeZa4SeY3jiiBbJVi/UdtGI9oYYnsnCaWXdyNHnbKJRFCVlBtEqkCYxPO4d/+s0fcNUffRB/pam70B9qKN5ccx7jdbEz5YJDJ9Ecton4SSXLgS82WgdlxUVMXwJ4KfFUVGQVSk6M6jZgaP85oxv1gImwIJJAvoLM= Message-ID: Date: Mon, 5 Nov 2007 17:19:49 -0700 From: "Dan Williams" To: "Justin Piszcz" , "=?ISO-8859-1?Q?BERTRAND_Jo=EBl?=" Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state Cc: "Neil Brown" , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_29572_25767903.1194308389630" References: <18222.16003.92062.970530@notabene.brown> X-Google-Sender-Auth: 5eb36f2fb820b55c Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4090 Lines: 77 ------=_Part_29572_25767903.1194308389630 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On 11/5/07, Justin Piszcz wrote: [..] > > Are you seeing the same "md thread takes 100% of the CPU" that Jo=EBl i= s > > reporting? > > > > Yes, in another e-mail I posted the top output with md3_raid5 at 100%. > This seems too similar to Jo=EBl's situation for them not to be correlated, and it shows that iscsi is not a necessary component of the failure. The attached patch allows the debug statements in MD to be enabled via sysfs. Jo=EBl, since it is easier for you to reproduce can you capture the kernel log output after the raid thread goes into the spin? It will help if you have CONFIG_PRINTK_TIME=3Dy set in your kernel configuration. After the failure run: echo 1 > /sys/block/md_d0/md/debug_print_enable; sleep 5; echo 0 > /sys/block/md_d0/md/debug_print_enable ...to enable the print messages for a few seconds. Please send the output in a private message if it proves too big for the mailing list. ------=_Part_29572_25767903.1194308389630 Content-Type: application/octet-stream; name=raid5-debug-print-enable.patch Content-Transfer-Encoding: base64 X-Attachment-Id: f_f8no5bvd Content-Disposition: attachment; filename=raid5-debug-print-enable.patch cmFpZDU6IGRlYnVnIHByaW50IGVuYWJsZQoKRnJvbTogRGFuIFdpbGxpYW1zIDxkYW4uai53aWxs aWFtc0BpbnRlbC5jb20+CgoKLS0tCgogZHJpdmVycy9tZC9yYWlkNS5jIHwgICAzNiArKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysKIDEgZmlsZXMgY2hhbmdlZCwgMzYgaW5zZXJ0 aW9ucygrKSwgMCBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9kcml2ZXJzL21kL3JhaWQ1LmMg Yi9kcml2ZXJzL21kL3JhaWQ1LmMKaW5kZXggMzgwOGY1Mi4uNDk2YjlhMyAxMDA2NDQKLS0tIGEv ZHJpdmVycy9tZC9yYWlkNS5jCisrKyBiL2RyaXZlcnMvbWQvcmFpZDUuYwpAQCAtNTQsNiArNTQs MTAgQEAKICNpbmNsdWRlIDxsaW51eC9yYWlkL2JpdG1hcC5oPgogI2luY2x1ZGUgPGxpbnV4L2Fz eW5jX3R4Lmg+CiAKK3N0YXRpYyBpbnQgZGVidWdfcHJpbnRfZW5hYmxlOworI3VuZGVmIHByX2Rl YnVnCisjZGVmaW5lIHByX2RlYnVnKHguLi4pICgodm9pZCkoZGVidWdfcHJpbnRfZW5hYmxlICYm IHByaW50ayh4KSkpIAorCiAvKgogICogU3RyaXBlIGNhY2hlCiAgKi8KQEAgLTQwMjMsNiArNDAy NywzNyBAQCByYWlkNV9zdHJpcGVjYWNoZV9zaXplID0gX19BVFRSKHN0cmlwZV9jYWNoZV9zaXpl LCBTX0lSVUdPIHwgU19JV1VTUiwKIAkJCQlyYWlkNV9zdG9yZV9zdHJpcGVfY2FjaGVfc2l6ZSk7 CiAKIHN0YXRpYyBzc2l6ZV90CityYWlkNV9zaG93X2RlYnVnX3ByaW50X2VuYWJsZShtZGRldl90 ICptZGRldiwgY2hhciAqcGFnZSkKK3sKKwlyZXR1cm4gc3ByaW50ZihwYWdlLCAiJWRcbiIsIGRl YnVnX3ByaW50X2VuYWJsZSk7Cit9CisKK3N0YXRpYyBzc2l6ZV90CityYWlkNV9zdG9yZV9kZWJ1 Z19wcmludF9lbmFibGUobWRkZXZfdCAqbWRkZXYsIGNvbnN0IGNoYXIgKnBhZ2UsIHNpemVfdCBs ZW4pCit7CisJcmFpZDVfY29uZl90ICpjb25mID0gbWRkZXZfdG9fY29uZihtZGRldik7CisJY2hh ciAqZW5kOworCWludCBuZXc7CisJaWYgKGxlbiA+PSBQQUdFX1NJWkUpCisJCXJldHVybiAtRUlO VkFMOworCisJbmV3ID0gc2ltcGxlX3N0cnRvdWwocGFnZSwgJmVuZCwgMTApOworCWlmICghKnBh Z2UgfHwgKCplbmQgJiYgKmVuZCAhPSAnXG4nKSApCisJCXJldHVybiAtRUlOVkFMOworCWlmIChu ZXcgPCAwIHx8IG5ldyA+IDEpCisJCXJldHVybiAtRUlOVkFMOworCisJZGVidWdfcHJpbnRfZW5h YmxlID0gbmV3OworCQorCXJldHVybiBsZW47Cit9CisKK3N0YXRpYyBzdHJ1Y3QgbWRfc3lzZnNf ZW50cnkKK3JhaWQ1X2RlYnVnX3ByaW50ID0gX19BVFRSKGRlYnVnX3ByaW50X2VuYWJsZSwgU19J UlVHTyB8IFNfSVdVU1IsCisJCQkJcmFpZDVfc2hvd19kZWJ1Z19wcmludF9lbmFibGUsCisJCQkJ cmFpZDVfc3RvcmVfZGVidWdfcHJpbnRfZW5hYmxlKTsKKworc3RhdGljIHNzaXplX3QKIHN0cmlw ZV9jYWNoZV9hY3RpdmVfc2hvdyhtZGRldl90ICptZGRldiwgY2hhciAqcGFnZSkKIHsKIAlyYWlk NV9jb25mX3QgKmNvbmYgPSBtZGRldl90b19jb25mKG1kZGV2KTsKQEAgLTQwMzgsNiArNDA3Myw3 IEBAIHJhaWQ1X3N0cmlwZWNhY2hlX2FjdGl2ZSA9IF9fQVRUUl9STyhzdHJpcGVfY2FjaGVfYWN0 aXZlKTsKIHN0YXRpYyBzdHJ1Y3QgYXR0cmlidXRlICpyYWlkNV9hdHRyc1tdID0gIHsKIAkmcmFp ZDVfc3RyaXBlY2FjaGVfc2l6ZS5hdHRyLAogCSZyYWlkNV9zdHJpcGVjYWNoZV9hY3RpdmUuYXR0 ciwKKwkmcmFpZDVfZGVidWdfcHJpbnQuYXR0ciwKIAlOVUxMLAogfTsKIHN0YXRpYyBzdHJ1Y3Qg YXR0cmlidXRlX2dyb3VwIHJhaWQ1X2F0dHJzX2dyb3VwID0gewo= ------=_Part_29572_25767903.1194308389630-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/