Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756569Ab1BJPvM (ORCPT ); Thu, 10 Feb 2011 10:51:12 -0500 Received: from daytona.panasas.com ([67.152.220.89]:54383 "EHLO daytona.panasas.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756361Ab1BJPvL (ORCPT ); Thu, 10 Feb 2011 10:51:11 -0500 Message-ID: <4D540969.9090507@panasas.com> Date: Thu, 10 Feb 2011 17:51:05 +0200 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10 MIME-Version: 1.0 To: Dan Williams , linux-kernel , linux-fsdevel , NeilBrown , uml-devel Subject: Re: Regression with calibrate_xor_blocks, probably UML related References: <4D52E4E1.7070705@panasas.com> In-Reply-To: <4D52E4E1.7070705@panasas.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 10 Feb 2011 15:51:09.0814 (UTC) FILETIME=[55FD4960:01CBC93A] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7076 Lines: 180 On 02/09/2011 09:02 PM, Boaz Harrosh wrote: > I have a new module that uses the async_tx.h lib. > > On an exact same module code based on 3.6.37 I see the: > xor: measuring software checksum speed > 8regs : 11312.000 MB/sec > 8regs_prefetch: 9792.800 MB/sec > 32regs : 11220.400 MB/sec > 32regs_prefetch: 9750.800 MB/sec > xor: using function: 8regs (11312.000 MB/sec) > > And all is well. But on code based on 2.6.38-rc4 I get hard stuck > right after: > xor: measuring software checksum speed > OK this is not dependent on Kernel version it is the same for both .38-rc4 and .37. I was just lucky with .37 more. And the same things happen with raid456 module. I do []$ modprobe raid456; modprobe --remove raid456 A few times it loads, printing the above checks, Then At one time it freezes. Sometimes at first attempt sometimes at 4-7 attempts. I never went 10 times strait. When it freezes (hard) I can see in my host that the UML is at 100% CPU. BTW: when I manage to pass the tests I get the above numbers But when I load directly on the host I get: xor: automatically using best checksumming function: generic_sse generic_sse: 7596.000 MB/sec xor: using function: generic_sse (7596.000 MB/sec) raid6: int64x1 1660 MB/s raid6: int64x2 1832 MB/s raid6: int64x4 1566 MB/s raid6: int64x8 1175 MB/s raid6: sse2x1 3699 MB/s raid6: sse2x2 4398 MB/s raid6: sse2x4 5863 MB/s raid6: using algorithm sse2x4 (5863 MB/s) and on the UML: raid6: int64x1 2019 MB/s raid6: int64x2 2208 MB/s raid6: int64x4 1892 MB/s raid6: int64x8 1528 MB/s raid6: using algorithm int64x2 (2208 MB/s) xor: measuring software checksum speed 8regs : 11308.000 MB/sec 8regs_prefetch: 9795.600 MB/sec 32regs : 11236.000 MB/sec 32regs_prefetch: 9752.400 MB/sec xor: using function: 8regs (11308.000 MB/sec) So the raid6 sse is better, but comparing it64xX the UML is faster than host But raid5? that's 33% better results. Does that say that UML's clock has a bug? Any way I'm trying to debug that xor.ko loading problem see what comes up. Any help is welcome Thanks Boaz > the UML is completely frozen. When I kill the uml from the host > I can sometimes get this trace. > > 750c7498: [<6005f936>] bad_page+0xd8/0xf3 > 750c74c8: [<60060c93>] get_page_from_freelist+0x333/0x47b > 750c7508: [<60131243>] put_dec+0x20/0x3c > 750c75a0: [<6001a0ac>] change_pre_exec+0x0/0x24 > 750c75b8: [<60060ef1>] __alloc_pages_nodemask+0x116/0x65b > 750c7668: [<60132e25>] sprintf+0xa1/0xa3 > 750c76a0: [<6001a0ac>] change_pre_exec+0x0/0x24 > 750c76b8: [<60061446>] __get_free_pages+0x10/0x43 > 750c76c8: [<60012875>] alloc_stack+0x1b/0x1d > 750c76d8: [<6001fe27>] run_helper+0x26/0x1b5 > 750c76e8: [<60021553>] set_signals+0x1c/0x2e > 750c7708: [<6007efac>] __kmalloc+0x9e/0xc4 > 750c7748: [<6001a544>] change+0x124/0x189 > 750c77e8: [<601b77db>] _raw_spin_unlock+0x9/0xb > 750c7818: [<6001a5a9>] close_addr+0x0/0x1c > 750c7828: [<6001a5c3>] close_addr+0x1a/0x1c > 750c7838: [<6001926a>] iter_addresses+0x5f/0x76 > 750c7858: [<6007e8e8>] kfree+0x92/0x9b > 750c7898: [<60022d01>] tuntap_close+0x24/0x38 > 750c78b8: [<600194e4>] close_devices+0x4a/0x7f > 750c78d8: [<600121bf>] do_uml_exitcalls+0x12/0x23 > 750c78f8: [<60012cd2>] uml_cleanup+0x1a/0x87 > 750c7928: [<6002039b>] last_ditch_exit+0x9/0x16 > 750c79e8: [<78817031>] xor_8regs_2+0x31/0x58 [xor] > 750c7a18: [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] > 750c7aa8: [<601b77ce>] _raw_spin_unlock_irqrestore+0x18/0x1c > 750c7ac8: [<60029d8d>] try_to_wake_up+0x86/0x98 > 750c7d78: [<601b548d>] printk+0xa0/0xa3 > 750c7e08: [<78817633>] do_xor_speed+0x54/0xaf [xor] > 750c7e20: [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] > 750c7e58: [<7881b057>] calibrate_xor_blocks+0x57/0xdf [xor] > 750c7e68: [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] > 750c7e78: [<6001105a>] do_one_initcall+0x76/0x121 > 750c7eb8: [<600563fd>] sys_init_module+0x78/0x1a6 > 750c7ee8: [<60014d60>] handle_syscall+0x58/0x70 > 750c7f08: [<60024163>] userspace+0x2dd/0x38a > 750c7fc8: [<600126af>] fork_handler+0x62/0x69 > > (gdb) list *(xor_8regs_2+0x31) > 0x55 is in xor_8regs_2 (/usr0/export/dev/bharrosh/git/pub/scsi-misc/include/asm-generic/xor.h:29). > 24 p1[0] ^= p2[0]; > 25 p1[1] ^= p2[1]; > 26 p1[2] ^= p2[2]; > 27 p1[3] ^= p2[3]; > 28 p1[4] ^= p2[4]; > 29 p1[5] ^= p2[5]; > 30 p1[6] ^= p2[6]; > 31 p1[7] ^= p2[7]; > 32 p1 += 8; > 33 p2 += 8; > (gdb) list *(calibrate_xor_blocks+0x0) > 0xd52 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:101). > 96 speed / 1000, speed % 1000); > 97 } > 98 > 99 static int __init > 100 calibrate_xor_blocks(void) > 101 { > 102 void *b1, *b2; > 103 struct xor_block_template *f, *fastest; > 104 > 105 /* > (gdb) list *(do_xor_speed+0x54) > 0x657 is in do_xor_speed (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:84). > 79 now = jiffies; > 80 count = 0; > 81 while (jiffies == now) { > 82 mb(); /* prevent loop optimzation */ > 83 tmpl->do_2(BENCH_SIZE, b1, b2); > 84 mb(); > 85 count++; > 86 mb(); > 87 } > 88 if (count > max) > (gdb) list *(calibrate_xor_blocks+0x57) > 0xda9 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:137). > 132 "checksumming function: %s\n", > 133 fastest->name); > 134 xor_speed(fastest); > 135 } else { > 136 printk(KERN_INFO "xor: measuring software checksum speed\n"); > 137 XOR_TRY_TEMPLATES; > 138 fastest = template_list; > 139 for (f = fastest; f; f = f->next) > 140 if (f->speed > fastest->speed) > 141 fastest = f; > (gdb) q > > So it looks like the code in UML links the include/asm-generic/xor.h and that it gets > stuck. Any thing changed in this area in last merge window? > > Before I start the very difficult bisect? > > Thanks for any tips > Boaz > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/