Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935863Ab0GPHuM (ORCPT ); Fri, 16 Jul 2010 03:50:12 -0400 Received: from 80-69-81-65.colo.transip.net ([80.69.81.65]:51102 "EHLO ns1.emsolutions.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935756Ab0GPHuJ (ORCPT ); Fri, 16 Jul 2010 03:50:09 -0400 X-Greylist: delayed 750 seconds by postgrey-1.27 at vger.kernel.org; Fri, 16 Jul 2010 03:50:08 EDT Message-ID: <1275.77.248.79.78.1279265850.squirrel@ketsers.dhs.org> Date: Fri, 16 Jul 2010 09:37:30 +0200 (CEST) Subject: Re: mvsas still has problems with 2.6.34 From: "Caspar Smit" To: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, ayan@marvell.com, "andy yan" , "linux-raid" User-Agent: SquirrelMail/1.4.9a MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Priority: 3 (Normal) Importance: Normal References: <201007160010.58000.tfjellstrom@strangesoft.net> <201007160053.01673.tfjellstrom@strangesoft.net> <201007160123.27540.tfjellstrom@strangesoft.net> In-Reply-To: <201007160123.27540.tfjellstrom@strangesoft.net> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 18733 Lines: 387 Thomas, The patches you are using are the ones from november '09 i presume? Those patches still had a lot of SATA issues so I think they didn't make the kernel. The patches seemed to handle SAS disks just fine though. SATA disks was a whole different story. Srinivas Naga Venkatasatya Pasagadugula created a patch instead of Andy Yan's patches which seemed to handle SATA disks a lot better but still after some tests it had alot of problems. Srinivas Naga Venkatasatya Pasagadugula is now in the process of creating a new patch to fix the remaining issues. He told me it would take a long time to create those and it is now a few months ago since. I and others submitted extensive logging for him to check. As for production I could only advise this: Using SAS disks: Use stock 2.6.34 kernel + Andy Yan's patches Using SATA disks: DO NOT GO INTO PRODCUTION. Kind regards, Caspar Smit > On July 16, 2010, Thomas Fjellstrom wrote: >> On July 16, 2010, Thomas Fjellstrom wrote: >> > I've recently updated my server, and the mvsas driver included in 2.6.34.1 still causes my AOC-SASLP-MV8 card to completely lock up >> after >> > mdraid starts up on the devices. The machine is essentially in "production" so I can't do a heck of a lot of testing on it anymore. The mvsas driver I got from Andy Yan seems to be a little outdated, it >> > fails to compile due to a missing argument to sas_change_queue_depth, which I managed to fix, and I will try testing. I hope it works. >> It seems to work with the change I made. > > Sorry for the noise, I forgot to post the following in my last couple messages: > > It works, but I do get a kernel warning: > > Jul 16 00:38:05 boris kernel: [ 20.104295] ------------[ cut here ]------------ > Jul 16 00:38:05 boris kernel: [ 20.104315] WARNING: at > drivers/ata/libata-core.c:5216 ata_qc_issue+0x31b/0x330 [libata]() Jul 16 00:38:05 boris kernel: [ 20.104323] Hardware name: > GA-MA790FXT-UD5P > Jul 16 00:38:05 boris kernel: [ 20.104327] Modules linked in: > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss nouveau ttm snd_pcm drm_kms_helper snd_seq_midi k10temp drm > agpgart i2c_algo_bit snd_rawmidi snd_seq_midi_event i2c_piix4 i2c_core evdev edac_core edac_mce_amd tpm_tis snd_seq pcspkr tpm button tpm_bios wmi snd_timer snd_seq_device processor snd soundcore snd_page_alloc ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod sd_mod crc_t10dif cdrom ata_generic ohci_hcd ide_pci_generic ahci mvsas libsas libata atiixp scsi_transport_sas firewire_ohci firewire_core crc_itu_t thermal skge thermal_sys ide_core ehci_hcd r8169 mii usbcore scsi_mod nls_base [last unloaded: scsi_wait_scan] > Jul 16 00:38:05 boris kernel: [ 20.104448] Pid: 6091, comm: ata_id Not tainted 2.6.34.1 #2 > Jul 16 00:38:05 boris kernel: [ 20.104453] Call Trace: > Jul 16 00:38:05 boris kernel: [ 20.104462] [] ? warn_slowpath_common+0x73/0xb0 > Jul 16 00:38:05 boris kernel: [ 20.104472] [] ? ata_qc_issue+0x31b/0x330 [libata] > Jul 16 00:38:05 boris kernel: [ 20.104482] [] ? scsi_init_io+0x2f/0x190 [scsi_mod] > Jul 16 00:38:05 boris kernel: [ 20.104492] [] ? ata_scsi_pass_thru+0x0/0x2e0 [libata] > Jul 16 00:38:05 boris kernel: [ 20.104500] [] ? scsi_done+0x0/0x20 [scsi_mod] > Jul 16 00:38:05 boris kernel: [ 20.104509] [] ? ata_scsi_translate+0x9e/0x180 [libata] > Jul 16 00:38:05 boris kernel: [ 20.104517] [] ? scsi_done+0x0/0x20 [scsi_mod] > Jul 16 00:38:05 boris kernel: [ 20.104525] [] ? sas_queuecommand+0x9b/0x330 [libsas] > Jul 16 00:38:05 boris kernel: [ 20.104533] [] ? scsi_dispatch_cmd+0x17e/0x2b0 [scsi_mod] > Jul 16 00:38:05 boris kernel: [ 20.104542] [] ? scsi_request_fn+0x3e0/0x570 [scsi_mod] > Jul 16 00:38:05 boris kernel: [ 20.104549] [] ? del_timer+0x71/0xd0 > Jul 16 00:38:05 boris kernel: [ 20.104556] [] ? __blk_run_queue+0x63/0x130 > Jul 16 00:38:05 boris kernel: [ 20.104563] [] ? elv_insert+0x132/0x1f0 > Jul 16 00:38:05 boris kernel: [ 20.104570] [] ? blk_execute_rq_nowait+0x59/0xb0 > Jul 16 00:38:05 boris kernel: [ 20.104576] [] ? blk_execute_rq+0x72/0xe0 > Jul 16 00:38:05 boris kernel: [ 20.104582] [] ? blk_rq_map_user+0x1ab/0x290 > Jul 16 00:38:05 boris kernel: [ 20.104588] [] ? sg_io+0x241/0x3f0 > Jul 16 00:38:05 boris kernel: [ 20.104594] [] ? scsi_cmd_ioctl+0x45c/0x4b0 > Jul 16 00:38:05 boris kernel: [ 20.104601] [] ? __dentry_open+0x22f/0x340 > Jul 16 00:38:05 boris kernel: [ 20.104607] [] ? inode_permission+0x93/0xd0 > Jul 16 00:38:05 boris kernel: [ 20.104614] [] ? sd_ioctl+0xa4/0x120 [sd_mod] > Jul 16 00:38:05 boris kernel: [ 20.105009] [] ? __blkdev_driver_ioctl+0x98/0xe0 > Jul 16 00:38:05 boris kernel: [ 20.105410] [] ? blkdev_ioctl+0x1f5/0x7b0 > Jul 16 00:38:05 boris kernel: [ 20.105815] [] ? cp_new_stat+0xe0/0x100 > Jul 16 00:38:05 boris kernel: [ 20.106230] [] ? block_ioctl+0x37/0x40 > Jul 16 00:38:05 boris kernel: [ 20.106647] [] ? vfs_ioctl+0x35/0xd0 > Jul 16 00:38:05 boris kernel: [ 20.107064] [] ? do_vfs_ioctl+0x88/0x560 > Jul 16 00:38:05 boris kernel: [ 20.107490] [] ? sys_newfstat+0x2e/0x50 > Jul 16 00:38:05 boris kernel: [ 20.107919] [] ? sys_ioctl+0x80/0xa0 > Jul 16 00:38:05 boris kernel: [ 20.108003] [] ? system_call_fastpath+0x16/0x1b > Jul 16 00:38:05 boris kernel: [ 20.108003] ---[ end trace > e8ea9c22d6b28439 ]--- > > Other than this stack trace, it seems to work fine. > >> > At some point though I really hope this gets fixed. I'm still willing to help test any new versions, just that I can't keep my box down for an extended period. >> > >> > Thanks. > > I forgot to post, but here are the kernel messages I get when trying to use the kernel's included mvsas driver: > > Jul 15 22:42:41 boris kernel: [ 208.816129] sd 0:0:3:0: [sdf] Unhandled error code > Jul 15 22:42:41 boris kernel: [ 208.816809] sd 0:0:3:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT > Jul 15 22:42:41 boris kernel: [ 208.817470] sd 0:0:3:0: [sdf] CDB: Read(10): 28 00 3a 45 c1 08 00 04 00 00 > Jul 15 22:42:41 boris kernel: [ 208.818853] sd 0:0:1:0: [sdd] Unhandled error code > Jul 15 22:42:41 boris kernel: [ 208.819508] sd 0:0:1:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT > Jul 15 22:42:41 boris kernel: [ 208.820179] sd 0:0:1:0: [sdd] CDB: Read(10): 28 00 3a 45 be 58 00 02 b0 00 > Jul 15 22:42:41 boris kernel: [ 208.821558] sd 0:0:2:0: [sde] Unhandled error code > Jul 15 22:42:41 boris kernel: [ 208.822201] sd 0:0:2:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT > Jul 15 22:42:41 boris kernel: [ 208.822836] sd 0:0:2:0: [sde] CDB: Read(10): 28 00 3a 45 c1 08 00 04 00 00 > Jul 15 22:42:41 boris kernel: [ 208.824157] sd 0:0:4:0: [sdg] Unhandled error code > Jul 15 22:42:41 boris kernel: [ 208.824784] sd 0:0:4:0: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT > Jul 15 22:42:41 boris kernel: [ 208.825407] sd 0:0:4:0: [sdg] CDB: Read(10): 28 00 3a 45 c1 08 00 04 00 00 > Jul 15 22:43:13 boris kernel: [ 240.737334] md1_raid5 D > 0000000000000001 0 6120 2 0x00000000 > Jul 15 22:43:13 boris kernel: [ 240.737948] ffff88012c94c420 > 0000000000000046 ffff880100000000 ffff88012f65b680 > Jul 15 22:43:13 boris kernel: [ 240.738570] 00000000000134c0 > ffff88012e6effd8 00000000000134c0 ffff88012c94c420 > Jul 15 22:43:13 boris kernel: [ 240.739196] ffff88012e6effd8 > ffff88012e6effd8 00000000000134c0 00000000000134c0 > Jul 15 22:43:13 boris kernel: [ 240.739821] Call Trace: > Jul 15 22:43:13 boris kernel: [ 240.740458] [] ? md_super_wait+0xae/0xd0 [md_mod] > Jul 15 22:43:13 boris kernel: [ 240.741100] [] ? autoremove_wake_function+0x0/0x30 > Jul 15 22:43:13 boris kernel: [ 240.741729] [] ? md_update_sb+0x268/0x3d0 [md_mod] > Jul 15 22:43:13 boris kernel: [ 240.742361] [] ? md_check_recovery+0x232/0x520 [md_mod] > Jul 15 22:43:13 boris kernel: [ 240.742982] [] ? raid5d+0x23/0x4f0 [raid456] > Jul 15 22:43:13 boris kernel: [ 240.743602] [] ? schedule_timeout+0x23d/0x310 > Jul 15 22:43:13 boris kernel: [ 240.744221] [] ? finish_task_switch+0x34/0xb0 > Jul 15 22:43:13 boris kernel: [ 240.744861] [] ? md_thread+0x53/0x120 [md_mod] > Jul 15 22:43:13 boris kernel: [ 240.745489] [] ? autoremove_wake_function+0x0/0x30 > Jul 15 22:43:13 boris kernel: [ 240.746121] [] ? md_thread+0x0/0x120 [md_mod] > Jul 15 22:43:13 boris kernel: [ 240.746743] [] ? kthread+0x8e/0xa0 > Jul 15 22:43:13 boris kernel: [ 240.747367] [] ? kernel_thread_helper+0x4/0x10 > Jul 15 22:43:13 boris kernel: [ 240.748000] [] ? kthread+0x0/0xa0 > Jul 15 22:43:13 boris kernel: [ 240.748639] [] ? kernel_thread_helper+0x0/0x10 > Jul 15 22:43:13 boris kernel: [ 240.750521] mount D > 0000000000000001 0 6405 6403 0x00000000 > Jul 15 22:43:13 boris kernel: [ 240.751158] ffff88012eb8f3d0 > 0000000000000082 ffff88012e50c600 ffff88012f65d1c0 > Jul 15 22:43:13 boris kernel: [ 240.751805] 00000000000134c0 > ffff88012dc0bfd8 00000000000134c0 ffff88012eb8f3d0 > Jul 15 22:43:13 boris kernel: [ 240.752452] ffff88012dc0bfd8 > ffff88012dc0bfd8 00000000000134c0 00000000000134c0 > Jul 15 22:43:13 boris kernel: [ 240.753108] Call Trace: > Jul 15 22:43:13 boris kernel: [ 240.753761] [] ? scsi_done+0x0/0x20 [scsi_mod] > Jul 15 22:43:13 boris kernel: [ 240.754409] [] ? schedule_timeout+0x23d/0x310 > Jul 15 22:43:13 boris kernel: [ 240.755053] [] ? blk_peek_request+0x127/0x1e0 > Jul 15 22:43:13 boris kernel: [ 240.755708] [] ? scsi_dispatch_cmd+0x18d/0x2b0 [scsi_mod] > Jul 15 22:43:13 boris kernel: [ 240.756358] [] ? wait_for_common+0xd2/0x180 > Jul 15 22:43:13 boris kernel: [ 240.757023] [] ? default_wake_function+0x0/0x20 > Jul 15 22:43:13 boris kernel: [ 240.757672] [] ? unplug_slaves+0x86/0xc0 [raid456] > Jul 15 22:43:13 boris kernel: [ 240.758363] [] ? xlog_bread_noalign+0xbd/0xf0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.759046] [] ? xfs_buf_iowait+0x40/0xf0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.759730] [] ? xlog_bread_noalign+0xbd/0xf0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.760423] [] ? xlog_bread+0x35/0x80 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.761124] [] ? xlog_find_verify_cycle+0xbf/0x170 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.761813] [] ? xlog_find_head+0x168/0x3a0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.762495] [] ? xlog_find_tail+0x27/0x3d0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.763178] [] ? xlog_recover+0x15/0x90 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.763858] [] ? xfs_log_mount+0x134/0x170 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.764528] [] ? xfs_mountfs+0x38f/0x720 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.765214] [] ? kmem_alloc+0x7b/0xc0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.765888] [] ? kmem_zalloc+0x2b/0x40 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.766559] [] ? xfs_fs_fill_super+0x225/0x3b0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.767203] [] ? get_sb_bdev+0x1a3/0x1e0 > Jul 15 22:43:13 boris kernel: [ 240.767877] [] ? xfs_fs_fill_super+0x0/0x3b0 [xfs] > Jul 15 22:43:13 boris kernel: [ 240.768533] [] ? vfs_kern_mount+0x83/0x1f0 > Jul 15 22:43:13 boris kernel: [ 240.769174] [] ? do_kern_mount+0x53/0x120 > Jul 15 22:43:13 boris kernel: [ 240.769806] [] ? do_mount+0x28a/0x8a0 > Jul 15 22:43:13 boris kernel: [ 240.770441] [] ? copy_mount_options+0xe0/0x180 > Jul 15 22:43:13 boris kernel: [ 240.771073] [] ? sys_mount+0x9a/0xf0 > Jul 15 22:43:13 boris kernel: [ 240.771695] [] ? system_call_fastpath+0x16/0x1b > Jul 15 22:45:13 boris kernel: [ 360.769363] md1_raid5 D > 0000000000000001 0 6120 2 0x00000000 > Jul 15 22:45:13 boris kernel: [ 360.770006] ffff88012c94c420 > 0000000000000046 ffff880100000000 ffff88012f65b680 > Jul 15 22:45:13 boris kernel: [ 360.770648] 00000000000134c0 > ffff88012e6effd8 00000000000134c0 ffff88012c94c420 > Jul 15 22:45:13 boris kernel: [ 360.771298] ffff88012e6effd8 > ffff88012e6effd8 00000000000134c0 00000000000134c0 > Jul 15 22:45:13 boris kernel: [ 360.771946] Call Trace: > Jul 15 22:45:13 boris kernel: [ 360.772620] [] ? md_super_wait+0xae/0xd0 [md_mod] > Jul 15 22:45:13 boris kernel: [ 360.773265] [] ? autoremove_wake_function+0x0/0x30 > Jul 15 22:45:13 boris kernel: [ 360.773911] [] ? md_update_sb+0x268/0x3d0 [md_mod] > Jul 15 22:45:13 boris kernel: [ 360.774550] [] ? md_check_recovery+0x232/0x520 [md_mod] > Jul 15 22:45:13 boris kernel: [ 360.775180] [] ? raid5d+0x23/0x4f0 [raid456] > Jul 15 22:45:13 boris kernel: [ 360.775804] [] ? schedule_timeout+0x23d/0x310 > Jul 15 22:45:13 boris kernel: [ 360.776424] [] ? finish_task_switch+0x34/0xb0 > Jul 15 22:45:13 boris kernel: [ 360.777064] [] ? md_thread+0x53/0x120 [md_mod] > Jul 15 22:45:13 boris kernel: [ 360.777679] [] ? autoremove_wake_function+0x0/0x30 > Jul 15 22:45:13 boris kernel: [ 360.778302] [] ? md_thread+0x0/0x120 [md_mod] > Jul 15 22:45:13 boris kernel: [ 360.778919] [] ? kthread+0x8e/0xa0 > Jul 15 22:45:13 boris kernel: [ 360.779534] [] ? kernel_thread_helper+0x4/0x10 > Jul 15 22:45:13 boris kernel: [ 360.780148] [] ? kthread+0x0/0xa0 > Jul 15 22:45:13 boris kernel: [ 360.780776] [] ? kernel_thread_helper+0x0/0x10 > Jul 15 22:45:13 boris kernel: [ 360.782623] mount D > 0000000000000001 0 6405 6403 0x00000000 > Jul 15 22:45:13 boris kernel: [ 360.783248] ffff88012eb8f3d0 > 0000000000000082 ffff88012e50c600 ffff88012f65d1c0 > Jul 15 22:45:13 boris kernel: [ 360.783883] 00000000000134c0 > ffff88012dc0bfd8 00000000000134c0 ffff88012eb8f3d0 > Jul 15 22:45:13 boris kernel: [ 360.784536] ffff88012dc0bfd8 > ffff88012dc0bfd8 00000000000134c0 00000000000134c0 > Jul 15 22:45:13 boris kernel: [ 360.785184] Call Trace: > Jul 15 22:45:13 boris kernel: [ 360.785829] [] ? scsi_done+0x0/0x20 [scsi_mod] > Jul 15 22:45:13 boris kernel: [ 360.786465] [] ? schedule_timeout+0x23d/0x310 > Jul 15 22:45:13 boris kernel: [ 360.787098] [] ? blk_peek_request+0x127/0x1e0 > Jul 15 22:45:13 boris kernel: [ 360.787740] [] ? scsi_dispatch_cmd+0x18d/0x2b0 [scsi_mod] > Jul 15 22:45:13 boris kernel: [ 360.788361] [] ? wait_for_common+0xd2/0x180 > Jul 15 22:45:13 boris kernel: [ 360.788988] [] ? default_wake_function+0x0/0x20 > Jul 15 22:45:13 boris kernel: [ 360.789612] [] ? unplug_slaves+0x86/0xc0 [raid456] > Jul 15 22:45:13 boris kernel: [ 360.790277] [] ? xlog_bread_noalign+0xbd/0xf0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.790933] [] ? xfs_buf_iowait+0x40/0xf0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.791597] [] ? xlog_bread_noalign+0xbd/0xf0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.792258] [] ? xlog_bread+0x35/0x80 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.792935] [] ? xlog_find_verify_cycle+0xbf/0x170 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.793598] [] ? xlog_find_head+0x168/0x3a0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.794258] [] ? xlog_find_tail+0x27/0x3d0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.794910] [] ? xlog_recover+0x15/0x90 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.795565] [] ? xfs_log_mount+0x134/0x170 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.796216] [] ? xfs_mountfs+0x38f/0x720 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.796879] [] ? kmem_alloc+0x7b/0xc0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.797527] [] ? kmem_zalloc+0x2b/0x40 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.798171] [] ? xfs_fs_fill_super+0x225/0x3b0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.798785] [] ? get_sb_bdev+0x1a3/0x1e0 > Jul 15 22:45:13 boris kernel: [ 360.799429] [] ? xfs_fs_fill_super+0x0/0x3b0 [xfs] > Jul 15 22:45:13 boris kernel: [ 360.800046] [] ? vfs_kern_mount+0x83/0x1f0 > Jul 15 22:45:13 boris kernel: [ 360.800678] [] ? do_kern_mount+0x53/0x120 > Jul 15 22:45:13 boris kernel: [ 360.801292] [] ? do_mount+0x28a/0x8a0 > Jul 15 22:45:13 boris kernel: [ 360.801910] [] ? copy_mount_options+0xe0/0x180 > Jul 15 22:45:13 boris kernel: [ 360.802531] [] ? sys_mount+0x9a/0xf0 > Jul 15 22:45:13 boris kernel: [ 360.803152] [] ? system_call_fastpath+0x16/0x1b > > I'm pretty sure most of that is due to the driver not responding for 4 of > the drives (the first few messages) > > Thanks again. > > -- > Thomas Fjellstrom > tfjellstrom@strangesoft.net > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/