Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932228AbWBSTU5 (ORCPT ); Sun, 19 Feb 2006 14:20:57 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932229AbWBSTU5 (ORCPT ); Sun, 19 Feb 2006 14:20:57 -0500 Received: from mailgw.aecom.yu.edu ([129.98.1.16]:29925 "EHLO mailgw.aecom.yu.edu") by vger.kernel.org with ESMTP id S932228AbWBSTU5 (ORCPT ); Sun, 19 Feb 2006 14:20:57 -0500 Mime-Version: 1.0 Message-Id: Date: Sun, 19 Feb 2006 14:20:54 -0500 To: drbd-user@linbit.com From: Maurice Volaski Subject: drbd-0.7.15 crashed doing a full sync under 2.6.13 Cc: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2764 Lines: 65 First posted to drbd mailing list on 2/6 with no reply. It hasn't happened since, but once is enough to mean something is wrong somewhere..... I did a resize of all the underlying LVMs and that proceeded very nicely since drbd automatically recognized the new sizes and resize2fs automatically figured out the new size, too. Anyway, bringing up the secondary automatically caused the primary to start a fully sync. The performance this time was very nice, averaging 50-55 MB per second across a total of 8 drbd resources, all syncing simultaneously. Each resource is an underlying logical volume and is assigned its own TCP port in drbd.conf. The primary has dual bonded gigabit in active load balancing mode and secondary is single gigabit. A total of 5600 GB has to be moved. I am estimating the resources were between 1/3 and 1/2 finished when the kernel panicked. The secondary, which is running the same kernel and drbd, did not panic. The drives in both systems are attached via LSI Logic SCSI cards and on the primary the driver is mptscsih; on the secondary it is sym53c8xx. Unfortunately, I never gotten an easy way to capture the text of a panic, and apparently attachments are excluded if the message exceeds 40K, so I typed out the whole thing: ne_endio+243} {scsi_io_completion+651} {sd_rw_intr+721} {scsi_device_unbusy+95} {scsi_softirq+258} {__do_softirq+113} {call_softirq+31} {do_softirq+53} {do_IRQ+79} {ret_from_intr+0} {:drbd:drbd_bm_clear_bit+497} {:drbd:drbd_bm_clear_bit+497} {:drbd:__drbd_set_in_sync+545} {:drbd:got_BlockAck+96} {:drbd:drbd_asender+1081} {:drbd:drbd_thread_setup+190} {child_rip+8} {:drbd:drbd_thread_setup+0> {child_rip+0} Code: 80 3b 00 7e f9 e9 89 fb ff ff e8 40 27 ef ff e9 01 fc ff ff console shuts up... <0>Kernel panic --not syncing:Aiee, killing interrupt handler! keywords: crash, freeze, freezing, hang, hung, panic, locked up, scsi, thread, threads, interrupt handler, bug, race condition. -- Maurice Volaski, mvolaski@aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/