Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758392AbXKGVxQ (ORCPT ); Wed, 7 Nov 2007 16:53:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755154AbXKGVxC (ORCPT ); Wed, 7 Nov 2007 16:53:02 -0500 Received: from ic0245.upco.es ([130.206.70.245]:41632 "EHLO antispam.upcomillas.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755150AbXKGVxA (ORCPT ); Wed, 7 Nov 2007 16:53:00 -0500 Subject: Re: 2.6.34-rc1 eat my photo SD card :-( From: Romano Giannetti To: Willy Tarreau Cc: Pierre Ossman , Roland Dreier , linux-kernel@vger.kernel.org, jens.axboe@oracle.com In-Reply-To: <1194387459.5205.2.camel@rukbat> References: <1193918202.8439.7.camel@rukbat> <20071102182833.2c055446@poseidon.drzeus.cx> <1194168583.10245.15.camel@rukbat> <1194259886.6927.9.camel@localhost> <20071105132218.495f244e@poseidon.drzeus.cx> <1194270393.27789.2.camel@localhost> <20071105162633.09e54290@poseidon.drzeus.cx> <1194343121.6953.6.camel@rukbat> <20071106195148.GD1045@1wt.eu> <1194385693.12938.5.camel@rukbat> <1194387459.5205.2.camel@rukbat> Content-Type: text/plain; charset=ISO-8859-1 Date: Wed, 07 Nov 2007 22:52:54 +0100 Message-Id: <1194472374.7176.4.camel@rukbat> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7507 Lines: 116 On Tue, 2007-11-06 at 23:17 +0100, Romano Giannetti wrote: > Well, I started bisecting it. It will be a long shot, I suspect... Well, I spent the last 36 hours (more or less) trying to bisect the SD problem. The method I used was to insert the card, umount it, and make 8 dd in a row; the kernel is "bad" if they differs, "good" if they are the same. I could not finish the bisect. The last pair good/bad were: bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] [BLOCK] blk_rq_map_sg: force clear termination bit good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] exportfs: update documentation The problem to conclude the bisect is that there is a whole series of commits, named [SG] something, that seems to matter; but my three try of a commit between the previous two ended with a MMC layer not working with this oops: [ 81.738991] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 81.739003] printing eip: c01db437 *pde = 00000000 [ 81.739010] Oops: 0000 [#1] SMP [ 81.739016] Modules linked in: mmc_block binfmt_misc rfcomm l2cap bluetooth ppdev i915 drm acpi_cpufreq cpufreq_conservative cpufreq_stats cpufreq_ondemand freq_table cpufreq_userspace cpufreq_powersave dock container sbs sbshc af_packet nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs dm_crypt dm_mod sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss iTCO_wdt iTCO_vendor_support serio_raw sdhci snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse pcspkr mmc_core snd_seq snd_timer snd_seq_device snd soundcore video output battery snd_page_alloc ac button intel_agp agpgart evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_piix ehci_hcd ata_generic ohci1394 uhci_hcd ieee1394 libata scsi_mod generic usbcore r8169 thermal processor fan [ 81.739122] [ 81.739127] Pid: 6075, comm: mmcqd Not tainted (2.6.23-bisect #19) [ 81.739132] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 [ 81.739141] EIP is at blk_rq_map_sg+0xd7/0x190 [ 81.739145] EAX: 03619000 EBX: 00000000 ECX: c3464198 EDX: c3464698 [ 81.739150] ESI: 0361a000 EDI: 00001000 EBP: cb82fe24 ESP: cb82fdec [ 81.739154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 81.739159] Process mmcqd (pid: 6075, ti=cb82e000 task=cb2a5550 task.ti=cb82e000) [ 81.739163] Stack: 00000292 c366c530 cb839a70 00002000 0361b000 c3464698 00000001 00000001 [ 81.739176] 00000000 c34e0848 01ae4698 c33ef2b0 c33ef2b0 cb2ec870 cb82fe3c f8e81e6c [ 81.739188] 00200200 c3342580 c33ef2b0 cb2ec870 cb82ffb8 f8e816f9 7898775f 5f6f5965 [ 81.739200] Call Trace: [ 81.739204] [] show_trace_log_lvl+0x1a/0x30 [ 81.739213] [] show_stack_log_lvl+0xb1/0xe0 [ 81.739220] [] show_registers+0xc1/0x1d0 [ 81.739226] [] die+0x11a/0x230 [ 81.739232] [] do_page_fault+0x269/0x5f0 [ 81.739239] [] error_code+0x72/0x78 [ 81.739247] [] mmc_queue_map_sg+0x2c/0xe0 [mmc_block] [ 81.739258] [] mmc_blk_issue_rq+0x199/0x750 [mmc_block] [ 81.739267] [] mmc_queue_thread+0x80/0xf0 [mmc_block] [ 81.739275] [] kthread+0x42/0x70 [ 81.739282] [] kernel_thread_helper+0x7/0x10 [ 81.739289] ======================= [ 81.739292] Code: f0 89 45 d8 8b 01 2b 05 80 aa 67 c0 c1 f8 02 69 c0 c5 4e ec c4 c1 e0 0c 03 41 08 39 45 d8 0f 84 8e 00 00 00 f6 03 02 74 52 31 db <8b> 03 c7 43 0c 00 00 00 00 c7 43 08 00 00 00 00 83 e0 03 0b 01 [ 81.739358] EIP: [] blk_rq_map_sg+0xd7/0x190 SS:ESP 0068:cb82fdec It seems to me that the two commits: [BLOCK] blk_rq_map_sg: force clear termination bit [BLOCK] Don't clear sg_dma_len/addr() in blk_rq_map_sg() have the potential to fix the aforementioned oops, but in a way that create for the mmc layer the problem reported. It's just gut feeling, I have not the knowledge of the kernel needed to debug this, but this comment: + * If the driver previously mapped a shorter + * list, we could see a termination bit + * prematurely unless it fully inits the sg + * table on each mapping. We KNOW that there + * must be more entries here or the driver + * would be buggy, so force clear the + * termination bit to avoid doing a full + * sg_init_table() in drivers for each command. + */ rang a bell. When the bug occurs, it seems that some random page is mapped into the device, so that... maybe the list was not supposed to continue in this case? Well, I hope it can helps someone to find the bug. I am available to test/try whatever patches you send me. Romano Complete git bisect log: git-bisect start # bad: [2655e2cee2d77459fcb7e10228259e4ee0328697] ata_piix: Add additional PCI identifier for 40 wire short cable git-bisect bad 2655e2cee2d77459fcb7e10228259e4ee0328697 # good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23 git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1 # good: [f4921aff5b174349bc36551f142a5dbac782ea3f] Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 git-bisect good f4921aff5b174349bc36551f142a5dbac782ea3f # good: [9cf52b2921fbe62566b6b2ee79f71203749c9e5e] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 git-bisect good 9cf52b2921fbe62566b6b2ee79f71203749c9e5e # bad: [a98ce5c6feead6bfedefabd46cb3d7f5be148d9a] Fix synchronize_irq races with IRQ handler git-bisect bad a98ce5c6feead6bfedefabd46cb3d7f5be148d9a # good: [e9a404580ccaeb31dd2a976f9929c4f9eb6f3540] nfs: Fix build break with CONFIG_NFS_V4=n git-bisect good e9a404580ccaeb31dd2a976f9929c4f9eb6f3540 # good: [668f895a85b0c3a62a690425145f13dabebebd7a] [NET]: Hide the queue_mapping field inside netif_subqueue_stopped git-bisect good 668f895a85b0c3a62a690425145f13dabebebd7a # bad: [ba1c28a94322865457ad59f80474615156065123] Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block git-bisect bad ba1c28a94322865457ad59f80474615156065123 # good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] exportfs: update documentation git-bisect good e38f981758118d829cd40cfe9c09e3fa81e422aa # bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] [BLOCK] blk_rq_map_sg: force clear termination bit git-bisect bad 7aeacf982203fb4dea2f3434eefdc268cfd5d6d9 -- Sorry for the disclaimer --- ?I cannot stop it! -- La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/