2007-11-01 11:56:55

by Romano Giannetti

[permalink] [raw]
Subject: 2.6.34-rc1 eat my photo SD card :-(


Hi,

I have a very possible regression to signal. This morning 2.6.24-rc1
eat and destroyed my SD card. I have a toshiba laptop with a card slot
and I have used it with 2.6.23-rcX and 2.6.23 without problems so far.
This morning I put the card in, nothing happened, removed it. When I put
it in again the filesystem in it was completely scr***ed up.

I have a flight waiting now, so I have put all the dmesgs and syslogs
over there:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624-rc1-mmc/

Sunday I'll be back to help debug it.

Thank you very much

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.


2007-11-01 12:42:19

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Thursday 01 November 2007 22:56, Romano Giannetti wrote:
> Hi,
>
> I have a very possible regression to signal. This morning 2.6.24-rc1
> eat and destroyed my SD card. I have a toshiba laptop with a card slot
> and I have used it with 2.6.23-rcX and 2.6.23 without problems so far.
> This morning I put the card in, nothing happened, removed it. When I put
> it in again the filesystem in it was completely scr***ed up.
>
> I have a flight waiting now, so I have put all the dmesgs and syslogs
> over there:
>
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624-rc1-mmc/
>
> Sunday I'll be back to help debug it.

Thanks for the report. Is it a FAT filesystem? Is it reproduceable?

2007-11-01 17:18:01

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On 11/01/2007 07:56 AM, Romano Giannetti wrote:
> Hi,
>
> I have a very possible regression to signal. This morning 2.6.24-rc1
> eat and destroyed my SD card. I have a toshiba laptop with a card slot
> and I have used it with 2.6.23-rcX and 2.6.23 without problems so far.
> This morning I put the card in, nothing happened, removed it. When I put
> it in again the filesystem in it was completely scr***ed up.
>

I always flip the write-protect switch when I use mine with a computer,
and only let the camera write to the card.

2007-11-02 17:30:22

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Thu, 01 Nov 2007 12:56:42 +0100
Romano Giannetti <[email protected]> wrote:

>
> Hi,
>
> I have a very possible regression to signal. This morning 2.6.24-rc1
> eat and destroyed my SD card. I have a toshiba laptop with a card slot
> and I have used it with 2.6.23-rcX and 2.6.23 without problems so far.
> This morning I put the card in, nothing happened, removed it. When I put
> it in again the filesystem in it was completely scr***ed up.
>

Data loss is never fun. I hope you didn't have anything important on the card.

> I have a flight waiting now, so I have put all the dmesgs and syslogs
> over there:
>
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624-rc1-mmc/
>

I'm afraid the logs are of little help. They are just filled with noise from the file system. You also seem to prove the FAT code to give an invalid pointer (the oops in the first dmesg).

Can you reproduce this? To help you I need to see the errors given by the MMC layer. You should also try reproducing it without a tainted kernel (i.e. don't load ndiswrapper).

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-02 17:41:34

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Thu, 01 Nov 2007 13:17:42 -0400
Chuck Ebbert <[email protected]> wrote:

>
> I always flip the write-protect switch when I use mine with a computer,
> and only let the camera write to the card.

Just so noone gets any optimistic ideas, that switch is software enforced so it's no guarantee against bugs.

(But since my code is bug free, that shouldn't be an issue. ;))

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-04 09:29:56

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Fri, 2007-11-02 at 18:28 +0100, Pierre Ossman wrote:
> On Thu, 01 Nov 2007 12:56:42 +0100
> Romano Giannetti <[email protected]> wrote:
>
> Data loss is never fun. I hope you didn't have anything important on the card.
>

Well. A cousin-in-law marriage, would have been best not to lose it, but
I'll survive.

> > I have a flight waiting now, so I have put all the dmesgs and syslogs
> > over there:
> >
> > http://www.dea.icai.upcomillas.es/romano/linux/info/2624-rc1-mmc/
> >

Hmmm... the Uni server is down today. Darn.

> I'm afraid the logs are of little help. They are just filled with
> noise from the file system. You also seem to prove the FAT code to
> give an invalid pointer (the oops in the first dmesg).
>

Ok, I suspected it.

> Can you reproduce this? To help you I need to see the errors given by
> the MMC layer. You should also try reproducing it without a tainted
> kernel (i.e. don't load ndiswrapper).

I have a spare 128M card I can use to try. The one that failed was a 2G
one. I will try to reproduce without tainting the kernel (unfortunately,
the Atheros chip I have is not supported by ath5k yet, and the choice is
between ndiswrapper or Vista.). Should I enable some debugging option
for the MMC layer?


> Rgds
--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-04 16:11:22

by Roland Dreier

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

I had something similar recently, trying to access an SD card in the
internal drive of my thinkpad X60. Fortunately, the data wasn't
actually corrupted, but when I tried to copy the picture files off the
card, I saw garbage filenames in the picture directory, and I saw this
in the kernel log:

[84797.855013] mmc0: new SD card at address b368
[84798.227299] mmcblk0: mmc0:b368 SD 501248KiB
[84798.227541] mmcblk0: p1
[84810.378736] FAT: Filesystem panic (dev mmcblk0p1)
[84810.378748] invalid access to FAT (entry 0x0000bfff)
[84810.378753] File system has been set read-only
[84810.378812] FAT: Filesystem panic (dev mmcblk0p1)
[84810.378816] invalid access to FAT (entry 0x0000ffbf)

(and on quite a bit longer)

When I put the same card back in the camera and used the camera's dock
to access the data via USB mass storage, everything worked fine. So
it does seem to be at least somewhat MMC-related.

Please let me know if there's further debug info that might be
helpful... this problem seems to be completely reproducible on my
system. I'm running a post 2.6.24-rc1 git tree
(v2.6.24-rc1-521-g54866f0 according to git describe).

- R.

2007-11-05 07:32:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Sun, 04 Nov 2007 10:29:43 +0100
Romano Giannetti <[email protected]> wrote:

> > Can you reproduce this? To help you I need to see the errors given by
> > the MMC layer. You should also try reproducing it without a tainted
> > kernel (i.e. don't load ndiswrapper).
>
> I have a spare 128M card I can use to try. The one that failed was a 2G
> one.

Many of these problems are card specific, so please make sure to also test with the original card.

> I will try to reproduce without tainting the kernel (unfortunately,
> the Atheros chip I have is not supported by ath5k yet, and the choice is
> between ndiswrapper or Vista.). Should I enable some debugging option
> for the MMC layer?

Not at this point no. The debugging tends to be quite noise so it easily drowns out any temporary problems.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-05 07:34:00

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Sun, 04 Nov 2007 08:11:10 -0800
Roland Dreier <[email protected]> wrote:

>
> When I put the same card back in the camera and used the camera's dock
> to access the data via USB mass storage, everything worked fine. So
> it does seem to be at least somewhat MMC-related.
>

Since there was no error from the mmc level there, I wouldn't be so sure. Could you try a complete fsck of the card to check that the camera is constructing a proper FAT?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-05 10:51:39

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Sun, 2007-11-04 at 08:11 -0800, Roland Dreier wrote:
> I had something similar recently, trying to access an SD card in the
> internal drive of my thinkpad X60. Fortunately, the data wasn't
> actually corrupted, but when I tried to copy the picture files off the
> card, I saw garbage filenames in the picture directory, [...]

> When I put the same card back in the camera and used the camera's dock
> to access the data via USB mass storage, everything worked fine. So
> it does seem to be at least somewhat MMC-related.

It is reproducible for me, too, with the 128Mbyte card. Full logs are
here:

http://www.dea.icai.upcomillas.es/romano/linux/info/lk2624-rc1-mmc2/

a rapid grep shows at boot:

Nov 5 08:45:40 rukbat kernel: [ 26.538165] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
Nov 5 08:45:40 rukbat kernel: [ 38.456554] PM: Adding info for No Bus:mmc0
Nov 5 08:45:40 rukbat kernel: [ 38.456634] mmc0: SDHCI at 0xd0100800 irq 16 DMA

and when loading the card:

Nov 5 09:21:14 rukbat kernel: [ 1632.002723] mmc0: new SD card at address e32f
Nov 5 09:21:14 rukbat kernel: [ 1632.002947] PM: Adding info for mmc:mmc0:e32f
Nov 5 09:21:14 rukbat kernel: [ 1632.024011] mmcblk0: mmc0:e32f SD128 123008KiB
Nov 5 09:21:14 rukbat kernel: [ 1632.025327] mmcblk0: p1
Nov 5 09:21:15 rukbat hald: mounted /dev/mmcblk0p1 on behalf of uid 1153

(I've trimmed away the spam by NetworkManager).

and opening the folder in Nautilus:

Nov 5 09:21:43 rukbat kernel: [ 1654.235333] FAT: Filesystem panic (dev mmcblk0p1)
Nov 5 09:21:43 rukbat kernel: [ 1654.235893] FAT: Filesystem panic (dev mmcblk0p1)
Nov 5 09:21:43 rukbat kernel: [ 1654.235915] mmcblk0p1: rw=0, want=575135, limit=245919
...ad libitum.

and it's definitely a regression, the card is read under 2.6.23. I've
not a lot of time in my hands now, but maybe the next weekend I can try
a bisection...

Romano



--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-05 12:24:28

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Mon, 05 Nov 2007 11:51:26 +0100
Romano Giannetti <[email protected]> wrote:

>
> and opening the folder in Nautilus:
>
> Nov 5 09:21:43 rukbat kernel: [ 1654.235333] FAT: Filesystem panic (dev mmcblk0p1)
> Nov 5 09:21:43 rukbat kernel: [ 1654.235893] FAT: Filesystem panic (dev mmcblk0p1)
> Nov 5 09:21:43 rukbat kernel: [ 1654.235915] mmcblk0p1: rw=0, want=575135, limit=245919
> ...ad libitum.
>

Ok, now this is a bit more telling. The filesystem is indeed corrupt somehow as it references sectors waaaay outside the device (at roughly 280 MB).

Did you partition and format this card in the camera or in Linux?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-05 13:46:45

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Mon, 2007-11-05 at 13:22 +0100, Pierre Ossman wrote:
> On Mon, 05 Nov 2007 11:51:26 +0100
> >
>
> Ok, now this is a bit more telling. The filesystem is indeed corrupt
> somehow as it references sectors waaaay outside the device (at roughly
> 280 MB).

Yes. The problem is, when I firstly mounted it on 2.6.23 it worked
perfectly. If you think it's worthwhile, I can try to reboot in 2.6.23
and try to mount it again.

> Did you partition and format this card in the camera or in Linux?

In the camera. It happened both with a Kodak and a Panasonic Lumix.

Romano


--
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-05 13:51:54

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Mon, 2007-11-05 at 13:22 +0100, Pierre Ossman wrote:
> On Mon, 05 Nov 2007 11:51:26 +0100
> Romano Giannetti <[email protected]> wrote:

Ah, I forgot: I have a dump of the card (made with dd). If you'd happen
to need it, simply tell me. dd gave no errors.

And to double check, I mounted a VFAT USB stick on the same PC, and
there are no VFAT errors. So it seems an interaction VFAT-mmc...

Romano



--
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-05 15:27:53

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Mon, 05 Nov 2007 14:51:45 +0100
Romano Giannetti <[email protected]> wrote:

>
> On Mon, 2007-11-05 at 13:22 +0100, Pierre Ossman wrote:
> > On Mon, 05 Nov 2007 11:51:26 +0100
> > Romano Giannetti <[email protected]> wrote:
>
> Ah, I forgot: I have a dump of the card (made with dd). If you'd happen
> to need it, simply tell me. dd gave no errors.
>

Please try to loop back mount that image and see if the problem remains.

> And to double check, I mounted a VFAT USB stick on the same PC, and
> there are no VFAT errors. So it seems an interaction VFAT-mmc...

Could you do a dump via both MMC and USB and compare the two?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-05 15:28:38

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Mon, 05 Nov 2007 14:46:33 +0100
Romano Giannetti <[email protected]> wrote:

>
> On Mon, 2007-11-05 at 13:22 +0100, Pierre Ossman wrote:
> > Did you partition and format this card in the camera or in Linux?
>
> In the camera. It happened both with a Kodak and a Panasonic Lumix.
>

Does it work if you partition and format it in Linux?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-05 17:24:35

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Mon, 2007-11-05 at 16:26 +0100, Pierre Ossman wrote:
> On Mon, 05 Nov 2007 14:46:33 +0100
> Romano Giannetti <[email protected]> wrote:
>
> >
> > On Mon, 2007-11-05 at 13:22 +0100, Pierre Ossman wrote:
> > > Did you partition and format this card in the camera or in Linux?
> >
> > In the camera. It happened both with a Kodak and a Panasonic Lumix.
> >
>
> Does it work if you partition and format it in Linux?

Well. Now I am quite surprised. I did the following... obtained an image
of the same chip, formatted by the camera, both under 2.6.23 and
2.6.24-rc1 [1], after umounting the volume, and:

(130)pern:~/software/toshiba/lk2624-rc1-mmc2% od -h image_camera_23.img | head
0000000 befa 7c00 00bf b97a 0100 0efc 0e1f f307
0000020 eaa5 7a16 0000 bebb 337b 80c9 803f 0675
0000040 c5fe f38b 07eb 3f80 7500 fe02 83c1 10c3
0000060 fb81 7bfe e572 f983 7404 810b 03f9 7401
0000100 bb0a 7aa5 2ceb 87bb eb7a 8b27 024c 148b
0000120 01b8 bb02 7c00 13cd 0573 bcbb eb7a 2e13
0000140 fea1 3d7d aa55 0574 bcbb eb7a ea05 7c00
0000160 0000 8a2e 3c07 7400 530c 07bb b400 cd0e
0000200 5b10 eb43 ebed 00fe 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
(0)pern:~/software/toshiba/lk2624-rc1-mmc2% od -h image_camera_24.img | head
0000000 7000 010a 004a 0000 2303 0eb8 0015 002a
0000020 7000 010d 1fdd 0000 2303 0ebc 0015 0008
0000040 7000 011c 002d 0000 2303 0ef4 8e15 0007
0000060 7000 011f 1d47 0000 2303 0f80 1315 0045
0000100 7000 013a 1fed 0000 2303 0f80 4015 0028
0000120 7000 013b 004a 0000 2303 0f84 d815 001d
0000140 7000 013c 004a 0000 2303 0f88 7015 003f
0000160 7000 0141 20c0 0000 2303 0f8c 1415 0029
0000200 7000 0143 004a 0000 2303 0f90 6315 0000
0000220 7000 014f 004a 0000 2303 0f94 7115 002d

Uf. So I reformatted (under 2.6.23) the partition p1, and now the card
works both under 2.6.23 and 2.6.24 (although now it's not detected as a
photo card, but I suppose that's normal). But again...

(0)pern:~/software/toshiba/lk2624-rc1-mmc2% od -h image_linux_23.img | head
0000000 befa 7c00 00bf b97a 0100 0efc 0e1f f307
0000020 eaa5 7a16 0000 bebb 337b 80c9 803f 0675
0000040 c5fe f38b 07eb 3f80 7500 fe02 83c1 10c3
0000060 fb81 7bfe e572 f983 7404 810b 03f9 7401
0000100 bb0a 7aa5 2ceb 87bb eb7a 8b27 024c 148b
0000120 01b8 bb02 7c00 13cd 0573 bcbb eb7a 2e13
0000140 fea1 3d7d aa55 0574 bcbb eb7a ea05 7c00
0000160 0000 8a2e 3c07 7400 530c 07bb b400 cd0e
0000200 5b10 eb43 ebed 00fe 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
(0)pern:~/software/toshiba/lk2624-rc1-mmc2% od -h image_linux_24.img | head
0000000 dd65 6ffc 0000 dffd ebf3 382d 0040 fbbe
0000020 472b e3d7 0000 5acb 7fd9 ef5f 0000 7fd7
0000040 b7ed aff9 0000 af67 594e ffbb 0100 7a7f
0000060 546a 03d5 0800 6fd5 f9ef 1f3e 0000 ffdf
0000100 85bd 8c8d 617d c01b cd88 1901 1f5e 09b0
0000120 06c9 f7e3 0c6d d827 a376 0b6d 1b5c 7f42
0000140 8df0 3918 4a53 9923 5305 6f5f 2101 4043
0000160 f052 9dc9 8005 102e faf1 0e0e 377d 8b8f
0000200 bb19 dacf 0000 3d7f 6ffb f0ff 0040 cdb3
0000220 cd4b b346 4000 e9f7 ee6d eade 4000 e6ff

Shouldn't they be equal? I have no camera handy now, I will try again.
If it's of some interested, I've put the images on line on

http://www.dea.icai.upcomillas.es/romano/linux/info/lk2624-rc1-mmc2/


I tried to loop mount them, but I forgot how can I mount a partition of
a loop device... I have loop0 but not loop0p1... there where a "seek"
magic but I cannot find it now.

Romano


[1] dd if=/dev/mmcblk0 of=file bs=1M count=128



--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-06 09:58:52

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


Hi,

I have some more data. I really start to think that the mmc layer is
busted. I repeated a dd of the device, unmounted, five or six times in a
row, and look:

(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0112835 seconds, 11.3 kB/s
0000000 0000 0000 31e4 c363 d908 cb2e 0000 0000
0000020 9550 c217 2a10 c012 0100 0010 0200 0020
0000040 25a8 cb45 af00 cb2a 0000 0000 9550 c217
0000060 2a10 c012 0100 0010 0200 0020 02e0 cb45
0000100 6db8 cb2d 0000 0000 9550 c217 2a10 c012
0000120 0100 0010 0200 0020 ed00 cb4a 70b0 cba5
0000140 0000 0000 9550 c217 2a10 c012 0100 0010
0000160 0200 0020 3058 cb7e 7730 cb28 0000 0000
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% uname -a
Linux rukbat 2.6.24-rc1 #7 SMP Sun Oct 28 23:51:49 CET 2007 i686 GNU/Linux
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
0000000 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0114972 seconds, 11.1 kB/s
*
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0112756 seconds, 11.4 kB/s
0000000 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b
*
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
0000000 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b
*
0000200
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0121104 seconds, 10.6 kB/s
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0112943 seconds, 11.3 kB/s
0000000 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b
*
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0112721 seconds, 11.4 kB/s
0000000 0000 0000 71e4 c36f d908 cb2e 0000 0000
0000020 9550 c217 2a10 c012 0100 0010 0200 0020
0000040 25a8 cb45 af00 cb2a 0000 0000 9550 c217
0000060 2a10 c012 0100 0010 0200 0020 02e0 cb45
0000100 6db8 cb2d 0000 0000 9550 c217 2a10 c012
0000120 0100 0010 0200 0020 ed00 cb4a 70b0 cba5
0000140 0000 0000 9550 c217 2a10 c012 0100 0010
0000160 0200 0020 3058 cb7e 7730 cb28 0000 0000
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.0112219 seconds, 11.4 kB/s
0000000 0000 0000 f1e4 c37b d908 cb2e 0000 0000
0000020 9550 c217 2a10 c012 0100 0010 0200 0020
0000040 25a8 cb45 af00 cb2a 0000 0000 9550 c217
0000060 2a10 c012 0100 0010 0200 0020 02e0 cb45
0000100 6db8 cb2d 0000 0000 9550 c217 2a10 c012
0000120 0100 0010 0200 0020 ed00 cb4a 70b0 cba5
0000140 0000 0000 9550 c217 2a10 c012 0100 0010
0000160 0200 0020 3058 cb7e 7730 cb28 0000 0000
0000200
(0)rukbat:~/software/toshiba/lk2624-rc1-mmc2% sudo dd if=/dev/mmcblk0 bs=1c count=128 | od -h
128+0 records in
128+0 records out
128 bytes (128 B) copied, 0.011198 seconds, 11.4 kB/s
0000000 3038 3030 3030 000a 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200

I really do not think this is normal. I will try to reboot to 2.6.23 and to see what's happening...

I saw in the changelog that there were changes w/ respect to DMA; maybe
these changes are the most probable culprit.


Romano


--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-06 10:28:25

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


(Nick re-added to the Cc: list; sorry, I dropped you without noticing)

On Tue, 2007-11-06 at 10:58 +0100, Romano Giannetti wrote:
> Hi,
>
> I have some more data. I really start to think that the mmc layer is
> busted. I repeated a dd of the device, unmounted, five or six times in a
> row, and look:

[ dd if=/dev/mmcblk0 bs=1c count=128 | od -h output differs from time to time ]

> I really do not think this is normal. I will try to reboot to 2.6.23
> and to see what's happening...
>

Tried it. The card is corrupted now, the vfat filesystem panics and a
directory is changed into a file; but the dd results are everytime the
same... the vfat error is:

SYS: Nov 6 11:04:10 rukbat hald: mounted /dev/mmcblk0p1 on behalf of uid 1153
SYS: Nov 6 11:04:10 rukbat kernel: [ 126.597563] FAT: Filesystem panic (dev mmcblk0p1)
SYS: Nov 6 11:04:10 rukbat kernel: [ 126.597572] fat_get_cluster: invalid cluster chain (i_pos 0)
SYS: Nov 6 11:04:10 rukbat kernel: [ 126.597577] File system has been set read-only
SYS: Nov 6 11:04:11 rukbat kernel: [ 127.705440] FAT: Filesystem panic (dev mmcblk0p1)
SYS: Nov 6 11:04:11 rukbat kernel: [ 127.705448] fat_get_cluster: invalid cluster chain (i_pos 0)

The first difference between the good and bad data is here:

--- good.txt 2007-11-06 11:20:59.000000000 +0100
+++ bad.txt 2007-11-06 11:20:49.000000000 +0100
@@ -48,7 +48,7 @@
0141720 6120 796e 6b20 7965 7720 6568 206e 6572
0141740 6461 0d79 000a 4f49 2020 2020 2020 5953
0141760 4d53 4453 534f 2020 5320 5359 0000 aa55
-0142000 fff8 ffff ffff ffff 0005 0006 0007 0008
+0142000 fff8 ffff 0000 ffff 0005 0006 0007 0008
0142020 0009 000a 000b 000c 000d 000e 000f 0010
0142040 0011 0012 0013 0014 0015 0016 0017 0018
0142060 0019 001a 001b 001c 001d 001e 001f 0020
@@ -64,7 +64,7 @@
0142320 0069 006a 006b ffff 0000 0000 0000 0000
0142340 0000 0000 0000 0000 0000 0000 0000 0000
*
-0201000 fff8 ffff ffff ffff 0005 0006 0007 0008
+0201000 fff8 ffff 0000 ffff 0005 0006 0007 0008
0201020 0009 000a 000b 000c 000d 000e 000f 0010
0201040 0011 0012 0013 0014 0015 0016 0017 0018
0201060 0019 001a 001b 001c 001d 001e 001f 0020


Well... what now?

Romano




--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-06 19:58:51

by Willy Tarreau

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Tue, Nov 06, 2007 at 10:58:41AM +0100, Romano Giannetti wrote:
(first time)
> 0000000 0000 0000 31e4 c363 d908 cb2e 0000 0000

(fourth time)
> 0000000 0000 0000 71e4 c36f d908 cb2e 0000 0000

(fifth time)
> 0000000 0000 0000 f1e4 c37b d908 cb2e 0000 0000

Most always, you have only a few bits which change, and always for
the same bytes :

31 -> 71 -> f1 (|40, |80)
63 -> 6f -> 7b (|0c, |10&~4)

It looks like a hardware problem to me. Maybe one version is more
optimized and puts more stress on the device ? I remember having
had comparable problems in the past with a CF connected to a
home-made IDE adapter on which the +5V wire had been cut. The CF
drained its power from the IDE signals and it would most always
work correctly, except when reading large files. Writing to it
finally killed it.

Regards,
Willy

2007-11-06 21:02:37

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Tue, 2007-11-06 at 20:51 +0100, Willy Tarreau wrote:

> It looks like a hardware problem to me. Maybe one version is more
> optimized and puts more stress on the device ? I remember having
> had comparable problems in the past with a CF connected to a
> home-made IDE adapter on which the +5V wire had been cut. The CF
> drained its power from the IDE signals and it would most always
> work correctly, except when reading large files. Writing to it
> finally killed it.
>

Obviously I cannot be sure, but I tested it a lot under 2.6.23 and Vista
and never had a single problem. If it's an HW problem, for sure Linux
2.6.24 has the capacity to trigger it at the first try...

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-06 21:48:26

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Tue, 2007-11-06 at 20:51 +0100, Willy Tarreau wrote:
> On Tue, Nov 06, 2007 at 10:58:41AM +0100, Romano Giannetti wrote:
> (first time)
> > 0000000 0000 0000 31e4 c363 d908 cb2e 0000 0000
>
> (fourth time)
> > 0000000 0000 0000 71e4 c36f d908 cb2e 0000 0000
>
> (fifth time)
> > 0000000 0000 0000 f1e4 c37b d908 cb2e 0000 0000
>
> Most always, you have only a few bits which change, and always for
> the same bytes :
>
> 31 -> 71 -> f1 (|40, |80)
> 63 -> 6f -> 7b (|0c, |10&~4)


Yes, but the second, third and sixth it was:

0000000 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b 6b6b
*
which seems to me something poisoned... and the error in the filesystem
was a 0xffff turned 0x0000.

I do really suspect a software bug.

Romano

--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-06 22:17:51

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Tue, 2007-11-06 at 22:48 +0100, Romano Giannetti wrote:

> I do really suspect a software bug.
>

Well, I started bisecting it. It will be a long shot, I suspect...

Romano

BTW: I noticed that if I change EXTRAVERSION, doing a make rebuild
almost all the kernel. Is it normal? And it seems to me that the same
thing happens if a make oldconfig results in a changed .config...




--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-07 06:10:20

by Willy Tarreau

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Tue, Nov 06, 2007 at 11:17:39PM +0100, Romano Giannetti wrote:
> On Tue, 2007-11-06 at 22:48 +0100, Romano Giannetti wrote:
>
> > I do really suspect a software bug.
> >
>
> Well, I started bisecting it. It will be a long shot, I suspect...
>
> Romano
>
> BTW: I noticed that if I change EXTRAVERSION, doing a make rebuild
> almost all the kernel. Is it normal?

yes I think, because it changes version.h which is included directly or
indirectly by every file.

> And it seems to me that the same
> thing happens if a make oldconfig results in a changed .config...

this should not happen IMHO. If you post a simple reproducible case, maybe
some people can investigate it.

Regards,
Willy

2007-11-07 21:53:16

by Romano Giannetti

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(


On Tue, 2007-11-06 at 23:17 +0100, Romano Giannetti wrote:
> Well, I started bisecting it. It will be a long shot, I suspect...

Well, I spent the last 36 hours (more or less) trying to bisect the SD
problem. The method I used was to insert the card, umount it, and make 8 dd
in a row; the kernel is "bad" if they differs, "good" if they are the same.

I could not finish the bisect. The last pair good/bad were:

bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9]
[BLOCK] blk_rq_map_sg: force clear termination bit
good: [e38f981758118d829cd40cfe9c09e3fa81e422aa]
exportfs: update documentation

The problem to conclude the bisect is that there is a whole series of
commits, named [SG] something, that seems to matter; but my three try of a
commit between the previous two ended with a MMC layer not working with this
oops:

[ 81.738991] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 81.739003] printing eip: c01db437 *pde = 00000000
[ 81.739010] Oops: 0000 [#1] SMP
[ 81.739016] Modules linked in: mmc_block binfmt_misc rfcomm l2cap bluetooth ppdev i915 drm acpi_cpufreq cpufreq_conservative cpufreq_stats cpufreq_ondemand freq_table cpufreq_userspace cpufreq_powersave dock container sbs sbshc af_packet nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs dm_crypt dm_mod sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss iTCO_wdt iTCO_vendor_support serio_raw sdhci snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse pcspkr mmc_core snd_seq snd_timer snd_seq_device snd soundcore video output battery snd_page_alloc ac button intel_agp agpgart evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_piix ehci_hcd ata_generic ohci1394 uhci_hcd ieee1394 libata scsi_mod generic usbcore r8169 thermal processor fan
[ 81.739122]
[ 81.739127] Pid: 6075, comm: mmcqd Not tainted (2.6.23-bisect #19)
[ 81.739132] EIP: 0060:[<c01db437>] EFLAGS: 00010246 CPU: 0
[ 81.739141] EIP is at blk_rq_map_sg+0xd7/0x190
[ 81.739145] EAX: 03619000 EBX: 00000000 ECX: c3464198 EDX: c3464698
[ 81.739150] ESI: 0361a000 EDI: 00001000 EBP: cb82fe24 ESP: cb82fdec
[ 81.739154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 81.739159] Process mmcqd (pid: 6075, ti=cb82e000 task=cb2a5550 task.ti=cb82e000)
[ 81.739163] Stack: 00000292 c366c530 cb839a70 00002000 0361b000 c3464698 00000001 00000001
[ 81.739176] 00000000 c34e0848 01ae4698 c33ef2b0 c33ef2b0 cb2ec870 cb82fe3c f8e81e6c
[ 81.739188] 00200200 c3342580 c33ef2b0 cb2ec870 cb82ffb8 f8e816f9 7898775f 5f6f5965
[ 81.739200] Call Trace:
[ 81.739204] [<c01052fa>] show_trace_log_lvl+0x1a/0x30
[ 81.739213] [<c01053c1>] show_stack_log_lvl+0xb1/0xe0
[ 81.739220] [<c01054b1>] show_registers+0xc1/0x1d0
[ 81.739226] [<c01056da>] die+0x11a/0x230
[ 81.739232] [<c011d7e9>] do_page_fault+0x269/0x5f0
[ 81.739239] [<c02f3eea>] error_code+0x72/0x78
[ 81.739247] [<f8e81e6c>] mmc_queue_map_sg+0x2c/0xe0 [mmc_block]
[ 81.739258] [<f8e816f9>] mmc_blk_issue_rq+0x199/0x750 [mmc_block]
[ 81.739267] [<f8e821a0>] mmc_queue_thread+0x80/0xf0 [mmc_block]
[ 81.739275] [<c013d862>] kthread+0x42/0x70
[ 81.739282] [<c0104ee7>] kernel_thread_helper+0x7/0x10
[ 81.739289] =======================
[ 81.739292] Code: f0 89 45 d8 8b 01 2b 05 80 aa 67 c0 c1 f8 02 69 c0 c5 4e ec c4 c1 e0 0c 03 41 08 39 45 d8 0f 84 8e 00 00 00 f6 03 02 74 52 31 db <8b> 03 c7 43 0c 00 00 00 00 c7 43 08 00 00 00 00 83 e0 03 0b 01
[ 81.739358] EIP: [<c01db437>] blk_rq_map_sg+0xd7/0x190 SS:ESP 0068:cb82fdec

It seems to me that the two commits:

[BLOCK] blk_rq_map_sg: force clear termination bit
[BLOCK] Don't clear sg_dma_len/addr() in blk_rq_map_sg()

have the potential to fix the aforementioned oops, but in a way that create
for the mmc layer the problem reported. It's just gut feeling, I have not
the knowledge of the kernel needed to debug this, but this comment:

+ * If the driver previously mapped a shorter
+ * list, we could see a termination bit
+ * prematurely unless it fully inits the sg
+ * table on each mapping. We KNOW that there
+ * must be more entries here or the driver
+ * would be buggy, so force clear the
+ * termination bit to avoid doing a full
+ * sg_init_table() in drivers for each command.
+ */

rang a bell. When the bug occurs, it seems that some random page is mapped
into the device, so that... maybe the list was not supposed to continue in
this case?

Well, I hope it can helps someone to find the bug. I am available to
test/try whatever patches you send me.

Romano

Complete git bisect log:

git-bisect start
# bad: [2655e2cee2d77459fcb7e10228259e4ee0328697] ata_piix: Add additional PCI identifier for 40 wire short cable
git-bisect bad 2655e2cee2d77459fcb7e10228259e4ee0328697
# good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
# good: [f4921aff5b174349bc36551f142a5dbac782ea3f] Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
git-bisect good f4921aff5b174349bc36551f142a5dbac782ea3f
# good: [9cf52b2921fbe62566b6b2ee79f71203749c9e5e] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
git-bisect good 9cf52b2921fbe62566b6b2ee79f71203749c9e5e
# bad: [a98ce5c6feead6bfedefabd46cb3d7f5be148d9a] Fix synchronize_irq races with IRQ handler
git-bisect bad a98ce5c6feead6bfedefabd46cb3d7f5be148d9a
# good: [e9a404580ccaeb31dd2a976f9929c4f9eb6f3540] nfs: Fix build break with CONFIG_NFS_V4=n
git-bisect good e9a404580ccaeb31dd2a976f9929c4f9eb6f3540
# good: [668f895a85b0c3a62a690425145f13dabebebd7a] [NET]: Hide the queue_mapping field inside netif_subqueue_stopped
git-bisect good 668f895a85b0c3a62a690425145f13dabebebd7a
# bad: [ba1c28a94322865457ad59f80474615156065123] Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block
git-bisect bad ba1c28a94322865457ad59f80474615156065123
# good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] exportfs: update documentation
git-bisect good e38f981758118d829cd40cfe9c09e3fa81e422aa
# bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] [BLOCK] blk_rq_map_sg: force clear termination bit
git-bisect bad 7aeacf982203fb4dea2f3434eefdc268cfd5d6d9
--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-07 22:23:09

by Joshua Doll

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

Romano Giannetti wrote:
> Hi,
>
> I have a very possible regression to signal. This morning 2.6.24-rc1
> eat and destroyed my SD card. I have a toshiba laptop with a card slot
> and I have used it with 2.6.23-rcX and 2.6.23 without problems so far.
> This morning I put the card in, nothing happened, removed it. When I put
> it in again the filesystem in it was completely scr***ed up.
>
> I have a flight waiting now, so I have put all the dmesgs and syslogs
> over there:
>
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624-rc1-mmc/
>
> Sunday I'll be back to help debug it.
>
> Thank you very much
>
> Romano
>
>

I think the problem is you are using a kernel from the future. :-) I
just couldn't resist.


--Joshua Doll

2007-11-07 23:44:58

by Roland Dreier

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

> Well, I spent the last 36 hours (more or less) trying to bisect the SD
> problem. The method I used was to insert the card, umount it, and make 8 dd
> in a row; the kernel is "bad" if they differs, "good" if they are the same.
>
> I could not finish the bisect. The last pair good/bad were:
>
> bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9]
> [BLOCK] blk_rq_map_sg: force clear termination bit
> good: [e38f981758118d829cd40cfe9c09e3fa81e422aa]
> exportfs: update documentation

Thanks, that helps. I read over the mmc changes in between those two
commits, and I think I found the problem... could you please try the
patch below (on top of the latest kernel) and report back how it
works? Unfortunately I am traveling and I don't have an SD card with
me to test on my laptop...

Pierre, assuming Romano tests this patch successfully, please apply!

Thanks,
Roland

<-- patch below -->

mmc: Fix sg helper copy-and-paste error

Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
following bogus change in drivers/mmc/card/queue.c:

> - src_buf = page_address(src->page) + src->offset;
> + src_buf = sg_virt(dst);

(Notice that "src" is converted to "dst"). Turn this "dst" back into
the intended "src".

Cc: Jens Axboe <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
---
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 9203a0b..1b9c9b6 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -310,7 +310,7 @@ static void copy_sg(struct scatterlist *dst, unsigned int dst_len,
}

if (src_size == 0) {
- src_buf = sg_virt(dst);
+ src_buf = sg_virt(src);
src_size = src->length;
}

2007-11-08 00:35:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Wednesday, 7 of November 2007, Romano Giannetti wrote:
>
> On Tue, 2007-11-06 at 23:17 +0100, Romano Giannetti wrote:
> > Well, I started bisecting it. It will be a long shot, I suspect...
>
> Well, I spent the last 36 hours (more or less) trying to bisect the SD
> problem. The method I used was to insert the card, umount it, and make 8 dd
> in a row; the kernel is "bad" if they differs, "good" if they are the same.
>
> I could not finish the bisect. The last pair good/bad were:
>
> bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9]
> [BLOCK] blk_rq_map_sg: force clear termination bit
> good: [e38f981758118d829cd40cfe9c09e3fa81e422aa]
> exportfs: update documentation
>
> The problem to conclude the bisect is that there is a whole series of
> commits, named [SG] something, that seems to matter; but my three try of a
> commit between the previous two ended with a MMC layer not working with this
> oops:

Can you please update the Bugzilla entry at
http://bugzilla.kernel.org/show_bug.cgi?id=9286 with this information?


> [ 81.738991] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 81.739003] printing eip: c01db437 *pde = 00000000
> [ 81.739010] Oops: 0000 [#1] SMP
> [ 81.739016] Modules linked in: mmc_block binfmt_misc rfcomm l2cap bluetooth ppdev i915 drm acpi_cpufreq cpufreq_conservative cpufreq_stats cpufreq_ondemand freq_table cpufreq_userspace cpufreq_powersave dock container sbs sbshc af_packet nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs dm_crypt dm_mod sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss iTCO_wdt iTCO_vendor_support serio_raw sdhci snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse pcspkr mmc_core snd_seq snd_timer snd_seq_device snd soundcore video output battery snd_page_alloc ac button intel_agp agpgart evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_piix ehci_hcd ata_generic ohci1394 uhci_hcd ieee1394 libata scsi_mod generic usbcore r8169 thermal processor fan
> [ 81.739122]
> [ 81.739127] Pid: 6075, comm: mmcqd Not tainted (2.6.23-bisect #19)
> [ 81.739132] EIP: 0060:[<c01db437>] EFLAGS: 00010246 CPU: 0
> [ 81.739141] EIP is at blk_rq_map_sg+0xd7/0x190
> [ 81.739145] EAX: 03619000 EBX: 00000000 ECX: c3464198 EDX: c3464698
> [ 81.739150] ESI: 0361a000 EDI: 00001000 EBP: cb82fe24 ESP: cb82fdec
> [ 81.739154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 81.739159] Process mmcqd (pid: 6075, ti=cb82e000 task=cb2a5550 task.ti=cb82e000)
> [ 81.739163] Stack: 00000292 c366c530 cb839a70 00002000 0361b000 c3464698 00000001 00000001
> [ 81.739176] 00000000 c34e0848 01ae4698 c33ef2b0 c33ef2b0 cb2ec870 cb82fe3c f8e81e6c
> [ 81.739188] 00200200 c3342580 c33ef2b0 cb2ec870 cb82ffb8 f8e816f9 7898775f 5f6f5965
> [ 81.739200] Call Trace:
> [ 81.739204] [<c01052fa>] show_trace_log_lvl+0x1a/0x30
> [ 81.739213] [<c01053c1>] show_stack_log_lvl+0xb1/0xe0
> [ 81.739220] [<c01054b1>] show_registers+0xc1/0x1d0
> [ 81.739226] [<c01056da>] die+0x11a/0x230
> [ 81.739232] [<c011d7e9>] do_page_fault+0x269/0x5f0
> [ 81.739239] [<c02f3eea>] error_code+0x72/0x78
> [ 81.739247] [<f8e81e6c>] mmc_queue_map_sg+0x2c/0xe0 [mmc_block]
> [ 81.739258] [<f8e816f9>] mmc_blk_issue_rq+0x199/0x750 [mmc_block]
> [ 81.739267] [<f8e821a0>] mmc_queue_thread+0x80/0xf0 [mmc_block]
> [ 81.739275] [<c013d862>] kthread+0x42/0x70
> [ 81.739282] [<c0104ee7>] kernel_thread_helper+0x7/0x10
> [ 81.739289] =======================
> [ 81.739292] Code: f0 89 45 d8 8b 01 2b 05 80 aa 67 c0 c1 f8 02 69 c0 c5 4e ec c4 c1 e0 0c 03 41 08 39 45 d8 0f 84 8e 00 00 00 f6 03 02 74 52 31 db <8b> 03 c7 43 0c 00 00 00 00 c7 43 08 00 00 00 00 83 e0 03 0b 01
> [ 81.739358] EIP: [<c01db437>] blk_rq_map_sg+0xd7/0x190 SS:ESP 0068:cb82fdec
>
> It seems to me that the two commits:
>
> [BLOCK] blk_rq_map_sg: force clear termination bit
> [BLOCK] Don't clear sg_dma_len/addr() in blk_rq_map_sg()
>
> have the potential to fix the aforementioned oops, but in a way that create
> for the mmc layer the problem reported. It's just gut feeling, I have not
> the knowledge of the kernel needed to debug this, but this comment:
>
> + * If the driver previously mapped a shorter
> + * list, we could see a termination bit
> + * prematurely unless it fully inits the sg
> + * table on each mapping. We KNOW that there
> + * must be more entries here or the driver
> + * would be buggy, so force clear the
> + * termination bit to avoid doing a full
> + * sg_init_table() in drivers for each command.
> + */
>
> rang a bell. When the bug occurs, it seems that some random page is mapped
> into the device, so that... maybe the list was not supposed to continue in
> this case?
>
> Well, I hope it can helps someone to find the bug. I am available to
> test/try whatever patches you send me.
>
> Romano
>
> Complete git bisect log:
>
> git-bisect start
> # bad: [2655e2cee2d77459fcb7e10228259e4ee0328697] ata_piix: Add additional PCI identifier for 40 wire short cable
> git-bisect bad 2655e2cee2d77459fcb7e10228259e4ee0328697
> # good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
> git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
> # good: [f4921aff5b174349bc36551f142a5dbac782ea3f] Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
> git-bisect good f4921aff5b174349bc36551f142a5dbac782ea3f
> # good: [9cf52b2921fbe62566b6b2ee79f71203749c9e5e] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
> git-bisect good 9cf52b2921fbe62566b6b2ee79f71203749c9e5e
> # bad: [a98ce5c6feead6bfedefabd46cb3d7f5be148d9a] Fix synchronize_irq races with IRQ handler
> git-bisect bad a98ce5c6feead6bfedefabd46cb3d7f5be148d9a
> # good: [e9a404580ccaeb31dd2a976f9929c4f9eb6f3540] nfs: Fix build break with CONFIG_NFS_V4=n
> git-bisect good e9a404580ccaeb31dd2a976f9929c4f9eb6f3540
> # good: [668f895a85b0c3a62a690425145f13dabebebd7a] [NET]: Hide the queue_mapping field inside netif_subqueue_stopped
> git-bisect good 668f895a85b0c3a62a690425145f13dabebebd7a
> # bad: [ba1c28a94322865457ad59f80474615156065123] Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block
> git-bisect bad ba1c28a94322865457ad59f80474615156065123
> # good: [e38f981758118d829cd40cfe9c09e3fa81e422aa] exportfs: update documentation
> git-bisect good e38f981758118d829cd40cfe9c09e3fa81e422aa
> # bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9] [BLOCK] blk_rq_map_sg: force clear termination bit
> git-bisect bad 7aeacf982203fb4dea2f3434eefdc268cfd5d6d9

2007-11-08 05:56:49

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Wed, 07 Nov 2007 15:37:46 -0800
Roland Dreier <[email protected]> wrote:

>
> mmc: Fix sg helper copy-and-paste error
>
> Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
> following bogus change in drivers/mmc/card/queue.c:
>
> > - src_buf = page_address(src->page) + src->offset;
> > + src_buf = sg_virt(dst);
>
> (Notice that "src" is converted to "dst"). Turn this "dst" back into
> the intended "src".
>
> Cc: Jens Axboe <[email protected]>
> Signed-off-by: Roland Dreier <[email protected]>

Ouch! Well that was obviously a bug. I wonder how the hell it only explodes for Romano. I've been shuffling loads of data using -rc1 without an incident.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-11-08 07:02:20

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Wed, Nov 07 2007, Roland Dreier wrote:
> > Well, I spent the last 36 hours (more or less) trying to bisect the SD
> > problem. The method I used was to insert the card, umount it, and make 8 dd
> > in a row; the kernel is "bad" if they differs, "good" if they are the same.
> >
> > I could not finish the bisect. The last pair good/bad were:
> >
> > bad: [7aeacf982203fb4dea2f3434eefdc268cfd5d6d9]
> > [BLOCK] blk_rq_map_sg: force clear termination bit
> > good: [e38f981758118d829cd40cfe9c09e3fa81e422aa]
> > exportfs: update documentation
>
> Thanks, that helps. I read over the mmc changes in between those two
> commits, and I think I found the problem... could you please try the
> patch below (on top of the latest kernel) and report back how it
> works? Unfortunately I am traveling and I don't have an SD card with
> me to test on my laptop...
>
> Pierre, assuming Romano tests this patch successfully, please apply!
>
> Thanks,
> Roland
>
> <-- patch below -->
>
> mmc: Fix sg helper copy-and-paste error
>
> Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
> following bogus change in drivers/mmc/card/queue.c:
>
> > - src_buf = page_address(src->page) + src->offset;
> > + src_buf = sg_virt(dst);
>
> (Notice that "src" is converted to "dst"). Turn this "dst" back into
> the intended "src".
>
> Cc: Jens Axboe <[email protected]>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
> index 9203a0b..1b9c9b6 100644
> --- a/drivers/mmc/card/queue.c
> +++ b/drivers/mmc/card/queue.c
> @@ -310,7 +310,7 @@ static void copy_sg(struct scatterlist *dst, unsigned int dst_len,
> }
>
> if (src_size == 0) {
> - src_buf = sg_virt(dst);
> + src_buf = sg_virt(src);
> src_size = src->length;
> }
>

How embarassing, sorry about that! Pierre, shall I shove this upstream
or will you?

--
Jens Axboe

2007-11-08 09:09:33

by Romano Giannetti

[permalink] [raw]
Subject: Re: *SPAM* Re: 2.6.34-rc1 eat my photo SD card :-(


On Wed, 2007-11-07 at 15:37 -0800, Roland Dreier wrote:
> .
>
> Pierre, assuming Romano tests this patch successfully, please apply!
>

Hi, the patch below solves the problem with my SD card.

Tested-by: Romano Giannetti <[email protected]>

Thanks!

Romano

> <-- patch below -->
>
> mmc: Fix sg helper copy-and-paste error
>
> Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
> following bogus change in drivers/mmc/card/queue.c:
>
> > - src_buf = page_address(src->page) + src->offset;
> > + src_buf = sg_virt(dst);
>
> (Notice that "src" is converted to "dst"). Turn this "dst" back into
> the intended "src".
>
> Cc: Jens Axboe <[email protected]>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
> index 9203a0b..1b9c9b6 100644
> --- a/drivers/mmc/card/queue.c
> +++ b/drivers/mmc/card/queue.c
> @@ -310,7 +310,7 @@ static void copy_sg(struct scatterlist *dst, unsigned int dst_len,
> }
>
> if (src_size == 0) {
> - src_buf = sg_virt(dst);
> + src_buf = sg_virt(src);
> src_size = src->length;
> }
>
--
Sorry for the disclaimer --- ?I cannot stop it!



--
La presente comunicaci?n tiene car?cter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribuci?n, reproducci?n o uso de esta comunicaci?n y/o de la informaci?n contenida en la misma est?n estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicaci?n por error, por favor, notif?quelo inmediatamente al remitente contestando a este mensaje y proceda a continuaci?n a destruirlo. Gracias por su colaboraci?n.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation.

2007-11-08 17:37:22

by Roland Dreier

[permalink] [raw]
Subject: Re: *SPAM* Re: 2.6.34-rc1 eat my photo SD card :-(

> Tested-by: Romano Giannetti <[email protected]>

Thanks for testing! Pierre / Jens, please merge with Romano's
tested-by line.

Thanks,
Roland

2007-11-08 17:44:16

by Jens Axboe

[permalink] [raw]
Subject: Re: *SPAM* Re: 2.6.34-rc1 eat my photo SD card :-(

On Thu, Nov 08 2007, Roland Dreier wrote:
> > Tested-by: Romano Giannetti <[email protected]>
>
> Thanks for testing! Pierre / Jens, please merge with Romano's
> tested-by line.

I already did, earlier today:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=8578007065bd27ec077a74b5814f0fe4df040180

--
Jens Axboe

2007-11-10 10:32:43

by Pierre Ossman

[permalink] [raw]
Subject: Re: 2.6.34-rc1 eat my photo SD card :-(

On Thu, 8 Nov 2007 08:01:49 +0100
Jens Axboe <[email protected]> wrote:

>
> How embarassing, sorry about that! Pierre, shall I shove this upstream
> or will you?
>

Sorry about being out touch. My day job is killing me... :/

I see you managed to sort things out by yourselves though. :)

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org