2007-02-07 22:20:38

by Andreas Steinmetz

[permalink] [raw]
Subject: [2.6.20] tifm_7xx1/mmc not working

Hi,
I do have a problem with tifm_7xx1 and 2.6.20. First of all, the device
is working with 2.6.18.2 and the out of tree tifm-0.6 release. In this
case except for the first card insertion after suspend/reboot I do get
the following messages:

tifm_7xx1: sd card detected in socket 3
mmcblk0: mmc0:7d7f SD01G 1006080KiB
mmcblk0: p1

With 2.6.20, however, I always do get only the following which is the
same as for 2.6.18.2 on first card insert after reboot/suspend:

tifm_7xx1: sd card detected in socket 3

Am I doing something wrong here or is there a problem?

Relevant modules loaded with 2.6.20:

mmc_block 7944 0
tifm_sd 10824 0
tifm_7xx1 7296 0
sdhci 17548 0
tifm_core 7960 2 tifm_sd,tifm_7xx1
mmc_core 24096 3 mmc_block,tifm_sd,sdhci

--
Andreas Steinmetz SPAMmers use [email protected]


2007-02-08 05:59:13

by Pierre Ossman

[permalink] [raw]
Subject: Re: [2.6.20] tifm_7xx1/mmc not working

Alex Dubov is the maintainer of tifm, not me.

Andreas Steinmetz wrote:
> Hi,
> I do have a problem with tifm_7xx1 and 2.6.20. First of all, the device
> is working with 2.6.18.2 and the out of tree tifm-0.6 release. In this
> case except for the first card insertion after suspend/reboot I do get
> the following messages:
>
> tifm_7xx1: sd card detected in socket 3
> mmcblk0: mmc0:7d7f SD01G 1006080KiB
> mmcblk0: p1
>
> With 2.6.20, however, I always do get only the following which is the
> same as for 2.6.18.2 on first card insert after reboot/suspend:
>
> tifm_7xx1: sd card detected in socket 3
>
> Am I doing something wrong here or is there a problem?
>
> Relevant modules loaded with 2.6.20:
>
> mmc_block 7944 0
> tifm_sd 10824 0
> tifm_7xx1 7296 0
> sdhci 17548 0
> tifm_core 7960 2 tifm_sd,tifm_7xx1
> mmc_core 24096 3 mmc_block,tifm_sd,sdhci
>

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-08 23:35:43

by Alex Dubov

[permalink] [raw]
Subject: Re: [2.6.20] tifm_7xx1/mmc not working

I'm aware that there are some weird problems with a 2.6.20. I'm currently looking into it.

Besides, I wonder, are tifm and sdhci play nicely together? And then, we do know that suspend is
totally broken in the older versions of the driver. So it may be desirable to make a test with
sdhci unloaded and machine freshly rebooted (not resumed).


--- Pierre Ossman <[email protected]> wrote:

> Alex Dubov is the maintainer of tifm, not me.
>
> Andreas Steinmetz wrote:
> > Hi,
> > I do have a problem with tifm_7xx1 and 2.6.20. First of all, the device
> > is working with 2.6.18.2 and the out of tree tifm-0.6 release. In this
> > case except for the first card insertion after suspend/reboot I do get
> > the following messages:
> >
> > tifm_7xx1: sd card detected in socket 3
> > mmcblk0: mmc0:7d7f SD01G 1006080KiB
> > mmcblk0: p1
> >
> > With 2.6.20, however, I always do get only the following which is the
> > same as for 2.6.18.2 on first card insert after reboot/suspend:
> >
> > tifm_7xx1: sd card detected in socket 3
> >
> > Am I doing something wrong here or is there a problem?
> >
> > Relevant modules loaded with 2.6.20:
> >
> > mmc_block 7944 0
> > tifm_sd 10824 0
> > tifm_7xx1 7296 0
> > sdhci 17548 0
> > tifm_core 7960 2 tifm_sd,tifm_7xx1
> > mmc_core 24096 3 mmc_block,tifm_sd,sdhci
> >
>
> Rgds
> --
> -- Pierre Ossman
>
> Linux kernel, MMC maintainer http://www.kernel.org
> PulseAudio, core developer http://pulseaudio.org
> rdesktop, core developer http://www.rdesktop.org
>




____________________________________________________________________________________
Cheap talk?
Check out Yahoo! Messenger's low PC-to-Phone call rates.
http://voice.yahoo.com

2007-02-09 12:27:46

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: [2.6.20] tifm_7xx1/mmc not working

Alex Dubov wrote:
> I'm aware that there are some weird problems with a 2.6.20. I'm currently looking into it.
>
> Besides, I wonder, are tifm and sdhci play nicely together? And then, we do know that suspend is
> totally broken in the older versions of the driver. So it may be desirable to make a test with
> sdhci unloaded and machine freshly rebooted (not resumed).
>

I tried exactly as described (fresh cold boot, sdhci never loaded), but
the described problem remains. If there is anything I can do to help to
trace this problem let me know.

--
Andreas Steinmetz SPAMmers use [email protected]

2007-02-09 15:14:52

by Alex Dubov

[permalink] [raw]
Subject: Recent and not-so problems with tifm_sd driver

Greetings.

It seems that the recent shower of error reports was caused by my not so thorough testing of the
R6 handling problems (I've sent you a patch for this).

However, there is this other problem with mmc_host_remove while transfer is in progress.
First, I disabled my "sleep on remove" hack. Then I've set up a transfer (dd if=/dev/mmcblk0
of=/dev/null) and pulled the card out. The result was a hard crash (irq handler style) soon
followed by additional one, so the numbers in this hand-grabbed dump may be inexact:
--------
elv_next_request+0x15d/0x1e0
:mmc_core:mmc_queue_thread+0x72/0x110
:mmc_core:mmc_queue_thread+0x0/0x110
keventd_create_kthread+0x0/0x90
kthread+0xd9/0x120
child_rip+0xa/0x12
keventd_create_kthread+0x0/0x90
kthread+0x0/0x120
child_rip+0x0/0x12

RIP :mmc_core:mmc_prep_request+0x4d/0x70
----------
I'm continuing to look for possible problems in my code.




____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/

2007-02-09 19:10:39

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
> I'm continuing to look for possible problems in my code.
>

This looks like the problem with races in mmc_block again. Add some printk:s in
the remove function so that you can see if this oops is after the remove
function (shouldn't be possible, but will cause crash if it is).

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-10 09:12:07

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

One more problem (you may already know about it) - I was contacted by somebody from the hald
project and indeed I can confirm that on 2.6.20 kernel hald fails to take action on card
insertion. I can't see anything in my code so this may be a general mmc problem.
The problem is described here:
http://lists.freedesktop.org/archives/hal/2007-January/006960.html




____________________________________________________________________________________
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367

2007-02-10 13:01:35

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Alex Dubov wrote:
> One more problem (you may already know about it) - I was contacted by somebody from the hald
> project and indeed I can confirm that on 2.6.20 kernel hald fails to take action on card
> insertion. I can't see anything in my code so this may be a general mmc problem.
> The problem is described here:
> http://lists.freedesktop.org/archives/hal/2007-January/006960.html
>

Odd. This might be the whole sysfs restructuring thing causing problems. Can you
check if that user has CONFIG_SYSFS_DEPRECATED on?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-11 04:47:13

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

>
> This looks like the problem with races in mmc_block again. Add some printk:s in
> the remove function so that you can see if this oops is after the remove
> function (shouldn't be possible, but will cause crash if it is).

It just occurred to me that my synopsis of the problem was utterly lame.
Here, the correct description:
When the card is pulled out, I mark the host as "ejected" (so it fast-fails all the requests),
sleep a little for it to relax and then call mmc_remove_host. Otherwise, nasty things happen.
Here's the example not involving mmc_block at all (command 3 failed, hardware timeout was missed,
so software timeout picked it up; the sleep hack is disabled):

Feb 11 02:26:08 mortug tifm_sd tifm_sd0:3: executing opcode 0x3, arg: 0x0, mask:
0x1140
Feb 11 02:26:09 mortug tifm_sd: card failed to respond for a long period of time<
7>tifm_7xx1 0000:06:09.3: checking media set 8
Feb 11 02:26:09 mortug tifm_7xx1: demand removing card from socket 0:3
Feb 11 02:26:09 mortug tifm_sd tifm_sd0:3: Setting bus width 0, power 0
Feb 11 02:26:09 mortug tifm_sd tifm_sd0:3: Setting bus width 0, power 0
Feb 11 02:26:09 mortug general protection fault: 0000 [1] SMP
Feb 11 02:26:09 mortug CPU 0
Feb 11 02:26:09 mortug Modules linked in: tifm_sd tifm_7xx1 tifm_core mmc_block m
mc_core appletalk ipx p8022 psnap llc p8023 fglrx(P) ndiswrapper snd_pcm_oss snd_
mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device eth
1394 binfmt_misc firmware_class tg3 ohci1394 yenta_socket rsrc_nonstatic ieee1394
pcmcia_core 8250_pci i2c_piix4 i2c_core snd_atiixp snd_atiixp_modem parport_pc p
arport k8temp hwmon snd_ac97_codec ac97_bus joydev snd_pcm ehci_hcd serio_raw tsd
ev pcspkr 8250_pnp 8250 serial_core snd_timer snd ohci_hcd evdev irda crc_ccitt s
oundcore snd_page_alloc usbcore
Feb 11 02:26:09 mortug Pid: 8318, comm: kmmcd Tainted: P 2.6.20 #1
Feb 11 02:26:09 mortug RIP: 0010:[<ffffffff88178d30>] [<ffffffff88178d30>] :mmc_
core:mmc_rescan+0x440/0x1350
Feb 11 02:26:09 mortug RSP: 0018:ffff81002d6d5af0 EFLAGS: 00010202
Feb 11 02:26:09 mortug RAX: 2f4065766f6d6572 RBX: 2f4065766f6d6572 RCX: 000000000
0000000
Feb 11 02:26:09 mortug RDX: ffff81002d6d5fd8 RSI: 0000000000000286 RDI: ffff81002
d6d5a88
Feb 11 02:26:09 mortug RBP: ffff81002d6d5dc8 R08: 0000000000000000 R09: 000000000
000000a
Feb 11 02:26:09 mortug R10: 00000000ffffffff R11: ffffffff88178110 R12: ffff81002
976a000
Feb 11 02:26:09 mortug R13: ffff81002976a2c0 R14: ffff81002adee800 R15: ffff81002
d6d5dc0
Feb 11 02:26:09 mortug FS: 00002b8375adab00(0000) GS:ffffffff805d0000(0000) knlG
S:00000000f7e066c0
Feb 11 02:26:09 mortug CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 11 02:26:09 mortug CR2: 000000000050e3c0 CR3: 00000000079a2000 CR4: 000000000
00006e0
Feb 11 02:26:09 mortug Process kmmcd (pid: 8318, threadinfo ffff81002d6d4000, tas
k ffff81002ea82820)
Feb 11 02:26:09 mortug Stack: ffffffff80288770 ffff81002d6d5dc8 ffff81002976a2ac
ffff81002d6d5b28
Feb 11 02:26:09 mortug ffffffff80288a38 00000cc03f1dc6bb 0000000000962228 037ffe4
382562228
Feb 11 02:26:09 mortug 00ffffff80248e8b 00ff800000000001 00000000000004d0 0000000
000000001
Feb 11 02:26:09 mortug Call Trace:
Feb 11 02:26:09 mortug [<ffffffff80288770>] task_rq_lock+0x50/0x90
Feb 11 02:26:09 mortug [<ffffffff80288a38>] __activate_task+0x38/0x50
Feb 11 02:26:09 mortug [<ffffffff8049d762>] unix_find_other+0x172/0x200
Feb 11 02:26:09 mortug [<ffffffff8028a5e0>] __wake_up_common+0x40/0x70
Feb 11 02:26:09 mortug [<ffffffff8020f199>] __alloc_pages+0x69/0x2f0
Feb 11 02:26:09 mortug [<ffffffff802110f1>] do_wp_page+0x4b1/0x510
Feb 11 02:26:09 mortug [<ffffffff80265d3e>] thread_return+0x0/0xe2
Feb 11 02:26:09 mortug [<ffffffff88178110>] :mmc_core:mmc_wait_done+0x0/0x10
Feb 11 02:26:09 mortug [<ffffffff881788f0>] :mmc_core:mmc_rescan+0x0/0x1350
Feb 11 02:26:09 mortug [<ffffffff80250001>] run_workqueue+0xb1/0x160
Feb 11 02:26:09 mortug [<ffffffff8029e670>] keventd_create_kthread+0x0/0x90
Feb 11 02:26:09 mortug [<ffffffff8024c6d1>] worker_thread+0x161/0x1a0
Feb 11 02:26:09 mortug [<ffffffff8028a590>] default_wake_function+0x0/0x10
Feb 11 02:26:09 mortug [<ffffffff8028a590>] default_wake_function+0x0/0x10
Feb 11 02:26:09 mortug [<ffffffff8024c570>] worker_thread+0x0/0x1a0
Feb 11 02:26:09 mortug [<ffffffff80233109>] kthread+0xd9/0x120
Feb 11 02:26:09 mortug [<ffffffff80262ec8>] child_rip+0xa/0x12
Feb 11 02:26:09 mortug [<ffffffff8029e670>] keventd_create_kthread+0x0/0x90
Feb 11 02:26:09 mortug [<ffffffff80233030>] kthread+0x0/0x120
Feb 11 02:26:09 mortug [<ffffffff80262ebe>] child_rip+0x0/0x12
Feb 11 02:26:09 mortug
Feb 11 02:26:09 mortug
Feb 11 02:26:09 mortug Code: 48 8b 03 0f 18 08 49 39 dd 0f 84 0c 04 00 00 f6 83 8
c 02 00
Feb 11 02:26:09 mortug RIP [<ffffffff88178d30>] :mmc_core:mmc_rescan+0x440/0x135
0
Feb 11 02:26:09 mortug RSP <ffff81002d6d5af0>
Feb 11 02:28:09 mortug shutdown[17114]: shutting down for system halt
Feb 11 02:28:09 mortug init: Switching to runlevel: 0
Feb 11 02:28:18 mortug dhcpcd[7461]: terminating on signal 15
Feb 11 02:28:21 mortug <6>ACPI: PCI interrupt for device 0000:00:14.5 disabled
Feb 11 02:28:21 mortug ACPI: PCI interrupt for device 0000:00:14.6 disabled




____________________________________________________________________________________
Don't get soaked. Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather

2007-02-11 06:04:05

by Brad Campbell

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Pierre Ossman wrote:
> Alex Dubov wrote:
>> One more problem (you may already know about it) - I was contacted by somebody from the hald
>> project and indeed I can confirm that on 2.6.20 kernel hald fails to take action on card
>> insertion. I can't see anything in my code so this may be a general mmc problem.
>> The problem is described here:
>> http://lists.freedesktop.org/archives/hal/2007-January/006960.html
>>
>
> Odd. This might be the whole sysfs restructuring thing causing problems. Can you
> check if that user has CONFIG_SYSFS_DEPRECATED on?

I've tested both with and without CONFIG_SYSFS_DEPRECATED on, both fail the same way.
hald reports that the device has no parent and decides to ignore it.

I've also tested both combinations against hal from ubuntu 6.06LTS and a more recent version

brad@bklaptop2:~$ hald --version
HAL package version: 0.5.8.1

Again with and without CONFIG_SYSFS_DEPRECATED and all combinations fail.
Hal appears to go looking for /sys/block/mmcblk0/mmcblk0p1/range

When it can't find it, it then looks for the parent of that to check what would be
/sys/block/mmcblk0/range, Complains it can't find a parent for /sys/block/mmcblk0/mmcblk0p1 and just
proceeds to ignore the event.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

2007-02-11 14:52:52

by Fabio Comolli

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Hi.
I can see this problem too. I have CONFIG_SYSFS_DEPRECATED set.

It happens only with SD cards, with MMC everything is OK.

Fabio

On 2/11/07, Brad Campbell <[email protected]> wrote:
> Pierre Ossman wrote:
> > Alex Dubov wrote:
> >> One more problem (you may already know about it) - I was contacted by somebody from the hald
> >> project and indeed I can confirm that on 2.6.20 kernel hald fails to take action on card
> >> insertion. I can't see anything in my code so this may be a general mmc problem.
> >> The problem is described here:
> >> http://lists.freedesktop.org/archives/hal/2007-January/006960.html
> >>
> >
> > Odd. This might be the whole sysfs restructuring thing causing problems. Can you
> > check if that user has CONFIG_SYSFS_DEPRECATED on?
>
> I've tested both with and without CONFIG_SYSFS_DEPRECATED on, both fail the same way.
> hald reports that the device has no parent and decides to ignore it.
>
> I've also tested both combinations against hal from ubuntu 6.06LTS and a more recent version
>
> brad@bklaptop2:~$ hald --version
> HAL package version: 0.5.8.1
>
> Again with and without CONFIG_SYSFS_DEPRECATED and all combinations fail.
> Hal appears to go looking for /sys/block/mmcblk0/mmcblk0p1/range
>
> When it can't find it, it then looks for the parent of that to check what would be
> /sys/block/mmcblk0/range, Complains it can't find a parent for /sys/block/mmcblk0/mmcblk0p1 and just
> proceeds to ignore the event.
>
> Brad
> --
> "Human beings, who are almost unique in having the ability
> to learn from the experience of others, are also remarkable
> for their apparent disinclination to do so." -- Douglas Adams
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-02-11 15:40:51

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Brad Campbell wrote:
>
> I've tested both with and without CONFIG_SYSFS_DEPRECATED on, both fail
> the same way.
> hald reports that the device has no parent and decides to ignore it.
>

Works fine here. The device tree is:

/sys/devices/pnp0/00:02/mmc0/mmc0:0001/block:mmcblk0/mmcblk0p1/

and hal correctly shows the same structure.

> I've also tested both combinations against hal from ubuntu 6.06LTS and a
> more recent version
>
> brad@bklaptop2:~$ hald --version
> HAL package version: 0.5.8.1
>

I'm using 0.5.9 here, but CONFIG_SYSFS_DEPRECATED should make it work with any
ancient version of hal.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-11 15:56:36

by Brad Campbell

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Pierre Ossman wrote:
> Brad Campbell wrote:
>> I've tested both with and without CONFIG_SYSFS_DEPRECATED on, both fail
>> the same way.
>> hald reports that the device has no parent and decides to ignore it.
>>
>
> Works fine here. The device tree is:
>
> /sys/devices/pnp0/00:02/mmc0/mmc0:0001/block:mmcblk0/mmcblk0p1/

Ok, following from that on my machine currently...

brad@bklaptop2:/$ ls /dev/mmc*
/dev/mmcblk0 /dev/mmcblk0p1

brad@bklaptop2:/$ find sys/devices | grep mmc
sys/devices/pci0000:00/0000:00:1e.0/0000:06:05.3/tifm_sd0:3/mmc_host:mmc0

brad@bklaptop2:/$ find /sys/block/ | grep mmc
/sys/block/mmcblk0
/sys/block/mmcblk0/queue
/sys/block/mmcblk0/queue/iosched
/sys/block/mmcblk0/queue/iosched/slice_idle
/sys/block/mmcblk0/queue/iosched/slice_async_rq
/sys/block/mmcblk0/queue/iosched/slice_async
/sys/block/mmcblk0/queue/iosched/slice_sync
/sys/block/mmcblk0/queue/iosched/back_seek_penalty
/sys/block/mmcblk0/queue/iosched/back_seek_max
/sys/block/mmcblk0/queue/iosched/fifo_expire_async
/sys/block/mmcblk0/queue/iosched/fifo_expire_sync
/sys/block/mmcblk0/queue/iosched/quantum
/sys/block/mmcblk0/queue/scheduler
/sys/block/mmcblk0/queue/max_sectors_kb
/sys/block/mmcblk0/queue/max_hw_sectors_kb
/sys/block/mmcblk0/queue/read_ahead_kb
/sys/block/mmcblk0/queue/nr_requests
/sys/block/mmcblk0/mmcblk0p1
/sys/block/mmcblk0/mmcblk0p1/holders
/sys/block/mmcblk0/mmcblk0p1/subsystem
/sys/block/mmcblk0/mmcblk0p1/stat
/sys/block/mmcblk0/mmcblk0p1/size
/sys/block/mmcblk0/mmcblk0p1/start
/sys/block/mmcblk0/mmcblk0p1/dev
/sys/block/mmcblk0/mmcblk0p1/uevent
/sys/block/mmcblk0/slaves
/sys/block/mmcblk0/holders
/sys/block/mmcblk0/subsystem
/sys/block/mmcblk0/device
/sys/block/mmcblk0/stat
/sys/block/mmcblk0/size
/sys/block/mmcblk0/removable
/sys/block/mmcblk0/range
/sys/block/mmcblk0/dev
/sys/block/mmcblk0/uevent

brad@bklaptop2:~$ zcat /proc/config.gz | grep CONFIG_SYSFS_DEPRECATED
CONFIG_SYSFS_DEPRECATED=y

brad@bklaptop2:~$ uname -a
Linux bklaptop2 2.6.20-git4-bkc1 #5 Sat Feb 10 23:49:52 GST 2007 i686 GNU/Linux

Not sure if any of this is relevant info, but none the less, I'm not a sysfs or hal guru.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

2007-02-11 18:04:28

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Brad Campbell wrote:
>
> brad@bklaptop2:/$ find sys/devices | grep mmc
> sys/devices/pci0000:00/0000:00:1e.0/0000:06:05.3/tifm_sd0:3/mmc_host:mmc0
>

This is strange. You should be getting more entries below that.

> /sys/block/mmcblk0/device

Where does this point?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-11 18:15:19

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
>
> It just occurred to me that my synopsis of the problem was utterly lame.
> Here, the correct description:
> When the card is pulled out, I mark the host as "ejected" (so it fast-fails all the requests),
> sleep a little for it to relax and then call mmc_remove_host. Otherwise, nasty things happen.
> Here's the example not involving mmc_block at all (command 3 failed, hardware timeout was missed,
> so software timeout picked it up; the sleep hack is disabled):
>

I believe your problem is line 915 of tifm_sd.c:

> tifm_set_drvdata(sock, NULL);

You call that before mmc_free_host() (which flushes the work queue), and I
assume something still needs it. Put in some BUG_ON() here and there and you
should be able to catch it.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-11 19:32:59

by Brad Campbell

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Pierre Ossman wrote:
> Brad Campbell wrote:
>> brad@bklaptop2:/$ find sys/devices | grep mmc
>> sys/devices/pci0000:00/0000:00:1e.0/0000:06:05.3/tifm_sd0:3/mmc_host:mmc0
>>
>
> This is strange. You should be getting more entries below that.

I believe that should be the case..

>> /sys/block/mmcblk0/device
>
brad@bklaptop2:/sys/block/mmcblk0$ ls -laR
.:
total 0
drwxr-xr-x 6 root root 0 2007-02-11 23:29 .
drwxr-xr-x 13 root root 0 2007-02-11 23:27 ..
-r--r--r-- 1 root root 4096 2007-02-11 23:28 dev
lrwxrwxrwx 1 root root 0 2007-02-11 23:27 device -> ../../class/mmc_host/mmc0/mmc0:b368
drwxr-xr-x 2 root root 0 2007-02-11 23:27 holders
drwxr-xr-x 3 root root 0 2007-02-11 23:29 mmcblk0p1
drwxr-xr-x 3 root root 0 2007-02-11 23:27 queue
-r--r--r-- 1 root root 4096 2007-02-11 23:29 range
-r--r--r-- 1 root root 4096 2007-02-11 23:27 removable
-r--r--r-- 1 root root 4096 2007-02-11 23:29 size
drwxr-xr-x 2 root root 0 2007-02-11 23:27 slaves
-r--r--r-- 1 root root 4096 2007-02-11 23:29 stat
lrwxrwxrwx 1 root root 0 2007-02-11 23:27 subsystem -> ../../block
--w------- 1 root root 4096 2007-02-11 23:29 uevent

./holders:
total 0
drwxr-xr-x 2 root root 0 2007-02-11 23:27 .
drwxr-xr-x 6 root root 0 2007-02-11 23:29 ..

./mmcblk0p1:
total 0
drwxr-xr-x 3 root root 0 2007-02-11 23:29 .
drwxr-xr-x 6 root root 0 2007-02-11 23:29 ..
-r--r--r-- 1 root root 4096 2007-02-11 23:29 dev
drwxr-xr-x 2 root root 0 2007-02-11 23:27 holders
-r--r--r-- 1 root root 4096 2007-02-11 23:29 size
-r--r--r-- 1 root root 4096 2007-02-11 23:29 start
-r--r--r-- 1 root root 4096 2007-02-11 23:29 stat
lrwxrwxrwx 1 root root 0 2007-02-11 23:29 subsystem -> ../../../block
--w------- 1 root root 4096 2007-02-11 23:29 uevent

./mmcblk0p1/holders:
total 0
drwxr-xr-x 2 root root 0 2007-02-11 23:27 .
drwxr-xr-x 3 root root 0 2007-02-11 23:29 ..

./queue:
total 0
drwxr-xr-x 3 root root 0 2007-02-11 23:27 .
drwxr-xr-x 6 root root 0 2007-02-11 23:29 ..
drwxr-xr-x 2 root root 0 2007-02-11 23:27 iosched
-r--r--r-- 1 root root 4096 2007-02-11 23:29 max_hw_sectors_kb
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 max_sectors_kb
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 nr_requests
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 read_ahead_kb
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 scheduler

./queue/iosched:
total 0
drwxr-xr-x 2 root root 0 2007-02-11 23:27 .
drwxr-xr-x 3 root root 0 2007-02-11 23:29 ..
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 back_seek_max
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 back_seek_penalty
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 fifo_expire_async
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 fifo_expire_sync
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 quantum
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 slice_async
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 slice_async_rq
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 slice_idle
-rw-r--r-- 1 root root 4096 2007-02-11 23:29 slice_sync

./slaves:
total 0
drwxr-xr-x 2 root root 0 2007-02-11 23:27 .
drwxr-xr-x 6 root root 0 2007-02-11 23:29 ..

brad@bklaptop2:/sys/block/mmcblk0$ cd device
brad@bklaptop2:/sys/block/mmcblk0/device$ ls -laR
.:
total 0
drwxr-xr-x 3 root root 0 2007-02-11 23:30 .
drwxr-xr-x 4 root root 0 2007-02-11 23:27 ..
lrwxrwxrwx 1 root root 0 2007-02-11 23:30 block:mmcblk0 -> ../../../../block/mmcblk0
lrwxrwxrwx 1 root root 0 2007-02-11 23:30 bus -> ../../../../bus/mmc
-r--r--r-- 1 root root 4096 2007-02-11 23:30 cid
-r--r--r-- 1 root root 4096 2007-02-11 23:30 csd
-r--r--r-- 1 root root 4096 2007-02-11 23:30 date
lrwxrwxrwx 1 root root 0 2007-02-11 23:27 driver -> ../../../../bus/mmc/drivers/mmcblk
-r--r--r-- 1 root root 4096 2007-02-11 23:30 fwrev
-r--r--r-- 1 root root 4096 2007-02-11 23:30 hwrev
-r--r--r-- 1 root root 4096 2007-02-11 23:30 manfid
-r--r--r-- 1 root root 4096 2007-02-11 23:30 name
-r--r--r-- 1 root root 4096 2007-02-11 23:30 oemid
drwxr-xr-x 2 root root 0 2007-02-11 23:27 power
-r--r--r-- 1 root root 4096 2007-02-11 23:30 scr
-r--r--r-- 1 root root 4096 2007-02-11 23:30 serial
lrwxrwxrwx 1 root root 0 2007-02-11 23:27 subsystem -> ../../../../bus/mmc
--w------- 1 root root 4096 2007-02-11 23:30 uevent

./power:
total 0
drwxr-xr-x 2 root root 0 2007-02-11 23:27 .
drwxr-xr-x 3 root root 0 2007-02-11 23:30 ..
-rw-r--r-- 1 root root 4096 2007-02-11 23:30 wakeup

Alex, it's still hit and miss getting this card detected. I had to insert/remove the card over 10
times with random driver load/unloads until it created the device entries..

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

2007-02-12 09:14:35

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Brad Campbell wrote:
> brad@bklaptop2:/sys/block/mmcblk0$ ls -laR
> .:
> total 0
> drwxr-xr-x 6 root root 0 2007-02-11 23:29 .
> drwxr-xr-x 13 root root 0 2007-02-11 23:27 ..
> -r--r--r-- 1 root root 4096 2007-02-11 23:28 dev
> lrwxrwxrwx 1 root root 0 2007-02-11 23:27 device ->
> ../../class/mmc_host/mmc0/mmc0:b368

Is this with CONFIG_SYSFS_DEPRECATED off? It should be pointing to the
../../devices tree.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-12 10:50:41

by Brad Campbell

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Pierre Ossman wrote:
> Brad Campbell wrote:
>> brad@bklaptop2:/sys/block/mmcblk0$ ls -laR
>> .:
>> total 0
>> drwxr-xr-x 6 root root 0 2007-02-11 23:29 .
>> drwxr-xr-x 13 root root 0 2007-02-11 23:27 ..
>> -r--r--r-- 1 root root 4096 2007-02-11 23:28 dev
>> lrwxrwxrwx 1 root root 0 2007-02-11 23:27 device ->
>> ../../class/mmc_host/mmc0/mmc0:b368
>
> Is this with CONFIG_SYSFS_DEPRECATED off? It should be pointing to the
> ../../devices tree.
>
> Rgds

No, it's currently on. I'll recompile and do a dump with it off this evening. Either way, it's still
broken and in 2.6.19 it wasn't.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

2007-02-12 11:53:45

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver


--- Pierre Ossman <[email protected]> wrote:

> Alex Dubov wrote:
> >
> > It just occurred to me that my synopsis of the problem was utterly lame.
> > Here, the correct description:
> > When the card is pulled out, I mark the host as "ejected" (so it fast-fails all the requests),
> > sleep a little for it to relax and then call mmc_remove_host. Otherwise, nasty things happen.
> > Here's the example not involving mmc_block at all (command 3 failed, hardware timeout was
> missed,
> > so software timeout picked it up; the sleep hack is disabled):
> >
>
> I believe your problem is line 915 of tifm_sd.c:
>
> > tifm_set_drvdata(sock, NULL);
>
> You call that before mmc_free_host() (which flushes the work queue), and I
> assume something still needs it. Put in some BUG_ON() here and there and you
> should be able to catch it.
>
I removed that line altogether (it does not really needed as mmc host will not be accessed
anymore). The problem is more elaborate. Here, the card fails, mmc_host_remove is called without
sleep beforehand, and "after remove" message is printed immediately after it. Only then, mmc_block
remembers to finish its business. If I leave the sleep in place, mmc_block's stuff will get
scheduled before the mmc_remove_host and everything will be all right.
You may also notice that host is already powered off ("Setting ... power 0" message) and still
mmc_block continues to make requests like nothing happened.

Feb 12 22:06:55 mortug tifm_sd tifm_sd0:3: host_status 400, fifo_status 0
Feb 12 22:06:56 mortug tifm_sd: card failed to respond for a long period of time
Feb 12 22:06:56 mortug tifm_7xx1 0000:06:09.3: checking media set 8
Feb 12 22:06:56 mortug tifm_7xx1: demand removing card from socket 0:3
Feb 12 22:06:56 mortug tifm_sd tifm_sd0:3: Setting bus width 0, power 0
Feb 12 22:06:56 mortug tifm_sd tifm_sd0:3: after remove
Feb 12 22:06:56 mortug mmcblk1: error 1 sending read/write command
Feb 12 22:06:56 mortug end_request: I/O error, dev mmcblk1, sector 0
Feb 12 22:06:56 mortug printk: 137 messages suppressed.
Feb 12 22:06:56 mortug Buffer I/O error on device mmcblk1, logical block 0
Feb 12 22:06:56 mortug divide error: 0000 [1] SMP
Feb 12 22:06:56 mortug CPU 0
Feb 12 22:06:56 mortug Modules linked in: ...



____________________________________________________________________________________
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html

2007-02-12 13:21:17

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver - one more

Brad Campbell wrote:
> Alex, it's still hit and miss getting this card detected. I had to
> insert/remove the card over 10 times with random driver load/unloads
> until it created the device entries..

And for me it's still worse, no matter what I try with 2.6.20:

speedy:~ # find /sys/devices | grep tifm
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/mmc_host:mmc0
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/driver
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/bus
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/subsystem
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/power
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/power/wakeup
/sys/devices/pci0000:00/0000:00:14.4/0000:03:09.3/tifm_sd0:3/uevent

speedy:~ # find /sys/block | grep mmc
speedy:~ #

--
Andreas Steinmetz SPAMmers use [email protected]

2007-02-12 13:49:41

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
> I removed that line altogether (it does not really needed as mmc host will not be accessed
> anymore). The problem is more elaborate. Here, the card fails, mmc_host_remove is called without
> sleep beforehand, and "after remove" message is printed immediately after it. Only then, mmc_block
> remembers to finish its business. If I leave the sleep in place, mmc_block's stuff will get
> scheduled before the mmc_remove_host and everything will be all right.
> You may also notice that host is already powered off ("Setting ... power 0" message) and still
> mmc_block continues to make requests like nothing happened.
>

How did you do the "after remove" detection? Patch?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-13 00:44:33

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver


--- Pierre Ossman <[email protected]> wrote:

> Alex Dubov wrote:
> > I removed that line altogether (it does not really needed as mmc host will not be accessed
> > anymore). The problem is more elaborate. Here, the card fails, mmc_host_remove is called
> without
> > sleep beforehand, and "after remove" message is printed immediately after it. Only then,
> mmc_block
> > remembers to finish its business. If I leave the sleep in place, mmc_block's stuff will get
> > scheduled before the mmc_remove_host and everything will be all right.
> > You may also notice that host is already powered off ("Setting ... power 0" message) and still
> > mmc_block continues to make requests like nothing happened.
> >
>
> How did you do the "after remove" detection? Patch?
>
(Power message is printed by ios callback)
Here's my current (SVN) remove function:

static void tifm_sd_remove(struct tifm_dev *sock)
{
struct mmc_host *mmc = tifm_get_drvdata(sock);
struct tifm_sd *host = mmc_priv(mmc);
unsigned long flags;

tasklet_kill(&host->finish_tasklet);
spin_lock_irqsave(&sock->lock, flags);
host->flags |= EJECT;
writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
mmiowb();

if (host->req) {
writel(TIFM_FIFO_INT_SETALL,
sock->addr + SOCK_DMA_FIFO_INT_ENABLE_CLEAR);
writel(0, sock->addr + SOCK_DMA_FIFO_INT_ENABLE_SET);
host->req->cmd->error = MMC_ERR_TIMEOUT;
if (host->req->stop)
host->req->stop->error = MMC_ERR_TIMEOUT;
tasklet_schedule(&host->finish_tasklet);
}
spin_unlock_irqrestore(&sock->lock, flags);
// temporary hack
msleep(1000);
mmc_remove_host(mmc);
dev_dbg(&sock->dev, "after remove\n");

/* The meaning of the bit majority in this constant is unknown. */
writel(0xfff8 & readl(sock->addr + SOCK_CONTROL),
sock->addr + SOCK_CONTROL);

mmc_free_host(mmc);
}



____________________________________________________________________________________
Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097

2007-02-13 14:40:55

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

If we are already on the topic, I would like to report two additional issues with mmc_block:

1. If, for some reason, device driver cannot return the requested data amount, but does not sets
any error, mmc_block would retry indefinitely. Of course, its always a device driver's fault, but
may be we can set some limit on retry count (this can help a lot with debugging).

And the more serious:
2. There was a write corruption problem with tifm_sd caused by missing wait cycle (card busy/card
not busy) after stop command. It should not had been a problem (the mmc layer was spinning around
with CMD13 untill the card become not-busy), but for some reason it was. We are currently testing
this. The intersting part, however, is behavior of mmc_block given this situation:

It appears that mmc_block's instance manages to get stuck because of this. The symptoms: module
usage count is not decremented when low level driver is unloaded and partition block devices do
not get created afterwards. The fun part: the main block device gets created and deleted on card
insertion/removal and its dump is correct (dd if=/dev/mmcblk0 ...); yet partition detection does
not happens. To fix this, one have to reboot the machine or to wait about 30 minutes for mmc_block
to regain its senses (then it becomes rmmod'able again).

On the other hand, it may be some sort of generic block layer problem.




____________________________________________________________________________________
Sucker-punch spam with award-winning protection.
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html

2007-02-17 15:16:41

by Alex Dubov

[permalink] [raw]
Subject: [mmc] incorrect behavior on resume

And today: yet another problem with mmc.
It so happens that after resume mmc layer issues requests to the device before mmc_resume_host is
called at all. Moreover, this prevents the machine from resuming, unless worked around, because
software timer does not work at this stage of the resume and interrupts may not be delivered (if
card was removed, for example).
And here are some logs (condition: card is present when machine is suspended and removed before it
is resumed).
Normally, only this is seen in the log:
---------
Feb 18 01:42:09 mortug usbdev3.1_ep00: PM: resume from 0, parent usb3 still 2
Feb 18 01:42:09 mortug usbdev3.1_ep81: PM: resume from 0, parent 3-0:1.0 still 2
Feb 18 01:42:09 mortug tifm_sd0:3 : controller failed to reset
Feb 18 01:42:09 mortug tifm_sd tifm_sd0:3: resume initialize -19
Feb 18 01:42:09 mortug mmcblk0: unable to set block size to 512: 1
---------
Here, the controller failed to reset because card disappeared, so mmc_resume_host will not be
called - however mmc layer already complains that it can not set the block size. To get a better
log, I commented out the mmc_remove_host and left the card in place:

--------
Feb 18 01:02:58 mortug usbdev3.1_ep00: PM: resume from 0, parent usb3 still 2
Feb 18 01:02:58 mortug usbdev3.1_ep81: PM: resume from 0, parent 3-0:1.0 still 2
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: resume initialize 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x7, arg: 0xe6240000, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x7, arg: 0xe6240000, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x7, arg: 0xe6240000, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x7, arg: 0xe6240000, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: executing opcode 0x10, arg: 0x200, mask: 0x2140
Feb 18 01:02:58 mortug tifm_sd tifm_sd0:3: host event: host_status 80, flags 0
Feb 18 01:02:58 mortug mmcblk0: unable to set block size to 512: 1
-----

It appears to me that mmc_block resumes and starts issuing requests all by itself, which is
incorrect.

For reference - correct resume sequence (card remains in place, mmc_resume_host called):

------
Feb 18 01:41:34 mortug usbdev3.1_ep00: PM: resume from 0, parent usb3 still 2
Feb 18 01:41:34 mortug usbdev3.1_ep81: PM: resume from 0, parent 3-0:1.0 still 2
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: resume initialize 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: ios: clock = 0, vdd = 15, bus_mode = 1, chip_select =
0, power_mode = 1, bus_width = 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: ios: clock = 333333, vdd = 15, bus_mode = 1,
chip_select = 0, power_mode = 2, bus_width = 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: ios: clock = 333333, vdd = 15, bus_mode = 1,
chip_select = 1, power_mode = 2, bus_width = 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: executing opcode 0x0, arg: 0x0, mask: 0x40
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: host event: host_status 1, flags 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: ios: clock = 333333, vdd = 15, bus_mode = 1,
chip_select = 0, power_mode = 2, bus_width = 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: executing opcode 0x37, arg: 0x0, mask: 0x2140
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: host event: host_status 1, flags 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: executing opcode 0x29, arg: 0x0, mask: 0x1340
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: host event: host_status 1001, flags 0
Feb 18 01:41:34 mortug tifm_sd tifm_sd0:3: ios: clock = 333333, vdd = 14, bus_mode = 1,
chip_select = 0, power_mode = 2, bus_width = 0
-------

My really simple resume function:
-------
static int tifm_sd_resume(struct tifm_dev *sock)
{
struct mmc_host *mmc = tifm_get_drvdata(sock);
struct tifm_sd *host = mmc_priv(mmc);
int rc;

rc = tifm_sd_initialize_host(host);
dev_dbg(&sock->dev, "resume initialize %d\n", rc);

if (!rc) {
host->eject = 0;
rc = mmc_resume_host(mmc);
}

return rc;
}





____________________________________________________________________________________
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091

2007-02-17 23:14:06

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
> If we are already on the topic, I would like to report two additional issues with mmc_block:
>
> 1. If, for some reason, device driver cannot return the requested data amount, but does not sets
> any error, mmc_block would retry indefinitely. Of course, its always a device driver's fault, but
> may be we can set some limit on retry count (this can help a lot with debugging).

Well, we could add a check that data requests get fully satisfied.

>
> And the more serious:
> 2. There was a write corruption problem with tifm_sd caused by missing wait cycle (card busy/card
> not busy) after stop command. It should not had been a problem (the mmc layer was spinning around
> with CMD13 untill the card become not-busy), but for some reason it was. We are currently testing
> this. The intersting part, however, is behavior of mmc_block given this situation:
>
> It appears that mmc_block's instance manages to get stuck because of this. The symptoms: module
> usage count is not decremented when low level driver is unloaded and partition block devices do
> not get created afterwards. The fun part: the main block device gets created and deleted on card
> insertion/removal and its dump is correct (dd if=/dev/mmcblk0 ...); yet partition detection does
> not happens. To fix this, one have to reboot the machine or to wait about 30 minutes for mmc_block
> to regain its senses (then it becomes rmmod'able again).
>
> On the other hand, it may be some sort of generic block layer problem.
>

The block layer can get a bit fuzzy when you start yanking device out from under
it. That said, the mmc block driver should be forgiving. So if you can figure
out what it is up to (and more exactly how it is provoked), I'll try to fix it.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-17 23:35:32

by Pierre Ossman

[permalink] [raw]
Subject: Re: [mmc] incorrect behavior on resume

Alex Dubov wrote:
> And today: yet another problem with mmc.
> It so happens that after resume mmc layer issues requests to the device before mmc_resume_host is
> called at all. Moreover, this prevents the machine from resuming, unless worked around, because
> software timer does not work at this stage of the resume and interrupts may not be delivered (if
> card was removed, for example).

Now this sounds incredibly broken. A child device should never be resumed before
its parent. Pavel, can you comment?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-17 23:51:05

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
> I removed that line altogether (it does not really needed as mmc host will not be accessed
> anymore). The problem is more elaborate. Here, the card fails, mmc_host_remove is called without
> sleep beforehand, and "after remove" message is printed immediately after it. Only then, mmc_block
> remembers to finish its business. If I leave the sleep in place, mmc_block's stuff will get
> scheduled before the mmc_remove_host and everything will be all right.
> You may also notice that host is already powered off ("Setting ... power 0" message) and still
> mmc_block continues to make requests like nothing happened.
>

I don't actually think that is what happening. The block errors tend to trail a
bit behind, so the errors you are seeing are probably the result of the queue
being flushed out as you remove the card. I don't see any mmc debug messages
that indicate that is trying to send more mmc requests.

You could add a printk to the queue thread in mmc_queue.c. That will allow you
to see exactly when it exits (which should be before mmc_remove_host() returns).

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-18 00:29:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [mmc] incorrect behavior on resume

On Sun 2007-02-18 00:35:33, Pierre Ossman wrote:
> Alex Dubov wrote:
> > And today: yet another problem with mmc.
> > It so happens that after resume mmc layer issues requests to the device before mmc_resume_host is
> > called at all. Moreover, this prevents the machine from resuming, unless worked around, because
> > software timer does not work at this stage of the resume and interrupts may not be delivered (if
> > card was removed, for example).
>
> Now this sounds incredibly broken. A child device should never be resumed before
> its parent. Pavel, can you comment?

No, child devices should not be resumed before their parents. Is it
true child?

What bus is mmc on? Timer should be resumed fairly early...?

Can you confirm that rmmod/insmod of mmc around suspend fixes the issue?

But this is greg's area.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-02-18 01:46:37

by Alex Dubov

[permalink] [raw]
Subject: Re: [mmc] incorrect behavior on resume


--- Pavel Machek <[email protected]> wrote:

> On Sun 2007-02-18 00:35:33, Pierre Ossman wrote:
> > Alex Dubov wrote:
> > > And today: yet another problem with mmc.
> > > It so happens that after resume mmc layer issues requests to the device before
> mmc_resume_host is
> > > called at all. Moreover, this prevents the machine from resuming, unless worked around,
> because
> > > software timer does not work at this stage of the resume and interrupts may not be delivered
> (if
> > > card was removed, for example).
> >
> > Now this sounds incredibly broken. A child device should never be resumed before
> > its parent. Pavel, can you comment?
>
> No, child devices should not be resumed before their parents. Is it
> true child?
>
> What bus is mmc on? Timer should be resumed fairly early...?
>
> Can you confirm that rmmod/insmod of mmc around suspend fixes the issue?
>
> But this is greg's area.
It sure does - if I'll do "rmmod mmc" I'd be forced to remove my driver as well (symbol
dependency) and the problem will magically disappear.

The problem here is that mmc_block's device is a child of real device (tifm_dev here), so it gets
resumed right after it. However, it correct functioning depends on mmc_core, which must be
manually resumed (mmc_resume_host). Therefore, I think this is purely mmc's problem.




____________________________________________________________________________________
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front

2007-02-18 04:59:49

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

> I don't actually think that is what happening. The block errors tend to trail a
> bit behind, so the errors you are seeing are probably the result of the queue
> being flushed out as you remove the card. I don't see any mmc debug messages
> that indicate that is trying to send more mmc requests.

Yes, indeed - it does not issue new requests after remove. However, mmc_remove_host does not wait
for all issued requests to complete and it is a problem.
Here's my remove procedure:
1. cut off interrupt signaling call back - no tasklets will be coming from here
2. kill request completion tasklet
3. check if there's still an unfinished request - if yes, schedule a completion tasklet
3a. if I'll sleep here, just scheduled tasklet will have a chance to complete
3b. This is what happens if I don't sleep:
(this line is printed by completion tasklet: tifm_sd0:3: num_bl 7, b_s 512, bsh 443)
------
Feb 18 15:22:40 mortug tifm_sd0:3 : card failed to respond for a long period of time (12, 1)
Feb 18 15:22:40 mortug tifm_7xx1 0000:06:09.3: checking media set 8
Feb 18 15:22:40 mortug tifm0 : demand removing card from socket 0:3
Feb 18 15:22:40 mortug mmc0: clock 0Hz busmode 1 powermode 0 cs 0 Vdd 0 width 0
Feb 18 15:22:40 mortug tifm_sd tifm_sd0:3: ios: clock = 0, vdd = 0, bus_mode = 1, chip_select = 0,
power_mode = 0, bus_width = 0
Feb 18 15:22:40 mortug tifm_sd tifm_sd0:3: after remove
Feb 18 15:22:40 mortug tifm_sd tifm_sd0:3: num_bl 7, b_s 512, bsh 443
Feb 18 15:22:40 mortug mmc0: req done (CMD18): 1/0/1: 00000900 00000000 00000000 00000000
Feb 18 15:22:40 mortug mmcblk0: error 1 sending read/write command
Feb 18 15:22:40 mortug end_request: I/O error, dev mmcblk0, sector 0
Feb 18 15:22:40 mortug printk: 79 messages suppressed.
Feb 18 15:22:40 mortug Buffer I/O error on device mmcblk0, logical block 0
Feb 18 15:22:40 mortug divide error: 0000 [1] SMP
Feb 18 15:22:40 mortug CPU 0
......
Feb 18 15:22:40 mortug Call Trace:
Feb 18 15:22:40 mortug [<ffffffff88111280>] :mmc_block:mmc_blk_issue_rq+0xf0/0x600
Feb 18 15:22:40 mortug [<ffffffff8020f199>] __alloc_pages+0x69/0x2f0
Feb 18 15:22:40 mortug [<ffffffff802110f1>] do_wp_page+0x4b1/0x510
Feb 18 15:22:40 mortug [<ffffffff8028d690>] task_rq_lock+0x50/0x90
Feb 18 15:22:40 mortug [<ffffffff8028d958>] __activate_task+0x38/0x50
Feb 18 15:22:40 mortug [<ffffffff80248ebb>] try_to_wake_up+0x42b/0x440
Feb 18 15:22:40 mortug [<ffffffff8038d43d>] elv_rb_del+0x2d/0x50
Feb 18 15:22:40 mortug [<ffffffff8038de2d>] elv_next_request+0x15d/0x1e0
Feb 18 15:22:40 mortug [<ffffffff8821c9b8>] :mmc_core:mmc_queue_thread+0xd8/0x110
Feb 18 15:22:40 mortug [<ffffffff8821c8e0>] :mmc_core:mmc_queue_thread+0x0/0x110
Feb 18 15:22:40 mortug [<ffffffff802a4d30>] keventd_create_kthread+0x0/0x90
Feb 18 15:22:40 mortug [<ffffffff80233169>] kthread+0xd9/0x120
Feb 18 15:22:40 mortug [<ffffffff80262ec8>] child_rip+0xa/0x12
Feb 18 15:22:40 mortug [<ffffffff802a4d30>] keventd_create_kthread+0x0/0x90
Feb 18 15:22:40 mortug [<ffffffff80233090>] kthread+0x0/0x120
Feb 18 15:22:40 mortug [<ffffffff80262ebe>] child_rip+0x0/0x12
Feb 18 15:22:40 mortug
Feb 18 15:22:40 mortug
Feb 18 15:22:40 mortug Code: f7 f6 41 01 c1 41 83 f8 01 19 c0 25 10 b6 fd ff 05 90 d0 03
Feb 18 15:22:40 mortug RIP [<ffffffff8821a44c>] :mmc_core:mmc_set_data_timeout+0x7c/0xb0
Feb 18 15:22:40 mortug RSP <ffff81003ad97c98>




____________________________________________________________________________________
Don't get soaked. Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather

2007-02-18 06:53:31

by Russell King

[permalink] [raw]
Subject: Re: [mmc] incorrect behavior on resume

On Sat, Feb 17, 2007 at 05:46:35PM -0800, Alex Dubov wrote:
> The problem here is that mmc_block's device is a child of real device
> (tifm_dev here), so it gets resumed right after it.

The host driver is supposed to call mmc_resume_host from it's resume
callback. This should be called before the child's resume callback.

> However, it correct functioning depends on mmc_core, which must be
> manually resumed (mmc_resume_host). Therefore, I think this is
> purely mmc's problem.

I don't see that - as I say above, the correct sequence is:

- host device resume
- calls mmc_resume_host()
- child's device resume (mmc_blk_resume)
- mmc_queue_resume()

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-02-18 08:13:09

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
>> I don't actually think that is what happening. The block errors tend to trail a
>> bit behind, so the errors you are seeing are probably the result of the queue
>> being flushed out as you remove the card. I don't see any mmc debug messages
>> that indicate that is trying to send more mmc requests.
>
> Yes, indeed - it does not issue new requests after remove. However, mmc_remove_host does not wait
> for all issued requests to complete and it is a problem.

I don't see how that is possible. mmc_block's remove routine waits for mmcqd to
exit, so there can't be any code still alive that has a request going. (I am
also completely unable to reproduce this problem here).

Add more printk:s do verify how the code in mmc_block executes.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-18 12:18:15

by Alex Dubov

[permalink] [raw]
Subject: Re: [mmc] incorrect behavior on resume


> I don't see that - as I say above, the correct sequence is:
>
> - host device resume
> - calls mmc_resume_host()
> - child's device resume (mmc_blk_resume)
> - mmc_queue_resume()
>
Of course, I understand that this is a correct sequence. It simply was not obvious to me that host
will issue requests even if mmc_resume_host is not called.

I'm trying to implement a hotplugable host, so I assumed that if host was removed during the
suspended state there's no need to resume it.




____________________________________________________________________________________
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367

2007-02-18 14:19:05

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

>
> I don't see how that is possible. mmc_block's remove routine waits for mmcqd to
> exit, so there can't be any code still alive that has a request going. (I am
> also completely unable to reproduce this problem here).
>
> Add more printk:s do verify how the code in mmc_block executes.
>
This is hard to trigger problem, so I'll spare you the rather lengthy log.
It happens if card timeouts and mmc_remove_host is called while mmc_register_card is still in
progress (the hint was in crash dump). If I sleep before remove, it gives the mmc_register_card
chance to finish/mark card as dead and everything's fine.
The remove callback of mmc_block is apparently not executed in this case (probably because device
has not finished registering).

My proposition: lets flush the workqueue first thing in the mmc_remove_host (and make sure that
nothing gets scheduled into it after this).




____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html

2007-02-18 15:10:53

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
> This is hard to trigger problem, so I'll spare you the rather lengthy log.
> It happens if card timeouts and mmc_remove_host is called while mmc_register_card is still in
> progress (the hint was in crash dump). If I sleep before remove, it gives the mmc_register_card
> chance to finish/mark card as dead and everything's fine.
> The remove callback of mmc_block is apparently not executed in this case (probably because device
> has not finished registering).

Let's see, you manage to call mmc_remove_host() when a detection is pending in
the workqueue? That could indeed generate new request (up until mmc_free_host()
is called), but as the card is removed the mmc layer shouldn't be able to detect
anything, hence mmc_block should never get involved.

>
> My proposition: lets flush the workqueue first thing in the mmc_remove_host (and make sure that
> nothing gets scheduled into it after this).
>

This is fixed in -mm. Making sure that nothing gets scheduled is the driver's
responsibility. I've added checks to catch broken drivers though.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-18 23:48:24

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

> > This is hard to trigger problem, so I'll spare you the rather lengthy log.
> > It happens if card timeouts and mmc_remove_host is called while mmc_register_card is still in
> > progress (the hint was in crash dump). If I sleep before remove, it gives the
> mmc_register_card
> > chance to finish/mark card as dead and everything's fine.
> > The remove callback of mmc_block is apparently not executed in this case (probably because
> device
> > has not finished registering).
>
> Let's see, you manage to call mmc_remove_host() when a detection is pending in
> the workqueue? That could indeed generate new request (up until mmc_free_host()
> is called), but as the card is removed the mmc layer shouldn't be able to detect
> anything, hence mmc_block should never get involved.
>

You'll agree, I think, that add_disk in mmc_block_probe issues a lot of requests (reads partition
table, fs superblocks and such - plenty of room for critical errors). Then, driver's remove method
will not be called before driver's probe method had finished. So mmc_block is quite involved, even
though it does not affect the problem's resolution.

Besides this, I have a new version of the tifm driver to submit. It fixes quite a few problems and
improves performance a little. Hopefully, it's better tested than the previous one.






____________________________________________________________________________________
Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097

2007-02-19 07:06:22

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
>
> You'll agree, I think, that add_disk in mmc_block_probe issues a lot of requests (reads partition
> table, fs superblocks and such - plenty of room for critical errors). Then, driver's remove method
> will not be called before driver's probe method had finished. So mmc_block is quite involved, even
> though it does not affect the problem's resolution.

I agree that mmc_block's probe method will generate a whole bunch of requests.
But I don't see how that can be called given the scenario you describe.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-19 12:09:10

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver


--- Pierre Ossman <[email protected]> wrote:

> Alex Dubov wrote:
> >
> > You'll agree, I think, that add_disk in mmc_block_probe issues a lot of requests (reads
> partition
> > table, fs superblocks and such - plenty of room for critical errors). Then, driver's remove
> method
> > will not be called before driver's probe method had finished. So mmc_block is quite involved,
> even
> > though it does not affect the problem's resolution.
>
> I agree that mmc_block's probe method will generate a whole bunch of requests.
> But I don't see how that can be called given the scenario you describe.

mmc_rescan
mmc_register_card
device_add
mmc_block_probe
mmc_block_alloc
-> queue thread starts running
add_disk
-> issues a lot of requests; card fails, my drivers calls mmc_remove_host, which in
turn calls device_del, though we are still in device_add




____________________________________________________________________________________
Have a burning question?
Go to http://www.Answers.yahoo.com and get answers from real people who know.

2007-02-19 12:34:36

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
>
> mmc_rescan
> mmc_register_card
> device_add
> mmc_block_probe
> mmc_block_alloc
> -> queue thread starts running
> add_disk
> -> issues a lot of requests; card fails, my drivers calls mmc_remove_host, which in
> turn calls device_del, though we are still in device_add
>

Ahhh, now I see. Well that's an entirely different kettle of fish.

Removing the device in this stack is not supported (we would have a whole bunch
of nasty dead lock possibilities to consider). You need to delay the removal in
some form, e.g. using the mmc workqueue.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-20 01:30:08

by Alex Dubov

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver


> >
> > mmc_rescan
> > mmc_register_card
> > device_add
> > mmc_block_probe
> > mmc_block_alloc
> > -> queue thread starts running
> > add_disk
> > -> issues a lot of requests; card fails, my drivers calls mmc_remove_host, which in

correction: my driver schedules (wakes kthread in 0.7) mmc_remove_host - noticed it only now

> > turn calls device_del, though we are still in device_add
> >
>

That's why I think that simply flushing the workqueue is enough. If workqueue is empty we know for
sure that device_add has exited.




____________________________________________________________________________________
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html

2007-02-20 07:03:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: Recent and not-so problems with tifm_sd driver

Alex Dubov wrote:
>
> correction: my driver schedules (wakes kthread in 0.7) mmc_remove_host - noticed it only now
>

Ok, good. Then things are as they should on your part.

>
> That's why I think that simply flushing the workqueue is enough. If workqueue is empty we know for
> sure that device_add has exited.
>

>From -mm:

http://www.kernel.org/git/?p=linux/kernel/git/drzeus/mmc.git;a=commitdiff;h=e89bac488861ebadfca3a74321af19a262dcbd08

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org