2023-03-29 20:23:01

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()

When using dm-verity with a data partition on an emmc device, dm-verity
races with the discovery of attached emmc devices. This is because mmc's
probing code sets up the host data structure then a work item is
scheduled to do discovery afterwards. To prevent this race on init,
let's inline the first call to detection, __mm_scan(), and let
subsequent detect calls be handled via the workqueue.

Signed-off-by: Dennis Zhou <[email protected]>
---
drivers/mmc/core/core.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..c0fdc438c882 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
}
EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);

-void mmc_rescan(struct work_struct *work)
+void __mmc_rescan(struct mmc_host *host)
{
- struct mmc_host *host =
- container_of(work, struct mmc_host, detect.work);
int i;

if (host->rescan_disable)
@@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
mmc_schedule_delayed_work(&host->detect, HZ);
}

+void mmc_rescan(struct work_struct *work)
+{
+ struct mmc_host *host =
+ container_of(work, struct mmc_host, detect.work);
+
+ __mmc_rescan(host);
+}
+
void mmc_start_host(struct mmc_host *host)
{
host->f_init = max(min(freqs[0], host->f_max), host->f_min);
@@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
}

mmc_gpiod_request_cd_irq(host);
- _mmc_detect_change(host, 0, false);
+ host->detect_change = 1;
+ __mmc_rescan(host);
}

void __mmc_stop_host(struct mmc_host *host)
--
2.40.0


2023-03-29 23:14:51

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Dennis,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf-hansson-mmc-mirror/next]
[also build test WARNING on linus/master v6.3-rc4 next-20230329]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
base: https://git.linaro.org/people/ulf.hansson/mmc-mirror.git next
patch link: https://lore.kernel.org/r/20230329202148.71107-1-dennis%40kernel.org
patch subject: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
config: arm-randconfig-r046-20230329 (https://download.01.org/0day-ci/archive/20230330/[email protected]/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/d2de7314d2198df0c7a452546af0c15799b2d864
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
git checkout d2de7314d2198df0c7a452546af0c15799b2d864
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/mmc/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> drivers/mmc/core/core.c:2202:6: warning: no previous prototype for '__mmc_rescan' [-Wmissing-prototypes]
2202 | void __mmc_rescan(struct mmc_host *host)
| ^~~~~~~~~~~~


vim +/__mmc_rescan +2202 drivers/mmc/core/core.c

2201
> 2202 void __mmc_rescan(struct mmc_host *host)
2203 {
2204 int i;
2205
2206 if (host->rescan_disable)
2207 return;
2208
2209 /* If there is a non-removable card registered, only scan once */
2210 if (!mmc_card_is_removable(host) && host->rescan_entered)
2211 return;
2212 host->rescan_entered = 1;
2213
2214 if (host->trigger_card_event && host->ops->card_event) {
2215 mmc_claim_host(host);
2216 host->ops->card_event(host);
2217 mmc_release_host(host);
2218 host->trigger_card_event = false;
2219 }
2220
2221 /* Verify a registered card to be functional, else remove it. */
2222 if (host->bus_ops)
2223 host->bus_ops->detect(host);
2224
2225 host->detect_change = 0;
2226
2227 /* if there still is a card present, stop here */
2228 if (host->bus_ops != NULL)
2229 goto out;
2230
2231 mmc_claim_host(host);
2232 if (mmc_card_is_removable(host) && host->ops->get_cd &&
2233 host->ops->get_cd(host) == 0) {
2234 mmc_power_off(host);
2235 mmc_release_host(host);
2236 goto out;
2237 }
2238
2239 /* If an SD express card is present, then leave it as is. */
2240 if (mmc_card_sd_express(host)) {
2241 mmc_release_host(host);
2242 goto out;
2243 }
2244
2245 for (i = 0; i < ARRAY_SIZE(freqs); i++) {
2246 unsigned int freq = freqs[i];
2247 if (freq > host->f_max) {
2248 if (i + 1 < ARRAY_SIZE(freqs))
2249 continue;
2250 freq = host->f_max;
2251 }
2252 if (!mmc_rescan_try_freq(host, max(freq, host->f_min)))
2253 break;
2254 if (freqs[i] <= host->f_min)
2255 break;
2256 }
2257
2258 /*
2259 * Ignore the command timeout errors observed during
2260 * the card init as those are excepted.
2261 */
2262 host->err_stats[MMC_ERR_CMD_TIMEOUT] = 0;
2263 mmc_release_host(host);
2264
2265 out:
2266 if (host->caps & MMC_CAP_NEEDS_POLL)
2267 mmc_schedule_delayed_work(&host->detect, HZ);
2268 }
2269

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-03-29 23:42:52

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Dennis,

I love your patch! Perhaps something to improve:

[auto build test WARNING on ulf-hansson-mmc-mirror/next]
[also build test WARNING on linus/master v6.3-rc4 next-20230329]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
base: https://git.linaro.org/people/ulf.hansson/mmc-mirror.git next
patch link: https://lore.kernel.org/r/20230329202148.71107-1-dennis%40kernel.org
patch subject: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()
config: hexagon-randconfig-r045-20230329 (https://download.01.org/0day-ci/archive/20230330/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/d2de7314d2198df0c7a452546af0c15799b2d864
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Dennis-Zhou/mmc-inline-the-first-mmc_scan-on-mmc_start_host/20230330-042213
git checkout d2de7314d2198df0c7a452546af0c15799b2d864
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/mmc/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

In file included from drivers/mmc/core/core.c:12:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from drivers/mmc/core/core.c:12:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from drivers/mmc/core/core.c:12:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> drivers/mmc/core/core.c:2202:6: warning: no previous prototype for function '__mmc_rescan' [-Wmissing-prototypes]
void __mmc_rescan(struct mmc_host *host)
^
drivers/mmc/core/core.c:2202:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __mmc_rescan(struct mmc_host *host)
^
static
7 warnings generated.


vim +/__mmc_rescan +2202 drivers/mmc/core/core.c

2201
> 2202 void __mmc_rescan(struct mmc_host *host)
2203 {
2204 int i;
2205
2206 if (host->rescan_disable)
2207 return;
2208
2209 /* If there is a non-removable card registered, only scan once */
2210 if (!mmc_card_is_removable(host) && host->rescan_entered)
2211 return;
2212 host->rescan_entered = 1;
2213
2214 if (host->trigger_card_event && host->ops->card_event) {
2215 mmc_claim_host(host);
2216 host->ops->card_event(host);
2217 mmc_release_host(host);
2218 host->trigger_card_event = false;
2219 }
2220
2221 /* Verify a registered card to be functional, else remove it. */
2222 if (host->bus_ops)
2223 host->bus_ops->detect(host);
2224
2225 host->detect_change = 0;
2226
2227 /* if there still is a card present, stop here */
2228 if (host->bus_ops != NULL)
2229 goto out;
2230
2231 mmc_claim_host(host);
2232 if (mmc_card_is_removable(host) && host->ops->get_cd &&
2233 host->ops->get_cd(host) == 0) {
2234 mmc_power_off(host);
2235 mmc_release_host(host);
2236 goto out;
2237 }
2238
2239 /* If an SD express card is present, then leave it as is. */
2240 if (mmc_card_sd_express(host)) {
2241 mmc_release_host(host);
2242 goto out;
2243 }
2244
2245 for (i = 0; i < ARRAY_SIZE(freqs); i++) {
2246 unsigned int freq = freqs[i];
2247 if (freq > host->f_max) {
2248 if (i + 1 < ARRAY_SIZE(freqs))
2249 continue;
2250 freq = host->f_max;
2251 }
2252 if (!mmc_rescan_try_freq(host, max(freq, host->f_min)))
2253 break;
2254 if (freqs[i] <= host->f_min)
2255 break;
2256 }
2257
2258 /*
2259 * Ignore the command timeout errors observed during
2260 * the card init as those are excepted.
2261 */
2262 host->err_stats[MMC_ERR_CMD_TIMEOUT] = 0;
2263 mmc_release_host(host);
2264
2265 out:
2266 if (host->caps & MMC_CAP_NEEDS_POLL)
2267 mmc_schedule_delayed_work(&host->detect, HZ);
2268 }
2269

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-03-29 23:48:47

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

When using dm-verity with a data partition on an emmc device, dm-verity
races with the discovery of attached emmc devices. This is because mmc's
probing code sets up the host data structure then a work item is
scheduled to do discovery afterwards. To prevent this race on init,
let's inline the first call to detection, __mm_scan(), and let
subsequent detect calls be handled via the workqueue.

Signed-off-by: Dennis Zhou <[email protected]>
---
Sigh.. fix missing static declaration.

drivers/mmc/core/core.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..fda7ee57dee3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
}
EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);

-void mmc_rescan(struct work_struct *work)
+static void __mmc_rescan(struct mmc_host *host)
{
- struct mmc_host *host =
- container_of(work, struct mmc_host, detect.work);
int i;

if (host->rescan_disable)
@@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
mmc_schedule_delayed_work(&host->detect, HZ);
}

+void mmc_rescan(struct work_struct *work)
+{
+ struct mmc_host *host =
+ container_of(work, struct mmc_host, detect.work);
+
+ __mmc_rescan(host);
+}
+
void mmc_start_host(struct mmc_host *host)
{
host->f_init = max(min(freqs[0], host->f_max), host->f_min);
@@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
}

mmc_gpiod_request_cd_irq(host);
- _mmc_detect_change(host, 0, false);
+ host->detect_change = 1;
+ __mmc_rescan(host);
}

void __mmc_stop_host(struct mmc_host *host)
--
2.40.0

2023-03-31 12:50:22

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
>
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.

In principle, I don't mind the changes in $subject patch, as long as
it doesn't hurt the overall initialization/boot time. Especially, we
may have more than one mmc-slot being used, so this needs to be well
tested.

Although, more importantly, I fail to understand how this is going to
solve the race condition. Any I/O request to an eMMC or SD requires
the mmc block device driver to be up and running too, which is getting
probed from a separate module/driver that's not part of mmc_rescan().

Kind regards
Uffe

>
> Signed-off-by: Dennis Zhou <[email protected]>
> ---
> Sigh.. fix missing static declaration.
>
> drivers/mmc/core/core.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 368f10405e13..fda7ee57dee3 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> }
> EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +static void __mmc_rescan(struct mmc_host *host)
> {
> - struct mmc_host *host =
> - container_of(work, struct mmc_host, detect.work);
> int i;
>
> if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> mmc_schedule_delayed_work(&host->detect, HZ);
> }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> + struct mmc_host *host =
> + container_of(work, struct mmc_host, detect.work);
> +
> + __mmc_rescan(host);
> +}
> +
> void mmc_start_host(struct mmc_host *host)
> {
> host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> }
>
> mmc_gpiod_request_cd_irq(host);
> - _mmc_detect_change(host, 0, false);
> + host->detect_change = 1;
> + __mmc_rescan(host);
> }
>
> void __mmc_stop_host(struct mmc_host *host)
> --
> 2.40.0
>

2023-03-31 18:25:10

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Ulf,

On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> >
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
>
> In principle, I don't mind the changes in $subject patch, as long as
> it doesn't hurt the overall initialization/boot time. Especially, we
> may have more than one mmc-slot being used, so this needs to be well
> tested.
>

I unfortunately don't have a device with multiple mmcs available. Is
this something you could help me with?

> Although, more importantly, I fail to understand how this is going to
> solve the race condition. Any I/O request to an eMMC or SD requires
> the mmc block device driver to be up and running too, which is getting
> probed from a separate module/driver that's not part of mmc_rescan().

I believe the call chain is something like this:

__mmc_rescan()
mmc_rescan_try_freq()
mmc_attach_mmc()
mmc_add_card()
device_add()
bus_probe_device()
mmc_blk_probe()

The initial calling of this is the host probe. So effectively if there
is a card attached, we're inlining the device_add() call for the card
attached rather than waiting for the workqueue item to kick off.

dm is a part of late_initcall() while mmc is a module_init(), when built
in becoming a device_initcall(). So this solves a race via the initcall
chain. In the current state, device_initcall() finishes and we move onto
the late_initcall() phase. But now, dm is racing with the workqueue to
init the attached emmc device.

Thanks,
Dennis

>
> Kind regards
> Uffe
>
> >
> > Signed-off-by: Dennis Zhou <[email protected]>
> > ---
> > Sigh.. fix missing static declaration.
> >
> > drivers/mmc/core/core.c | 15 +++++++++++----
> > 1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index 368f10405e13..fda7ee57dee3 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > }
> > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> > {
> > - struct mmc_host *host =
> > - container_of(work, struct mmc_host, detect.work);
> > int i;
> >
> > if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > mmc_schedule_delayed_work(&host->detect, HZ);
> > }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > + struct mmc_host *host =
> > + container_of(work, struct mmc_host, detect.work);
> > +
> > + __mmc_rescan(host);
> > +}
> > +
> > void mmc_start_host(struct mmc_host *host)
> > {
> > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > }
> >
> > mmc_gpiod_request_cd_irq(host);
> > - _mmc_detect_change(host, 0, false);
> > + host->detect_change = 1;
> > + __mmc_rescan(host);
> > }
> >
> > void __mmc_stop_host(struct mmc_host *host)
> > --
> > 2.40.0
> >

2023-04-03 09:55:10

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
>
> Hi Ulf,
>
> On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > >
> > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > races with the discovery of attached emmc devices. This is because mmc's
> > > probing code sets up the host data structure then a work item is
> > > scheduled to do discovery afterwards. To prevent this race on init,
> > > let's inline the first call to detection, __mm_scan(), and let
> > > subsequent detect calls be handled via the workqueue.
> >
> > In principle, I don't mind the changes in $subject patch, as long as
> > it doesn't hurt the overall initialization/boot time. Especially, we
> > may have more than one mmc-slot being used, so this needs to be well
> > tested.
> >
>
> I unfortunately don't have a device with multiple mmcs available. Is
> this something you could help me with?

Yes, I can help to test. Allow me a few days to see what I can do.

Note that, just having one eMMC and one SD card should work too. It
doesn't have to be multiple eMMCs.

>
> > Although, more importantly, I fail to understand how this is going to
> > solve the race condition. Any I/O request to an eMMC or SD requires
> > the mmc block device driver to be up and running too, which is getting
> > probed from a separate module/driver that's not part of mmc_rescan().
>
> I believe the call chain is something like this:
>
> __mmc_rescan()
> mmc_rescan_try_freq()
> mmc_attach_mmc()
> mmc_add_card()
> device_add()
> bus_probe_device()
> mmc_blk_probe()
>
> The initial calling of this is the host probe. So effectively if there
> is a card attached, we're inlining the device_add() call for the card
> attached rather than waiting for the workqueue item to kick off.
>
> dm is a part of late_initcall() while mmc is a module_init(), when built
> in becoming a device_initcall(). So this solves a race via the initcall
> chain. In the current state, device_initcall() finishes and we move onto
> the late_initcall() phase. But now, dm is racing with the workqueue to
> init the attached emmc device.

You certainly have a point!

This should work when the mmc blk module is built-in. Even if that
doesn't solve the entire problem, it should be a step in the right
direction.

I will give it some more thinking and run some tests at my side, then
I will get back to you again.

Kind regards
Uffe

> >
> > >
> > > Signed-off-by: Dennis Zhou <[email protected]>
> > > ---
> > > Sigh.. fix missing static declaration.
> > >
> > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > index 368f10405e13..fda7ee57dee3 100644
> > > --- a/drivers/mmc/core/core.c
> > > +++ b/drivers/mmc/core/core.c
> > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > }
> > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > >
> > > -void mmc_rescan(struct work_struct *work)
> > > +static void __mmc_rescan(struct mmc_host *host)
> > > {
> > > - struct mmc_host *host =
> > > - container_of(work, struct mmc_host, detect.work);
> > > int i;
> > >
> > > if (host->rescan_disable)
> > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > }
> > >
> > > +void mmc_rescan(struct work_struct *work)
> > > +{
> > > + struct mmc_host *host =
> > > + container_of(work, struct mmc_host, detect.work);
> > > +
> > > + __mmc_rescan(host);
> > > +}
> > > +
> > > void mmc_start_host(struct mmc_host *host)
> > > {
> > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > }
> > >
> > > mmc_gpiod_request_cd_irq(host);
> > > - _mmc_detect_change(host, 0, false);
> > > + host->detect_change = 1;
> > > + __mmc_rescan(host);
> > > }
> > >
> > > void __mmc_stop_host(struct mmc_host *host)
> > > --
> > > 2.40.0
> > >

2023-04-07 08:30:54

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> >
> > Hi Ulf,
> >
> > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > >
> > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > probing code sets up the host data structure then a work item is
> > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > let's inline the first call to detection, __mm_scan(), and let
> > > > subsequent detect calls be handled via the workqueue.
> > >
> > > In principle, I don't mind the changes in $subject patch, as long as
> > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > may have more than one mmc-slot being used, so this needs to be well
> > > tested.
> > >
> >
> > I unfortunately don't have a device with multiple mmcs available. Is
> > this something you could help me with?
>
> Yes, I can help to test. Allow me a few days to see what I can do.
>
> Note that, just having one eMMC and one SD card should work too. It
> doesn't have to be multiple eMMCs.
>
> >
> > > Although, more importantly, I fail to understand how this is going to
> > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > the mmc block device driver to be up and running too, which is getting
> > > probed from a separate module/driver that's not part of mmc_rescan().
> >
> > I believe the call chain is something like this:
> >
> > __mmc_rescan()
> > mmc_rescan_try_freq()
> > mmc_attach_mmc()
> > mmc_add_card()
> > device_add()
> > bus_probe_device()
> > mmc_blk_probe()
> >
> > The initial calling of this is the host probe. So effectively if there
> > is a card attached, we're inlining the device_add() call for the card
> > attached rather than waiting for the workqueue item to kick off.
> >
> > dm is a part of late_initcall() while mmc is a module_init(), when built
> > in becoming a device_initcall(). So this solves a race via the initcall
> > chain. In the current state, device_initcall() finishes and we move onto
> > the late_initcall() phase. But now, dm is racing with the workqueue to
> > init the attached emmc device.
>
> You certainly have a point!
>
> This should work when the mmc blk module is built-in. Even if that
> doesn't solve the entire problem, it should be a step in the right
> direction.
>
> I will give it some more thinking and run some tests at my side, then
> I will get back to you again.
>

Hi Ulf, is there an update on testing with this patch?

Thanks,
Dennis

> Kind regards
> Uffe
>
> > >
> > > >
> > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > ---
> > > > Sigh.. fix missing static declaration.
> > > >
> > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > index 368f10405e13..fda7ee57dee3 100644
> > > > --- a/drivers/mmc/core/core.c
> > > > +++ b/drivers/mmc/core/core.c
> > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > }
> > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > >
> > > > -void mmc_rescan(struct work_struct *work)
> > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > {
> > > > - struct mmc_host *host =
> > > > - container_of(work, struct mmc_host, detect.work);
> > > > int i;
> > > >
> > > > if (host->rescan_disable)
> > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > }
> > > >
> > > > +void mmc_rescan(struct work_struct *work)
> > > > +{
> > > > + struct mmc_host *host =
> > > > + container_of(work, struct mmc_host, detect.work);
> > > > +
> > > > + __mmc_rescan(host);
> > > > +}
> > > > +
> > > > void mmc_start_host(struct mmc_host *host)
> > > > {
> > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > }
> > > >
> > > > mmc_gpiod_request_cd_irq(host);
> > > > - _mmc_detect_change(host, 0, false);
> > > > + host->detect_change = 1;
> > > > + __mmc_rescan(host);
> > > > }
> > > >
> > > > void __mmc_stop_host(struct mmc_host *host)
> > > > --
> > > > 2.40.0
> > > >

2023-04-11 20:31:57

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, Apr 07, 2023 at 01:24:15AM -0700, Dennis Zhou wrote:
> On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> >
> > Yes, I can help to test. Allow me a few days to see what I can do.
> >
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> >
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > > mmc_rescan_try_freq()
> > > mmc_attach_mmc()
> > > mmc_add_card()
> > > device_add()
> > > bus_probe_device()
> > > mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> >
> > You certainly have a point!
> >
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> >
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> >
>
> Hi Ulf, is there an update on testing with this patch?
>

Ping.

Thanks,
Dennis

2023-04-12 11:06:33

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, 7 Apr 2023 at 10:24, Dennis Zhou <[email protected]> wrote:
>
> On Mon, Apr 03, 2023 at 11:50:41AM +0200, Ulf Hansson wrote:
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> >
> > Yes, I can help to test. Allow me a few days to see what I can do.
> >
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> >
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > > mmc_rescan_try_freq()
> > > mmc_attach_mmc()
> > > mmc_add_card()
> > > device_add()
> > > bus_probe_device()
> > > mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> >
> > You certainly have a point!
> >
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> >
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> >
>
> Hi Ulf, is there an update on testing with this patch?

Sorry, it's a busy period for me and I expect it to remain like that
for another couple of weeks.

I will try to squeeze in some time for this, but no promises. Sorry.

[...]

Kind regards
Uffe

2023-05-12 12:29:20

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

+ Linus,

Hi Dennis,

On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
>
> On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> >
> > Hi Ulf,
> >
> > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > >
> > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > probing code sets up the host data structure then a work item is
> > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > let's inline the first call to detection, __mm_scan(), and let
> > > > subsequent detect calls be handled via the workqueue.
> > >
> > > In principle, I don't mind the changes in $subject patch, as long as
> > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > may have more than one mmc-slot being used, so this needs to be well
> > > tested.
> > >
> >
> > I unfortunately don't have a device with multiple mmcs available. Is
> > this something you could help me with?
>
> Yes, I can help to test. Allow me a few days to see what I can do.
>
> Note that, just having one eMMC and one SD card should work too. It
> doesn't have to be multiple eMMCs.
>
> >
> > > Although, more importantly, I fail to understand how this is going to
> > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > the mmc block device driver to be up and running too, which is getting
> > > probed from a separate module/driver that's not part of mmc_rescan().
> >
> > I believe the call chain is something like this:
> >
> > __mmc_rescan()
> > mmc_rescan_try_freq()
> > mmc_attach_mmc()
> > mmc_add_card()
> > device_add()
> > bus_probe_device()
> > mmc_blk_probe()
> >
> > The initial calling of this is the host probe. So effectively if there
> > is a card attached, we're inlining the device_add() call for the card
> > attached rather than waiting for the workqueue item to kick off.
> >
> > dm is a part of late_initcall() while mmc is a module_init(), when built
> > in becoming a device_initcall(). So this solves a race via the initcall
> > chain. In the current state, device_initcall() finishes and we move onto
> > the late_initcall() phase. But now, dm is racing with the workqueue to
> > init the attached emmc device.
>
> You certainly have a point!
>
> This should work when the mmc blk module is built-in. Even if that
> doesn't solve the entire problem, it should be a step in the right
> direction.
>
> I will give it some more thinking and run some tests at my side, then
> I will get back to you again.
>
> Kind regards
> Uffe
>
> > >
> > > >
> > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > ---
> > > > Sigh.. fix missing static declaration.
> > > >
> > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > index 368f10405e13..fda7ee57dee3 100644
> > > > --- a/drivers/mmc/core/core.c
> > > > +++ b/drivers/mmc/core/core.c
> > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > }
> > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > >
> > > > -void mmc_rescan(struct work_struct *work)
> > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > {
> > > > - struct mmc_host *host =
> > > > - container_of(work, struct mmc_host, detect.work);
> > > > int i;
> > > >
> > > > if (host->rescan_disable)
> > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > }
> > > >
> > > > +void mmc_rescan(struct work_struct *work)
> > > > +{
> > > > + struct mmc_host *host =
> > > > + container_of(work, struct mmc_host, detect.work);
> > > > +
> > > > + __mmc_rescan(host);
> > > > +}
> > > > +
> > > > void mmc_start_host(struct mmc_host *host)
> > > > {
> > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > }
> > > >
> > > > mmc_gpiod_request_cd_irq(host);
> > > > - _mmc_detect_change(host, 0, false);
> > > > + host->detect_change = 1;
> > > > + __mmc_rescan(host);
> > > > }
> > > >
> > > > void __mmc_stop_host(struct mmc_host *host)
> > > > --
> > > > 2.40.0
> > > >

My apologies for the long delay. I finally managed to test this.

I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
based upon the ux500 SoC. It's quite good to use for these types of
tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
SD-card slot. So in total there are 4 devices that get probed.

The SDIO card isn't detected properly, but always fails in the similar
way (thus I left it out from the below data). I tested both with and
without an SD card inserted during boot, to get some more data to
compare. These are the summary from my tests:

v6.4-rc1 without SD card:
~2.18s - MMC1 (eMMC)
~3.33s - MMC3 (eMMC)
~5.91s - kernel boot complete

v6.4-rc1 with an SD card:
~2.18s - MMC1 (eMMC)
~3.45s - MMC3 (eMMC)
~3.57s - MMC2 (SD)
~5.76s - kernel boot complete

v6.4-rc1 + patch without SD card:
~2.24s - MMC1 (eMMC)
~3.58s - MMC3 (eMMC)
~5.96s - kernel boot complete

v6.4-rc1 + patch with an SD card:
~2.24s - MMC1 (eMMC)
~3.73s - MMC2 (SD)
~3.98s - MMC3 (eMMC)
~6.73s - kernel boot complete

By looking at these results, I was kind of surprised. I was thinking
that the asynchronous probe should address the parallelism problem.
Then I discovered that it in fact, hasn't been enabled for the mmci
driver that is being used for this platform. Huh, I was under the
assumption that it has been enabled for all mmc hosts by now. :-)

Okay, so I am going to run another round of tests, with async probe
enabled for the mmci driver too. I will let you know the results as
soon as I can.

Kind regards
Uffe

2023-06-08 21:55:14

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> + Linus,
>
> Hi Dennis,
>
> On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
> >
> > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > >
> > > Hi Ulf,
> > >
> > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > >
> > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > probing code sets up the host data structure then a work item is
> > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > subsequent detect calls be handled via the workqueue.
> > > >
> > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > may have more than one mmc-slot being used, so this needs to be well
> > > > tested.
> > > >
> > >
> > > I unfortunately don't have a device with multiple mmcs available. Is
> > > this something you could help me with?
> >
> > Yes, I can help to test. Allow me a few days to see what I can do.
> >
> > Note that, just having one eMMC and one SD card should work too. It
> > doesn't have to be multiple eMMCs.
> >
> > >
> > > > Although, more importantly, I fail to understand how this is going to
> > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > the mmc block device driver to be up and running too, which is getting
> > > > probed from a separate module/driver that's not part of mmc_rescan().
> > >
> > > I believe the call chain is something like this:
> > >
> > > __mmc_rescan()
> > > mmc_rescan_try_freq()
> > > mmc_attach_mmc()
> > > mmc_add_card()
> > > device_add()
> > > bus_probe_device()
> > > mmc_blk_probe()
> > >
> > > The initial calling of this is the host probe. So effectively if there
> > > is a card attached, we're inlining the device_add() call for the card
> > > attached rather than waiting for the workqueue item to kick off.
> > >
> > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > in becoming a device_initcall(). So this solves a race via the initcall
> > > chain. In the current state, device_initcall() finishes and we move onto
> > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > init the attached emmc device.
> >
> > You certainly have a point!
> >
> > This should work when the mmc blk module is built-in. Even if that
> > doesn't solve the entire problem, it should be a step in the right
> > direction.
> >
> > I will give it some more thinking and run some tests at my side, then
> > I will get back to you again.
> >
> > Kind regards
> > Uffe
> >
> > > >
> > > > >
> > > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > > ---
> > > > > Sigh.. fix missing static declaration.
> > > > >
> > > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > --- a/drivers/mmc/core/core.c
> > > > > +++ b/drivers/mmc/core/core.c
> > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > }
> > > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > >
> > > > > -void mmc_rescan(struct work_struct *work)
> > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > {
> > > > > - struct mmc_host *host =
> > > > > - container_of(work, struct mmc_host, detect.work);
> > > > > int i;
> > > > >
> > > > > if (host->rescan_disable)
> > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > }
> > > > >
> > > > > +void mmc_rescan(struct work_struct *work)
> > > > > +{
> > > > > + struct mmc_host *host =
> > > > > + container_of(work, struct mmc_host, detect.work);
> > > > > +
> > > > > + __mmc_rescan(host);
> > > > > +}
> > > > > +
> > > > > void mmc_start_host(struct mmc_host *host)
> > > > > {
> > > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > }
> > > > >
> > > > > mmc_gpiod_request_cd_irq(host);
> > > > > - _mmc_detect_change(host, 0, false);
> > > > > + host->detect_change = 1;
> > > > > + __mmc_rescan(host);
> > > > > }
> > > > >
> > > > > void __mmc_stop_host(struct mmc_host *host)
> > > > > --
> > > > > 2.40.0
> > > > >
>
> My apologies for the long delay. I finally managed to test this.
>
> I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> based upon the ux500 SoC. It's quite good to use for these types of
> tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> SD-card slot. So in total there are 4 devices that get probed.
>
> The SDIO card isn't detected properly, but always fails in the similar
> way (thus I left it out from the below data). I tested both with and
> without an SD card inserted during boot, to get some more data to
> compare. These are the summary from my tests:
>
> v6.4-rc1 without SD card:
> ~2.18s - MMC1 (eMMC)
> ~3.33s - MMC3 (eMMC)
> ~5.91s - kernel boot complete
>
> v6.4-rc1 with an SD card:
> ~2.18s - MMC1 (eMMC)
> ~3.45s - MMC3 (eMMC)
> ~3.57s - MMC2 (SD)
> ~5.76s - kernel boot complete
>
> v6.4-rc1 + patch without SD card:
> ~2.24s - MMC1 (eMMC)
> ~3.58s - MMC3 (eMMC)
> ~5.96s - kernel boot complete
>
> v6.4-rc1 + patch with an SD card:
> ~2.24s - MMC1 (eMMC)
> ~3.73s - MMC2 (SD)
> ~3.98s - MMC3 (eMMC)
> ~6.73s - kernel boot complete
>
> By looking at these results, I was kind of surprised. I was thinking
> that the asynchronous probe should address the parallelism problem.
> Then I discovered that it in fact, hasn't been enabled for the mmci
> driver that is being used for this platform. Huh, I was under the
> assumption that it has been enabled for all mmc hosts by now. :-)
>
> Okay, so I am going to run another round of tests, with async probe
> enabled for the mmci driver too. I will let you know the results as
> soon as I can.
>
> Kind regards
> Uffe

Hi Uffe,

Kindly this has been way too long for review. It's been over 3 months.
What's going on here?

I think there's a misunderstanding too. Without this fix, the machine
doesn't even boot. I'm not sure why perf is the blocking question here.

Greg, is there another tree I can run this through?

Thanks,
Dennis

2023-06-09 06:37:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > + Linus,
> >
> > Hi Dennis,
> >
> > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
> > >
> > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > > >
> > > > Hi Ulf,
> > > >
> > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > > >
> > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > probing code sets up the host data structure then a work item is
> > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > subsequent detect calls be handled via the workqueue.
> > > > >
> > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > tested.
> > > > >
> > > >
> > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > this something you could help me with?
> > >
> > > Yes, I can help to test. Allow me a few days to see what I can do.
> > >
> > > Note that, just having one eMMC and one SD card should work too. It
> > > doesn't have to be multiple eMMCs.
> > >
> > > >
> > > > > Although, more importantly, I fail to understand how this is going to
> > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > the mmc block device driver to be up and running too, which is getting
> > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > >
> > > > I believe the call chain is something like this:
> > > >
> > > > __mmc_rescan()
> > > > mmc_rescan_try_freq()
> > > > mmc_attach_mmc()
> > > > mmc_add_card()
> > > > device_add()
> > > > bus_probe_device()
> > > > mmc_blk_probe()
> > > >
> > > > The initial calling of this is the host probe. So effectively if there
> > > > is a card attached, we're inlining the device_add() call for the card
> > > > attached rather than waiting for the workqueue item to kick off.
> > > >
> > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > init the attached emmc device.
> > >
> > > You certainly have a point!
> > >
> > > This should work when the mmc blk module is built-in. Even if that
> > > doesn't solve the entire problem, it should be a step in the right
> > > direction.
> > >
> > > I will give it some more thinking and run some tests at my side, then
> > > I will get back to you again.
> > >
> > > Kind regards
> > > Uffe
> > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > > > ---
> > > > > > Sigh.. fix missing static declaration.
> > > > > >
> > > > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > --- a/drivers/mmc/core/core.c
> > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > }
> > > > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > >
> > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > {
> > > > > > - struct mmc_host *host =
> > > > > > - container_of(work, struct mmc_host, detect.work);
> > > > > > int i;
> > > > > >
> > > > > > if (host->rescan_disable)
> > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > }
> > > > > >
> > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > +{
> > > > > > + struct mmc_host *host =
> > > > > > + container_of(work, struct mmc_host, detect.work);
> > > > > > +
> > > > > > + __mmc_rescan(host);
> > > > > > +}
> > > > > > +
> > > > > > void mmc_start_host(struct mmc_host *host)
> > > > > > {
> > > > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > }
> > > > > >
> > > > > > mmc_gpiod_request_cd_irq(host);
> > > > > > - _mmc_detect_change(host, 0, false);
> > > > > > + host->detect_change = 1;
> > > > > > + __mmc_rescan(host);
> > > > > > }
> > > > > >
> > > > > > void __mmc_stop_host(struct mmc_host *host)
> > > > > > --
> > > > > > 2.40.0
> > > > > >
> >
> > My apologies for the long delay. I finally managed to test this.
> >
> > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > based upon the ux500 SoC. It's quite good to use for these types of
> > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > SD-card slot. So in total there are 4 devices that get probed.
> >
> > The SDIO card isn't detected properly, but always fails in the similar
> > way (thus I left it out from the below data). I tested both with and
> > without an SD card inserted during boot, to get some more data to
> > compare. These are the summary from my tests:
> >
> > v6.4-rc1 without SD card:
> > ~2.18s - MMC1 (eMMC)
> > ~3.33s - MMC3 (eMMC)
> > ~5.91s - kernel boot complete
> >
> > v6.4-rc1 with an SD card:
> > ~2.18s - MMC1 (eMMC)
> > ~3.45s - MMC3 (eMMC)
> > ~3.57s - MMC2 (SD)
> > ~5.76s - kernel boot complete
> >
> > v6.4-rc1 + patch without SD card:
> > ~2.24s - MMC1 (eMMC)
> > ~3.58s - MMC3 (eMMC)
> > ~5.96s - kernel boot complete
> >
> > v6.4-rc1 + patch with an SD card:
> > ~2.24s - MMC1 (eMMC)
> > ~3.73s - MMC2 (SD)
> > ~3.98s - MMC3 (eMMC)
> > ~6.73s - kernel boot complete
> >
> > By looking at these results, I was kind of surprised. I was thinking
> > that the asynchronous probe should address the parallelism problem.
> > Then I discovered that it in fact, hasn't been enabled for the mmci
> > driver that is being used for this platform. Huh, I was under the
> > assumption that it has been enabled for all mmc hosts by now. :-)
> >
> > Okay, so I am going to run another round of tests, with async probe
> > enabled for the mmci driver too. I will let you know the results as
> > soon as I can.
> >
> > Kind regards
> > Uffe
>
> Hi Uffe,
>
> Kindly this has been way too long for review. It's been over 3 months.
> What's going on here?
>
> I think there's a misunderstanding too. Without this fix, the machine
> doesn't even boot. I'm not sure why perf is the blocking question here.

Well you can not degrade performance of existing machines that work
today, right? That would be a regression and it seems that you are
doing that if I read the numbers above correctly.

> Greg, is there another tree I can run this through?

Why would you want to route around a maintainer just to get a patch that
would have to be reverted applied? :)

thanks,

greg k-h

2023-06-09 07:30:34

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Greg,

On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > + Linus,
> > >
> > > Hi Dennis,
> > >
> > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > > > >
> > > > > Hi Ulf,
> > > > >
> > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > > > >
> > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > >
> > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > tested.
> > > > > >
> > > > >
> > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > this something you could help me with?
> > > >
> > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > >
> > > > Note that, just having one eMMC and one SD card should work too. It
> > > > doesn't have to be multiple eMMCs.
> > > >
> > > > >
> > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > >
> > > > > I believe the call chain is something like this:
> > > > >
> > > > > __mmc_rescan()
> > > > > mmc_rescan_try_freq()
> > > > > mmc_attach_mmc()
> > > > > mmc_add_card()
> > > > > device_add()
> > > > > bus_probe_device()
> > > > > mmc_blk_probe()
> > > > >
> > > > > The initial calling of this is the host probe. So effectively if there
> > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > attached rather than waiting for the workqueue item to kick off.
> > > > >
> > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > init the attached emmc device.
> > > >
> > > > You certainly have a point!
> > > >
> > > > This should work when the mmc blk module is built-in. Even if that
> > > > doesn't solve the entire problem, it should be a step in the right
> > > > direction.
> > > >
> > > > I will give it some more thinking and run some tests at my side, then
> > > > I will get back to you again.
> > > >
> > > > Kind regards
> > > > Uffe
> > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > > > > ---
> > > > > > > Sigh.. fix missing static declaration.
> > > > > > >
> > > > > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > >
> > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > > {
> > > > > > > - struct mmc_host *host =
> > > > > > > - container_of(work, struct mmc_host, detect.work);
> > > > > > > int i;
> > > > > > >
> > > > > > > if (host->rescan_disable)
> > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > > }
> > > > > > >
> > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > +{
> > > > > > > + struct mmc_host *host =
> > > > > > > + container_of(work, struct mmc_host, detect.work);
> > > > > > > +
> > > > > > > + __mmc_rescan(host);
> > > > > > > +}
> > > > > > > +
> > > > > > > void mmc_start_host(struct mmc_host *host)
> > > > > > > {
> > > > > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > > }
> > > > > > >
> > > > > > > mmc_gpiod_request_cd_irq(host);
> > > > > > > - _mmc_detect_change(host, 0, false);
> > > > > > > + host->detect_change = 1;
> > > > > > > + __mmc_rescan(host);
> > > > > > > }
> > > > > > >
> > > > > > > void __mmc_stop_host(struct mmc_host *host)
> > > > > > > --
> > > > > > > 2.40.0
> > > > > > >
> > >
> > > My apologies for the long delay. I finally managed to test this.
> > >
> > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > based upon the ux500 SoC. It's quite good to use for these types of
> > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > SD-card slot. So in total there are 4 devices that get probed.
> > >
> > > The SDIO card isn't detected properly, but always fails in the similar
> > > way (thus I left it out from the below data). I tested both with and
> > > without an SD card inserted during boot, to get some more data to
> > > compare. These are the summary from my tests:
> > >
> > > v6.4-rc1 without SD card:
> > > ~2.18s - MMC1 (eMMC)
> > > ~3.33s - MMC3 (eMMC)
> > > ~5.91s - kernel boot complete
> > >
> > > v6.4-rc1 with an SD card:
> > > ~2.18s - MMC1 (eMMC)
> > > ~3.45s - MMC3 (eMMC)
> > > ~3.57s - MMC2 (SD)
> > > ~5.76s - kernel boot complete
> > >
> > > v6.4-rc1 + patch without SD card:
> > > ~2.24s - MMC1 (eMMC)
> > > ~3.58s - MMC3 (eMMC)
> > > ~5.96s - kernel boot complete
> > >
> > > v6.4-rc1 + patch with an SD card:
> > > ~2.24s - MMC1 (eMMC)
> > > ~3.73s - MMC2 (SD)
> > > ~3.98s - MMC3 (eMMC)
> > > ~6.73s - kernel boot complete
> > >
> > > By looking at these results, I was kind of surprised. I was thinking
> > > that the asynchronous probe should address the parallelism problem.
> > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > driver that is being used for this platform. Huh, I was under the
> > > assumption that it has been enabled for all mmc hosts by now. :-)
> > >
> > > Okay, so I am going to run another round of tests, with async probe
> > > enabled for the mmci driver too. I will let you know the results as
> > > soon as I can.
> > >
> > > Kind regards
> > > Uffe
> >
> > Hi Uffe,
> >
> > Kindly this has been way too long for review. It's been over 3 months.
> > What's going on here?
> >
> > I think there's a misunderstanding too. Without this fix, the machine
> > doesn't even boot. I'm not sure why perf is the blocking question here.
>
> Well you can not degrade performance of existing machines that work
> today, right? That would be a regression and it seems that you are
> doing that if I read the numbers above correctly.
>

I agree that we shouldn't degrade performance of existing machines, but
this is a timing bug on existing platforms that have a slow enough cpu
such that emmc doesn't finish probing before dm-verity progresses to
trying to read off the device. In my opinion it's a bit unfair to trade
performance in the common case for not supporting all use cases. I'm
just trying to get my machines to boot without having to carry my own
patch here.

As a path forward I can add a command line flag as a bool to handle this
and that should hopefully take care of the regresion aspect to this.


> > Greg, is there another tree I can run this through?
>
> Why would you want to route around a maintainer just to get a patch that
> would have to be reverted applied? :)
>

What's your advice here as I don't feel like I'm getting adequate
traction with Ulf. I think I've generally been quite patient here
waiting > 3 months for this patch to be reviewed.

Thanks,
Dennis

2023-06-09 09:04:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, Jun 09, 2023 at 12:16:19AM -0700, Dennis Zhou wrote:
> Hi Greg,
>
> On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> > On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > > + Linus,
> > > >
> > > > Hi Dennis,
> > > >
> > > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > > > > >
> > > > > > Hi Ulf,
> > > > > >
> > > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > > > > >
> > > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > > >
> > > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > > tested.
> > > > > > >
> > > > > >
> > > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > > this something you could help me with?
> > > > >
> > > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > > >
> > > > > Note that, just having one eMMC and one SD card should work too. It
> > > > > doesn't have to be multiple eMMCs.
> > > > >
> > > > > >
> > > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > > >
> > > > > > I believe the call chain is something like this:
> > > > > >
> > > > > > __mmc_rescan()
> > > > > > mmc_rescan_try_freq()
> > > > > > mmc_attach_mmc()
> > > > > > mmc_add_card()
> > > > > > device_add()
> > > > > > bus_probe_device()
> > > > > > mmc_blk_probe()
> > > > > >
> > > > > > The initial calling of this is the host probe. So effectively if there
> > > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > > attached rather than waiting for the workqueue item to kick off.
> > > > > >
> > > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > > init the attached emmc device.
> > > > >
> > > > > You certainly have a point!
> > > > >
> > > > > This should work when the mmc blk module is built-in. Even if that
> > > > > doesn't solve the entire problem, it should be a step in the right
> > > > > direction.
> > > > >
> > > > > I will give it some more thinking and run some tests at my side, then
> > > > > I will get back to you again.
> > > > >
> > > > > Kind regards
> > > > > Uffe
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > > > > > ---
> > > > > > > > Sigh.. fix missing static declaration.
> > > > > > > >
> > > > > > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > > > }
> > > > > > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > > >
> > > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > > > {
> > > > > > > > - struct mmc_host *host =
> > > > > > > > - container_of(work, struct mmc_host, detect.work);
> > > > > > > > int i;
> > > > > > > >
> > > > > > > > if (host->rescan_disable)
> > > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > + struct mmc_host *host =
> > > > > > > > + container_of(work, struct mmc_host, detect.work);
> > > > > > > > +
> > > > > > > > + __mmc_rescan(host);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > void mmc_start_host(struct mmc_host *host)
> > > > > > > > {
> > > > > > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > > > }
> > > > > > > >
> > > > > > > > mmc_gpiod_request_cd_irq(host);
> > > > > > > > - _mmc_detect_change(host, 0, false);
> > > > > > > > + host->detect_change = 1;
> > > > > > > > + __mmc_rescan(host);
> > > > > > > > }
> > > > > > > >
> > > > > > > > void __mmc_stop_host(struct mmc_host *host)
> > > > > > > > --
> > > > > > > > 2.40.0
> > > > > > > >
> > > >
> > > > My apologies for the long delay. I finally managed to test this.
> > > >
> > > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > > based upon the ux500 SoC. It's quite good to use for these types of
> > > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > > SD-card slot. So in total there are 4 devices that get probed.
> > > >
> > > > The SDIO card isn't detected properly, but always fails in the similar
> > > > way (thus I left it out from the below data). I tested both with and
> > > > without an SD card inserted during boot, to get some more data to
> > > > compare. These are the summary from my tests:
> > > >
> > > > v6.4-rc1 without SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.33s - MMC3 (eMMC)
> > > > ~5.91s - kernel boot complete
> > > >
> > > > v6.4-rc1 with an SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.45s - MMC3 (eMMC)
> > > > ~3.57s - MMC2 (SD)
> > > > ~5.76s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch without SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.58s - MMC3 (eMMC)
> > > > ~5.96s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch with an SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.73s - MMC2 (SD)
> > > > ~3.98s - MMC3 (eMMC)
> > > > ~6.73s - kernel boot complete
> > > >
> > > > By looking at these results, I was kind of surprised. I was thinking
> > > > that the asynchronous probe should address the parallelism problem.
> > > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > > driver that is being used for this platform. Huh, I was under the
> > > > assumption that it has been enabled for all mmc hosts by now. :-)
> > > >
> > > > Okay, so I am going to run another round of tests, with async probe
> > > > enabled for the mmci driver too. I will let you know the results as
> > > > soon as I can.
> > > >
> > > > Kind regards
> > > > Uffe
> > >
> > > Hi Uffe,
> > >
> > > Kindly this has been way too long for review. It's been over 3 months.
> > > What's going on here?
> > >
> > > I think there's a misunderstanding too. Without this fix, the machine
> > > doesn't even boot. I'm not sure why perf is the blocking question here.
> >
> > Well you can not degrade performance of existing machines that work
> > today, right? That would be a regression and it seems that you are
> > doing that if I read the numbers above correctly.
> >
>
> I agree that we shouldn't degrade performance of existing machines, but
> this is a timing bug on existing platforms that have a slow enough cpu
> such that emmc doesn't finish probing before dm-verity progresses to
> trying to read off the device. In my opinion it's a bit unfair to trade
> performance in the common case for not supporting all use cases. I'm
> just trying to get my machines to boot without having to carry my own
> patch here.

I think the users of the systems you are going to slow down will take
objection to you slowing them down. What if you were them, what would
you want to see here?

> As a path forward I can add a command line flag as a bool to handle this
> and that should hopefully take care of the regresion aspect to this.

command line flags are horrible and should never be used. Why can't you
dynamically detect this type of thing and handle it that way?

And yes, we do hold off in supporting new hardware (and configurations)
to prevent existing working ones from breaking or slowing down.

What is forcing you to use dm-verity on this odd hardware? Can you not
use other configurations instead?

> > > Greg, is there another tree I can run this through?
> >
> > Why would you want to route around a maintainer just to get a patch that
> > would have to be reverted applied? :)
> >
>
> What's your advice here as I don't feel like I'm getting adequate
> traction with Ulf. I think I've generally been quite patient here
> waiting > 3 months for this patch to be reviewed.

Maintainers are overworked, that's normal. I suggest helping out in
reviewing other patches in the subsystem to reduce that burden. After
all, you are asking someone to do something for you without much in
return, is that fair?

thanks,

greg k-h

2023-06-09 09:16:23

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, Jun 9, 2023 at 10:53 AM Greg KH <[email protected]> wrote:
> On Fri, Jun 09, 2023 at 12:16:19AM -0700, Dennis Zhou wrote:

> > As a path forward I can add a command line flag as a bool to handle this
> > and that should hopefully take care of the regresion aspect to this.
>
> command line flags are horrible and should never be used. Why can't you
> dynamically detect this type of thing and handle it that way?

If nothing else works, if this is device tree, a machine-specific quirk can be
done:

if (of_machine_is_compatible("vendor,machine")) {...}

Yours,
Linus Walleij

2023-06-12 15:28:33

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Fri, 9 Jun 2023 at 09:16, Dennis Zhou <[email protected]> wrote:
>
> Hi Greg,
>
> On Fri, Jun 09, 2023 at 08:19:51AM +0200, Greg KH wrote:
> > On Thu, Jun 08, 2023 at 01:49:00PM -0700, Dennis Zhou wrote:
> > > On Fri, May 12, 2023 at 01:42:51PM +0200, Ulf Hansson wrote:
> > > > + Linus,
> > > >
> > > > Hi Dennis,
> > > >
> > > > On Mon, 3 Apr 2023 at 11:50, Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Fri, 31 Mar 2023 at 20:23, Dennis Zhou <[email protected]> wrote:
> > > > > >
> > > > > > Hi Ulf,
> > > > > >
> > > > > > On Fri, Mar 31, 2023 at 02:43:10PM +0200, Ulf Hansson wrote:
> > > > > > > On Thu, 30 Mar 2023 at 01:48, Dennis Zhou <[email protected]> wrote:
> > > > > > > >
> > > > > > > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > > > > > > races with the discovery of attached emmc devices. This is because mmc's
> > > > > > > > probing code sets up the host data structure then a work item is
> > > > > > > > scheduled to do discovery afterwards. To prevent this race on init,
> > > > > > > > let's inline the first call to detection, __mm_scan(), and let
> > > > > > > > subsequent detect calls be handled via the workqueue.
> > > > > > >
> > > > > > > In principle, I don't mind the changes in $subject patch, as long as
> > > > > > > it doesn't hurt the overall initialization/boot time. Especially, we
> > > > > > > may have more than one mmc-slot being used, so this needs to be well
> > > > > > > tested.
> > > > > > >
> > > > > >
> > > > > > I unfortunately don't have a device with multiple mmcs available. Is
> > > > > > this something you could help me with?
> > > > >
> > > > > Yes, I can help to test. Allow me a few days to see what I can do.
> > > > >
> > > > > Note that, just having one eMMC and one SD card should work too. It
> > > > > doesn't have to be multiple eMMCs.
> > > > >
> > > > > >
> > > > > > > Although, more importantly, I fail to understand how this is going to
> > > > > > > solve the race condition. Any I/O request to an eMMC or SD requires
> > > > > > > the mmc block device driver to be up and running too, which is getting
> > > > > > > probed from a separate module/driver that's not part of mmc_rescan().
> > > > > >
> > > > > > I believe the call chain is something like this:
> > > > > >
> > > > > > __mmc_rescan()
> > > > > > mmc_rescan_try_freq()
> > > > > > mmc_attach_mmc()
> > > > > > mmc_add_card()
> > > > > > device_add()
> > > > > > bus_probe_device()
> > > > > > mmc_blk_probe()
> > > > > >
> > > > > > The initial calling of this is the host probe. So effectively if there
> > > > > > is a card attached, we're inlining the device_add() call for the card
> > > > > > attached rather than waiting for the workqueue item to kick off.
> > > > > >
> > > > > > dm is a part of late_initcall() while mmc is a module_init(), when built
> > > > > > in becoming a device_initcall(). So this solves a race via the initcall
> > > > > > chain. In the current state, device_initcall() finishes and we move onto
> > > > > > the late_initcall() phase. But now, dm is racing with the workqueue to
> > > > > > init the attached emmc device.
> > > > >
> > > > > You certainly have a point!
> > > > >
> > > > > This should work when the mmc blk module is built-in. Even if that
> > > > > doesn't solve the entire problem, it should be a step in the right
> > > > > direction.
> > > > >
> > > > > I will give it some more thinking and run some tests at my side, then
> > > > > I will get back to you again.
> > > > >
> > > > > Kind regards
> > > > > Uffe
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Dennis Zhou <[email protected]>
> > > > > > > > ---
> > > > > > > > Sigh.. fix missing static declaration.
> > > > > > > >
> > > > > > > > drivers/mmc/core/core.c | 15 +++++++++++----
> > > > > > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > > > > > > > index 368f10405e13..fda7ee57dee3 100644
> > > > > > > > --- a/drivers/mmc/core/core.c
> > > > > > > > +++ b/drivers/mmc/core/core.c
> > > > > > > > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > > > > > > > }
> > > > > > > > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> > > > > > > >
> > > > > > > > -void mmc_rescan(struct work_struct *work)
> > > > > > > > +static void __mmc_rescan(struct mmc_host *host)
> > > > > > > > {
> > > > > > > > - struct mmc_host *host =
> > > > > > > > - container_of(work, struct mmc_host, detect.work);
> > > > > > > > int i;
> > > > > > > >
> > > > > > > > if (host->rescan_disable)
> > > > > > > > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > > > > > > > mmc_schedule_delayed_work(&host->detect, HZ);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +void mmc_rescan(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > + struct mmc_host *host =
> > > > > > > > + container_of(work, struct mmc_host, detect.work);
> > > > > > > > +
> > > > > > > > + __mmc_rescan(host);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > void mmc_start_host(struct mmc_host *host)
> > > > > > > > {
> > > > > > > > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > > > > > > > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > > > > > > > }
> > > > > > > >
> > > > > > > > mmc_gpiod_request_cd_irq(host);
> > > > > > > > - _mmc_detect_change(host, 0, false);
> > > > > > > > + host->detect_change = 1;
> > > > > > > > + __mmc_rescan(host);
> > > > > > > > }
> > > > > > > >
> > > > > > > > void __mmc_stop_host(struct mmc_host *host)
> > > > > > > > --
> > > > > > > > 2.40.0
> > > > > > > >
> > > >
> > > > My apologies for the long delay. I finally managed to test this.
> > > >
> > > > I decided to pick an old arm32 based platform. An ST-Ericsson HREF,
> > > > based upon the ux500 SoC. It's quite good to use for these types of
> > > > tests as it has two eMMCs soldered, an embedded SDIO (for WiFi) and an
> > > > SD-card slot. So in total there are 4 devices that get probed.
> > > >
> > > > The SDIO card isn't detected properly, but always fails in the similar
> > > > way (thus I left it out from the below data). I tested both with and
> > > > without an SD card inserted during boot, to get some more data to
> > > > compare. These are the summary from my tests:
> > > >
> > > > v6.4-rc1 without SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.33s - MMC3 (eMMC)
> > > > ~5.91s - kernel boot complete
> > > >
> > > > v6.4-rc1 with an SD card:
> > > > ~2.18s - MMC1 (eMMC)
> > > > ~3.45s - MMC3 (eMMC)
> > > > ~3.57s - MMC2 (SD)
> > > > ~5.76s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch without SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.58s - MMC3 (eMMC)
> > > > ~5.96s - kernel boot complete
> > > >
> > > > v6.4-rc1 + patch with an SD card:
> > > > ~2.24s - MMC1 (eMMC)
> > > > ~3.73s - MMC2 (SD)
> > > > ~3.98s - MMC3 (eMMC)
> > > > ~6.73s - kernel boot complete
> > > >
> > > > By looking at these results, I was kind of surprised. I was thinking
> > > > that the asynchronous probe should address the parallelism problem.
> > > > Then I discovered that it in fact, hasn't been enabled for the mmci
> > > > driver that is being used for this platform. Huh, I was under the
> > > > assumption that it has been enabled for all mmc hosts by now. :-)
> > > >
> > > > Okay, so I am going to run another round of tests, with async probe
> > > > enabled for the mmci driver too. I will let you know the results as
> > > > soon as I can.
> > > >
> > > > Kind regards
> > > > Uffe
> > >
> > > Hi Uffe,
> > >
> > > Kindly this has been way too long for review. It's been over 3 months.
> > > What's going on here?

I have been busier than usual. My apologies.

The main problem was also that I found a problem with the patch, as
explained with the numbers above.

> > >
> > > I think there's a misunderstanding too. Without this fix, the machine
> > > doesn't even boot. I'm not sure why perf is the blocking question here.
> >
> > Well you can not degrade performance of existing machines that work
> > today, right? That would be a regression and it seems that you are
> > doing that if I read the numbers above correctly.
> >
>
> I agree that we shouldn't degrade performance of existing machines, but
> this is a timing bug on existing platforms that have a slow enough cpu
> such that emmc doesn't finish probing before dm-verity progresses to
> trying to read off the device. In my opinion it's a bit unfair to trade
> performance in the common case for not supporting all use cases. I'm
> just trying to get my machines to boot without having to carry my own
> patch here.
>
> As a path forward I can add a command line flag as a bool to handle this
> and that should hopefully take care of the regresion aspect to this.

Let's not go there. To be able to move forward with your suggested
approach in $subject patch *and* without causing performance
degradations, we need to enable async probe for the mmci host driver
too.

As I said, I was under the impression that we have already done so for
all mmc host drivers by now, but apparently not. So, I have just sent
a patch for that [1].

>
>
> > > Greg, is there another tree I can run this through?
> >
> > Why would you want to route around a maintainer just to get a patch that
> > would have to be reverted applied? :)
> >
>
> What's your advice here as I don't feel like I'm getting adequate
> traction with Ulf. I think I've generally been quite patient here
> waiting > 3 months for this patch to be reviewed.

Sure, you have been patient, but in this case you needed my hands-on
help too, to run specific tests. That takes time too.

Anyway, unless something unexpected happens, I plan to apply the
$subject patch tomorrow after running another round of tests.

Kind regards
Uffe

[1]
https://lore.kernel.org/linux-mmc/[email protected]/

2023-06-13 14:36:47

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()

On Wed, 29 Mar 2023 at 22:21, Dennis Zhou <[email protected]> wrote:
>
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Along with the patch for the mmci driver, this one applied too, for
next, thanks!

Note also that I took the liberty to clarify the commit message a bit.

Moreover, if we want this to be applied for stable kernels, we need to
manage that separately, as then the mmci patch is needed too. Please
ping if you need some pointers in regards to this.

Kind regards
Uffe

> ---
> drivers/mmc/core/core.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 368f10405e13..c0fdc438c882 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> }
> EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +void __mmc_rescan(struct mmc_host *host)
> {
> - struct mmc_host *host =
> - container_of(work, struct mmc_host, detect.work);
> int i;
>
> if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> mmc_schedule_delayed_work(&host->detect, HZ);
> }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> + struct mmc_host *host =
> + container_of(work, struct mmc_host, detect.work);
> +
> + __mmc_rescan(host);
> +}
> +
> void mmc_start_host(struct mmc_host *host)
> {
> host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> }
>
> mmc_gpiod_request_cd_irq(host);
> - _mmc_detect_change(host, 0, false);
> + host->detect_change = 1;
> + __mmc_rescan(host);
> }
>
> void __mmc_stop_host(struct mmc_host *host)
> --
> 2.40.0
>

2023-06-15 16:09:12

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH] mmc: inline the first mmc_scan() on mmc_start_host()

On Tue, Jun 13, 2023 at 04:25:11PM +0200, Ulf Hansson wrote:
> On Wed, 29 Mar 2023 at 22:21, Dennis Zhou <[email protected]> wrote:
> >
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <[email protected]>
>
> Along with the patch for the mmci driver, this one applied too, for
> next, thanks!
>

Thank you Ulf! I'm good with this just being applied to for-next.

Thanks,
Dennis

> Note also that I took the liberty to clarify the commit message a bit.
>
> Moreover, if we want this to be applied for stable kernels, we need to
> manage that separately, as then the mmci patch is needed too. Please
> ping if you need some pointers in regards to this.
>
> Kind regards
> Uffe
>
> > ---
> > drivers/mmc/core/core.c | 15 +++++++++++----
> > 1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index 368f10405e13..c0fdc438c882 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > }
> > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +void __mmc_rescan(struct mmc_host *host)
> > {
> > - struct mmc_host *host =
> > - container_of(work, struct mmc_host, detect.work);
> > int i;
> >
> > if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > mmc_schedule_delayed_work(&host->detect, HZ);
> > }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > + struct mmc_host *host =
> > + container_of(work, struct mmc_host, detect.work);
> > +
> > + __mmc_rescan(host);
> > +}
> > +
> > void mmc_start_host(struct mmc_host *host)
> > {
> > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > }
> >
> > mmc_gpiod_request_cd_irq(host);
> > - _mmc_detect_change(host, 0, false);
> > + host->detect_change = 1;
> > + __mmc_rescan(host);
> > }
> >
> > void __mmc_stop_host(struct mmc_host *host)
> > --
> > 2.40.0
> >

2023-06-27 18:04:02

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Dennis,

On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <[email protected]> wrote:
> When using dm-verity with a data partition on an emmc device, dm-verity
> races with the discovery of attached emmc devices. This is because mmc's
> probing code sets up the host data structure then a work item is
> scheduled to do discovery afterwards. To prevent this race on init,
> let's inline the first call to detection, __mm_scan(), and let
> subsequent detect calls be handled via the workqueue.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
core: Allow mmc_start_host() synchronously detect a card") in
linux-next/master mmc/next next-20230614 next-20230615 next-20230616

I have bisected the following failure on Renesas Salvator-XS with R-Car H3
ES2.0 to the above commit:

renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD0)
renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD1)
renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD0)
renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
hardware interrupt (CMD1)
mmc0: Failed to initialize a non-removable card

Reverting the commit fixes the issue for me.

> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> }
> EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
>
> -void mmc_rescan(struct work_struct *work)
> +static void __mmc_rescan(struct mmc_host *host)
> {
> - struct mmc_host *host =
> - container_of(work, struct mmc_host, detect.work);
> int i;
>
> if (host->rescan_disable)
> @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> mmc_schedule_delayed_work(&host->detect, HZ);
> }
>
> +void mmc_rescan(struct work_struct *work)
> +{
> + struct mmc_host *host =
> + container_of(work, struct mmc_host, detect.work);
> +
> + __mmc_rescan(host);
> +}
> +
> void mmc_start_host(struct mmc_host *host)
> {
> host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> }
>
> mmc_gpiod_request_cd_irq(host);
> - _mmc_detect_change(host, 0, false);
> + host->detect_change = 1;
> + __mmc_rescan(host);
> }
>
> void __mmc_stop_host(struct mmc_host *host)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-06-27 18:22:30

by Biju Das

[permalink] [raw]
Subject: RE: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Dennis,

Reverting commit 2cc83bf7d41113d9 ("mmc:core: Allow mmc_start_host() synchronously detect a card") fixes below issue[1] on RZ/G2L SMARC EVK too.

[1] Before reverting:
------------------

[ 2.469266] renesas_sdhi_internal_dmac 11c10000.mmc: mmc1 base at 0x0000000011c10000, max clock rate 133 MHz
[ 2.993112] mmc1: new ultra high speed SDR104 SDHC card at address aaaa
[ 3.014052] mmcblk1: mmc1:aaaa SC32G 29.7 GiB
[ 3.033826] mmcblk1: p1
[ 7.651447] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD52)
[ 12.768118] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD52)
[ 17.888240] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD0)
[ 23.008122] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD8)
[ 28.128098] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[ 33.248115] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[ 38.368119] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[ 43.488112] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD5)
[ 48.608110] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[ 53.728121] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[ 58.848115] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)
[ 63.968118] renesas_sdhi_internal_dmac 11c00000.mmc: timeout waiting for hardware interrupt (CMD55)

After reverting:
--------------
[ 2.463112] renesas_sdhi_internal_dmac 11c10000.mmc: mmc1 base at 0x0000000011c10000, max clock rate 133 MHz
[ 2.472569] rz-ssi-pcm-audio 10049c00.ssi: DMA enabled
[ 2.474124] renesas_sdhi_internal_dmac 11c00000.mmc: mmc0 base at 0x0000000011c00000, max clock rate 133 MHz
[ 2.600050] mmc0: new HS200 MMC card at address 0001
[ 2.608931] mmcblk0: mmc0:0001 G1M15M 59.3 GiB
[ 2.623432] mmcblk0: p1
[ 2.629863] mmcblk0boot0: mmc0:0001 G1M15M 31.5 MiB
[ 2.640008] mmcblk0boot1: mmc0:0001 G1M15M 31.5 MiB
[ 2.650212] mmcblk0rpmb: mmc0:0001 G1M15M 4.00 MiB, chardev (242:0)
[ 2.681609] Microchip KSZ9131 Gigabit PHY 11c20000.ethernet-ffffffff:07: attached PHY driver (mii_bus:phy_addr=11c20000.ethernet-ffffffff:07, irq=54)
[ 3.020085] mmc1: new ultra high speed SDR104 SDHC card at address aaaa
[ 3.033079] mmcblk1: mmc1:aaaa SC32G 29.7 GiB
[ 3.048402] mmcblk1: p1

Cheers,
Biju

> -----Original Message-----
> From: Geert Uytterhoeven <[email protected]>
> Sent: Tuesday, June 27, 2023 6:20 PM
> To: Dennis Zhou <[email protected]>
> Cc: Ulf Hansson <[email protected]>; Linux MMC List <linux-
> [email protected]>; Linux Kernel Mailing List <linux-
> [email protected]>; Biju Das <[email protected]>; Wolfram
> Sang <[email protected]>; Linux-Renesas <linux-renesas-
> [email protected]>
> Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on
> mmc_start_host()
>
> Hi Dennis,
>
> On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <[email protected]> wrote:
> > When using dm-verity with a data partition on an emmc device,
> > dm-verity races with the discovery of attached emmc devices. This is
> > because mmc's probing code sets up the host data structure then a work
> > item is scheduled to do discovery afterwards. To prevent this race on
> > init, let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <[email protected]>
>
> Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> core: Allow mmc_start_host() synchronously detect a card") in linux-
> next/master mmc/next next-20230614 next-20230615 next-20230616
>
> I have bisected the following failure on Renesas Salvator-XS with R-Car
> H3
> ES2.0 to the above commit:
>
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
> mmc0: Failed to initialize a non-removable card
>
> Reverting the commit fixes the issue for me.
>
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct
> > mmc_card *card, sector_t *gpt_sector) }
> > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> > {
> > - struct mmc_host *host =
> > - container_of(work, struct mmc_host, detect.work);
> > int i;
> >
> > if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > mmc_schedule_delayed_work(&host->detect, HZ); }
> >
> > +void mmc_rescan(struct work_struct *work) {
> > + struct mmc_host *host =
> > + container_of(work, struct mmc_host, detect.work);
> > +
> > + __mmc_rescan(host);
> > +}
> > +
> > void mmc_start_host(struct mmc_host *host) {
> > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > }
> >
> > mmc_gpiod_request_cd_irq(host);
> > - _mmc_detect_change(host, 0, false);
> > + host->detect_change = 1;
> > + __mmc_rescan(host);
> > }
> >
> > void __mmc_stop_host(struct mmc_host *host)
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-
> m68k.org
>
> In personal conversations with technical people, I call myself a hacker.
> But when I'm talking to journalists I just say "programmer" or something
> like that.
> -- Linus Torvalds

2023-06-30 12:34:46

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

On Tue, 27 Jun 2023 at 19:20, Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Dennis,
>
> On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <[email protected]> wrote:
> > When using dm-verity with a data partition on an emmc device, dm-verity
> > races with the discovery of attached emmc devices. This is because mmc's
> > probing code sets up the host data structure then a work item is
> > scheduled to do discovery afterwards. To prevent this race on init,
> > let's inline the first call to detection, __mm_scan(), and let
> > subsequent detect calls be handled via the workqueue.
> >
> > Signed-off-by: Dennis Zhou <[email protected]>
>
> Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> core: Allow mmc_start_host() synchronously detect a card") in
> linux-next/master mmc/next next-20230614 next-20230615 next-20230616
>
> I have bisected the following failure on Renesas Salvator-XS with R-Car H3
> ES2.0 to the above commit:
>
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD0)
> renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> hardware interrupt (CMD1)
> mmc0: Failed to initialize a non-removable card

Thanks for reporting!

After I had a closer look, I realize that all the renesas/tmio drivers
are suffering from the similar problem. A host driver must not call
mmc_add_host() before it's ready to serve requests.

Things like initializing an irq-handler must be done before
mmc_add_host() is called, which is not the case for renesas/tmio. In
fact, there seems to be a few other host drivers that have the similar
pattern in their probe routines.

Note that, even if the offending commit below triggers this problem
100% of the cases (as the probe path has now becomes synchronous),
there was a potential risk even before. Previously, mmc_add_host()
ended up punting a work - and if that work ended up sending a request
to the host driver, *before* the irq-handler would be ready, we would
hit the similar problem. I bet adding an msleep(1000) immediately
after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
problem too. :-)

That said, I am going to revert the offending commit to fix these
problems, for now. Then I will try to help out and fixup the relevant
host drivers - and when that is done, we can give this whole thing a
new try.

Any objections or other suggestions to this?

Kind regards
Uffe

>
> Reverting the commit fixes the issue for me.
>
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2185,10 +2185,8 @@ int mmc_card_alternative_gpt_sector(struct mmc_card *card, sector_t *gpt_sector)
> > }
> > EXPORT_SYMBOL(mmc_card_alternative_gpt_sector);
> >
> > -void mmc_rescan(struct work_struct *work)
> > +static void __mmc_rescan(struct mmc_host *host)
> > {
> > - struct mmc_host *host =
> > - container_of(work, struct mmc_host, detect.work);
> > int i;
> >
> > if (host->rescan_disable)
> > @@ -2249,6 +2247,14 @@ void mmc_rescan(struct work_struct *work)
> > mmc_schedule_delayed_work(&host->detect, HZ);
> > }
> >
> > +void mmc_rescan(struct work_struct *work)
> > +{
> > + struct mmc_host *host =
> > + container_of(work, struct mmc_host, detect.work);
> > +
> > + __mmc_rescan(host);
> > +}
> > +
> > void mmc_start_host(struct mmc_host *host)
> > {
> > host->f_init = max(min(freqs[0], host->f_max), host->f_min);
> > @@ -2261,7 +2267,8 @@ void mmc_start_host(struct mmc_host *host)
> > }
> >
> > mmc_gpiod_request_cd_irq(host);
> > - _mmc_detect_change(host, 0, false);
> > + host->detect_change = 1;
> > + __mmc_rescan(host);
> > }
> >
> > void __mmc_stop_host(struct mmc_host *host)
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2023-06-30 14:02:33

by Wolfram Sang

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()


> Note that, even if the offending commit below triggers this problem
> 100% of the cases (as the probe path has now becomes synchronous),
> there was a potential risk even before. Previously, mmc_add_host()
> ended up punting a work - and if that work ended up sending a request
> to the host driver, *before* the irq-handler would be ready, we would
> hit the similar problem. I bet adding an msleep(1000) immediately
> after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
> problem too. :-)
>
> That said, I am going to revert the offending commit to fix these
> problems, for now. Then I will try to help out and fixup the relevant
> host drivers - and when that is done, we can give this whole thing a
> new try.

I'll work on the TMIO/SDHI driver next week. Thanks for the input!


Attachments:
(No filename) (825.00 B)
signature.asc (849.00 B)
Download all attachments

2023-06-30 22:16:47

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

Hi Ulf,

On Fri, Jun 30, 2023 at 01:26:14PM +0200, Ulf Hansson wrote:
> On Tue, 27 Jun 2023 at 19:20, Geert Uytterhoeven <[email protected]> wrote:
> >
> > Hi Dennis,
> >
> > On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <[email protected]> wrote:
> > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > races with the discovery of attached emmc devices. This is because mmc's
> > > probing code sets up the host data structure then a work item is
> > > scheduled to do discovery afterwards. To prevent this race on init,
> > > let's inline the first call to detection, __mm_scan(), and let
> > > subsequent detect calls be handled via the workqueue.
> > >
> > > Signed-off-by: Dennis Zhou <[email protected]>
> >
> > Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> > core: Allow mmc_start_host() synchronously detect a card") in
> > linux-next/master mmc/next next-20230614 next-20230615 next-20230616
> >
> > I have bisected the following failure on Renesas Salvator-XS with R-Car H3
> > ES2.0 to the above commit:
> >
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> > mmc0: Failed to initialize a non-removable card
>
> Thanks for reporting!
>
> After I had a closer look, I realize that all the renesas/tmio drivers
> are suffering from the similar problem. A host driver must not call
> mmc_add_host() before it's ready to serve requests.
>
> Things like initializing an irq-handler must be done before
> mmc_add_host() is called, which is not the case for renesas/tmio. In
> fact, there seems to be a few other host drivers that have the similar
> pattern in their probe routines.
>
> Note that, even if the offending commit below triggers this problem
> 100% of the cases (as the probe path has now becomes synchronous),
> there was a potential risk even before. Previously, mmc_add_host()
> ended up punting a work - and if that work ended up sending a request
> to the host driver, *before* the irq-handler would be ready, we would
> hit the similar problem. I bet adding an msleep(1000) immediately
> after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
> problem too. :-)
>

I'm deeply appreciative that you're willing to get to the bottom of the
issue.

> That said, I am going to revert the offending commit to fix these
> problems, for now. Then I will try to help out and fixup the relevant
> host drivers - and when that is done, we can give this whole thing a
> new try.
>
> Any objections or other suggestions to this?
>

Acked-by: Dennis Zhou <[email protected]>

Thanks,
Dennis

> Kind regards
> Uffe
>