2024-01-30 06:54:41

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v1] mmc: sdhci: fix max req size based on spec

On 28/01/24 12:01, Shengyu Qu wrote:
> For almost 2 decades, the max allowed requests were limited to 512KB
> because of SDMA's max 512KiB boundary limit.

It is not limited by SDMA. It is limited by choice.

>
> ADMA2 and ADMA3 do not have such limit and were effectively made so any
> kind of block count would not impose interrupt and managing stress to the
> host.

The main benefit of ADMA is that it provides scatter/gather and so does
not need a bounce buffer.

>
> By limiting that to 512KiB, it effectively downgrades these DMA modes to
> SDMA.

Not really.

>
> Fix that by actually following the spec:
> When ADMA is selected tuning mode is advised. On lesser modes, 4MiB
> transfer is selected as max, so re-tuning if timer trigger or if requested
> by host interrupt, can be done in time. Otherwise, the only limit is the
> variable size of types used. In this implementation, 16MiB is used as
> maximum since tests showed that after that point, there are diminishing
> returns.
>
> Also 16MiB in worst case scenarios, when card is eMMC and its max speed is
> a generous 350MiB/s, will generate interrupts every 45ms on huge data
> transfers.
>
> A new `adma_get_req_limit` sdhci host function was also introduced, to let
> vendors override imposed limits by the generic implementation if needed.

Not in this patch?

>
> For example, on local tests with rigorous CPU/GPU burn-in tests and abrupt
> cut-offs to generate huge temperature changes (upwards/downwards) to the
> card, tested host was fine up to 128MB/s transfers on slow cards that used
> SDR104 bus timing without re-tuning.
> In that case the 4MiB limit was overridden with a more than safe 8MiB
> value.
>
> In all testing cases and boards, that change brought the following:

"all testing cases and boards" doesn't mean much to anyone else. You
need to be more explicit.

>
> Depending on bus timing and eMMC/SD specs:
> * Max Read throughput increased by 2-20%
> * Max Write throughput increased by 50-200%
> Depending on CPU frequency and transfer sizes:
> * Reduced mmcqd cpu core usage by 4-50%

The main issue with increasing the request size is that it introduces much
more latency for synchronous reads e.g. a synchronous read may have to wait
for a large write operation. Generally, that is probably a show-stopper
for unconditionally increasing the maximum request size.

>
> Above commit message comes from original author whose id is CTCaer, with
> SoB email address [email protected]. I tried to contact with the author 1
> month ago to ask for sending it to mainline or get the authority to submit
> by myself, but I didn't get any reply, so I decided to send this patch by
> myself. Original commit is here[1].

Ok, so it is not your patch and the original author is out of touch.

Is there a particular reason you wanted this patch?

>
> The author also has a patch[2] applied after this patch, which overrides
> adma size on tegra device from device tree property. But I don't have a
> tegra device to actually test that, so it is not sent, and
> adma_get_req_limit part is not included in this version of patch.

Does that mean you haven't tested this patch yourself at all?

>
> [1]: https://github.com/CTCaer/switch-l4t-kernel-4.9/commit/fa86ebbd56d30b3b6af26e1d1c3f9c411a47e98e
> [2]: https://github.com/CTCaer/switch-l4t-kernel-4.9/commit/385f9335b9a60ce471ac3291f202b1326212be3e
>
> Signed-off-by: Shengyu Qu <[email protected]>
> ---
> drivers/mmc/host/sdhci.c | 17 ++++++++++++-----
> drivers/mmc/host/sdhci.h | 4 ++--
> 2 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index c79f73459915..f546b675c7b9 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -1081,7 +1081,7 @@ static void sdhci_initialize_data(struct sdhci_host *host,
> WARN_ON(host->data);
>
> /* Sanity checks */
> - BUG_ON(data->blksz * data->blocks > 524288);
> + BUG_ON(data->blksz * data->blocks > host->mmc->max_req_size);
> BUG_ON(data->blksz > host->mmc->max_blk_size);
> BUG_ON(data->blocks > 65535);
>
> @@ -4690,11 +4690,18 @@ int sdhci_setup_host(struct sdhci_host *host)
>
> /*
> * Maximum number of sectors in one transfer. Limited by SDMA boundary
> - * size (512KiB). Note some tuning modes impose a 4MiB limit, but this
> - * is less anyway.
> + * size and by tuning modes on ADMA. On tuning mode 3 16MiB is more than
> + * enough to cover big data transfers.
> */
> - mmc->max_req_size = 524288;
> -
> + if (host->flags & SDHCI_USE_ADMA) {
> + if (host->tuning_mode != SDHCI_TUNING_MODE_3)
> + mmc->max_req_size = SZ_4M;
> + else
> + mmc->max_req_size = SZ_16M;
> + } else {
> + /* On PIO/SDMA use SDMA boundary size (512KiB). */
> + mmc->max_req_size = SZ_512K;
> + }
> /*
> * Maximum number of segments. Depends on if the hardware
> * can do scatter/gather or not.
> diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
> index a20864fc0641..98252c427feb 100644
> --- a/drivers/mmc/host/sdhci.h
> +++ b/drivers/mmc/host/sdhci.h
> @@ -346,11 +346,11 @@ struct sdhci_adma2_64_desc {
> #define ADMA2_END 0x2
>
> /*
> - * Maximum segments assuming a 512KiB maximum requisition size and a minimum
> + * Maximum segments assuming a 16MiB maximum requisition size and a minimum
> * 4KiB page size. Note this also allows enough for multiple descriptors in
> * case of PAGE_SIZE >= 64KiB.
> */
> -#define SDHCI_MAX_SEGS 128
> +#define SDHCI_MAX_SEGS 4096
>
> /* Allow for a command request and a data request at the same time */
> #define SDHCI_MAX_MRQS 2