The xarray iteration only holds RCU and thus may encounter
XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
This will cause oops when referring to the invalid entry.
Fix this by adding the missing xas_retry(), which will make the
iteration wind back to the root node if XA_RETRY_ENTRY is encountered.
Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
Signed-off-by: Jingbo Xu <[email protected]>
---
fs/erofs/fscache.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index fe05bc51f9f2..458c1c70ef30 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -75,11 +75,15 @@ static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
rcu_read_lock();
xas_for_each(&xas, folio, last_page) {
- unsigned int pgpos =
- (folio_index(folio) - start_page) * PAGE_SIZE;
- unsigned int pgend = pgpos + folio_size(folio);
+ unsigned int pgpos, pgend;
bool pg_failed = false;
+ if (xas_retry(&xas, folio))
+ continue;
+
+ pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
+ pgend = pgpos + folio_size(folio);
+
for (;;) {
if (!subreq) {
pg_failed = true;
--
2.19.1.6.gb485710b
On Fri, Nov 11, 2022 at 05:08:13PM +0800, Jingbo Xu wrote:
> The xarray iteration only holds RCU and thus may encounter
> XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
> This will cause oops when referring to the invalid entry.
>
> Fix this by adding the missing xas_retry(), which will make the
> iteration wind back to the root node if XA_RETRY_ENTRY is encountered.
>
> Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
> Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Thanks,
Gao Xiang
> ---
> fs/erofs/fscache.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index fe05bc51f9f2..458c1c70ef30 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -75,11 +75,15 @@ static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
>
> rcu_read_lock();
> xas_for_each(&xas, folio, last_page) {
> - unsigned int pgpos =
> - (folio_index(folio) - start_page) * PAGE_SIZE;
> - unsigned int pgend = pgpos + folio_size(folio);
> + unsigned int pgpos, pgend;
> bool pg_failed = false;
>
> + if (xas_retry(&xas, folio))
> + continue;
> +
> + pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
> + pgend = pgpos + folio_size(folio);
> +
> for (;;) {
> if (!subreq) {
> pg_failed = true;
> --
> 2.19.1.6.gb485710b
在 2022/11/11 17:08, Jingbo Xu 写道:
> The xarray iteration only holds RCU and thus may encounter
> XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
> This will cause oops when referring to the invalid entry.
>
> Fix this by adding the missing xas_retry(), which will make the
> iteration wind back to the root node if XA_RETRY_ENTRY is encountered.
>
> Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead")
> Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Jia Zhu <[email protected]>
> ---
> fs/erofs/fscache.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index fe05bc51f9f2..458c1c70ef30 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -75,11 +75,15 @@ static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq)
>
> rcu_read_lock();
> xas_for_each(&xas, folio, last_page) {
> - unsigned int pgpos =
> - (folio_index(folio) - start_page) * PAGE_SIZE;
> - unsigned int pgend = pgpos + folio_size(folio);
> + unsigned int pgpos, pgend;
> bool pg_failed = false;
>
> + if (xas_retry(&xas, folio))
> + continue;
> +
> + pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
> + pgend = pgpos + folio_size(folio);
> +
> for (;;) {
> if (!subreq) {
> pg_failed = true;
Hi David,
Thanks for the comment.
On 11/14/22 7:44 PM, David Howells wrote:
> Jingbo Xu <[email protected]> wrote:
>
>> The xarray iteration only holds RCU
>
> I would say "the RCU read lock".
Yeah, this looks clearer. I will update the commit message in v2 later.
>
> Also, I think you've copied the code to which my dodgy-maths fix applies:
>
> https://lore.kernel.org/linux-fsdevel/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/
>
Thanks for the kindly reminder. Yeah this code was ever copied from
libnetfs. In the scenario of erofs, currently req->start is always
aligned with folio size and erofs doesn't support large folio yet. Thus
req->start won't be inside the folio so far, and I think the current
code works well in the scenario of erofs, though the issue indeed exist
mathematically.
Actually I'm working on the support for large folio now, and the
completion routine of erofs in fscache mode will be refactored quite a
lot. I think this issue will be fixed along with the refactoring.
Thanks again for the suggestion :)
--
Thanks,
Jingbo
Jingbo Xu <[email protected]> wrote:
> The xarray iteration only holds RCU
I would say "the RCU read lock".
Also, I think you've copied the code to which my dodgy-maths fix applies:
https://lore.kernel.org/linux-fsdevel/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/
David