Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1890474rbb; Tue, 27 Feb 2024 04:44:09 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUnlCgNcTo0o6GfkJAaMcVYfX0PGnpHu+kHCOXaJm1pFvRHeQXLWKr3tRTy5JYWw4+gFcYzwewHQjPxaPLs/VCZH4xV4qmsBkTLwJAW4g== X-Google-Smtp-Source: AGHT+IFScLPiOBj6VyQEhI1Bzmc9lfWco0dsn9p7Tc/ckwf3mjIyA+agNldh6Vfslu9Y4Jtgk3p9 X-Received: by 2002:a05:6512:39d4:b0:512:f657:122d with SMTP id k20-20020a05651239d400b00512f657122dmr6073106lfu.12.1709037849090; Tue, 27 Feb 2024 04:44:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709037849; cv=pass; d=google.com; s=arc-20160816; b=Go+8mg1WLtXODfQQu3oaYmlW7ZE6UuonuItKuX8vnyzGm6QGc7arentLuE+pvOxL+k VKAKcKXrrNFNFiXKyLHno8WfhLk5+CbfsTZcflD0xUV9yQ+rSKjLzm7MZP671o/TXAlf SHdMxQtkFabr7bEoT3W4slqSvct6LS/Pz9/6IB9NUNZrSv8fISg67h4M997pUaDuubHS ZcQYh/uDdybV7OnHhBIh/m8qS5RWzO91lnqw27/1PxhpfhR+e6l410aWyeciqgV+hJ0Q +Upk+GoiH6EV1Z5vN7EVo+sA7+f0h8xYmMxc7apA6XambFyppSXk0cRush5kEnDz2rEU jNaA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:references:cc:to:from:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id; bh=hzmKsN0swRIip2/tqjNgplrPYX61Uto+U2/yC2qPkMs=; fh=3pDsO//mYrsJxrDR9JVzJgBPJoNQ3wldnzg3Jcotqz4=; b=LAwKUEUFj9Q+gKd8eRRvlSe6O15YHf1cG+PvVp+BR8IWerbcy2dogEeoPnSPxgl7S8 eVGmAAMZpnaHR79HmmckBs8lbVkdIkPvCAtGnB3sKNk2i2pDtE46zQqX8BHywxSrvPOY YTiIjFVHH3a1BULQouWzXLUOU1NM6UM5HVPDhHcd+W8Pcc7qqnoNJho2LA6i1ji8J2VA CBF8t7Pz24fttYUiEMDokeXfkvfXahMWA88LGQAvGTFDbN1mhiWD2XWcrCpI/AKwrP6I ruT1eJWww5ZJh71SKHapMvD9cBRNPQxCs/XpsyDmxZBPK+SdziMVRZrFz6a8UDMYGnf3 7X9Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-83250-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83250-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id bf27-20020a0564021a5b00b005657afaef57si689984edb.351.2024.02.27.04.44.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:44:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83250-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-83250-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83250-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 7B88D1F271B1 for ; Tue, 27 Feb 2024 12:44:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8BAC113A895; Tue, 27 Feb 2024 12:43:47 +0000 (UTC) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36BC8135A71; Tue, 27 Feb 2024 12:43:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709037825; cv=none; b=GWoGMTDOXq61K+LxTLVT0LOCjUOCIGry0fpp1zo6qkNh3hEI1U9Fc8H3qXqLefr8pcHocKe2/0Dzf6DQVyxv0Oaf84adV6lS422l6Zfa/uewTVDNcFR18pq2T0084XE9ci/A0wtb8S/MUve6uZOVd/rMlkVqF8/r0ouX5rCqKtw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709037825; c=relaxed/simple; bh=7tDm4uDTXo3Cog+OUg4PpfrbnsItvcB1D5v2pwdPhTk=; h=Message-ID:Date:MIME-Version:Subject:From:To:CC:References: In-Reply-To:Content-Type; b=Mi/B4GLvd6drYIycR7NyrEnZ0IwYikb2xuQ4rrxamG0P092grM9FwL9u6cVjo3DhpU8xykewv1qI6NmBDXhBxWrjv8/aErUlbZ2sNysu4tzhw3Bs4KBeIMJV/d00XdRuZQO1punVYFmsA0pHPNwpVS6v59Y3Of9IY3wA5nB83oI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4TkcYm4Z4fzpTn0; Tue, 27 Feb 2024 20:41:48 +0800 (CST) Received: from kwepemm600017.china.huawei.com (unknown [7.193.23.234]) by mail.maildlp.com (Postfix) with ESMTPS id 22856140499; Tue, 27 Feb 2024 20:43:36 +0800 (CST) Received: from [10.174.179.234] (10.174.179.234) by kwepemm600017.china.huawei.com (7.193.23.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 27 Feb 2024 20:43:34 +0800 Message-ID: <40676c7a-daee-8ef4-340f-d8573556ae10@huawei.com> Date: Tue, 27 Feb 2024 20:43:34 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs From: Tong Tiangen To: David Howells , Jens Axboe CC: Al Viro , Linus Torvalds , Christoph Hellwig , Christian Brauner , David Laight , Matthew Wilcox , Jeff Layton , , , , , , Kefeng Wang References: <20230925120309.1731676-1-dhowells@redhat.com> <20230925120309.1731676-8-dhowells@redhat.com> <4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com> In-Reply-To: <4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600017.china.huawei.com (7.193.23.234) Hi David, Jens: Kindly ping... Thanks. Tong. 在 2024/2/18 11:13, Tong Tiangen 写道: > Hi David, Jens: > > Recently, I tested the x86 coredump function of the user process in the > mainline (6.8-rc1) and found an deadloop issue related to this patch. > > Let's discuss it. > > 1. Test step: > ---------------------------- >   a. Start a user process. >   b. Use EINJ to inject a hardware memory error into a page of >      the this user process. >   c. Send SIGBUS to this user process. >   d. After receiving the signal, a coredump file is configured to be >      written to tmpfs. > > 2. Root cause: > ---------------------------- > The deadloop occurs in generic_perform_write(), the call path: > > elf_core_dump() >   -> dump_user_range() >     -> dump_emit_page() >       -> iov_iter_bvec()  //iter type set to BVEC >         -> iov_iter_set_copy_mc(&iter);  //support copy mc >           -> __kernel_write_iter() >             -> shmem_file_write_iter() >               -> generic_perform_write() > > ssize_t generic_perform_write(...) > { >     [...] >     do { >         [...] >     again: >         //[4] >         if (unlikely(fault_in_iov_iter_readable(i, bytes) == >                              bytes)) { >             status = -EFAULT; >             break; >         } >         //[5] >         if (fatal_signal_pending(current)) { >             status = -EINTR; >             break; >         } > >             [...] > >         //[1] >         copied = copy_page_from_iter_atomic(page, offset, bytes, >                          i); >         [...] > >         //[2] >         status = a_ops->write_end(...); >         if (unlikely(status != copied)) { >             iov_iter_revert(i, copied - max(status, 0L)); >             if (unlikely(status < 0)) >                 break; >         } >         cond_resched(); > >         if (unlikely(status == 0)) { >             /* >             * A short copy made ->write_end() reject the >             * thing entirely.  Might be memory poisoning >             * halfway through, might be a race with munmap, >             * might be severe memory pressure. >             */ >             if (copied) >                 bytes = copied; >             //----[3] >             goto again; >         } >         [...] >     } while (iov_iter_count(i)); >     [...] > } > > [1]Before this patch: >   copy_page_from_iter_atomic() >     -> iterate_and_advance() >        -> __iterate_and_advance(..., ((void)(K),0)) >          ->iterate_bvec macro >            -> left = ((void)(K),0) > > With CONFIG_ARCH_HAS_COPY_MC, the K() is copy_mc_to_kernel() which > return "bytes not copied". > > When a memory error occurs during K(), the value of "left" must be 0. > Therefore, the value of "copied" returned by > copy_page_from_iter_atomic() is not 0, and the loop of > generic_perform_write() can be ended normally. > > > After this patch: >   copy_page_from_iter_atomic() >     -> iterate_and_advance2() >       -> iterate_bvec() >         -> remain = step() > > With CONFIG_ARCH_HAS_COPY_MC, the step() is copy_mc_to_kernel() which > return "bytes not copied". > > When a memory error occurs during step(), the value of "left" equal to > the value of "part" (no one byte is copied successfully). In this case, > iterate_bvec() returns 0, and copy_page_from_iter_atomic() also returns > 0. The callback shmem_write_end()[2] also returns 0. Finally, > generic_perform_write() goes to "goto again"[3], and the loop restarts. > 4][5] cannot enter and exit the loop, then deadloop occurs. > > Thanks. > Tong > > > 在 2023/9/25 20:03, David Howells 写道: >> Convert the iov_iter iteration macros to inline functions to make the >> code >> easier to follow. >> >> The functions are marked __always_inline as we don't want to end up with >> indirect calls in the code.  This, however, leaves dealing with ->copy_mc >> in an awkard situation since the step function (memcpy_from_iter_mc()) >> needs to test the flag in the iterator, but isn't passed the iterator. >> This will be dealt with in a follow-up patch. >> >> The variable names in the per-type iterator functions have been >> harmonised >> as much as possible and made clearer as to the variable purpose. >> >> The iterator functions are also moved to a header file so that other >> operations that need to scan over an iterator can be added.  For >> instance, >> the rbd driver could use this to scan a buffer to see if it is all zeros >> and libceph could use this to generate a crc. >> >> Signed-off-by: David Howells >> cc: Alexander Viro >> cc: Jens Axboe >> cc: Christoph Hellwig >> cc: Christian Brauner >> cc: Matthew Wilcox >> cc: Linus Torvalds >> cc: David Laight >> cc: linux-block@vger.kernel.org >> cc: linux-fsdevel@vger.kernel.org >> cc: linux-mm@kvack.org >> Link: >> https://lore.kernel.org/r/3710261.1691764329@warthog.procyon.org.uk/ # v1 >> Link: https://lore.kernel.org/r/855.1692047347@warthog.procyon.org.uk/ >> # v2 >> Link: >> https://lore.kernel.org/r/20230816120741.534415-1-dhowells@redhat.com/ >> # v3 >> --- >> >> Notes: >>      Changes >>      ======= >>      ver #5) >>       - Merge in patch to move iteration framework to a header file. >>       - Move "iter->count - progress" into individual iteration >> subfunctions. >> >>   include/linux/iov_iter.h | 274 ++++++++++++++++++++++++++ >>   lib/iov_iter.c           | 416 ++++++++++++++++----------------------- >>   2 files changed, 449 insertions(+), 241 deletions(-) >>   create mode 100644 include/linux/iov_iter.h >> >> diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h >> new file mode 100644 >> index 000000000000..270454a6703d >> --- /dev/null >> +++ b/include/linux/iov_iter.h >> @@ -0,0 +1,274 @@ >> +/* SPDX-License-Identifier: GPL-2.0-or-later */ >> +/* I/O iterator iteration building functions. >> + * >> + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. >> + * Written by David Howells (dhowells@redhat.com) >> + */ >> + >> +#ifndef _LINUX_IOV_ITER_H >> +#define _LINUX_IOV_ITER_H >> + >> +#include >> +#include >> + >> +typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t >> len, >> +                 void *priv, void *priv2); >> +typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t >> progress, size_t len, >> +                  void *priv, void *priv2); >> + >> +/* >> + * Handle ITER_UBUF. >> + */ >> +static __always_inline >> +size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +            iov_ustep_f step) >> +{ >> +    void __user *base = iter->ubuf; >> +    size_t progress = 0, remain; >> + >> +    remain = step(base + iter->iov_offset, 0, len, priv, priv2); >> +    progress = len - remain; >> +    iter->iov_offset += progress; >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/* >> + * Handle ITER_IOVEC. >> + */ >> +static __always_inline >> +size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +             iov_ustep_f step) >> +{ >> +    const struct iovec *p = iter->__iov; >> +    size_t progress = 0, skip = iter->iov_offset; >> + >> +    do { >> +        size_t remain, consumed; >> +        size_t part = min(len, p->iov_len - skip); >> + >> +        if (likely(part)) { >> +            remain = step(p->iov_base + skip, progress, part, priv, >> priv2); >> +            consumed = part - remain; >> +            progress += consumed; >> +            skip += consumed; >> +            len -= consumed; >> +            if (skip < p->iov_len) >> +                break; >> +        } >> +        p++; >> +        skip = 0; >> +    } while (len); >> + >> +    iter->nr_segs -= p - iter->__iov; >> +    iter->__iov = p; >> +    iter->iov_offset = skip; >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/* >> + * Handle ITER_KVEC. >> + */ >> +static __always_inline >> +size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +            iov_step_f step) >> +{ >> +    const struct kvec *p = iter->kvec; >> +    size_t progress = 0, skip = iter->iov_offset; >> + >> +    do { >> +        size_t remain, consumed; >> +        size_t part = min(len, p->iov_len - skip); >> + >> +        if (likely(part)) { >> +            remain = step(p->iov_base + skip, progress, part, priv, >> priv2); >> +            consumed = part - remain; >> +            progress += consumed; >> +            skip += consumed; >> +            len -= consumed; >> +            if (skip < p->iov_len) >> +                break; >> +        } >> +        p++; >> +        skip = 0; >> +    } while (len); >> + >> +    iter->nr_segs -= p - iter->kvec; >> +    iter->kvec = p; >> +    iter->iov_offset = skip; >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/* >> + * Handle ITER_BVEC. >> + */ >> +static __always_inline >> +size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +            iov_step_f step) >> +{ >> +    const struct bio_vec *p = iter->bvec; >> +    size_t progress = 0, skip = iter->iov_offset; >> + >> +    do { >> +        size_t remain, consumed; >> +        size_t offset = p->bv_offset + skip, part; >> +        void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE); >> + >> +        part = min3(len, >> +               (size_t)(p->bv_len - skip), >> +               (size_t)(PAGE_SIZE - offset % PAGE_SIZE)); >> +        remain = step(kaddr + offset % PAGE_SIZE, progress, part, >> priv, priv2); >> +        kunmap_local(kaddr); >> +        consumed = part - remain; >> +        len -= consumed; >> +        progress += consumed; >> +        skip += consumed; >> +        if (skip >= p->bv_len) { >> +            skip = 0; >> +            p++; >> +        } >> +        if (remain) >> +            break; >> +    } while (len); >> + >> +    iter->nr_segs -= p - iter->bvec; >> +    iter->bvec = p; >> +    iter->iov_offset = skip; >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/* >> + * Handle ITER_XARRAY. >> + */ >> +static __always_inline >> +size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +              iov_step_f step) >> +{ >> +    struct folio *folio; >> +    size_t progress = 0; >> +    loff_t start = iter->xarray_start + iter->iov_offset; >> +    pgoff_t index = start / PAGE_SIZE; >> +    XA_STATE(xas, iter->xarray, index); >> + >> +    rcu_read_lock(); >> +    xas_for_each(&xas, folio, ULONG_MAX) { >> +        size_t remain, consumed, offset, part, flen; >> + >> +        if (xas_retry(&xas, folio)) >> +            continue; >> +        if (WARN_ON(xa_is_value(folio))) >> +            break; >> +        if (WARN_ON(folio_test_hugetlb(folio))) >> +            break; >> + >> +        offset = offset_in_folio(folio, start + progress); >> +        flen = min(folio_size(folio) - offset, len); >> + >> +        while (flen) { >> +            void *base = kmap_local_folio(folio, offset); >> + >> +            part = min_t(size_t, flen, >> +                     PAGE_SIZE - offset_in_page(offset)); >> +            remain = step(base, progress, part, priv, priv2); >> +            kunmap_local(base); >> + >> +            consumed = part - remain; >> +            progress += consumed; >> +            len -= consumed; >> + >> +            if (remain || len == 0) >> +                goto out; >> +            flen -= consumed; >> +            offset += consumed; >> +        } >> +    } >> + >> +out: >> +    rcu_read_unlock(); >> +    iter->iov_offset += progress; >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/* >> + * Handle ITER_DISCARD. >> + */ >> +static __always_inline >> +size_t iterate_discard(struct iov_iter *iter, size_t len, void *priv, >> void *priv2, >> +              iov_step_f step) >> +{ >> +    size_t progress = len; >> + >> +    iter->count -= progress; >> +    return progress; >> +} >> + >> +/** >> + * iterate_and_advance2 - Iterate over an iterator >> + * @iter: The iterator to iterate over. >> + * @len: The amount to iterate over. >> + * @priv: Data for the step functions. >> + * @priv2: More data for the step functions. >> + * @ustep: Function for UBUF/IOVEC iterators; given __user addresses. >> + * @step: Function for other iterators; given kernel addresses. >> + * >> + * Iterate over the next part of an iterator, up to the specified >> length.  The >> + * buffer is presented in segments, which for kernel iteration are >> broken up by >> + * physical pages and mapped, with the mapped address being presented. >> + * >> + * Two step functions, @step and @ustep, must be provided, one for >> handling >> + * mapped kernel addresses and the other is given user addresses >> which have the >> + * potential to fault since no pinning is performed. >> + * >> + * The step functions are passed the address and length of the >> segment, @priv, >> + * @priv2 and the amount of data so far iterated over (which can, for >> example, >> + * be added to @priv to point to the right part of a second buffer). >> The step >> + * functions should return the amount of the segment they didn't >> process (ie. 0 >> + * indicates complete processsing). >> + * >> + * This function returns the amount of data processed (ie. 0 means >> nothing was >> + * processed and the value of @len means processes to completion). >> + */ >> +static __always_inline >> +size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void >> *priv, >> +                void *priv2, iov_ustep_f ustep, iov_step_f step) >> +{ >> +    if (unlikely(iter->count < len)) >> +        len = iter->count; >> +    if (unlikely(!len)) >> +        return 0; >> + >> +    if (likely(iter_is_ubuf(iter))) >> +        return iterate_ubuf(iter, len, priv, priv2, ustep); >> +    if (likely(iter_is_iovec(iter))) >> +        return iterate_iovec(iter, len, priv, priv2, ustep); >> +    if (iov_iter_is_bvec(iter)) >> +        return iterate_bvec(iter, len, priv, priv2, step); >> +    if (iov_iter_is_kvec(iter)) >> +        return iterate_kvec(iter, len, priv, priv2, step); >> +    if (iov_iter_is_xarray(iter)) >> +        return iterate_xarray(iter, len, priv, priv2, step); >> +    return iterate_discard(iter, len, priv, priv2, step); >> +} >> + >> +/** >> + * iterate_and_advance - Iterate over an iterator >> + * @iter: The iterator to iterate over. >> + * @len: The amount to iterate over. >> + * @priv: Data for the step functions. >> + * @ustep: Function for UBUF/IOVEC iterators; given __user addresses. >> + * @step: Function for other iterators; given kernel addresses. >> + * >> + * As iterate_and_advance2(), but priv2 is always NULL. >> + */ >> +static __always_inline >> +size_t iterate_and_advance(struct iov_iter *iter, size_t len, void >> *priv, >> +               iov_ustep_f ustep, iov_step_f step) >> +{ >> +    return iterate_and_advance2(iter, len, priv, NULL, ustep, step); >> +} >> + >> +#endif /* _LINUX_IOV_ITER_H */ >> diff --git a/lib/iov_iter.c b/lib/iov_iter.c >> index 227c9f536b94..65374ee91ecd 100644 >> --- a/lib/iov_iter.c >> +++ b/lib/iov_iter.c >> @@ -13,189 +13,69 @@ >>   #include >>   #include >>   #include >> +#include >> -/* covers ubuf and kbuf alike */ >> -#define iterate_buf(i, n, base, len, off, __p, STEP) {        \ >> -    size_t __maybe_unused off = 0;                \ >> -    len = n;                        \ >> -    base = __p + i->iov_offset;                \ >> -    len -= (STEP);                        \ >> -    i->iov_offset += len;                    \ >> -    n = len;                        \ >> -} >> - >> -/* covers iovec and kvec alike */ >> -#define iterate_iovec(i, n, base, len, off, __p, STEP) {    \ >> -    size_t off = 0;                        \ >> -    size_t skip = i->iov_offset;                \ >> -    do {                            \ >> -        len = min(n, __p->iov_len - skip);        \ >> -        if (likely(len)) {                \ >> -            base = __p->iov_base + skip;        \ >> -            len -= (STEP);                \ >> -            off += len;                \ >> -            skip += len;                \ >> -            n -= len;                \ >> -            if (skip < __p->iov_len)        \ >> -                break;                \ >> -        }                        \ >> -        __p++;                        \ >> -        skip = 0;                    \ >> -    } while (n);                        \ >> -    i->iov_offset = skip;                    \ >> -    n = off;                        \ >> -} >> - >> -#define iterate_bvec(i, n, base, len, off, p, STEP) {        \ >> -    size_t off = 0;                        \ >> -    unsigned skip = i->iov_offset;                \ >> -    while (n) {                        \ >> -        unsigned offset = p->bv_offset + skip;        \ >> -        unsigned left;                    \ >> -        void *kaddr = kmap_local_page(p->bv_page +    \ >> -                    offset / PAGE_SIZE);    \ >> -        base = kaddr + offset % PAGE_SIZE;        \ >> -        len = min(min(n, (size_t)(p->bv_len - skip)),    \ >> -             (size_t)(PAGE_SIZE - offset % PAGE_SIZE));    \ >> -        left = (STEP);                    \ >> -        kunmap_local(kaddr);                \ >> -        len -= left;                    \ >> -        off += len;                    \ >> -        skip += len;                    \ >> -        if (skip == p->bv_len) {            \ >> -            skip = 0;                \ >> -            p++;                    \ >> -        }                        \ >> -        n -= len;                    \ >> -        if (left)                    \ >> -            break;                    \ >> -    }                            \ >> -    i->iov_offset = skip;                    \ >> -    n = off;                        \ >> -} >> - >> -#define iterate_xarray(i, n, base, len, __off, STEP) {        \ >> -    __label__ __out;                    \ >> -    size_t __off = 0;                    \ >> -    struct folio *folio;                    \ >> -    loff_t start = i->xarray_start + i->iov_offset;        \ >> -    pgoff_t index = start / PAGE_SIZE;            \ >> -    XA_STATE(xas, i->xarray, index);            \ >> -                                \ >> -    len = PAGE_SIZE - offset_in_page(start);        \ >> -    rcu_read_lock();                    \ >> -    xas_for_each(&xas, folio, ULONG_MAX) {            \ >> -        unsigned left;                    \ >> -        size_t offset;                    \ >> -        if (xas_retry(&xas, folio))            \ >> -            continue;                \ >> -        if (WARN_ON(xa_is_value(folio)))        \ >> -            break;                    \ >> -        if (WARN_ON(folio_test_hugetlb(folio)))        \ >> -            break;                    \ >> -        offset = offset_in_folio(folio, start + __off);    \ >> -        while (offset < folio_size(folio)) {        \ >> -            base = kmap_local_folio(folio, offset);    \ >> -            len = min(n, len);            \ >> -            left = (STEP);                \ >> -            kunmap_local(base);            \ >> -            len -= left;                \ >> -            __off += len;                \ >> -            n -= len;                \ >> -            if (left || n == 0)            \ >> -                goto __out;            \ >> -            offset += len;                \ >> -            len = PAGE_SIZE;            \ >> -        }                        \ >> -    }                            \ >> -__out:                                \ >> -    rcu_read_unlock();                    \ >> -    i->iov_offset += __off;                    \ >> -    n = __off;                        \ >> -} >> - >> -#define __iterate_and_advance(i, n, base, len, off, I, K) {    \ >> -    if (unlikely(i->count < n))                \ >> -        n = i->count;                    \ >> -    if (likely(n)) {                    \ >> -        if (likely(iter_is_ubuf(i))) {            \ >> -            void __user *base;            \ >> -            size_t len;                \ >> -            iterate_buf(i, n, base, len, off,    \ >> -                        i->ubuf, (I))     \ >> -        } else if (likely(iter_is_iovec(i))) {        \ >> -            const struct iovec *iov = iter_iov(i);    \ >> -            void __user *base;            \ >> -            size_t len;                \ >> -            iterate_iovec(i, n, base, len, off,    \ >> -                        iov, (I))    \ >> -            i->nr_segs -= iov - iter_iov(i);    \ >> -            i->__iov = iov;                \ >> -        } else if (iov_iter_is_bvec(i)) {        \ >> -            const struct bio_vec *bvec = i->bvec;    \ >> -            void *base;                \ >> -            size_t len;                \ >> -            iterate_bvec(i, n, base, len, off,    \ >> -                        bvec, (K))    \ >> -            i->nr_segs -= bvec - i->bvec;        \ >> -            i->bvec = bvec;                \ >> -        } else if (iov_iter_is_kvec(i)) {        \ >> -            const struct kvec *kvec = i->kvec;    \ >> -            void *base;                \ >> -            size_t len;                \ >> -            iterate_iovec(i, n, base, len, off,    \ >> -                        kvec, (K))    \ >> -            i->nr_segs -= kvec - i->kvec;        \ >> -            i->kvec = kvec;                \ >> -        } else if (iov_iter_is_xarray(i)) {        \ >> -            void *base;                \ >> -            size_t len;                \ >> -            iterate_xarray(i, n, base, len, off,    \ >> -                            (K))    \ >> -        }                        \ >> -        i->count -= n;                    \ >> -    }                            \ >> -} >> -#define iterate_and_advance(i, n, base, len, off, I, K) \ >> -    __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0)) >> - >> -static int copyout(void __user *to, const void *from, size_t n) >> +static __always_inline >> +size_t copy_to_user_iter(void __user *iter_to, size_t progress, >> +             size_t len, void *from, void *priv2) >>   { >>       if (should_fail_usercopy()) >> -        return n; >> -    if (access_ok(to, n)) { >> -        instrument_copy_to_user(to, from, n); >> -        n = raw_copy_to_user(to, from, n); >> +        return len; >> +    if (access_ok(iter_to, len)) { >> +        from += progress; >> +        instrument_copy_to_user(iter_to, from, len); >> +        len = raw_copy_to_user(iter_to, from, len); >>       } >> -    return n; >> +    return len; >>   } >> -static int copyout_nofault(void __user *to, const void *from, size_t n) >> +static __always_inline >> +size_t copy_to_user_iter_nofault(void __user *iter_to, size_t progress, >> +                 size_t len, void *from, void *priv2) >>   { >> -    long res; >> +    ssize_t res; >>       if (should_fail_usercopy()) >> -        return n; >> - >> -    res = copy_to_user_nofault(to, from, n); >> +        return len; >> -    return res < 0 ? n : res; >> +    from += progress; >> +    res = copy_to_user_nofault(iter_to, from, len); >> +    return res < 0 ? len : res; >>   } >> -static int copyin(void *to, const void __user *from, size_t n) >> +static __always_inline >> +size_t copy_from_user_iter(void __user *iter_from, size_t progress, >> +               size_t len, void *to, void *priv2) >>   { >> -    size_t res = n; >> +    size_t res = len; >>       if (should_fail_usercopy()) >> -        return n; >> -    if (access_ok(from, n)) { >> -        instrument_copy_from_user_before(to, from, n); >> -        res = raw_copy_from_user(to, from, n); >> -        instrument_copy_from_user_after(to, from, n, res); >> +        return len; >> +    if (access_ok(iter_from, len)) { >> +        to += progress; >> +        instrument_copy_from_user_before(to, iter_from, len); >> +        res = raw_copy_from_user(to, iter_from, len); >> +        instrument_copy_from_user_after(to, iter_from, len, res); >>       } >>       return res; >>   } >> +static __always_inline >> +size_t memcpy_to_iter(void *iter_to, size_t progress, >> +              size_t len, void *from, void *priv2) >> +{ >> +    memcpy(iter_to, from + progress, len); >> +    return 0; >> +} >> + >> +static __always_inline >> +size_t memcpy_from_iter(void *iter_from, size_t progress, >> +            size_t len, void *to, void *priv2) >> +{ >> +    memcpy(to + progress, iter_from, len); >> +    return 0; >> +} >> + >>   /* >>    * fault_in_iov_iter_readable - fault in iov iterator for reading >>    * @i: iterator >> @@ -312,23 +192,29 @@ size_t _copy_to_iter(const void *addr, size_t >> bytes, struct iov_iter *i) >>           return 0; >>       if (user_backed_iter(i)) >>           might_fault(); >> -    iterate_and_advance(i, bytes, base, len, off, >> -        copyout(base, addr + off, len), >> -        memcpy(base, addr + off, len) >> -    ) >> - >> -    return bytes; >> +    return iterate_and_advance(i, bytes, (void *)addr, >> +                   copy_to_user_iter, memcpy_to_iter); >>   } >>   EXPORT_SYMBOL(_copy_to_iter); >>   #ifdef CONFIG_ARCH_HAS_COPY_MC >> -static int copyout_mc(void __user *to, const void *from, size_t n) >> -{ >> -    if (access_ok(to, n)) { >> -        instrument_copy_to_user(to, from, n); >> -        n = copy_mc_to_user((__force void *) to, from, n); >> +static __always_inline >> +size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress, >> +                size_t len, void *from, void *priv2) >> +{ >> +    if (access_ok(iter_to, len)) { >> +        from += progress; >> +        instrument_copy_to_user(iter_to, from, len); >> +        len = copy_mc_to_user(iter_to, from, len); >>       } >> -    return n; >> +    return len; >> +} >> + >> +static __always_inline >> +size_t memcpy_to_iter_mc(void *iter_to, size_t progress, >> +             size_t len, void *from, void *priv2) >> +{ >> +    return copy_mc_to_kernel(iter_to, from + progress, len); >>   } >>   /** >> @@ -361,22 +247,20 @@ size_t _copy_mc_to_iter(const void *addr, size_t >> bytes, struct iov_iter *i) >>           return 0; >>       if (user_backed_iter(i)) >>           might_fault(); >> -    __iterate_and_advance(i, bytes, base, len, off, >> -        copyout_mc(base, addr + off, len), >> -        copy_mc_to_kernel(base, addr + off, len) >> -    ) >> - >> -    return bytes; >> +    return iterate_and_advance(i, bytes, (void *)addr, >> +                   copy_to_user_iter_mc, memcpy_to_iter_mc); >>   } >>   EXPORT_SYMBOL_GPL(_copy_mc_to_iter); >>   #endif /* CONFIG_ARCH_HAS_COPY_MC */ >> -static void *memcpy_from_iter(struct iov_iter *i, void *to, const >> void *from, >> -                 size_t size) >> +static size_t memcpy_from_iter_mc(void *iter_from, size_t progress, >> +                  size_t len, void *to, void *priv2) >>   { >> -    if (iov_iter_is_copy_mc(i)) >> -        return (void *)copy_mc_to_kernel(to, from, size); >> -    return memcpy(to, from, size); >> +    struct iov_iter *iter = priv2; >> + >> +    if (iov_iter_is_copy_mc(iter)) >> +        return copy_mc_to_kernel(to + progress, iter_from, len); >> +    return memcpy_from_iter(iter_from, progress, len, to, priv2); >>   } >>   size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) >> @@ -386,30 +270,46 @@ size_t _copy_from_iter(void *addr, size_t bytes, >> struct iov_iter *i) >>       if (user_backed_iter(i)) >>           might_fault(); >> -    iterate_and_advance(i, bytes, base, len, off, >> -        copyin(addr + off, base, len), >> -        memcpy_from_iter(i, addr + off, base, len) >> -    ) >> - >> -    return bytes; >> +    return iterate_and_advance2(i, bytes, addr, i, >> +                    copy_from_user_iter, >> +                    memcpy_from_iter_mc); >>   } >>   EXPORT_SYMBOL(_copy_from_iter); >> +static __always_inline >> +size_t copy_from_user_iter_nocache(void __user *iter_from, size_t >> progress, >> +                   size_t len, void *to, void *priv2) >> +{ >> +    return __copy_from_user_inatomic_nocache(to + progress, >> iter_from, len); >> +} >> + >>   size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct >> iov_iter *i) >>   { >>       if (WARN_ON_ONCE(!i->data_source)) >>           return 0; >> -    iterate_and_advance(i, bytes, base, len, off, >> -        __copy_from_user_inatomic_nocache(addr + off, base, len), >> -        memcpy(addr + off, base, len) >> -    ) >> - >> -    return bytes; >> +    return iterate_and_advance(i, bytes, addr, >> +                   copy_from_user_iter_nocache, >> +                   memcpy_from_iter); >>   } >>   EXPORT_SYMBOL(_copy_from_iter_nocache); >>   #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE >> +static __always_inline >> +size_t copy_from_user_iter_flushcache(void __user *iter_from, size_t >> progress, >> +                      size_t len, void *to, void *priv2) >> +{ >> +    return __copy_from_user_flushcache(to + progress, iter_from, len); >> +} >> + >> +static __always_inline >> +size_t memcpy_from_iter_flushcache(void *iter_from, size_t progress, >> +                   size_t len, void *to, void *priv2) >> +{ >> +    memcpy_flushcache(to + progress, iter_from, len); >> +    return 0; >> +} >> + >>   /** >>    * _copy_from_iter_flushcache - write destination through cpu cache >>    * @addr: destination kernel address >> @@ -431,12 +331,9 @@ size_t _copy_from_iter_flushcache(void *addr, >> size_t bytes, struct iov_iter *i) >>       if (WARN_ON_ONCE(!i->data_source)) >>           return 0; >> -    iterate_and_advance(i, bytes, base, len, off, >> -        __copy_from_user_flushcache(addr + off, base, len), >> -        memcpy_flushcache(addr + off, base, len) >> -    ) >> - >> -    return bytes; >> +    return iterate_and_advance(i, bytes, addr, >> +                   copy_from_user_iter_flushcache, >> +                   memcpy_from_iter_flushcache); >>   } >>   EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache); >>   #endif >> @@ -508,10 +405,9 @@ size_t copy_page_to_iter_nofault(struct page >> *page, unsigned offset, size_t byte >>           void *kaddr = kmap_local_page(page); >>           size_t n = min(bytes, (size_t)PAGE_SIZE - offset); >> -        iterate_and_advance(i, n, base, len, off, >> -            copyout_nofault(base, kaddr + offset + off, len), >> -            memcpy(base, kaddr + offset + off, len) >> -        ) >> +        n = iterate_and_advance(i, bytes, kaddr, >> +                    copy_to_user_iter_nofault, >> +                    memcpy_to_iter); >>           kunmap_local(kaddr); >>           res += n; >>           bytes -= n; >> @@ -554,14 +450,25 @@ size_t copy_page_from_iter(struct page *page, >> size_t offset, size_t bytes, >>   } >>   EXPORT_SYMBOL(copy_page_from_iter); >> -size_t iov_iter_zero(size_t bytes, struct iov_iter *i) >> +static __always_inline >> +size_t zero_to_user_iter(void __user *iter_to, size_t progress, >> +             size_t len, void *priv, void *priv2) >>   { >> -    iterate_and_advance(i, bytes, base, len, count, >> -        clear_user(base, len), >> -        memset(base, 0, len) >> -    ) >> +    return clear_user(iter_to, len); >> +} >> -    return bytes; >> +static __always_inline >> +size_t zero_to_iter(void *iter_to, size_t progress, >> +            size_t len, void *priv, void *priv2) >> +{ >> +    memset(iter_to, 0, len); >> +    return 0; >> +} >> + >> +size_t iov_iter_zero(size_t bytes, struct iov_iter *i) >> +{ >> +    return iterate_and_advance(i, bytes, NULL, >> +                   zero_to_user_iter, zero_to_iter); >>   } >>   EXPORT_SYMBOL(iov_iter_zero); >> @@ -586,10 +493,9 @@ size_t copy_page_from_iter_atomic(struct page >> *page, size_t offset, >>           } >>           p = kmap_atomic(page) + offset; >> -        iterate_and_advance(i, n, base, len, off, >> -            copyin(p + off, base, len), >> -            memcpy_from_iter(i, p + off, base, len) >> -        ) >> +        n = iterate_and_advance2(i, n, p, i, >> +                     copy_from_user_iter, >> +                     memcpy_from_iter_mc); >>           kunmap_atomic(p); >>           copied += n; >>           offset += n; >> @@ -1180,32 +1086,64 @@ ssize_t iov_iter_get_pages_alloc2(struct >> iov_iter *i, >>   } >>   EXPORT_SYMBOL(iov_iter_get_pages_alloc2); >> +static __always_inline >> +size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress, >> +                size_t len, void *to, void *priv2) >> +{ >> +    __wsum next, *csum = priv2; >> + >> +    next = csum_and_copy_from_user(iter_from, to + progress, len); >> +    *csum = csum_block_add(*csum, next, progress); >> +    return next ? 0 : len; >> +} >> + >> +static __always_inline >> +size_t memcpy_from_iter_csum(void *iter_from, size_t progress, >> +                 size_t len, void *to, void *priv2) >> +{ >> +    __wsum *csum = priv2; >> + >> +    *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, >> progress); >> +    return 0; >> +} >> + >>   size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, >>                      struct iov_iter *i) >>   { >> -    __wsum sum, next; >> -    sum = *csum; >>       if (WARN_ON_ONCE(!i->data_source)) >>           return 0; >> - >> -    iterate_and_advance(i, bytes, base, len, off, ({ >> -        next = csum_and_copy_from_user(base, addr + off, len); >> -        sum = csum_block_add(sum, next, off); >> -        next ? 0 : len; >> -    }), ({ >> -        sum = csum_and_memcpy(addr + off, base, len, sum, off); >> -    }) >> -    ) >> -    *csum = sum; >> -    return bytes; >> +    return iterate_and_advance2(i, bytes, addr, csum, >> +                    copy_from_user_iter_csum, >> +                    memcpy_from_iter_csum); >>   } >>   EXPORT_SYMBOL(csum_and_copy_from_iter); >> +static __always_inline >> +size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress, >> +                  size_t len, void *from, void *priv2) >> +{ >> +    __wsum next, *csum = priv2; >> + >> +    next = csum_and_copy_to_user(from + progress, iter_to, len); >> +    *csum = csum_block_add(*csum, next, progress); >> +    return next ? 0 : len; >> +} >> + >> +static __always_inline >> +size_t memcpy_to_iter_csum(void *iter_to, size_t progress, >> +               size_t len, void *from, void *priv2) >> +{ >> +    __wsum *csum = priv2; >> + >> +    *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, >> progress); >> +    return 0; >> +} >> + >>   size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void >> *_csstate, >>                    struct iov_iter *i) >>   { >>       struct csum_state *csstate = _csstate; >> -    __wsum sum, next; >> +    __wsum sum; >>       if (WARN_ON_ONCE(i->data_source)) >>           return 0; >> @@ -1219,14 +1157,10 @@ size_t csum_and_copy_to_iter(const void *addr, >> size_t bytes, void *_csstate, >>       } >>       sum = csum_shift(csstate->csum, csstate->off); >> -    iterate_and_advance(i, bytes, base, len, off, ({ >> -        next = csum_and_copy_to_user(addr + off, base, len); >> -        sum = csum_block_add(sum, next, off); >> -        next ? 0 : len; >> -    }), ({ >> -        sum = csum_and_memcpy(base, addr + off, len, sum, off); >> -    }) >> -    ) >> + >> +    bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum, >> +                     copy_to_user_iter_csum, >> +                     memcpy_to_iter_csum); >>       csstate->csum = csum_shift(sum, csstate->off); >>       csstate->off += bytes; >>       return bytes; >> >>