Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp7388200rwl; Thu, 23 Mar 2023 03:46:13 -0700 (PDT) X-Google-Smtp-Source: AK7set9I+8lmdmKx3Uc7R5GYJlryXzY0UOsBCkWr0czL+WqfNzglphOJeFuHDIE6KnyiDMUPtlYu X-Received: by 2002:a17:906:7e14:b0:928:c92e:d112 with SMTP id e20-20020a1709067e1400b00928c92ed112mr10344653ejr.50.1679568373024; Thu, 23 Mar 2023 03:46:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679568373; cv=none; d=google.com; s=arc-20160816; b=BHnt36BHxWpuiEy+gexJsECb5MjvI9fTPtoW1nPbDMmjC3XJQh8SA0wj6Kz3uHOi4w fzyScnZ7r9L2nNH69OT9EKDWb0L94idVuIU2C7oBv6ijP7UzVDWjSsdJ/PiNpUyMc6La o/4PDHRE7/vTgVTPyawuxjcdRsTZMGXC09yxoY1pFIEK5XzvPYPlLHq8azf8BhSkOxok aIxuDVg5zXismUL5CgXWXWyFGBD71g7iIGzcrSbut2I0EIAkY6KreGf7rK/sZNkNYeKY 9uGl5KMb9Ex2IP0t7t/Dcj2LKJ9/8n3mKEolX8sQ+Au3AG1G0T3W/mkXPI1Og3CGiU+q HTjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=/TFhcV6GkMvFSJzY1Kq9CdLM3RnvV0Xngxwo6M9TXoA=; b=k6bBXujYGSgrbne4iQYDaIGBGrtgfpMsDgIUSp1W/+GU01DbwHRFP725e0Og+03tT/ 87ml+XLPXDQVtFZA8J9ofCZ/lHJSMniCjDPVmAYGvx4cL7ICYYtXClr1r6NCOI2qmorU JOiIo3NQPFCc6cWEXUZgUcg41nqYOebTMsvi5/+YodYl5o2+OEyC13HsgdYhfaQBIs0W Xbf4klytjxx2oDI/m1CdgeP5HqlYYv+aA1KmcYT2FjcOzBUO609sDqgMfv4UOahI6Ay0 b5ugYSJBR+bYJKKIOL6FBifsSx3Gxm+ZdrXu568qHwnWLHKZH54jdJq1yV/Q0gz+jDIv ulaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="I3gWK6/z"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o5-20020a17090608c500b00921d6a82dd4si18129330eje.969.2023.03.23.03.45.48; Thu, 23 Mar 2023 03:46:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="I3gWK6/z"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231620AbjCWKnL (ORCPT + 99 others); Thu, 23 Mar 2023 06:43:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229690AbjCWKmu (ORCPT ); Thu, 23 Mar 2023 06:42:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B88C83BDBD for ; Thu, 23 Mar 2023 03:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679567938; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/TFhcV6GkMvFSJzY1Kq9CdLM3RnvV0Xngxwo6M9TXoA=; b=I3gWK6/z+dYsYrf4Yz+Ji4zHVaJqzWNobeDmjBr7RU7HyW9huf4oP5tddVfWAL4e4GLv3b 8weNjf/TAqoOkOKPET2NZpr5PUcqCagJMX4qtx1MMxwRhgvR6Fg8PEXDojC75n0qoIcZyY DNZzhR5MCw3TuahFRZS17syf07yJq60= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-13-9n7AMeAZM1GXYdFO-WJ0uw-1; Thu, 23 Mar 2023 06:38:55 -0400 X-MC-Unique: 9n7AMeAZM1GXYdFO-WJ0uw-1 Received: by mail-wr1-f70.google.com with SMTP id s28-20020adfa29c000000b002d92bb99383so917635wra.23 for ; Thu, 23 Mar 2023 03:38:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679567934; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/TFhcV6GkMvFSJzY1Kq9CdLM3RnvV0Xngxwo6M9TXoA=; b=QWiv2YmeEkzLR9C5ssAYnln9gGbxXDaxro39CXz7Vglvx4n7yBV3XQDuHt4xzvwN96 5YJAZGNQFHpdEvFaCXHJCnEmRwy6A87Bt1xLnCPfPGIUrEMslQMRvb1lRwVX/FGfEq44 3i/gINWVxNLb59+iucYKq6MEUpfykjzkusrRZw2+dUdix7Dai6MMOe/jCt8SrMLVfUqb Tnh6lW5AaxOg1eWRYcJRIwZ0UON2NYCFEi03DWfxxXutYV+AoznEK0UGXkB/kNlePwBN wj5QeEgYhq70ejBd918ZCJfcNUBXqoY71E/oYw+C/d3lbBJqSHN5Yqi12PAiEcycP8or ZQFA== X-Gm-Message-State: AO0yUKUWM/OOEF8TfkWjmSljlvyCqaFPTX69CSQpUw1JSPt1T4qjJLn7 A8idH0LykVN5zKjRmZbuexbnFaxEVE5oTWpM57xP4hzSg2hkPYKTZMm9QaP1qPw32jAVyj0xUNm 1l5HXADdRWcI+AnwFfL1N9m2W X-Received: by 2002:a05:600c:211a:b0:3e1:374:8b66 with SMTP id u26-20020a05600c211a00b003e103748b66mr1663818wml.40.1679567934037; Thu, 23 Mar 2023 03:38:54 -0700 (PDT) X-Received: by 2002:a05:600c:211a:b0:3e1:374:8b66 with SMTP id u26-20020a05600c211a00b003e103748b66mr1663806wml.40.1679567933651; Thu, 23 Mar 2023 03:38:53 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id p4-20020a05600c204400b003ee4e99a8f6sm1484570wmg.33.2023.03.23.03.38.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Mar 2023 03:38:53 -0700 (PDT) Message-ID: <7aee68e9-6e31-925f-68bc-73557c032a42@redhat.com> Date: Thu, 23 Mar 2023 11:38:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v7 4/4] mm: vmalloc: convert vread() to vread_iter() Content-Language: en-US To: Baoquan He , Lorenzo Stoakes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Uladzislau Rezki , Matthew Wilcox , Liu Shixin , Jiri Olsa , Jens Axboe , Alexander Viro References: <941f88bc5ab928e6656e1e2593b91bf0f8c81e1b.1679511146.git.lstoakes@gmail.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23.03.23 11:36, Baoquan He wrote: > On 03/23/23 at 06:44am, Lorenzo Stoakes wrote: >> On Thu, Mar 23, 2023 at 10:52:09AM +0800, Baoquan He wrote: >>> On 03/22/23 at 06:57pm, Lorenzo Stoakes wrote: >>>> Having previously laid the foundation for converting vread() to an iterator >>>> function, pull the trigger and do so. >>>> >>>> This patch attempts to provide minimal refactoring and to reflect the >>>> existing logic as best we can, for example we continue to zero portions of >>>> memory not read, as before. >>>> >>>> Overall, there should be no functional difference other than a performance >>>> improvement in /proc/kcore access to vmalloc regions. >>>> >>>> Now we have eliminated the need for a bounce buffer in read_kcore_iter(), >>>> we dispense with it, and try to write to user memory optimistically but >>>> with faults disabled via copy_page_to_iter_nofault(). We already have >>>> preemption disabled by holding a spin lock. We continue faulting in until >>>> the operation is complete. >>> >>> I don't understand the sentences here. In vread_iter(), the actual >>> content reading is done in aligned_vread_iter(), otherwise we zero >>> filling the region. In aligned_vread_iter(), we will use >>> vmalloc_to_page() to get the mapped page and read out, otherwise zero >>> fill. While in this patch, fault_in_iov_iter_writeable() fault in memory >>> of iter one time and will bail out if failed. I am wondering why we >>> continue faulting in until the operation is complete, and how that is done. >> >> This is refererrring to what's happening in kcore.c, not vread_iter(), >> i.e. the looped read/faultin. >> >> The reason we bail out if failt_in_iov_iter_writeable() is that would >> indicate an error had occurred. >> >> The whole point is to _optimistically_ try to perform the operation >> assuming the pages are faulted in. Ultimately we fault in via >> copy_to_user_nofault() which will either copy data or fail if the pages are >> not faulted in (will discuss this below a bit more in response to your >> other point). >> >> If this fails, then we fault in, and try again. We loop because there could >> be some extremely unfortunate timing with a race on e.g. swapping out or >> migrating pages between faulting in and trying to write out again. >> >> This is extremely unlikely, but to avoid any chance of breaking userland we >> repeat the operation until it completes. In nearly all real-world >> situations it'll either work immediately or loop once. > > Thanks a lot for these helpful details with patience. I got it now. I was > mainly confused by the while(true) loop in KCORE_VMALLOC case of read_kcore_iter. > > Now is there any chance that the faulted in memory is swapped out or > migrated again before vread_iter()? fault_in_iov_iter_writeable() will > pin the memory? I didn't find it from code and document. Seems it only > falults in memory. If yes, there's window between faluting in and > copy_to_user_nofault(). > See the documentation of fault_in_safe_writeable(): "Note that we don't pin or otherwise hold the pages referenced that we fault in. There's no guarantee that they'll stay in memory for any duration of time." -- Thanks, David / dhildenb