2016-11-15 16:18:48

by Nikolaus Rath

[permalink] [raw]
Subject: fuse: max_background and congestion_threshold settings

Hello,

Could someone explain to me the meaning of the max_background and
congestion_threshold settings of the fuse module?

At first I assumed that max_background specifies the maximum number of
pending requests (i.e., requests that have been send to userspace but
for which no reply was received yet). But looking at fs/fuse/dev.c, it
looks as if not every request is included in this number.

I also figured out that if the number of background requests (whatever
they are) exceeds the congestion threshold, fuse calls
set_bdi_congested() for the backing device. But what does this do? And
does this become a no-op if there is no backing device?


Thanks,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«


2016-11-16 13:09:12

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

Hi,


On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
> Hello,
>
> Could someone explain to me the meaning of the max_background and
> congestion_threshold settings of the fuse module?
>
> At first I assumed that max_background specifies the maximum number of
> pending requests (i.e., requests that have been send to userspace but
> for which no reply was received yet). But looking at fs/fuse/dev.c, it
> looks as if not every request is included in this number.

fuse uses max_background for cases where the total number of
simultaneous requests of given type is not limited by some other natural
means. AFAIU, these cases are: 1) async processing of direct IO; 2)
read-ahead. As an example of "natural" limitation: when userspace
process blocks on a sync direct IO read/write, the number of requests
fuse consumed is limited by the number of such processes (actually their
threads). In contrast, if userspace requests 1GB direct IO read/write,
it would be unreasonable to issue 1GB/128K==8192 fuse requests
simultaneously. That's where max_background steps in.

>
> I also figured out that if the number of background requests (whatever
> they are) exceeds the congestion threshold, fuse calls
> set_bdi_congested() for the backing device. But what does this do?

AFAIU, this is a hint for reclaimer to avoid busy loop:

> /*
> * If kswapd scans pages marked marked for immediate
> * reclaim and under writeback (nr_immediate), it implies
> * that pages are cycling through the LRU faster than
> * they are written so also forcibly stall.
> */
> if (nr_immediate && current_may_throttle())
> congestion_wait(BLK_RW_ASYNC, HZ/10);


> And
> does this become a no-op if there is no backing device?

current->backing_dev_info exists (and helps to control writeback) even
if there is no "real" backing device.

Thanks,
Maxim

2016-11-16 19:19:31

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

Hi Maxim,

On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>> Could someone explain to me the meaning of the max_background and
>> congestion_threshold settings of the fuse module?
>>
>> At first I assumed that max_background specifies the maximum number of
>> pending requests (i.e., requests that have been send to userspace but
>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>> looks as if not every request is included in this number.
>
> fuse uses max_background for cases where the total number of
> simultaneous requests of given type is not limited by some other
> natural means. AFAIU, these cases are: 1) async processing of direct
> IO; 2) read-ahead. As an example of "natural" limitation: when
> userspace process blocks on a sync direct IO read/write, the number of
> requests fuse consumed is limited by the number of such processes
> (actually their threads). In contrast, if userspace requests 1GB
> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
> fuse requests simultaneously. That's where max_background steps in.

Ah, that makes sense. Are these two cases meant as examples, or is that
an exhaustive list? Because I would have thought that other cases should
be writing of cached data (when writeback caching is enabled), and
asynchronous I/O from userspace...?

Also, I am not sure what you mean with async processing of direct
I/O. Shouldn't direct I/O always go directly to the file-system? If so,
how can it be processed asynchronously?

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

2016-11-16 20:19:57

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
> On 11/16/2016 11:19 AM, Nikolaus Rath wrote:
>
>> Hi Maxim,
>>
>> On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
>>> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>>>> Could someone explain to me the meaning of the max_background and
>>>> congestion_threshold settings of the fuse module?
>>>>
>>>> At first I assumed that max_background specifies the maximum number of
>>>> pending requests (i.e., requests that have been send to userspace but
>>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>>> looks as if not every request is included in this number.
>>> fuse uses max_background for cases where the total number of
>>> simultaneous requests of given type is not limited by some other
>>> natural means. AFAIU, these cases are: 1) async processing of direct
>>> IO; 2) read-ahead. As an example of "natural" limitation: when
>>> userspace process blocks on a sync direct IO read/write, the number of
>>> requests fuse consumed is limited by the number of such processes
>>> (actually their threads). In contrast, if userspace requests 1GB
>>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>>> fuse requests simultaneously. That's where max_background steps in.
>> Ah, that makes sense. Are these two cases meant as examples, or is that
>> an exhaustive list? Because I would have thought that other cases should
>> be writing of cached data (when writeback caching is enabled), and
>> asynchronous I/O from userspace...?
>
> I think that's exhaustive list, but I can miss something.
>
> As for writing of cached data, that definitely doesn't go through
> background requests. Here we rely on flusher: fuse will allocate as
> many requests as the flusher wants to writeback.
>
> Buffered AIO READs actually block in submit_io until fully
> processed. So it's just another example of "natural" limitation I told
> above.

Not sure I understand. What is it that's blocking? It can't be the
userspace process, because then it wouldn't be asynchronous I/O...

>> Also, I am not sure what you mean with async processing of direct
>> I/O. Shouldn't direct I/O always go directly to the file-system? If so,
>> how can it be processed asynchronously?
>
> That's a nice optimization we implemented a few years ago: having
> incoming sync direct IO request of 1MB size, kernel fuse splits it
> into eight 128K requests and starts processing them in async manner,
> waiting for the completion of all of them before completing that
> incoming 1MB requests.

I see. But why isn't that also done for regular (non-direct) IO?

Thanks,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

2016-11-16 22:14:19

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

On 11/16/2016 12:19 PM, Nikolaus Rath wrote:

> On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
>> On 11/16/2016 11:19 AM, Nikolaus Rath wrote:
>>
>>> Hi Maxim,
>>>
>>> On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
>>>> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>>>>> Could someone explain to me the meaning of the max_background and
>>>>> congestion_threshold settings of the fuse module?
>>>>>
>>>>> At first I assumed that max_background specifies the maximum number of
>>>>> pending requests (i.e., requests that have been send to userspace but
>>>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>>>> looks as if not every request is included in this number.
>>>> fuse uses max_background for cases where the total number of
>>>> simultaneous requests of given type is not limited by some other
>>>> natural means. AFAIU, these cases are: 1) async processing of direct
>>>> IO; 2) read-ahead. As an example of "natural" limitation: when
>>>> userspace process blocks on a sync direct IO read/write, the number of
>>>> requests fuse consumed is limited by the number of such processes
>>>> (actually their threads). In contrast, if userspace requests 1GB
>>>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>>>> fuse requests simultaneously. That's where max_background steps in.
>>> Ah, that makes sense. Are these two cases meant as examples, or is that
>>> an exhaustive list? Because I would have thought that other cases should
>>> be writing of cached data (when writeback caching is enabled), and
>>> asynchronous I/O from userspace...?
>> I think that's exhaustive list, but I can miss something.
>>
>> As for writing of cached data, that definitely doesn't go through
>> background requests. Here we rely on flusher: fuse will allocate as
>> many requests as the flusher wants to writeback.
>>
>> Buffered AIO READs actually block in submit_io until fully
>> processed. So it's just another example of "natural" limitation I told
>> above.
> Not sure I understand. What is it that's blocking? It can't be the
> userspace process, because then it wouldn't be asynchronous I/O...

Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
async manner. You can verify it yourself by strace-ing a simple program
looping over io_submit + io_getevents: for direct IO (as expected)
io_submit returns immediately while io_getevents waits for actual IO; in
contrast, for buffered IO (surprisingly) io_submit waits for actual IO
while io_getevents returns immediately. Presumably, people are supposed
to use mmap-ed read/writes rather than buffered AIO.


>
>>> Also, I am not sure what you mean with async processing of direct
>>> I/O. Shouldn't direct I/O always go directly to the file-system? If so,
>>> how can it be processed asynchronously?
>> That's a nice optimization we implemented a few years ago: having
>> incoming sync direct IO request of 1MB size, kernel fuse splits it
>> into eight 128K requests and starts processing them in async manner,
>> waiting for the completion of all of them before completing that
>> incoming 1MB requests.
> I see. But why isn't that also done for regular (non-direct) IO?

Regular READs are helped by async read-ahead. Regular writes go through
writeback mechanics: flusher calls fuse_writepages() and the latter
submits as many async write requests as needed. Everything looks fine.
(but as I wrote those async requests are not under fuse max_backgroung
control).

Thanks,
Maxim


>
> Thanks,
> -Nikolaus

2016-11-17 06:53:39

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

On 11/16/2016 11:19 AM, Nikolaus Rath wrote:

> Hi Maxim,
>
> On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
>> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>>> Could someone explain to me the meaning of the max_background and
>>> congestion_threshold settings of the fuse module?
>>>
>>> At first I assumed that max_background specifies the maximum number of
>>> pending requests (i.e., requests that have been send to userspace but
>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>> looks as if not every request is included in this number.
>> fuse uses max_background for cases where the total number of
>> simultaneous requests of given type is not limited by some other
>> natural means. AFAIU, these cases are: 1) async processing of direct
>> IO; 2) read-ahead. As an example of "natural" limitation: when
>> userspace process blocks on a sync direct IO read/write, the number of
>> requests fuse consumed is limited by the number of such processes
>> (actually their threads). In contrast, if userspace requests 1GB
>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>> fuse requests simultaneously. That's where max_background steps in.
> Ah, that makes sense. Are these two cases meant as examples, or is that
> an exhaustive list? Because I would have thought that other cases should
> be writing of cached data (when writeback caching is enabled), and
> asynchronous I/O from userspace...?

I think that's exhaustive list, but I can miss something.

As for writing of cached data, that definitely doesn't go through
background requests. Here we rely on flusher: fuse will allocate as many
requests as the flusher wants to writeback.

Buffered AIO READs actually block in submit_io until fully processed. So
it's just another example of "natural" limitation I told above. Buffered
AIO WRITEs go through writeback mechanics anyway, so here again we rely
on flusher behaving reasonable. And finally, direct AIO does go through
fuse background requests as I wrote: "1) async processing of direct IO;"

>
> Also, I am not sure what you mean with async processing of direct
> I/O. Shouldn't direct I/O always go directly to the file-system? If so,
> how can it be processed asynchronously?

That's a nice optimization we implemented a few years ago: having
incoming sync direct IO request of 1MB size, kernel fuse splits it into
eight 128K requests and starts processing them in async manner, waiting
for the completion of all of them before completing that incoming 1MB
requests. This boosts performance tremendously if userspace fuse daemon
is able to efficiently process many requests "in parallel". This
optimization is implemented using background fuse requests. Otherwise,
having 1GB incoming request, we would obediently allocate 8K fuse
requests in one shot -- too dangerous and not good for latency.

Thanks,
Maxim

2016-11-22 22:45:41

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
> On 11/16/2016 12:19 PM, Nikolaus Rath wrote:
>
>> On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
>>> On 11/16/2016 11:19 AM, Nikolaus Rath wrote:
>>>
>>>> Hi Maxim,
>>>>
>>>> On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
>>>>> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>>>>>> Could someone explain to me the meaning of the max_background and
>>>>>> congestion_threshold settings of the fuse module?
>>>>>>
>>>>>> At first I assumed that max_background specifies the maximum number of
>>>>>> pending requests (i.e., requests that have been send to userspace but
>>>>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>>>>> looks as if not every request is included in this number.
>>>>> fuse uses max_background for cases where the total number of
>>>>> simultaneous requests of given type is not limited by some other
>>>>> natural means. AFAIU, these cases are: 1) async processing of direct
>>>>> IO; 2) read-ahead. As an example of "natural" limitation: when
>>>>> userspace process blocks on a sync direct IO read/write, the number of
>>>>> requests fuse consumed is limited by the number of such processes
>>>>> (actually their threads). In contrast, if userspace requests 1GB
>>>>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>>>>> fuse requests simultaneously. That's where max_background steps in.
>>>> Ah, that makes sense. Are these two cases meant as examples, or is that
>>>> an exhaustive list? Because I would have thought that other cases should
>>>> be writing of cached data (when writeback caching is enabled), and
>>>> asynchronous I/O from userspace...?
>>> I think that's exhaustive list, but I can miss something.
>>>
>>> As for writing of cached data, that definitely doesn't go through
>>> background requests. Here we rely on flusher: fuse will allocate as
>>> many requests as the flusher wants to writeback.
>>>
>>> Buffered AIO READs actually block in submit_io until fully
>>> processed. So it's just another example of "natural" limitation I told
>>> above.
>> Not sure I understand. What is it that's blocking? It can't be the
>> userspace process, because then it wouldn't be asynchronous I/O...
>
> Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
> async manner. You can verify it yourself by strace-ing a simple
> program looping over io_submit + io_getevents: for direct IO (as
> expected) io_submit returns immediately while io_getevents waits for
> actual IO; in contrast, for buffered IO (surprisingly) io_submit waits
> for actual IO while io_getevents returns immediately. Presumably,
> people are supposed to use mmap-ed read/writes rather than buffered
> AIO.

What about buffered, asynchronous writes when writeback cache is
disabled? It sounds as if io_submit does not block (so userspace could
create an unlimited number), nor can the kernel coalesce them (since
writeback caching is disabled).

Thanks!
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

2016-11-22 23:44:34

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

Hi Maxim,

On Nov 22 2016, Maxim Patlasov <[email protected]> wrote:
>>>>>>>> Could someone explain to me the meaning of the max_background and
>>>>>>>> congestion_threshold settings of the fuse module?
>>>>>>>>
>>>>>>>> At first I assumed that max_background specifies the maximum number of
>>>>>>>> pending requests (i.e., requests that have been send to userspace but
>>>>>>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>>>>>>> looks as if not every request is included in this number.
>>>>>>> fuse uses max_background for cases where the total number of
>>>>>>> simultaneous requests of given type is not limited by some other
>>>>>>> natural means. AFAIU, these cases are: 1) async processing of direct
>>>>>>> IO; 2) read-ahead. As an example of "natural" limitation: when
>>>>>>> userspace process blocks on a sync direct IO read/write, the number of
>>>>>>> requests fuse consumed is limited by the number of such processes
>>>>>>> (actually their threads). In contrast, if userspace requests 1GB
>>>>>>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>>>>>>> fuse requests simultaneously. That's where max_background steps in.
>>>>>> Ah, that makes sense. Are these two cases meant as examples, or is that
>>>>>> an exhaustive list? Because I would have thought that other cases should
>>>>>> be writing of cached data (when writeback caching is enabled), and
>>>>>> asynchronous I/O from userspace...?
>>>>> I think that's exhaustive list, but I can miss something.
>>>>>
>>>>> As for writing of cached data, that definitely doesn't go through
>>>>> background requests. Here we rely on flusher: fuse will allocate as
>>>>> many requests as the flusher wants to writeback.
>>>>>
>>>>> Buffered AIO READs actually block in submit_io until fully
>>>>> processed. So it's just another example of "natural" limitation I told
>>>>> above.
>>>> Not sure I understand. What is it that's blocking? It can't be the
>>>> userspace process, because then it wouldn't be asynchronous I/O...
>>> Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
>>> async manner. You can verify it yourself by strace-ing a simple
>>> program looping over io_submit + io_getevents: for direct IO (as
>>> expected) io_submit returns immediately while io_getevents waits for
>>> actual IO; in contrast, for buffered IO (surprisingly) io_submit waits
>>> for actual IO while io_getevents returns immediately. Presumably,
>>> people are supposed to use mmap-ed read/writes rather than buffered
>>> AIO.
>> What about buffered, asynchronous writes when writeback cache is
>> disabled? It sounds as if io_submit does not block (so userspace could
>> create an unlimited number), nor can the kernel coalesce them (since
>> writeback caching is disabled).
>
> I've never looked closely at it. Do you have a particular use case or
> concern?

My only concern is to accurately describe the effects of the
"max_background" parameter in the libfuse documentation.

At the moment most FUSE filesystems don't use writeback caching (because
there is no stable libfuse release out that supports it). On the other
hand, most filesystem are probably also not too worried about the
behavior when userspace submits a large number of asynchronous write
requests. But I think it would still be important to correctly describe
this case. If io_submit does not block, and the request does not count
as a background request, wouldn't this be a bug that should be fixed? Or
is there anything else that would limit the number of such requests?

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

2016-11-23 04:35:05

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] fuse: max_background and congestion_threshold settings

On 11/22/2016 02:45 PM, Nikolaus Rath wrote:

> On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
>> On 11/16/2016 12:19 PM, Nikolaus Rath wrote:
>>
>>> On Nov 16 2016, Maxim Patlasov <[email protected]> wrote:
>>>> On 11/16/2016 11:19 AM, Nikolaus Rath wrote:
>>>>
>>>>> Hi Maxim,
>>>>>
>>>>> On Nov 15 2016, Maxim Patlasov <[email protected]> wrote:
>>>>>> On 11/15/2016 08:18 AM, Nikolaus Rath wrote:
>>>>>>> Could someone explain to me the meaning of the max_background and
>>>>>>> congestion_threshold settings of the fuse module?
>>>>>>>
>>>>>>> At first I assumed that max_background specifies the maximum number of
>>>>>>> pending requests (i.e., requests that have been send to userspace but
>>>>>>> for which no reply was received yet). But looking at fs/fuse/dev.c, it
>>>>>>> looks as if not every request is included in this number.
>>>>>> fuse uses max_background for cases where the total number of
>>>>>> simultaneous requests of given type is not limited by some other
>>>>>> natural means. AFAIU, these cases are: 1) async processing of direct
>>>>>> IO; 2) read-ahead. As an example of "natural" limitation: when
>>>>>> userspace process blocks on a sync direct IO read/write, the number of
>>>>>> requests fuse consumed is limited by the number of such processes
>>>>>> (actually their threads). In contrast, if userspace requests 1GB
>>>>>> direct IO read/write, it would be unreasonable to issue 1GB/128K==8192
>>>>>> fuse requests simultaneously. That's where max_background steps in.
>>>>> Ah, that makes sense. Are these two cases meant as examples, or is that
>>>>> an exhaustive list? Because I would have thought that other cases should
>>>>> be writing of cached data (when writeback caching is enabled), and
>>>>> asynchronous I/O from userspace...?
>>>> I think that's exhaustive list, but I can miss something.
>>>>
>>>> As for writing of cached data, that definitely doesn't go through
>>>> background requests. Here we rely on flusher: fuse will allocate as
>>>> many requests as the flusher wants to writeback.
>>>>
>>>> Buffered AIO READs actually block in submit_io until fully
>>>> processed. So it's just another example of "natural" limitation I told
>>>> above.
>>> Not sure I understand. What is it that's blocking? It can't be the
>>> userspace process, because then it wouldn't be asynchronous I/O...
>> Surprise! Alas, Linux kernel does NOT process buffered AIO reads in
>> async manner. You can verify it yourself by strace-ing a simple
>> program looping over io_submit + io_getevents: for direct IO (as
>> expected) io_submit returns immediately while io_getevents waits for
>> actual IO; in contrast, for buffered IO (surprisingly) io_submit waits
>> for actual IO while io_getevents returns immediately. Presumably,
>> people are supposed to use mmap-ed read/writes rather than buffered
>> AIO.
> What about buffered, asynchronous writes when writeback cache is
> disabled? It sounds as if io_submit does not block (so userspace could
> create an unlimited number), nor can the kernel coalesce them (since
> writeback caching is disabled).

I've never looked closely at it. Do you have a particular use case or
concern?


>
> Thanks!
> -Nikolaus
>