Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

After reading this article which states :

After a device finishes its job , (IO operation)- it notifies the CPU via interrupt.

... ... ...

However, that “completion” status only exists at the OS level; the process has its own memory space that must be notified

... ... ...

Since the library/BCL is using the standard P/Invoke overlapped I/O system, it has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool.

... ... ...

So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it’s complete.

I was interesting about the bold part :

If I understood correctly , after the the IO operation is finished , it has to notify to the actual process which executed the IO operation.

Question #1:

Does it mean that it grabs a new thread pool thread for each completed IO operation ? Or is it a dedicated number of threads for this ?

Question #2:

Looking at :

for (int i=0;i<1000;i++)
    {
      PingAsync_NOT_AWAITED(i); //notice not awaited !
    }

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
191 views
Welcome To Ask or Share your Answers For Others

1 Answer

Does it mean that it grabs a new thread pool thread for each completed IO operation ? Or is it a dedicated number of threads for this ?

It would be terribly inefficient to create a new thread for every single I/O request, to the point of defeating the purpose. Instead, the runtime starts off with a small number of threads (the exact number depends on your environment) and adds and removes worker threads as necessary (the exact algorithm for this likewise varies with your environment). Ever major version of .NET has seen changes in this implementation, but the basic idea stays the same: the runtime does its best to create and maintain only as many threads as are necessary to service all I/O efficiently. On my system (Windows 8.1, .NET 4.5.2) a brand new console application has only 3 threads in the process on entering Main, and this number doesn't increase until actual work is requested.

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

No. When you issue an I/O request, a thread will be waiting on a completion port to get the result and call whatever callback was registered to handle the result (be it via a BeginXXX method or as the continuation of a task). If you use a task and don't await it, that task simply ends there and the thread is returned to the thread pool.

What if you did await it? The results of 1000 I/O requests won't really arrive all at the same time, since interrupts don't all arrive at the same time, but let's say the interval is much shorter than the time we need to process them. In that case, the thread pool will keep spinning up threads to handle the results until it reaches a maximum, and any further requests will end up queueing on the completion port. Depending on how you configure it, those threads may take some time to spin up.

Consider the following (deliberately awful) toy program:

static void Main(string[] args) {
    printThreadCounts();
    var buffer = new byte[1024];
    const int requestCount = 30;
    int pendingRequestCount = requestCount;
    for (int i = 0; i != requestCount; ++i) {
        var stream = new FileStream(
            @"C:Windowswin.ini",
            FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 
            buffer.Length, FileOptions.Asynchronous
        );
        stream.BeginRead(
            buffer, 0, buffer.Length,
            delegate {
                Interlocked.Decrement(ref pendingRequestCount);
                Thread.Sleep(Timeout.Infinite);
            }, null
        );
    }
    do {
        printThreadCounts();
        Thread.Sleep(1000);
    } while (Thread.VolatileRead(ref pendingRequestCount) != 0);
    Console.WriteLine(new String('=', 40));
    printThreadCounts();
}

private static void printThreadCounts() {
    int completionPortThreads, maxCompletionPortThreads;
    int workerThreads, maxWorkerThreads;
    ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);
    ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
    Console.WriteLine(
        "Worker threads: {0}, Completion port threads: {1}, Total threads: {2}", 
        maxWorkerThreads - workerThreads, 
        maxCompletionPortThreads - completionPortThreads, 
        Process.GetCurrentProcess().Threads.Count
    );
}

On my system (which has 8 logical processors), the output is as follows (results may vary on your system):

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 8, Total threads: 12
Worker threads: 0, Completion port threads: 9, Total threads: 13
Worker threads: 0, Completion port threads: 11, Total threads: 15
Worker threads: 0, Completion port threads: 13, Total threads: 17
Worker threads: 0, Completion port threads: 15, Total threads: 19
Worker threads: 0, Completion port threads: 17, Total threads: 21
Worker threads: 0, Completion port threads: 19, Total threads: 23
Worker threads: 0, Completion port threads: 21, Total threads: 25
Worker threads: 0, Completion port threads: 23, Total threads: 27
Worker threads: 0, Completion port threads: 25, Total threads: 29
Worker threads: 0, Completion port threads: 27, Total threads: 31
Worker threads: 0, Completion port threads: 29, Total threads: 33
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 34

When we issue 30 asynchronous requests, the thread pool quickly makes 8 threads available to handle the results, but after that it only spins up new threads at a leisurely pace of about 2 per second. This demonstrates that if you want to properly utilize system resources, you'd better make sure that your I/O processing completes quickly. Indeed, let's change our delegate to the following, which represents "proper" processing of the request:

stream.BeginRead(
    buffer, 0, buffer.Length,
    ar => {
        stream.EndRead(ar);
        Interlocked.Decrement(ref pendingRequestCount);
    }, null
);

Result:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 1, Total threads: 11
========================================
Worker threads: 0, Completion port threads: 0, Total threads: 11

Again, results may vary on your system and across runs. Here we barely glimpse the completion port threads in action while the 30 requests we issued are completed without spinning up new threads. You should find that you can change "30" to "100" or even "100000": our loop can't start requests faster than they complete. Note, however, that the results are skewed heavily in our favor because the "I/O" is reading the same bytes over and over and is going to be serviced from the operating system cache and not by reading from a disk. This isn't meant to demonstrate realistic throughput, of course, only the difference in overhead.

To repeat these results with worker threads rather than completion port threads, simply change FileOptions.Asynchronous to FileOptions.None. This makes file access synchronous and the asynchronous operations will be completed on worker threads rather than using the completion port:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 8, Completion port threads: 0, Total threads: 15
Worker threads: 9, Completion port threads: 0, Total threads: 16
Worker threads: 10, Completion port threads: 0, Total threads: 17
Worker threads: 11, Completion port threads: 0, Total threads: 18
Worker threads: 12, Completion port threads: 0, Total threads: 19
Worker threads: 13, Completion port threads: 0, Total threads: 20
Worker threads: 14, Completion port threads: 0, Total threads: 21
Worker threads: 15, Completion port threads: 0, Total threads: 22
Worker threads: 16, Completion port threads: 0, Total threads: 23
Worker threads: 17, Completion port threads: 0, Total threads: 24
Worker threads: 18, Completion port threads: 0, Total threads: 25
Worker threads: 19, Completion port threads: 0, Total threads: 26
Worker threads: 20, Completion port threads: 0, Total threads: 27
Worker threads: 21, Completion port threads: 0, Total threads: 28
Worker threads: 22, Completion port threads: 0, Total threads: 29
Worker threads: 23, Completion port threads: 0, Total threads: 30
Worker threads: 24, Completion port threads: 0, Total threads: 31
Worker threads: 25, Completion port threads: 0, Total threads: 32
Worker threads: 26, Completion port threads: 0, Total threads: 33
Worker threads: 27, Completion port threads: 0, Total threads: 34
Worker threads: 28, Completion port threads: 0, Total threads: 35
Worker threads: 29, Completion port threads: 0, Total threads: 36
========================================
Worker threads: 30, Completion port threads: 0, Total threads: 37

The thread pool spins up one worker thread per second rather than the two it started for completion port threads. Obviously these numbers are implementation-dependent and may change in new releases.

Finally, let's demonstrate the use of ThreadPool.SetMinThreads to ensure a minimum number of threads is available to complete requests. If we go back to FileOptions.Asynchronous and add ThreadPool.SetMinThreads(50, 50) to the Main of our toy program, the result is:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 31, Total threads: 35
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 35

Now, instead of patiently adding one thread every two seconds, the thread pool keeps spinning up threads until the maximum is reached (which doesn't happen in this case, so the final count stays at 30). Of course, all of these 30 threads are stuck in infinite waits -- but if this had been a real system, those 30 threads would now presumably be doing useful if not terribly efficient work. I wouldn't try this with 100000 requests, though.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...