Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I feel like I'm missing something here...

I slightly altered some code to change from using std::thread to std::async and noticed a substantial performance increase. I wrote up a simple test which I assume should run nearly identically using std::thread as it does using std::async.

std::atomic<int> someCount = 0;
const int THREADS = 200;
std::vector<std::thread> threadVec(THREADS);
std::vector<std::future<void>> futureVec(THREADS);
auto lam = [&]()
{
    for (int i = 0; i < 100; ++i)
        someCount++;
};

for (int i = 0; i < THREADS; ++i)
    threadVec[i] = std::thread(lam);
for (int i = 0; i < THREADS; ++i)
    threadVec[i].join();

for (int i = 0; i < THREADS; ++i)
    futureVec[i] = std::async(std::launch::async, lam);
for (int i = 0; i < THREADS; ++i)
    futureVec[i].get();

I didn't get too deep into analysis, but some preliminary results made it seem that std::async code ran around 10X faster! Results varied slightly with optimizations off, I also tried switching the execution order.

Is this some Visual Studio compiler issue? Or is there some deeper implementation issue I'm overlooking that would account for this performance difference? I thought that std::async was a wrapper around the std::thread calls?


Also considering these differences, I'm wondering what would be the way to get the best performance here? (There are more than std::thread and std::async which create threads)

What about if I wanted detached threads? (std::async can't do that as far as I'm aware)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
278 views
Welcome To Ask or Share your Answers For Others

1 Answer

When you're using async you are not creating new threads, instead you reuse the ones available in a thread pool. Creating and destroying threads is a very expensive operation that requires about 200 000 CPU cycles in Windows OS. On top of that, remember that having a number of threads much bigger than the number of CPU cores means that the operating system needs to spend more time creating them and scheduling them to use the available CPU time in each of the cores.

UPDATE: To see that the numbers of threads being used using std::async is a lot smaller than using std::thread, I have modified the testing code to count the number of unique thread ids used when run either way as below. Results in my PC shows this result:

Number of threads used running std::threads = 200
Number of threads used to run std::async = 4

but the number of threads running std::async show variations from 2 to 4 in my PC. It basically means that std::async will reuse threads instead of creating new ones every time. Curiously, if I increase the computing time of the lambda by replacing 100 by 1000000 iterations in the for loop, the number of async threads increases to 9 but using raw threads it always gives 200. Worth keeping in mind that "Once a thread has finished, the value of std::thread::id may be reused by another thread"

Here is the testing code:

#include <atomic>
#include <vector>
#include <future>
#include <thread>
#include <unordered_set>
#include <iostream>

int main()
{
    std::atomic<int> someCount = 0;
    const int THREADS = 200;
    std::vector<std::thread> threadVec(THREADS);
    std::vector<std::future<void>> futureVec(THREADS);

    std::unordered_set<std::thread::id> uniqueThreadIdsAsync;
    std::unordered_set<std::thread::id> uniqueThreadsIdsThreads;
    std::mutex mutex;

    auto lam = [&](bool isAsync)
    {
        for (int i = 0; i < 100; ++i)
            someCount++;

        auto threadId = std::this_thread::get_id();
        if (isAsync)
        {
            std::lock_guard<std::mutex> lg(mutex);
            uniqueThreadIdsAsync.insert(threadId);
        }
        else
        {
            std::lock_guard<std::mutex> lg(mutex);
            uniqueThreadsIdsThreads.insert(threadId);
        }
    };

    for (int i = 0; i < THREADS; ++i)
        threadVec[i] = std::thread(lam, false); 

    for (int i = 0; i < THREADS; ++i)
        threadVec[i].join();
    std::cout << "Number of threads used running std::threads = " << uniqueThreadsIdsThreads.size() << std::endl;

    for (int i = 0; i < THREADS; ++i)
        futureVec[i] = std::async(lam, true);
    for (int i = 0; i < THREADS; ++i)
        futureVec[i].get();
    std::cout << "Number of threads used to run std::async = " << uniqueThreadIdsAsync.size() << std::endl;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...