I am currently experimenting on some usage of stl-datastructures. However I am still not sure when to use which one and when to use a certain combination. Currently I am trying to figure out, when using a std::multimap
does make sense. As far as I can see, one can easily build ones own multimap implementation by combining std::map
and std::vector
. So I am left with the question when each of these datastructures should be used.
- Simplicity: A std::multimap is definitely simpler to use, because one does not have to handle the additional nesting. However access to a range of elements as a bulk one might need to copy the data from the iterators to another datastructure (for example a
std::vector
). - Speed: The locality of the vector most likely makes iterating over the range of equal element much faster, because the cache usage is optimized. However I am guessing that
std::multimaps
also have a lot of optimization tricks behind the back to make iterating over equal elements as fast as possible. Also getting to the correct element-range might probably be optimized forstd::multimaps
.
In order to try out the speed issues I did some simple comparisons using the following program:
#include <stdint.h>
#include <iostream>
#include <map>
#include <vector>
#include <utility>
typedef std::map<uint32_t, std::vector<uint64_t> > my_mumap_t;
const uint32_t num_partitions = 100000;
const size_t num_elements = 500000;
int main() {
srand( 1337 );
std::vector<std::pair<uint32_t,uint64_t>> values;
for( size_t i = 0; i <= num_elements; ++i ) {
uint32_t key = rand() % num_partitions;
uint64_t value = rand();
values.push_back( std::make_pair( key, value ) );
}
clock_t start;
clock_t stop;
{
start = clock();
std::multimap< uint32_t, uint64_t > mumap;
for( auto iter = values.begin(); iter != values.end(); ++iter ) {
mumap.insert( *iter );
}
stop = clock();
std::cout << "Filling std::multimap: " << stop - start << " ticks" << std::endl;
std::vector<uint64_t> sums;
start = clock();
for( uint32_t i = 0; i <= num_partitions; ++i ) {
uint64_t sum = 0;
auto range = mumap.equal_range( i );
for( auto iter = range.first; iter != range.second; ++iter ) {
sum += iter->second;
}
sums.push_back( sum );
}
stop = clock();
std::cout << "Reading std::multimap: " << stop - start << " ticks" << std::endl;
}
{
start = clock();
my_mumap_t mumap;
for( auto iter = values.begin(); iter != values.end(); ++iter ) {
mumap[ iter->first ].push_back( iter->second );
}
stop = clock();
std::cout << "Filling my_mumap_t: " << stop - start << " ticks" << std::endl;
std::vector<uint64_t> sums;
start = clock();
for( uint32_t i = 0; i <= num_partitions; ++i ) {
uint64_t sum = 0;
auto range = std::make_pair( mumap[i].begin(), mumap[i].end() );
for( auto iter = range.first; iter != range.second; ++iter ) {
sum += *iter;
}
sums.push_back( sum );
}
stop = clock();
std::cout << "Reading my_mumap_t: " << stop - start << " ticks" << std::endl;
}
}
As I suspected it depends mainly on the ratio between num_partitions
and num_elements
, so I am still at a loss here. Here are some example outputs:
For num_partitions = 100000
and num_elements = 1000000
Filling std::multimap: 1440000 ticks
Reading std::multimap: 230000 ticks
Filling my_mumap_t: 1500000 ticks
Reading my_mumap_t: 170000 ticks
For num_partitions = 100000
and num_elements = 500000
Filling std::multimap: 580000 ticks
Reading std::multimap: 150000 ticks
Filling my_mumap_t: 770000 ticks
Reading my_mumap_t: 140000 ticks
For num_partitions = 100000
and num_elements = 200000
Filling std::multimap: 180000 ticks
Reading std::multimap: 90000 ticks
Filling my_mumap_t: 290000 ticks
Reading my_mumap_t: 130000 ticks
For num_partitions = 1000
and num_elements = 1000000
Filling std::multimap: 970000 ticks
Reading std::multimap: 150000 ticks
Filling my_mumap_t: 710000 ticks
Reading my_mumap_t: 10000 ticks
I am unsure about how to interpret these results. How would you go about deciding for the correct data structure? Are there any additional constraints for the decission, which I might have missed?
question from:https://stackoverflow.com/questions/8342445/when-does-using-a-stdmultimap-make-sense