Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

The normal way to map a function in a numpy.narray like np.array[map(some_func,x)] or vectorize(f)(x) can't provide an index. The following code is just a simple example that is commonly seen in many applications.

dis_mat = np.zeros([feature_mat.shape[0], feature_mat.shape[0]])

for i in range(feature_mat.shape[0]):
    for j in range(i, feature_mat.shape[0]):
        dis_mat[i, j] = np.linalg.norm(
            feature_mat[i, :] - feature_mat[j, :]
        )
        dis_mat[j, i] = dis_mat[i, j]

Is there a way to speed it up?


Thank you for your help! The quickest way to speed up this code is this, using the function that @user2357112 commented about:

    from scipy.spatial.distance import pdist,squareform
    dis_mat = squareform(pdist(feature_mat))

@Julien's method is also good if feature_mat is small, but when the feature_mat is 1000 by 2000, then it needs nearly 40 GB of memory.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
624 views
Welcome To Ask or Share your Answers For Others

1 Answer

SciPy comes with a function specifically to compute the kind of pairwise distances you're computing. It's scipy.spatial.distance.pdist, and it produces the distances in a condensed format that basically only stores the upper triangle of the distance matrix, but you can convert the result to square form with scipy.spatial.distance.squareform:

from scipy.spatial.distance import pdist, squareform

distance_matrix = squareform(pdist(feature_mat))

This has the benefit of avoiding the giant intermediate arrays required with a direct vectorized solution, so it's faster and works on larger inputs. It loses the timing to an approach that uses algebraic manipulations to have dot handle the heavy lifting, though.

pdist also supports a wide variety of alternate distance metrics, if you decide you want something other than Euclidean distance.

# Manhattan distance!
distance_matrix = squareform(pdist(feature_mat, 'cityblock'))

# Cosine distance!
distance_matrix = squareform(pdist(feature_mat, 'cosine'))

# Correlation distance!
distance_matrix = squareform(pdist(feature_mat, 'correlation'))

# And more! Check out the docs.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...