Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Just a short question that I can't find the answer to before i head off for the day,

When i do something like this:

v1 = float_list_python = ... # <some list of floats>
v2 = float_array_NumPy = ... # <some numpy.ndarray of floats>
                             # I guess they don't have to be floats - 
                             # but some object that also has a native 
                             # object in C, so that numpy can just use
                             # that

If i want to multiply these vectors by a scalar, my understanding has always been that the python list is a list of object references, and so looping through the list to do the multiplication must fetch the locations of all the floats, and then must get the floats in order to do it - which is one of the reasons it's slow.

If i do the same thing in NumPy, then, well, i'm not sure what happens. There are a number of things i imagine could happen:

  1. It splits the multpilication up across the cores.
  2. It vectorises the multications (as well?)

The documentation i've found suggests that many of the primitives in numpy take advantage of the first option there whenever they can (i don't have a computer on hand at the moment i can test it on). And my intuition tells me that number 2 should happen whenever it's possible.

So my question is, if I create a NumPy array of python objects, will it still at least perform operations on the list in parallel? I know that if you create an array of objects that have native C types, then it will actually create a contiguous array in memory of the actual objects, and that if you create an numpy array of python objects it will create an array of references, but i don't see why this would rule out parallel operations on said list, and cannot find anywhere that explicitly states that.

EDIT: I feel there's a bit of confusion over what i'm asking. I understand what vectorisation is, I understand that it is a compiler optimisation, and not something you necesarily program in (though aligning the data such that it's contiguous in memory is important). On the grounds of vectorisation, all i wanted to know was whether or not numpy uses it. If i do something like np_array1 * np_array2 does the underlying library call use vectorisation (presuming that dtype is a compatible type).

For the splitting up over the cores, all i mean there, is if i again do something like np_array1 * np_array2, but this time dtype=object: would it divide that work up amongst there cores?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
109 views
Welcome To Ask or Share your Answers For Others

1 Answer

numpy is fast because it performs numeric operations like this in fast compiled C code. In contrast the list operation operates at the interpreted Python level (streamlined as much as possible with Python bytecodes etc).

A numpy array of numeric type stores those numbers in a data buffer. At least in the simple cases this is just a block of bytes that C code can step through efficiently. The array also has shape and strides information that allows multidimensional access.

When you multiply the array by a scalar, it, in effect, calls a C function titled something like 'multiply_array_by_scalar', which does the multiplication in fast compiled code. So this kind of numpy operation is fast (compared to Python list code) regardless of the number of cores or other multi-processing/threading enhancements.

Arrays of objects do not have any special speed advantage (compared to lists), at least not at this time.

Look at my answer to a question about creating an array of arrays, https://stackoverflow.com/a/28284526/901925 I had to use iteration to initialize the values.

Have you done any time experiments? For example, construct an array, say (1000,2). Use tolist() to create an equivalent list of lists. And make a similar array of objects, with each object being a (2,) array or list (how much work did that take?). Now do something simple like len(x) for each of those sub lists.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...