Python NumPy vs core list and core array.array

Icon class

icon_class_computed

fab fa-python

icon_class

fab fa-python

Keywords

For those who for some reason (cost?) don't want to use Wolfram Mathematica instead, which also has support for symbolic algebra and so much more (and can call Python anyway).

Perhaps the most famous contributed Python package is NumPy, which offers support for numerical and scientific computing, mathematical functions, improved arrays, vectors, matrices, and linear algebra (and some vector and matrix operations). Most of NumPy under the hood is C-optimised compiled.

There are few Python packages that have as many online guides as NumPy! Apart from the NumPy docs themselves, some good starting points are W3School and GeeksForGeeks: Python Lists VS Numpy Arrays

There's also a comprehensive online book Learning Scientific Programming with Python (2nd edition) by Christian Hill with a quiz for each section.

By popular convention it is imported as:

import numpy as np

a = np.array([1, 2, 3])   # creates an np.ndarray object
print(a)

The Python array.array and the np.ndarray class are more memory efficient for numerical data than list, and are more efficient for numerical computing.

Unlike core Python lists, the core Python array.array requires all array elements to be of the same type, specified on creation.

Under the hood, an np.ndarray is a NumPy class, whereas np.array() is a function for constructing np.ndarray objects from its arguments. The 'nd' in ndarray stands for N-dimensional array.

NumPy arrays are also actually homogeneous (hold elements of the same Datatype), but the np.array() function may in fact accept heterogeneous elements (it just converts the elements to the most applicable datatype as long as no explicit and incompatible type indicator was provided).

It is sometimes incorrectly stated that a NumPy array can be heterogeneous, because you can pass a heterogeneous list as an argument to the np.array() function, however the resulting np.ndarray object (specifically) created is always ultimately homogeneous. Try this:

a_np_mixed = np.array(["mixed", 1, 2, 3])
print (a_np_mixed)
['numbers' '3' '6' '9' '12']

print(type(a_np_mixed))
<class 'numpy.ndarray'>

print(type(a_np_mixed))
<class 'numpy.str_'>

Note how the above has created an ndarray that is indeed now homogeneous w.r.t. numpy.str_ (after conversion).

Compare with:

a_np_i = np.array([1, 2, 3])
print(a_np_i)
[1 2 3]

print(type(a_np_i))
<class 'numpy.ndarray'>

print(a_np_i.dtype.type)
<class 'numpy.int64'>

Note how the above has created an ndarray that is homogeneous w.r.t. numpy.int64_ and indeed only contains integers.

NumPy arrays should always be created via np.array(..). There are in fact ways to HACK creating a NumPy array using np.ndarray with some tricky arguments, but it's not well advised.

NumPy arrays can be N-dimensional. NumPy arrays are stored contiguously in memory, which means that all rows of 2D arrays must have the same number of column slots (and similarly for 3D etc.).

The order for 2D arrays is rows then columns:

arr = np.array([[1, 2, 3],[4, 5,6]])
print(arr)
[[1 2 3]
 [4 5 6]]

print(arr.shape)
(2, 3)

Resizing and adding elements

The core Python array.array can be "resized" using the insert() or append() methods. So it can be created empty then progressively populated.

The official NumPy docs state that:

Most NumPy arrays have some restrictions ... Once created, the total size of the array can’t change.

The above statement is inconsistent w.r.t. NumPy's own API docs!

For example, the docs for the numpy.ndarray.resize method state:

Change shape and size of array in-place.

An np.ndarray has a fixed size on creation, but one can change the dimensions using either the ndarray.resize() method (which does not make a copy and pads any extra slots with zeros) or the np.resize() function, which does make a new object and pads using the values of the source ndarray object (not zeros).

Both the np.append() and np.insert() functions create new arrays, which may be a performance consideration with large arrays.

Some methods such as ndarray.reshape appear to change the dimensions but in fact just create "views" of an underlying ndarray.base array. The base of an array that owns its memory is None.

Taking a slice also just creates a "view":

>>> x = np.array([1,2,3,4])
>>> x.base is None
True

>>> y = x[2:]
>>> y.base is x
True

Some other ways to create views that appear to "change" the dimensions of an ndarray array include the np.newa_dims function and use of the np.newaxis alias for None.

Copy array (shallow and deep)

To create a full copy of a purely numerical array use the numpy.ndarray.copy method or the numpy.copy function (beware that they have different defaults).

The above only make a shallow copy of an array containing objects (dtype=object), the "copy" is therefore mutable w.r.t. changes to the underlying objects. To make a deep copy use the core library function copy.deepcopy.

Which methods creates views and which create copies?

It's not always obvious without checking the docs which methods create copies and which just create views (although there's probably an implementation rhyme to the reason). For example, numpy.ndarray.swapaxes just creates a view, and numpy.ndarray.flatten returns a copy of the array collapsed into one dimension. One can achieve similar with ndarray.reshape but it only creates a view.

Creating specific kinds of arrays

There are heaps of different Array creation routines for creating various kinds of commonly used arrays easily, including empty, zeros, ones, full (all one value), identity etc. And one can create arrays "like" another array, that is, with the same dimensions, but populated with different values (such as used for masking).

Replacing values

There are many different approaches. Some basic 1D cases:

>>> a1 = np.array([1, 2, 3, 4, 5])
[1 2 3 4 5]

>>> a1[2] = 7  # single index
[1 2 7 4 5]

>>> a1[a1==4] = 6  # conditional
[1 2 7 6 5]

>>> a1[[0,1]] = [9,8]  # explicit range
[9 8 7 6 5]

>>> a1[range(2)] = [11,10]  # generate range
[11 10  7  6  5]

>>> a2 = np.arange(5)
[0 1 2 3 4]

>>> np.put(a2, [0, 2], [-44, -55])
[-44   1 -55   3   4]

Some 2D cases:

>>> a3 = np.array([[1, 2, 3], [4, 5, 6]])
[[1 2 3]
 [4 5 6]]

>>> a3[0][2] = 7
[[1 2 7]
 [4 5 6]]

>>> a3[0][range(2)] = [8,9]
[[8 9 7]
 [4 5 6]]

The following offers a good quick starts for conditional value replacement: How to Replace Values in a NumPy Array?.

NumPy matrix is apparently no more (sort of)

NumPy has a specialised 2D-array numpy.matrix with special operators, such as * (matrix multiplication) and ** (matrix power), however:

It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.

(What impact this may have on SciPi and scipy.linalg is not clear.)

The numpy.matrix has the following convenient short-cuts:

numpy.matrix.T for transpose() (non conjugated).

numpy.matrix.H for the (complex) conjugate transpose of self.

numpy.matrix.I for the (multiplicative) inverse of invertible self.

numpy.matrix.A for self as an ndarray object.

Note that of those a regular ndarray only directly has:

ndarray.T for transpose().

The operations for those other cases can all be achieved with ndarray, but you can HACK in the same shortcuts.

NumPy handles complex numbers

The engineering friendly j has the special meaning of indicating the imaginary part:

>>> com=np.array([1 + 2j, 1 - 3j])
[1.+2.j 1.-3.j]

>>> type(com)
<class 'numpy.ndarray'>

>>> com.dtype.type
<class 'numpy.complex128'>

And there's support for most basic complex operations such as addition, subtraction, multiplication, division etc.

Access the real and imaginary parts using np.real() and np.imag():

>>> np.real(com[1])
1.0

>>> np.imag(com[1])
-3.0

Conjugate with np.conj():

>>> np.conj(com[1])
(1+3j)