Numpy 快速入门


Numpy 中的数组类叫做 ndarray,顾名思义就是是 n 维数组。其中一些重要的属性

  • ndarray.ndim 维度
  • ndarray.shape 形状
  • ndarray.size ndarray 中的元素总数
  • ndarray.dtype 数据类型
  • ndarray.itemsize 每一个元素的字节数等价于 ndarray.dtype.itemsize.
  • 实际存储 ndarray 内容的内存,一般不使用
  • ndarray.T 返回转置,一个 view
  • flat 返回一个数组的迭代器,对此迭代器赋值将导致整个数组元素被覆盖
  • real/imag 返回复数数组的实部/虚部数组
  • nbytes 数组占用的字节数
  • ndarray.base base array 如果是其他 array 的 view
  • ndarray.flags 关于 array 内存的一些信息
>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
>>> a.itemsize
>>> a.size
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>

>>> b.flat
<numpy.flatiter at 0x20456dd3670>
>>> b.flat = 1
>>> b
array([1, 1, 1])
>>> b.flat = [1,2]
>>> b
array([1, 2, 1])


Array Creation

从 python list 或 tuple 利用 array 创建,数据格式会自动推断

>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype

不要用多个数字来调用 array

>>> a = np.array(1,2,3,4)    # WRONG
>>> a = np.array([1,2,3,4])  # RIGHT

array 自动将二维或多维序列转换为多维数组

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])


>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

很多时候,数组的元素是未知的,但数组的大小已知,numpy 提供了很多创建带初始值数组的函数。这可以最小化改变数组大小的操作,因为那样很慢 比如 ones,zeros 还有 empty,empty 创建的数组内所包含的数是随机的,取决于内存块当前的状态。默认的 dtype 是 float64

>>> np.zeros( (3,4) )
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> np.ones( (2,3,4), dtype=np.int16 )                # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) )                                 # uninitialized, output may vary
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])

Numpy 提供了一个类似于 range 的函数,arange 同样接受 start, stop, step 参数

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )                 # it accepts float arguments
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

值得注意的是,当 arange 接受 float 参数的时候,由于浮点数固有的精度限制,有时候结果往往并不让人满意,为了改进这一点,一个更好的办法是使用 linspace 函数

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 )                 # 9 numbers from 0 to 2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> x = np.linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
>>> f = np.sin(x)

Printing Arrays

打印数组的时候,显示方式基本上和 nested list 一样,但是会做一些调整以展示数组的维度

>>> a = np.arange(6)                         # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>> b = np.arange(12).reshape(4,3)           # 2d array
>>> print(b)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>> c = np.arange(24).reshape(2,3,4)         # 3d array
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

如果数组过大会自动跳过中间部分。如果想要强制打印整个数组,可以使用 set_printoptions.

>>> np.set_printoptions(threshold=np.nan)

Basic Operations

对数组进行的一些数学操作,会自动转换为对每一个元素的操作,之后产生一个新的 array.

注意在 numpy 中 * 是元素乘法,如果要进行矩阵乘法,使用 dot

>>> A = np.array( [[1,1],
...             [0,1]] )
>>> B = np.array( [[2,0],
...             [3,4]] )
>>> A*B                         # elementwise product
array([[2, 0],
       [0, 4]])
>>>                    # matrix product
array([[5, 4],
       [3, 4]])
>>>, B)                # another matrix product
array([[5, 4],
       [3, 4]])

+= *= 之类的操作 会就地操作原数组,而不是产生一个新的数组,这很好理解。因为这回优先调用 iadd 等方法。

很多 array 层面的操作,比如求和等被当作 ndarray 的方法实现。 例如 sum min max 等。这些操作默认是 array 层面的,但是你可以手动指定 axis 参数来对行或者列进行操作。

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>> b.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

Universal Functions

有很多函数比如 sin cos exp sqrt add,这被称作 ufunc,numpy 中这样的函数是 elementwise 的,返回一个数组。

Indexing, Slicing and Iterating

一维数组的 index slice 还有 iterating 和 python list 并没有什么区别。多维数组的每一维度都有一个 index,以元祖形式传递。当有些维度的 index 没有给出的时候,认为是全部 :.

>>> def f(x,y):
...     return 10*x+y
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])
>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

b[i] 也可以写成 b[i, ...], ... 可以智能的代表任意个 : 对多维数组的迭代,被认为是对第一维度的迭代。如果想对整个数组的所有元素进行迭代,可以使用 flat 属性,flat 返回一个原数组的扁平化迭代器。

>>> for element in b.flat:
...     print(element)

Shape Manipulation

改变数组的形状,以下三个命令都返回一个新的 array 而不会改变原数组

>>> a.ravel()  # returns the array, flattened
array([ 2.,  8.,  0.,  6.,  4.,  5.,  1.,  1.,  8.,  9.,  3.,  6.])
>>> a.reshape(6,2)  # returns the array with a modified shape
array([[ 2.,  8.],
       [ 0.,  6.],
       [ 4.,  5.],
       [ 1.,  1.],
       [ 8.,  9.],
       [ 3.,  6.]])
>>> a.T  # returns the array, transposed
array([[ 2.,  4.,  8.],
       [ 8.,  5.,  9.],
       [ 0.,  1.,  3.],
       [ 6.,  1.,  6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

ravel 返回的是原 array 的一个 view,不会占用内存,但 view 的核心数据改变会影响原 array,flatten 返回一个副本

reshape 函数返回一个调整后的数组,而 resize 函数则原地操作数组本身 由于 resize 是 inplace 操作,所以有一个 reference check 机制,可以用 refcheck = False 取消。

如果其中一个维度参数给的是 -1,那么 numpy 会自动计算维数

Stacking together different arrays

可以使用 vstack 和 hstack 在不同方向组合不同的数组。

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8.,  8.],
       [ 0.,  0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1.,  8.],
       [ 0.,  4.]])
>>> np.vstack((a,b))
array([[ 8.,  8.],
       [ 0.,  0.],
       [ 1.,  8.],
       [ 0.,  4.]])
>>> np.hstack((a,b))
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])

column_stack 把 1D array 作为列组合成一个 2D array。对于 2D arrays 是和 hstack 一样的。

>>> from numpy import newaxis
>>> np.column_stack((a,b))     # with 2D arrays
array([[ 8.,  8.,  1.,  8.],
      [ 0.,  0.,  0.,  4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b))     # returns a 2D array
array([[ 4., 3.],
      [ 2., 8.]])
>>> np.hstack((a,b))           # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis]               # this allows to have a 2D columns vector
array([[ 4.],
      [ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4.,  3.],
      [ 2.,  8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis]))   # the result is the same
array([[ 4.,  3.],
      [ 2.,  8.]])

另一方面 row_stack 对任何输入数组来说都和 vstack 一样。简单的说,hstack 按第二 index 来拼接,而 vstack 按第一轴拼接。concatenate 可以用来指定需要沿第几轴拼接。

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
       [3, 4, 6]])

注意:r_ 和 c_ 在构建数组的时候也很有用,默认行为类似于 vstack 和 hstack,但是可以可选参数指定沿哪一轴拼接。

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])

Splitting one array into several smaller ones

使用 hsplit,可以沿水平轴拆分数组,或者指定需要返回几个数组,也可以指定在哪些列进行拆分。

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9.,  5.,  6.,  3.,  6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 1.,  4.,  9.,  2.,  2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])
>>> np.hsplit(a,3)   # Split a into 3
[array([[ 9.,  5.,  6.,  3.],
       [ 1.,  4.,  9.,  2.]]), array([[ 6.,  8.,  0.,  7.],
       [ 2.,  1.,  0.,  6.]]), array([[ 9.,  7.,  2.,  7.],
       [ 2.,  2.,  4.,  0.]])]
>>> np.hsplit(a,(3,4))   # Split a after the third and the fourth column
[array([[ 9.,  5.,  6.],
       [ 1.,  4.,  9.]]), array([[ 3.],
       [ 2.]]), array([[ 6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])]

vsplit 沿着垂直轴进行拆分,array_split 允许指定沿着哪个轴拆分。

Copies and Views

如下的赋值是不会有任何的深度复制的,python 通过引用传递可变对象,函数调用也不会复制。

>>> a = np.arange(12)
>>> b = a            # no new object is created
>>> b is a           # a and b are two names for the same ndarray object
>>> b.shape = 3,4    # changes the shape of a
>>> a.shape
(3, 4)

>>> def f(x):
...     print(id(x))
>>> id(a)                           # id is a unique identifier of an object
>>> f(a)

view 和浅复制

不同的数组可以共享数据,view 方法产生一个数组核心数据的引用。改变 view 的属性值不改变原数组的属性,但改变核心数据会影响原 array。

>>> c = a.view()
>>> c is a
>>> c.base is a                        # c is a view of the data owned by a
>>> c.flags.owndata
>>> c.shape = 2,6                      # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234                      # a's data changes
>>> a
array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

对 array 进行切片返回一个 view

>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])


copy 方法返回一个完全的拷贝。

>>> d = a.copy()                          # a new array object with new data is created
>>> d is a
>>> d.base is a                           # d doesn't share anything with a
>>> d[0,0] = 9999
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])


Less Basic

Broadcasting Rules

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape.

用有意义的方式处理 shape 不统一的情况。

The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.

The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

After application of the broadcasting rules, the sizes of all arrays must match. More details can be found in Broadcasting.

Fancy indexing and index tricks

Indexing with Arrays of Indices

第一种方法:用 array 作为 index

>>> a = np.arange(12)**2                       # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] )              # an array of indices
>>> a[i]                                       # the elements of a at the positions i
array([ 1,  1,  9, 64, 25])
>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] )      # a bidimensional array of indices
>>> a[j]                                       # the same shape as j
array([[ 9, 16],
       [81, 49]])

甚至可以用 两个 array 作为 index 实现双重选择

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array( [ [0,1],                        # indices for the first dim of a
...                 [1,2] ] )
>>> j = np.array( [ [2,1],                        # indices for the second dim
...                 [3,3] ] )
>>> a[i,j]                                     # i and j must have equal shape
array([[ 2,  5],
       [ 7, 11]])
>>> a[i,2]
array([[ 2,  6],
       [ 6, 10]])
>>> a[:,j]                                     # i.e., a[ : , j]
array([[[ 2,  1],
        [ 3,  3]],
       [[ 6,  5],
        [ 7,  7]],
       [[10,  9],
        [11, 11]]])


>>> time = np.linspace(20, 145, 5)                 # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4)      # 4 time-dependent series
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
       [-0.53657292,  0.42016704,  0.99060736,  0.65028784],
       [-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
>>> ind = data.argmax(axis=0)                  # index of the maxima for each series
>>> ind
array([2, 0, 3, 1])
>>> time_max = time[ind]                       # times corresponding to the maxima
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>> time_max
array([  82.5 ,   20.  ,  113.75,   51.25])
>>> data_max
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])
>>> np.all(data_max == data.max(axis=0))

也可以把 array indexed 数组作为赋值对象,但是如果 index 重复出现则以最后一次为准,注意 array 在 python 中的 += 方法可能会出现意想不到的结果。

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])

虽然 index 0 出现了两次,但是只会增加一次,因为 a+=1 等同于 a = a + 1.

Indexing with Boolean Arrays

第二种方法:用 boolean 选择器作为 index,确保长度不要越界

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True])             # first dim selection
>>> b2 = np.array([True,False,True,False])       # second dim selection
>>> a[b1,:]                                   # selecting rows
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[b1]                                     # same thing
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[:,b2]                                   # selecting columns
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
>>> a[b1,b2]                                  # a weird thing to do
array([ 4, 10])

一个产生 mandelbrot set 的例子

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> def mandelbrot( h,w, maxit=20 ):
...     """Returns an image of the Mandelbrot fractal of size (h,w)."""
...     y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
...     c = x+y*1j
...     z = c
...     divtime = maxit + np.zeros(z.shape, dtype=int)
...     for i in range(maxit):
...         z = z**2 + c
...         diverge = z*np.conj(z) > 2**2            # who is diverging
...         div_now = diverge & (divtime==maxit)  # who is diverging now
...         divtime[div_now] = i                  # note when
...         z[diverge] = 2                        # avoid diverging too much
...     return divtime
>>> plt.imshow(mandelbrot(400,400))

ix_() 函数

ix_() 用以组合 不同的 array 以产生 任意 n-uplet 的结果,也就是说在几个 array 里分别取一个之后运算的结果。比如计算 a+b*c

>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> ax
>>> bx
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> ax.shape, bx.shape, cx.shape
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result = ax+bx*cx
>>> result
array([[[42, 34, 50, 66, 26],
        [27, 22, 32, 42, 17],
        [22, 18, 26, 34, 14]],
       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],
       [[44, 36, 52, 68, 28],
        [29, 24, 34, 44, 19],
        [24, 20, 28, 36, 16]],
       [[45, 37, 53, 69, 29],
        [30, 25, 35, 45, 20],
        [25, 21, 29, 37, 17]]])
>>> result[3,2,4]
>>> a[3]+b[2]*c[4]

可以这样实现 reduce

>>> def ufunc_reduce(ufct, *vectors):
...    vs = np.ix_(*vectors)
...    r = ufct.identity
...    for v in vs:
...        r = ufct(r,v)
...    return r

>>> ufunc_reduce(np.add,a,b,c)
array([[[15, 14, 16, 18, 13],
        [12, 11, 13, 15, 10],
        [11, 10, 12, 14,  9]],
       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],
       [[17, 16, 18, 20, 15],
        [14, 13, 15, 17, 12],
        [13, 12, 14, 16, 11]],
       [[18, 17, 19, 21, 16],
        [15, 14, 16, 18, 13],
        [14, 13, 15, 17, 12]]])

此版本的 reduce 和 ufunc.reduce 相比的优点是利用 broadcasting rules 从而避免了中间变量的产生。

Indexing with strings

Structured arrays



>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1.  2.]
 [ 3.  4.]]

>>> a.transpose()
array([[ 1.,  3.],
       [ 2.,  4.]])

>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1.,  0.],
       [ 0.,  1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> (j, j) # matrix product
array([[-1.,  0.],
       [ 0., -1.]])

>>> np.trace(u)  # trace

>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
       [ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j,  0.-1.j]), array([[ 0.70710678+0.j        ,  0.70710678-0.j        ],
       [ 0.00000000-0.70710678j,  0.00000000+0.70710678j]]))

tricks and tips

“Automatic” Reshaping

>>> a = np.arange(30)
>>> a.shape = 2,-1,3  # -1 means "whatever is needed"
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14]],
       [[15, 16, 17],
        [18, 19, 20],
        [21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

Vector Stacking



numpy histogram 函数以 array 为输入,输出一个 hitogram 向量和一个 bin 向量。matplotlib 也有热力图函数 hist 和numpy 中的不一样,主要区别是 hist 自动画出热力图而 numpy.histogram 只是产生数据。

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
>>> mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
>>> # Plot a normalized histogram with 50 bins
>>> plt.hist(v, bins=50, normed=1)       # matplotlib version (plot)
>>> # Compute the histogram with numpy and then plot it
>>> (n, bins) = np.histogram(v, bins=50, normed=True)  # NumPy version (no plot)
>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)


自定义 dtype

创建 array 时自定义 dtype 类型,也可以包含 str 类型比如

n_drops = 5
rain_drops = np.zeros(n_drops, dtype=[('position', float, 2),
                                      ('size', float, 1),
                                      ('growth', float, 1),
                                      ('color', float, 4),
                                      ('name', str, 1)])
>>> rain_drops
array([([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], '')],
      dtype=[('position', '<f8', (2,)), ('size', '<f8'), ('growth', '<f8'), ('color', '<f8', (4,)), ('name', '<U1')])
>>> rain_drops[0]
([0., 0.], 0., 0., [0., 0., 0., 0.], '')
>>> rain_drops[0]['position']
array([0., 0.])


np.stats 基本统计数据

In [109]: np.sum(arr11)   #计算所有元素的和
Out[109]: -18

In [110]: np.sum(arr11,axis = 0)    #对每一列求和,注意axis是0
Out[110]: array([ -2,  -6, -10])

In [111]: np.sum(arr11, axis = 1)     #对每一行求和,注意axis是1
Out[111]: array([  9,   0,  -9, -18])

In [112]: np.cumsum(arr11) #对每一个元素求累积和(从上到下,从左到右的元素顺序),即每移动一次就把当前数字加到和值
Out[112]: array([  4,   7,   9,  10,  10,   9,   7,   4,   0,  -5, -11, -18], dtype=int32)

In [113]: np.cumsum(arr11, axis = 0) #计算每一列的累积和,并返回二维数组
array([[  4,   3,   2],
[  5,   3,   1],
[  3,   0,  -3],
[ -2,  -6, -10]], dtype=int32)

In [114]: np.cumprod(arr11, axis = 1) #计算每一行的累计积,并返回二维数组
array([[   4,   12,   24],
[   1,    0,    0],
[  -2,    6,  -24],
[  -5,   30, -210]], dtype=int32)

In [115]: np.min(arr11)   #计算所有元素的最小值
Out[115]: -7

In [116]: np.max(arr11, axis = 0) #计算每一列的最大值
Out[116]: array([4, 3, 2])

In [117]: np.mean(arr11)  #计算所有元素的均值
Out[117]: -1.5

In [118]: np.mean(arr11, axis = 1) #计算每一行的均值
Out[118]: array([ 3.,  0., -3., -6.])

In [119]: np.median(arr11)   #计算所有元素的中位数
Out[119]: -1.5

In [120]: np.median(arr11, axis = 0)   #计算每一列的中位数
Out[120]: array([-0.5, -1.5, -2.5])

In [121]: np.var(arr12)   #计算所有元素的方差
Out[121]: 5.354166666666667

In [122]: np.std(arr12, axis = 1)   #计算每一行的标准差
Out[122]: array([ 2.49443826,  1.88561808,  1.69967317,  2.1602469 ])


unique(x): 计算x的唯一元素,并返回有序结果 intersect(x,y): 计算x和y的公共元素,即交集 union1d(x,y): 计算x和y的并集 setdiff1d(x,y): 计算x和y的差集,即元素在x中,不在y中 setxor1d(x,y): 计算集合的对称差,即存在于一个数组中,但不同时存在于两个数组中 in1d(x,y): 判断x的元素是否包含于y中

numpy.random 模块

一些常用的 random 函数

rand(d0, d1, ..., dn)               Random values in a given shape.Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1)
randn(d0, d1, ..., dn)              Return a sample (or samples) from the “standard normal” distribution.
randint(low[, high, size, dtype])   Return random integers from low (inclusive) to high (exclusive).
random_integers(low[, high, size])  Random integers of type between low and high, inclusive.
random_sample([size])               Return random floats in the half-open interval [0.0, 1.0).
random([size])                      Return random floats in the half-open interval [0.0, 1.0).
ranf([size])                        Return random floats in the half-open interval [0.0, 1.0).
sample([size])                      Return random floats in the half-open interval [0.0, 1.0).
choice(a[, size, replace, p])       Generates a random sample from a given 1-D array
bytes(length)                       Return random bytes.

