Pytorch Summary sheet
Kunpeng Song
Tensor
Tensor initialize
- torch.empty(dim,dim); torch.zeros(dim,dim), torch.ones(dim,dim)
>>> torch.empty(5, 3) # same as torch.zeros
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
>>> torch.ones(2,3)
tensor([[ 1., 1., 1.],
[ 1., 1., 1.]])
- tensor.new_ones(dim,dim, dtype = tensor.dtype): Create a tensor based on an existing tensor. Reuse properties of the input tensor, e.g. dtype, unless new values are provided by user.
>>> x = torch.ones(2,3,dtype=torch.int8) #default dtype is torch.float32
>>> x.dtype
torch.int8
>>> x.new_ones()
Traceback TypeError: ew_ones() missing 1 required positional arguments: "size" # need size
>>> x.new_ones(1, 3)
tensor([[ 1, 1, 1]], dtype=torch.int8)
>>> x.new_ones(1, 3, dtype=torch.double)
tensor([[ 1., 1., 1.]], dtype=torch.float64)
- torch.tensor(list) ; torch.tensor(nparray)
- torch.rand(dim,dim): random from a uniform distribution on the interval [0, 1) U(0,1)
>>> torch.rand(2, 3)
tensor([[0.5728, 0.5375, 0.0494],
[0.2820, 0.1853, 0.8619]]) #default dtype is torch.float32
>>> torch.rand(4)
tensor([ 0.5204, 0.2503, 0.3525, 0.5673])
- torch.randn( size, dtype=None): random numbers from a normal distribution with mean 0 and variance 1 N(0,1)
>>> torch.randn(4)
tensor([-2.1436, 0.9966, 2.3426, -0.6366])
>>> torch.randn(2, 3)
tensor([[ 1.5954, 2.8929, -1.0923],
[ 1.1719, -0.4709, -0.1996]])
>>> torch.randn(2,2,3)
tensor([[[ 0.5827, 1.7046, -0.8720],
[ 0.2199, 0.5719, 1.9939]],
[[ 0.5509, 1.1897, 0.2434],
[-1.1203, 1.8258, 1.0824]]])
>>> torch.randn(2,1,3)
tensor([[[-0.2905, 0.0681, -0.8215]],
[[-0.4649, 0.5093, 0.5316]]])
>>> torch.randn(1,2,3)
tensor([[[-0.1372, -0.2624, -0.6892],
[-1.9984, -0.4962, -1.1629]]]) # pay attention to the shape of tensor
- torch.randn_like(tensor, dtype = tensor.dtype); tensor.size(); tensor.shape
Returns a tensor with the same size as input
that is filled with random numbers from a normal distribution N(0,1)
>>> x = torch.ones(2,3)
>>> x.dtype
torch.float32 #default dtype is torch.float32
>>> y = torch.randn_like(x) # do not need size
>>> y
tensor([[-0.0686, 1.9222, -1.7793],
[-2.1696, 1.5362, 0.0112]]) # same shape
>>> y.dtype
torch.float32 #same dtype as default
>>> z = torch.randn_like(x,dtype=torch.double) # specify double as dtype
>>> z
tensor([[ 1.0053, 0.8341, 2.8000],
[ 0.6961, 0.0417, 0.9913]], dtype=torch.float64) #double is torch.float64
>>> z.dtype
torch.float64
>>> z.size() # need (), not x.size!
torch.Size([1, 3]) # This is a tuple, so it supports all tuple operations.
>>> z.shape # do not need (). Same as numpy
torch.Size([1, 3]) # Same as size()
-
torch.manual_seed(num) :
>>> torch.manual_seed(2) >>> print(torch.rand(2))#always same number. If not torch.manual_seed, different number >>> Seed = random.randint(1, 10000) >>> print(Seed) 6895 >>> print(random.seed(Seed)) >>> print(torch.manual_seed(Seed)) >>> print(random.random()) >>> print(torch.randn(1)) None <torch._C.Generator object at 0x7ff6bed50d90> 0.14592352920524942 tensor([-1.0107])
Notice about jupyter notebook
# run these two rows in the same block, result is always same >>> torch.manual_seed(Seed) >>> torch.rand(2) tensor([0.9825, 0.2720]) #run this line in the next jupyter notebook block, result keeps changing every time. >>> torch.rand(2) tensor([0.3530, 0.6419])
-
torch.arange(start=0, end, step=1) : same as tensor.range(), but it is deprecated (outdated).
>>> torch.arange(5)
tensor([ 0, 1, 2, 3, 4]) # torch.float32
>>> torch.arange(1, 4)
tensor([ 1, 2, 3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000, 1.5000, 2.0000])
>>> np.arange(5) # similar
array([0, 1, 2, 3, 4])
>>> torch.arange(5,11,2)
tensor([ 5., 7., 9.]) # notice: end excluded!
- torch.randperm(n) : Returns a random permutation of integers from
0
ton - 1
.
>>> torch.randperm(4)
tensor([2, 1, 0, 3]) # dtype = torch.int64
- torch.tensor( list ): Construct a tensor directly from list
>>> torch.tensor([5.5, 3])
tensor([5.5000, 3.0000])
-
torch.cat ( ( tensor1, tensor2) ) # concat tensor, notice: tuple needed
>>> a = torch.tensor([1,2]) >>> b = torch.tensor([4,3]) >>> torch.cat((a,b,a)) tensor([1, 2, 4, 3, 1, 2])
more:
-
torch.detach() or tensor1.detach()
Returns a new Tensor, detached from the current graph.
The result will never require gradient. Returned Tensor uses the same data tensor as the original one.
Tensor iteration
- torch.numel (tensor) : Returns the total number of elements in the
input
tensor. Same asa.numel()
>>> a = torch.randn(1, 2, 3, 4, 5) # same as a.numel()
>>> torch.numel(a)
120
>>> a = torch.zeros(4,4)
>>> torch.numel(a)
16
Tensor math
+
, and torch.add()
>>> x = torch.ones(2,3)
>>> y = torch.randn(2,3)
>>> x+y # same as torch.add(x,y)
tensor([[ 0.0066, 1.3231, 1.2001],
[ 1.1485, -0.0490, 2.0391]])
>>> z = torch.rand(2,3,dtype=torch.double) # different dtype. Same as torch.float64
>>> z+x
Traceback Expected object of type torch.DoubleTensor # cannot use +
# torch.add(x,z) also gives this Error
torch.add() is able to broadcast:
a = torch.randn(4)
>>> a
tensor([-0.9950, 1.6279, -1.5262, -0.1706])
>>> torch.add(a, 20)
tensor([ 19.0050, 21.6279, 18.4738, 19.8294]) # broadcast with scalar
>>> b = torch.randn(4)
>>> b
tensor([-0.8575, -0.2146, 0.6487, 0.6497])
>>> torch.add(a,b)
tensor([-1.8525, 1.4133, -0.8775, 0.4791]) # same shape: broadcast with element
>>> torch.add(a,10,b)
tensor([-9.5702, -0.5183, 4.9613, 6.3262]) # out = tensor1 + value × tensor2
>>> c = torch.randn(2,1)
>>> c
tensor([[ 1.7452],
[-2.0755]])
>>> torch.add(a,c)
tensor([[ 0.7502, 3.3731, 0.2190, 1.5746], # different shape: broadcast to pair matix
[-3.0705, -0.4476, -3.6017, -2.2461]])
more broadcast examples:
t1 = torch.randn(1, 3)
t2 = torch.randn(3, 1)
print(t1)
print(t2)
print('###')
print(t1+t2)
print('###')
t3 = torch.randn(3,3)
print(t3)
print(t1+t3)
>>>
tensor([[ 0.0989, 0.2018, -0.5701]])
tensor([[ 0.2082],
[-1.1686],
[ 0.0156]])
###
tensor([[ 0.3071, 0.4100, -0.3619], # broadcast to matrix
[-1.0696, -0.9667, -1.7387],
[ 0.1145, 0.2175, -0.5545]])
###
tensor([[-0.4906, 1.5755, -0.4574],
[-2.2614, -0.9916, -1.6035],
[-0.2944, -1.4336, 0.1889]])
tensor([[-0.3917, 1.7774, -1.0275], # 1*3 nultiple with 3*3: broadcast to 3*3
[-2.1625, -0.7898, -2.1736],
[-0.1954, -1.2318, -0.3813]])
Some variations:
- torch.addcmul(tensor, value=1, tensor1, tensor2) : outi = tensori + value × tensor1i × tensor2i
- torch.addcdiv(tensor, value=1, tensor1, tensor2) : outi = tensori + value × tensor1i / tensor2i
Other element wise math function
-
torch.abs(tensor) : element-wise absolute value of the given
input
tensor. Same astensor.abs()
-
sin cos tan asin acos atan
-
torch.ceil(tensor) : Returns a new tensor with the ceil of the elements of
input
, the smallest integer greater than or equal to each element. -
torch.floor(tensor) : similar as before but smaller than original value; torch.round(tensor) : closet integer
-
torch.clamp(tensor, min = min, max = max) :
-
torch.clamp(tensor, min = min) : Clamps all elements in
input
to be larger or equalmin
. (Similarly, max) -
torch.mul(tensor, value) ; torch.mul(tensor1, tensor2); Similarly div
-
torch.exp(tensor) : e^tensori
-
torch.fmod(tensor, value) : Computes the element-wise remainder of division.
-
torch.frac(tensor) : Computes the fractional portion of each element in
input
.torch.frac(torch.tensor([1, 2.5, -3.2])) tensor([ 0.0000, 0.5000, -0.2000])
-
torch.lerp(start, end, weigth) : Does a linear interpolation of two tensors
start
andend
based on a scalarweight
and returns the resultingout
tensor. outi=starti+weight×(endi−starti)>>> start = torch.arange(1., 5.) >>> end = torch.empty(4).fill_(10) >>> start tensor([ 1., 2., 3., 4.]) >>> end tensor([ 10., 10., 10., 10.]) >>> torch.lerp(start, end, 0.5) tensor([ 5.5000, 6.0000, 6.5000, 7.0000])
-
torch.log : log e ; torch.log10 : log 10 ; torch.log2 : log 2 ; torch.neg : negative
-
torch.power(tensor, value) : power by value ; torch.power(tensor, tensor) : power by pair tensor
-
torch.reciprocal(tensor) : 1 / tensor
-
torch.reminder(tensor, divisor) : Computes the element-wise remainder of division. (divisor: value or tensor)
-
torch.sqrt(tensor) : sqrt(tensor) ; torch.rsqrt(tensor) : 1 / sqrt(tensor)
-
torch.sigmoid(tensor) :
-
torch.sign(tensor) : 1, -1, 0
-
torch.trunc(tensor) : Returns a new tensor with the truncated integer values of the elements of
input
.>>> a = torch.randn(4) >>> a tensor([ 3.4742, 0.5466, -0.8008, -0.9079]) >>> torch.trunc(a) tensor([ 3., 0., -0., -0.]) # simply delet fraction
Reductions
-
torch.argmax(tensor, dim=None) : Returns the indices of the maximum values of a tensor across a dimension.
>>> a = torch.randn(3, 4) >>> a tensor([[ 0.7913, 0.9236, 0.0348, 0.5403], [ 0.4394, -0.4464, -0.6759, -0.3718], [-1.4133, 1.0612, 1.0322, -2.5048]]) >>> torch.argmax(a) # argmax in the flatten tensor tensor(9) >>> torch.argmax(a, dim=0) # col tensor([ 0, 2, 2, 0]) >>> torch.argmax(a, dim=1) # row tensor([ 1, 0, 1]) # same as argmin
-
Torch.prod(tensor,dim=None) : product of all elements
-
torch.cumprod(tensor,dim,dtype=None) : cumulative product of elements of
input
in the dimensiondim
>>> a = torch.randn(7) >>> a tensor([ 0.6001, 0.2069, -0.1919, 0.9792, 0.6727, 1.0062, 0.4126]) >>> torch.cumprod(a, dim=0) tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0158, -0.0065]) >>> a[5] = 0.0 >>> torch.cumprod(a, dim=0) # cumulative. So, from 5th on, there are all zeros tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0000, -0.0000]) >>> a = torch.randn(3,4) >>> a[0,2] = 0 >>> a[2,1] = 0 >>> a tensor([[-0.4318, 0.4427, 0.0000, -0.1199], [-0.2790, 0.9716, -0.9334, 1.1384], [-1.1763, 0.0000, -1.4534, -1.2983]]) >>> torch.cumprod(a,dim=0) # col tensor([[-0.4318, 0.4427, 0.0000, -0.1199], [ 0.1205, 0.4302, -0.0000, -0.1365], [-0.1417, 0.0000, 0.0000, 0.1773]]) >>> torch.cumprod(a,dim=1) # row tensor([[-0.4318, -0.1912, -0.0000, 0.0000], [-0.2790, -0.2711, 0.2530, 0.2880], [-1.1763, -0.0000, 0.0000, -0.0000]])
-
torch.cumsum(tensor, dim, dtype = None) : similar as cumprod
-
torch.dist(tensor1, tensor2, p=2) : Returns the p-norm (Frobenius norm) of (
input
-other
), it is a scalar value -
torch.mean(tensor, dim=None, keepdim =False) :
>>> a = torch.rand(3,2) >>> a tensor([[ 0.5877, 0.6770], [ 0.9455, 0.4647], [ 0.0451, 0.9079]]) >>> torch.mean(a) tensor(0.6047) >>> torch.mean(a,dim=1) tensor([ 0.6323, 0.7051, 0.4765]) >>> torch.mean(a,1) #same above tensor([ 0.6323, 0.7051, 0.4765]) >>> torch.mean(a, 1, True) #keep dimension tensor([[ 0.6323], [ 0.7051], [ 0.4765]]) >>> torch.mean(a,dim=0) tensor([ 0.5261, 0.6832]) # similarly: torch.median();
-
torch.norm(tensor, p=’fro’, dim=None, keepdim=False, out=None) : Returns the matrix norm
-
torch.std(tensor, dim=None) : Returns the standard-deviation of all elements in the
input
tensor. Similarly: var() -
torch.sum(tensor, dim, keepdim=False) : Returns the sum of each row of the
input
tensor in the given dimensiondim
. Ifdim
is a list of dimensions, reduce over all of them. (dim = tuple is disabled on iLab) -
torch.unique(tensor,sorted=False,return_inverse=False,dim=None) : Returns the unique scalar elements of the input tensor as a 1-D tensor.
- sorted (bool) – Whether to sort the unique elements in ascending order
- return_inverse (bool) – Whether to also return the indices for where elements in the original input ended up in the returned unique list.
>>> output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long)) >>> output tensor([ 2, 3, 1]) >>> a = torch.tensor([[1, 3], [2, 3]] >>> output, inverse_indices = torch.unique(a, sorted=True, return_inverse=True) >>> output tensor([ 1, 2, 3]) >>> inverse_indices tensor([[ 0, 2], [ 1, 2]])
-
A LOT OF FUNCTIONS. SUMMARIZE LATER.
-
torch.matmul(tensor1,tensor2) : Matrix multiplication of two tensors. Cases when they have different dimensions:
- If tensor1 is 1-D, tensor2 is 2_D: 1 is prepended to its dimension for the purpose of the matrix multiply (Turn a vector to matrix). After the matrix multiply, the prepended dimension is removed (turn back to vector).
- If tensor1 is 2-D, tensor2 is 1_D: matrix-vector product
- At least one is N-D:
>>> a = torch.rand(3) >>> b = torch.rand(3) >>> torch.matmul(a,b) # vector * vector tensor(0.2872) # size() = torch.Size([]) >>> a = torch.rand(3,4) >>> b = torch.rand(4) >>> torch.matmul(a,b).size() # matrix * vector torch.Size([3]) >>> a = torch.rand(3) >>> b = torch.rand(3,4) >>> torch.matmul(a,b) # vector * matrix tensor([ 1.1128, 0.9645, 0.6328, 1.0789]) # TO show the above more clearly: >>> torch.tensor([[1,2,3]]).size() # This is matrix torch.Size([1, 3]) >>> torch.tensor([1,2,3]).size() # This is vector torch.Size([3]) # And vector is different from matrix: # (1) >>> a = tensor([ 1, 2, 3]) >>> b = tensor([[ 1, 2], [ 3, 4],[ 4, 5]]) >>> torch.matmul(a,b) tensor([[ 19, 25]]) # (2) >>> a = tensor([[ 1, 2, 3]]) >>> b = tensor([[ 1, 2], [ 3, 4], [ 4, 5]]) >>> torch.matmul(a,b) tensor([ 19, 25]) >>> a = torch.rand(3,4) >>> b = torch.rand(4,5) >>> torch.matmul(a,b).size() torch.Size([3, 5]) >>> # batched matrix x broadcasted vector >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3]) >>> # batched matrix x batched matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(10, 4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) >>> # batched matrix x broadcasted matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) # To understand the dimensionality of tensor: # a torch.Size([10, 3, 4]) : 10 layer, 3 rows per layer, 4 elements per row # b torch.Size([10, 4, 5]) : 10 layer, 4 rows per layer, 5 elements per row # torch.matmul(a, b) : layer by layer matrix mul. --> torch.Size([10, 3, 5]) # c torch.Size([4,5]) : 2-D --> 1 layer 3-D. # torch.matmul(a, c) : c boradcast. layer by layer mul. torch.Size([10, 3, 5])
-
torch.mm(mat1, mat2) : matrix multiplication of the matrices
mat1
andmat2
. (m x n) (n x p) –> (m x p) -
torch.mv(mat, vec) : matrix-vector product. mat is (n×m),
vec
1-D of size m –> 1-D of size n. Not broadcastable. -
torch.addmm(beta=1,mat,alpha=1,mat1,mat2) : matrix multiplication (Not element wise) of the matrices
mat1
andmat2
. The matrixmat
is added to the final result.- out=β mat+α (mat1i * mat2i)
- alpha, beta are optional. But mat is required.
>>> a = torch.rand(2,4) >>> b = torch.rand(4,3) torch.addmm(torch.zeros(a.shape[0],b.shape[1]), a, b)
-
torch.addmv(beta=1, tensor, alpha=1, mat, vec) : matrix-vector product of the matrix
mat
and the vectorvec
. Thetensor
is added to the final result.- out=β tensor+α (mat * vec)
- alpha, beta are optional. But tensor is required.
-
torch.addr(beta=1, mat, alpha=1, vec1, vec2) : outer-product of vectors
vec1
andvec2
and adds it to matrixmat
- out=β mat+α (vec1⊗vec2)
- If
vec1
is a vector of size n andvec2
is a vector of size m, thenmat
must be broadcastable with a matrix of size (n x m) andout
will be a matrix of size(n×m).
>>> a = torch.arange(1,4) # a = tensor([ 1., 2., 3.]) it is 1-D >>> b = torch.arange(1,3) >>> torch.addr(torch.zeros(3,2), a, b) tensor([[ 1., 2.], [ 2., 4.], [ 3., 6.]]) ##attention: The following won't work! >>> a = torch.rand(1,4) # This is a 2_D matrix. It is not a 'vector' >>> b = torch.rand(1,3) >>> torch.addr(torch.zeros(4,3), a, b) Traceback Expected 1-D argument vec1, but got 2-D
-
torch.dot(tensor1,tensor2) : dot product (inner product) of two tensors. Returning is a scalar tensor.
-
torch.squeeze(input, dim=None, out=None) : Returns a tensor with all the dimensions of
input
of size 1 removed.- If input is of shape: (A \times 1 \times B \times C \times 1 \times D)(A×1×B×C×1×D) then the out tensor will be of shape: (A \times B \times C \times D)(A×B×C×D).
- When
dim
is given, a squeeze operation is done only in the given dimension. If input is of shape: (A \times 1 \times B)(A×1×B),squeeze(input, 0)
leaves the tensor unchanged, butsqueeze(input, 1)
will squeeze the tensor to the shape (A \times B)(A×B).
>>> a = torch.randn(1,2,2,2) >>> a tensor([[[[-0.3458, 1.3588], [-0.2800, -1.1741]], [[-0.9958, -1.0715], [-0.0628, -0.3520]]]]) >>> a.view(-1, 1) tensor([[-0.3458], [ 1.3588], [-0.2800], [-1.1741], [-0.9958], [-1.0715], [-0.0628], [-0.3520]]) >>> a.view(-1, 1).shape torch.Size([8, 1]) >>> a.view(-1, 1).squeeze(1) # tensor.squeeze(dim=None) SAME AS torch.squeeze(tensor,dim=None) tensor([-0.3458, 1.3588, -0.2800, -1.1741, -0.9958, -1.0715, -0.0628, -0.3520])
more:
-
torch.tensor.item() : Returns the value of this tensor as a standard Python number. This only works for tensors with one element.
>>> x = torch.tensor([1.0]) # only for tensors with one element. >>> x.item() 1.0
-
torch.tensor.tolist() : Returns the tensor as a (nested) list. Tensors are automatically moved to the CPU first.
-
tensor1.expand( tensor.size() ) :
Returns a new view of the
self
tensor with singleton dimensions expanded to a larger size, without allocating new memory. Passing -1 as the size for a dimension means not changing the size of that dimension. Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1.>>> x = torch.Tensor([[1], [2], [3]]) >>> x.size() torch.Size([3, 1]) >>> x.expand(3, 4) 1 1 1 1 2 2 2 2 3 3 3 3 [torch.FloatTensor of size 3x4] >>> x.expand(-1, 4) # -1 means not changing the size of that dimension 1 1 1 1 2 2 2 2 3 3 3 3 [torch.FloatTensor of size 3x4] >>> x.expand(2,3,4) # add new dim at the very outside dim tensor([[[1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.]], [[1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.]]]) >>> y = torch.tensor([1,2,3,4]) >>> y tensor([1, 2, 3, 4]) >>> y.expand(3,3,-1) tensor([[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]])
-
torch.transpose( tensor, dim0, dim1) :
Returns a tensor that is a transposed version of
input
. The given dimensionsdim0
anddim1
are swapped.>>> x = torch.randn(2, 3) >>> x tensor([[ 1.0028, -0.9893, 0.5809], [-0.1669, 0.7299, 0.4942]]) >>> torch.transpose(x, 0, 1) tensor([[ 1.0028, -0.1669], [-0.9893, 0.7299], [ 0.5809, 0.4942]])
Notice: the returned tensor is not contiguous and cannot be applied torch.view()
-
tensor.contiguous() :
Some tensor is constructed by several tensor blocks, rather than an contiguous memory block. These tensors can not be applied torch.view(). And need to be transferred to contiguous tensor beforehand.
>>> x = torch.Tensor([[1], [2], [3]]) tensor([[1.], [2.], [3.]]) >>> y = x.expand(2,3,4) >>> y tensor([[[1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.]], [[1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.]]]) >>> y.is_contiguous() False >>> y.view(3,-1) RuntimeError: invalid argument 2: view size is not compatible with input tensor size and stride (at least one dimension spans across two contiguous subspaces) >>> y.contiguous().view(3,-1) tensor([[1., 1., 1., 1., 2., 2., 2., 2.], [3., 3., 3., 3., 1., 1., 1., 1.], [2., 2., 2., 2., 3., 3., 3., 3.]])
-
torch.numel( tensor ) : equals : torch.nelement( tensor )
Returns the total number of elements in the
input
tensor.>>> a = torch.randn(1, 2, 3, 4, 5) >>> torch.numel(a) 120
-
To be continued.
NN
To be continued.
References:
[1] Pytorch: tensor https://pytorch.org/docs/stable/torch.html#torch.fmod
-
torch.backends.cudnn.benchmark = True #enables benchmark mode in cudnn.inbuilt cudnn auto-tuner to find the best algorithm to use for your hardware.
-
import argparse parser = argparse.ArgumentParser() parser.add_argument('--dataset', required=True, help='cifar10 |lsun |mnist |imagenet') parser.add_argument('--workers', type=int, help='number of loading workers', default=2) parser.add_argument('--batchSize', type=int, default=64, help='input batch size') opt = parser.parse_args() root = opt.dataroot ...
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
dataset = torchvision.datasets.ImageFolder('path/to/imagenet_root/') ''' dataset = torchvision.datasets.MNIST(root='path/to/imagenet_root/',download=True) ''' data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=True, num_workers=args.nThreads)
Use some common datasets and generate a data loader (https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder)
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
torch.utils.data.DataLoader
(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0) : Retrun (batch_x, batch_y)import torch.utils.data as Data torch.manual_seed(1) # reproducible BATCH_SIZE = 5 x = torch.linspace(1, 10, 10) # x data (torch tensor) y = torch.linspace(10, 1, 10) # y data (torch tensor) # conver to torch Dataset torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y) # load dataset to DataLoader loader = Data.DataLoader( dataset=torch_dataset, # torch TensorDataset format batch_size=BATCH_SIZE, # mini batch size shuffle=True, num_workers=2, ) for epoch in range(3): # all training data 3 iterations for step, (batch_x, batch_y) in enumerate(loader): # # TRAINING CODE print('Epoch: ', epoch, '| Step: ', step, '| batch x: ', batch_x.numpy(), '| batch y: ', batch_y.numpy()) """ Epoch: 0 |Step: 0 |batch x:[ 6. 7. 2. 3. 1.] | batch y:[ 5. 4. 9. 8. 10.] Epoch: 0 |Step: 1 | batch x:[ 9. 10. 4. 8. 5.] | batch y:[ 2. 1. 7. 3. 6.] Epoch: 1 |Step: 0 | batch x: [ 3. 4. 2. 9. 10.] | batch y:[ 8. 7. 9. 2. 1.] Epoch: 1 |Step: 1 | batch x: [ 1. 7. 8. 5. 6.] | batch y [ 10. 4. 3. 6. 5.] Epoch: 2 |Step: 0 | batch x: [ 3. 9. 2. 6. 7.] | batch y: [ 8. 2. 9. 5. 4.] Epoch: 2 |Step: 1 | batch x: [ 10. 4. 8. 1. 5.] | batch y:[ 1. 7. 3. 10. 6.] """
https://morvanzhou.github.io/tutorials/machine-learning/torch/3-05-train-on-batch/
-
>>> w = torch.empty(3, 5) >>> nn.init.normal_(w)
Fills the input Tensor with values drawn from the normal distribution
-
x = torch.randn(3, requires_grad=True) print(x.requires_grad) #default is true print((x ** 2).requires_grad) with torch.no_grad(): print((x ** 2).requires_grad) >>> out True True False # The wrapper "with torch.no_grad()" temporarily set all the requires_grad flag to false.
-
weight initialize : define your name check function, which applies selectively the initialisation.
def weights_init(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: xavier(m.weight.data) xavier(m.bias.data) # Then traverse the whole set of Modules. net = Net() # generate an instance network from the Net class net.apply(weights_init) # apply weight init # another eg: # custom weights initialization called on netG and netD def weights_init(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: # if is conv: normalize to N(0,0.02) m.weight.data.normal_(0.0, 0.02) elif classname.find('BatchNorm') != -1: # if is BN: normalize to N(1,0.02) m.weight.data.normal_(1.0, 0.02) m.bias.data.fill_(0) # also init bias netG = Generator(ngpu).to(device) netG.apply(weights_init) netD = Discriminator(ngpu).to(device) netD.apply(weights_init) # others: #check if some module is an instance of a class def weights_init(m): if isinstance(m, nn.Conv2d): xavier(m.weight.data) xavier(m.bias.data)
https://discuss.pytorch.org/t/weight-initilzation/157/8v
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
nn.relu(inplace=False) :
inplace – can optionally do the operation in-place, modifying the input directly, without allocating any additional output. It can sometimes slightly decrease the memory usage, but may not always be a valid operation. Default:
False
>>> m = nn.ReLU() >>> tensor = torch.randn(2) >>> output = m(tensor)
-
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True) :
>>> a = torch.randn(1,3,3,3) >>> conv1 = torch.nn.Conv2d(3,1,3,1,0) # no padding >>> conv1(a).size() torch.Size([1, 1, 1, 1]) >>> conv2 = torch.nn.Conv2d(3,1,3,1,1) # padding by FLOOR(kernel_size/2) >>> conv2(a).shape torch.Size([1, 1, 3, 3]) >>> conv3 = torch.nn.Conv2d(3,1,3,1) # default is no-padding >>> conv3(a).shape torch.Size([1, 1, 1, 1])
-
nn.ConvTranspose2d (in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1) :
This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
>>> a = torch.randn(1, 16, 12, 12) >>> downsample = nn.Conv2d(16, 16, 3, stride=2) >>> b = downsample(a) >>> b.size() torch.Size([1, 16, 5, 5]) >>> downsample = nn.Conv2d(16, 16, 3, stride=2,padding=1) >>> b = downsample(a) >>> b.size() torch.Size([1, 16, 6, 6]) >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2) >>> b = upsample(a)#, output_size=input.size()) >>> b.size() torch.Size([1, 16, 25, 25]) >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1) >>> b = upsample(a)#, output_size=input.size()) >>> b.size() torch.Size([1, 16, 23, 23])
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
conv_kernel size
-
specify GPU:
# in model class Discriminator(nn.Module): def __init__(self, ngpu): # ngpu: number of gpu super(Discriminator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # input is (nc) x 64 x 64 nn.Conv2d(nc, ndf, 4, 2, 1, bias=False), nn.LeakyReLU(0.2, inplace=True), # ... # state size. (ndf*8) x 4 x 4 nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False), nn.Sigmoid() ) def forward(self, input): if self.ngpu > 1: output = nn.parallel.data_parallel(self.main, input, range(self.ngpu)) else: output = self.main(input) return output
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
cuda() :
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = model.to(device) for features, targets in data_loader: features = features.to(device) targets = targets.to(device)
https://discuss.pytorch.org/t/how-to-check-if-model-is-on-cuda/180/7
-
nn.load_state_dict(state_dict, strict=True) :
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.netD = Discriminator(ngpu).to(device) netD.apply(weights_init) if opt.ResumeNetD = 'True': netD.load_state_dict(torch.load('path')) # resume training or load pretrained model print(netD)
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
nn.Sequential`(*args) : A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.
# Example of using Sequential model = nn.Sequential( nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU() ) # Example of using Sequential with OrderedDict model = nn.Sequential(OrderedDict([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU()) ]))
In user-defined models, use this function in
__init__()
:class Generator(nn.Module): def __init__(self, ngpu): super(Generator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # input s Z, going into a convolution nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False), nn.BatchNorm2d(ngf * 8), nn.ReLU(True), # state size. (ngf*8) x 4 x 4 nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 4), nn.ReLU(True), # state size. (ngf*4) x 8 x 8 nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 2), nn.ReLU(True), # state size. (ngf*2) x 16 x 16 nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf), nn.ReLU(True), # state size. (ngf) x 32 x 32 nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False), nn.Tanh() # state size. (nc) x 64 x 64 ) def forward(self, input): if input.is_cuda and self.ngpu > 1: output = nn.parallel.data_parallel(self.main, input, range(self.ngpu)) else: output = self.main(input) return output
code source: https://github.com/pytorch/examples/blob/master/dcgan/main.py
-
torch.stack( (tensor1, tensor2) ) : tensor of tensors
>>> a = torch.tensor([1,2]) >>> b = torch.tensor([2,3]) >>> torch.stack((a,b)) tensor([[1, 2], [2, 3]]) >>> torch.cat((a,b)) tensor([1, 2, 2, 3])
-
tensor.backward() :
>>> x = torch.tensor(([1]),dtype = torch.float32,requires_grad=True) >>> y = x*2 # 1,2 >>> y.backward(retain_graph=True) >>> print(x.grad) >>> y.backward(retain_graph=True) >>> print(x.grad) >>> y.backward() >>> print(x.grad) >>> y.backward() >>> print(x.grad) tensor([2.]) tensor([4.]) # the gradient accumulates. tensor([6.]) RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
see
retain_graph=True
below. -
backward (retain_graph=True )
need d(d)/d(a) and d(e)/d(a):
from torch.autograd import Variable a = Variable(torch.rand(1, 4), requires_grad=True) b = a**2 c = b*2 d = c.mean() e = c.sum() d.backward(retain_graph=True) # fine e.backward(retain_graph=True) # fine d.backward() # also fine e.backward() # error will occur!
when we do
d.backward()
, that is fine. After this computation, the part of graph that calculated
will be freed by default to save memory. So if we doe.backward()
, the error message will pop up. In order to doe.backward()
, we have to set the parameterretain_graph
toTrue
ind.backward()
a real use case is multi-task learning where you have multiple loss which maybe be at different layers. Suppose that you have 2 losses:
loss1
andloss2
and they reside in different layers. In order to backprop the gradient ofloss1
andloss2
w.r.t to the learnable weight of your network independently . You have to useretain_graph=True
inbackward()
method in the first back-propagated loss:loss1.backward(retain_graph=True) loss2.backward() # now the graph is freed, ready for next process of batch SGD optimizer.step() # update the network parameters
More example:
- This will work when: final_loss = loss1+loss2. And it is we need to update both model 1 and model 2 according to this final loss.
- This does not work when: we need to update model 1 according to only loss 1 and update model 2 according to loss1+loss2 (or only loss 2). This is what we said: update models separately. Then this approach will introduce loss 2 when updating model 1, which is not intended.
https://discuss.pytorch.org/t/what-exactly-does-retain-variables-true-in-loss-backward-do/3508/12
-
tensor.detach()
>>> x = torch.tensor(([1]),dtype = torch.float32,requires_grad=True) >>> y = x**2 # 1,2x >>> z = 2*y # 2,2 >>> w= z**3 # 8,3z**2 >>> # detach it, so the gradient w.r.t `p` does not effect `z`! >>> p = z.detach() >>> q = torch.tensor(([2]),dtype = torch.float32, requires_grad=True) >>> pq = p*q # Gragh: # x --> y --> z --> w # --> p --> pq >>> pq.backward(retain_graph=True) >>> print(x.grad) >>> w.backward() >>> print(x.grad) None tensor([48.]) # replace p = z.detach() with: >>> p = z tensor([8.]) # affected tensor([56.]) # accummulated
more summary:
tensor.detach()
creates a tensor that shares storage withtensor
that does not require grad.tensor.clone()
creates a copy of tensor that imitates the originaltensor
’srequires_grad
field. You should usedetach()
when attempting to remove a tensor from a computation graph, andclone
as a way to copy the tensor while still keeping the copy as a part of the computation graph it came from.tensor.data
returns a new tensor that shares storage withtensor
. However, it always hasrequires_grad=False
(even if the originaltensor
hadrequires_grad=True
- You should try not to call
tensor.data
in 0.4.0. What are your use cases fortensor.data
? tensor.clone()
makes a copy oftensor
.variable.clone()
andvariable.detach()
in 0.3.1 act the same astensor.clone()
andtensor.detach()
in 0.4.0.
https://discuss.pytorch.org/t/clone-and-detach-in-v0-4-0/16861
utils
-
save image patch as grad image
import torchvision.utils as vutils fixed_noise = torch.randn(batchSize, nz, 1, 1).to(device) # in train loop fake = netG(fixed_noise) vutils.save_image(fake.detach(), '%s/find_para_dim64%d.png' % (outfile_path, epoch), normalize=True)
-
datasets
-
dataloader
-
custom dataset examples
https://github.com/utkuozbulak/pytorch-custom-dataset-examples
-
transforms.Compose([])
transform=transforms.Compose([ transforms.Resize(256), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0,), (1,)), ])
https://pytorch.org/docs/0.4.0/torchvision/transforms.html
https://pytorch.org/docs/stable/torchvision/transforms.html
- torchvision.transforms.CenterCrop(size) : Crops the given PIL Image at the center.
- torchvision.transforms.FiveCrop(size) : Crop the given PIL Image into four corners and the central crop
- torchvision.transforms.RandomCrop(size) : Crop the given PIL Image at a random location.
- torchvision.transforms.Resize(size, interpolation=2) : Resize the input PIL Image to the given size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
Notice:
‘transforms’ only perform on PIL images. For example when building dataset. It cannot perform on tensors:
import torch import torchvision.transforms as transforms b = torch.ones(1,10,10) transforms.Resize(5,5)(b) Traceback TypeError: img should be PIL Image. Got <class 'torch.Tensor'>
This issue can be solved by transforming tensor to PIL beforehand:
import torch import torchvision.transforms as transforms import torchvision.transforms.functional as F a = torch.ones(2,6,6) b = F.to_pil_image(a) c = transforms.Resize(3,3)(b)
But still not working perfectly: we can’t see the output result c directly.
Notice: ‘transforms’ are very powerful. But it performs on PIL image. That is why we normally can’t see the direct result just like other functions:
But the above issue can be solved by converting it back to tensor:
c = transforms.ToTensor()(transforms.Resize(3,3)(b))
Now: Enjoy the amazing power provided by ‘torchvision.transforms’ !
Datasets
CIFAR-10
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
http://www.cs.utoronto.ca/~kriz/cifar.html
>>> data_dict = unpickle("data_batch_1")
>>> print(data_dict[b'batch_label'])
>>> print(data_dict[b'labels'][12:14])
>>> print(data_dict[b'data'].shape)
>>> print(data_dict[b'filenames'][12:14])
b'training batch 1 of 5'
[7, 2] # number of labels from 0 to 9: horse, bird
(10000, 3072)
[b'quarter_horse_s_000672.png', b'passerine_s_000343.png']
Some strange things:
img = np.ones((32,32,3)).astype('uint8')
img[:,:,0] = data_dict[b'data'][15][:1024].reshape(32,32)
img[:,:,1] = data_dict[b'data'][15][1024:2048].reshape(32,32)
img[:,:,2] = data_dict[b'data'][15][2048:3072].reshape(32,32)
plt.axis('off')
plt.imshow(img)
img = np.ones((3,1024)).astype('uint8')
img[0,:] = data_dict[b'data'][15][:1024]
img[1,:] = data_dict[b'data'][15][1024:2048]
img[2,:] = data_dict[b'data'][15][2048:3072]
img = np.rot90(img,3)
img = img.reshape(32,32,3)
plt.axis('off')
plt.imshow(img)
dataloader for CIFAR-10
import pickle
from torch.utils.data import Dataset
import os
dataroot = 'common/user/ks1418'
batchSize = 64
def get_img(img_data):
img = np.array([img_data[:1024].reshape(32,32),img_data[1024:2048].reshape(32,32),img_data[2048:3072].reshape(32,32)])
#because torch use C x H x W.
return img
def get_dataset():
dataset = []
files = ["data_batch_1","data_batch_2","data_batch_3","data_batch_4","data_batch_5"]
for file in files:
with open(os.path.join(dataroot,file), 'rb') as fo:
data_dict = pickle.load(fo, encoding='bytes')
for i in range(len(data_dict[b'data'])):
if data_dict[b'labels'][i] == int(7): # only use 'Horse' to train
img_data = data_dict[b'data'][i]
dataset.append((torch.tensor(get_img(img_data),dtype = torch.float32),torch.tensor(8,dtype=torch.float32))) # items in dataset should by tuples of tensor: first element is image, second is label (not used but it needs to be a tuple)
return dataset
class MyDataSet(Dataset):
def __init__(self):
self.dataset = get_dataset() # read in all training images
def __getitem__(self, index):
return self.dataset[index] # return by index
def __len__(self):
return len(self.dataset)
dataset = MyDataSet()
dataloader = Data.DataLoader(
dataset=dataset, # torch TensorDataset format
batch_size=batchSize, # mini batch size
shuffle=True,
num_workers=2,
)
trouble shooting
-
Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same
Try moving the data to torch.float, i.e
data.to(device=device, dtype=torch.float)
. -
cuDNN error: CUDNN_STATUS_BAD_PARAM
The size of tensor and size of CNN are not proper. take care of the shape of tensor. Usually, tensor is too small.
>>> m = torch.nn.Conv2d(3, 5, 4, 1).to(device) >>> good = torch.randn((1, 3, 10, 10)).to(device) >>> small = torch.randn((1, 3, 2 ,2)).to(device) >>> print(m(good).shape) >>> m(small) torch.Size([1, 5, 7, 7]) RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
- When stride = 1 to keep same image size, only use kernel size = 3
- conv: padding = 1
- convtranpose: padding = 1
- when stride =2 and do conv or convtranspose to scale up/down: conv : free, convtranspose: only kernel 4
- kernel size = 4 is safe in both conv and convtranspose
- conv:
- use padding = 1 to get input/2
- use padding = 0 to get input/2-1
- convtranspose:
- use padding = 1 to get 2*input
- use padding =0 to get 2*input+2)
- kernel size = 3 is only applicable in conv, use padding = 1
- When stride = 1 to keep same image size, only use kernel size = 3
-
Gradient vanish :
If you see Loss is always 0, and results are almost white noise, it probably because of gradient vanish
Use spectral norm after each convolution to control the size of gradient update
from torch.nn.utils import spectral_norm # ... in sequential spectral_norm(nn.Conv2d(in_chan, out_chan, kernel_size=3, stride=2, padding=1)) # ...
But this leads to slower training. At the beginning, it seems to have collapsed, but gradually different results come out finally. Also, deeper CNN trains much slower than shallow CNN, and are more likely to be considered as collapse by mistake. (Actually, it is indeed more likely to collapse, but at least it won’t suffer from gradient vanish) (left is one layer deeper than right, both of them uses spectral norm.)
-
model collapse
Deeper and more complex model tends to collapse.
WGAN solved this issue almost completely.
Train D more:
for epoch in range(niter): for i, data in enumerate(dataloader,0): # inside each batch img = 0 for img, i in enumerate(dataloader,0): for j in range(5): # train D 5 times more than G print(next(iter(dataloader))[1]) # next iter img = img+1 if img >1: break # example output of above inner loop tensor([8]) tensor([8]) tensor([8]) tensor([8])
-
About batch size : avoid to big batch size when the training data set is small.
Eg: 360 coast images training data set, different batch size at the same training time:
-
batch size 64
-
batch size 16
-
-
one of the variables needed for gradient computation has been modified by an inplace operation
This means some variable A is rewritten, then in backward() , its original value is needed.
output = netD(real_img) errD_real = criterion(output, label) noise = torch.randn(batch_size, nz, 1, 1).to(device)#,dtype=torch.float32) fake = netG(noise) output = netD(fake.detach()) #rewrite errD_fake = criterion(output, label) errD = errD_real+errD_fake errD.backward() # need original value
To make life easier, call .backward() after calculation each item of loss function.
errD_real = criterion(output, label) errD_real.backward() # ... errD_fake = criterion(output, label) errD_fake.backward() optimizerD.step()
-
ValueError: only one element tensors can be converted to Python scalars:
Notice: connot convert list of tensors to tensor of tensors
>>> a = torch.tensor([1,2,3]) >>> b = torch.tensor([2,4,3]) >>> c = [a,b] >>> torch.tensor(c) ValueError: only one element tensors can be converted to Python scalars >>> c = torch.zeros(2,3) >>> c[0] = a >>> c[1] = b >>> c tensor([[1., 2., 3.], [2., 4., 3.]])
-
torchvision normalize does not working:
The transforms will not be called until the dataloader derived form the dataset (transform involved) is iterated.
-
torchvision.datasets.ImageFolder:
RuntimeError: Found 0 files in subfolders of: /common/users/ks1418/images/coast. Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif
The image path is correct but still cannot find the training images. The key is the folder structure:
At least one subfolder is needer. Change imaga_path to ‘ /common/users/ks1418/images/ ‘ will work.
-
CUDA out of memory, specify GPU
>>> CUDA_VISIBLE_DEVICES=0,2 python landscape_no_seg.py # use cuda parallel in code and sed ngpu=2 RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 1; 10.91 GiB total capacity; 8.79 GiB already allocated; 1.59 GiB free; 2.43 MiB cached) (malloc at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THC/THCCachingAllocator.cpp:231) >>> torch.cuda.device_count() 4 # It is said in the Rutgers CS official doc that we have 8 GPU per linux machine.. # But one student can only use 4 of them. So cuda device_count() is 4.
CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2.
So if you do
CUDA_VISIBLE_DEVICES=2
, then your gpu #2 will be denoted asgpu:0
inside tensorflow.The 2 GB in ` Tried to allocate 2.00 GiB` is related to model size, not batch size.
I reduced the number of filters and the above problem solved.
More:
-
Smaller batch size
-
torch.cuda.empty_cache()
every few minibatches -
CUDA_VISIBLE_DEVICES=0 python find_para_landscape.py
multi GPU:
1. CUDA_VISIBLE_DEVICES=2,3 python myscript.py # GPU 2 is cuda 0, GPU 3 is cuda 1 2. net = torch.nn.DataParallel(model, device_ids=[0, 1])
-
-
NVIDIA memory not deallocated after keyboard interrupt
Check the GPU state:
Chose GPU with more memory:
Check Linux process:
- RSS resident set size, the non-swapped physical memory that a task has used (in kiloBytes).
- VSZ virtual memory size of the process in KiB (1024-byte units).
Kill process using Linux command: https://blog.csdn.net/andy572633/article/details/7211546
-
ps -aux same as ps -ef : see all processes
-
ps -aux grep ks1418 : see processes only contain ‘ks1418’. ( ‘ ’ means Pipeline, the result of left is transfer to grep to filter keywords)
When investigating, we found that there’s actually a bug in python multiprocessing that might keep the child process hanging around, as zombie processes. It is not even visible to
nvidia-smi
.The solution is
kill all python
, or tops -elf | grep python
and find them andkill -9 [pid]
to them.try:
-
Log out iLab will kill all ks1418 related processes and release some of GPU memory requested by my keyboard-interrupted python programs.
-
Change model structure and adopt a smaller model.
-
Try to specify GPU id by:
CUDA_VISIBLE_DEVICES=2 python myscript.py # if no multi GPU in code
-
transforms centercrop and resize:
Training images are not of same size. Some 2048 X 2048, some 356 X 256. Some of them may not be square but rectangular. We would like to use 64 x 64 pixel images to test our idea. How to deal with this training image set?
imagenet_data = dataset.ImageFolder(dataroot,transform=transforms.Compose([ transforms.CenterCrop(64), transforms.Resize(64) transforms.ToTensor(), transforms.Normalize((0,), (1,)), ])) dataloader = Data.DataLoader(imagenet_data, batch_size=batchSize, shuffle=True, num_workers=nworkers) # This will NOT work! # I waited for a long time with this code, but only got a lot of bad images. # The key point is "centercrop and resize" # Remember centercrop will only cut a small part from the center of our image. It is NOT maximum center cut and resize! # SO if we call 'centercrop' before resize, we will get a lot of non-sence training images. # The following is the correct version: imagenet_data = dataset.ImageFolder(dataroot,transform=transforms.Compose([ transforms.Resize(64), # resize first transforms.CenterCrop(64), # centercrop second. transforms.ToTensor(), transforms.Normalize((0,), (1,)), ])) dataloader = ...
Also, this code is not perfect, because it will use these ‘transforms compose’ every time it train with one image.
I found it much faster if we prepare 64 x 64 training images beforehand:
# use torchvisio transforms to prepare training images import os import torch.utils.data as Data import torchvision.datasets as dataset import torchvision.transforms as transforms import torchvision.utils as vutils dataroot = '/common/users/ks1418/images/no_seg/street' # original images are in it subfolder 'image'. Attention: we need at least one subfolder when using ImageFolder. outf = '/common/users/ks1418/images/no_seg/street_64/image' # save new images names = [] for img in os.listdir(os.path.join(dataroot,'image')): names.append(img) # preserve the image names. Attention: this code will not keep image name unchanged. The permutation of new image names are different from before. imagenet_data = dataset.ImageFolder(dataroot,transform=transforms.Compose([ transforms.Resize(64), transforms.CenterCrop(64), transforms.ToTensor(), #This IS Needed !!! Otherwise, you got "tensor or list of tensors expected, got <class 'PIL.Image.Image'>" when calling "vutils.save_image".v ])) for cnt in range(len(imagenet_data)): img = imagenet_data[cnt][0] vutils.save_image(img,os.path.join(outf,image_names[cnt]),normalize=False) # Or: for paired images: in 'dataroot', we have two folders: image, annot for cnt in range(int(len(imagenet_data)/2)): img = imagenet_data[int(len(imagenet_data)/2)+cnt][0] vutils.save_image(img,os.path.join(outf,'image',image_names[cnt]),normalize=False) img = imagenet_data[cnt][0] vutils.save_image(img,os.path.join(outf,'annot',image_names[cnt]),normalize=False) # imagenet_data contains both image and annot. First half are annots, whose form is (annot_image_tensor,tensor[0]). The last half are images, whose form is (image_tensor,tensor[1])
Notice: You may not use ‘transforms’ in CNN layers. Because:
-
You have to convert tensor to PIL before calling ‘transforms’
-
You would need to copy tensor to CPU for the above process. This is computationally very expensive.
-
Use nn.Upsample instead. (It is also capable for down sampling).
nn.Upsample(size=16, mode=’bilinear’)( tensor1 )
-
-
Multi GPU not faster.
netG = Generator() netD = Discriminator() if torch.cuda.device_count() > 1: print("Let's use", torch.cuda.device_count(), "GPUs!") netD = nn.DataParallel(netD) netG = nn.DataParallel(netG) netG.to(device) netG.apply(weights_init) netD.to(device) netD.apply(weights_init)
Attention:
If you use batch_size=30 using a single GPU, then when you use DataParallel with 3 GPUs, you should use batch_size=90 to make a fair comparison. The point of using DataParallel is that you can use a larger batch_size which then requires less number of iterations to complete one full epoch.
Still Ongoing
…
Questions:
-
torch.nn.Conv2d
(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
-
Strange_imageLoadingIssue_from_numpy .py
-
input image normalization. 0 mean 1 var works perfectly and the result image is not bluish. Why they use 0.5,0.5?
-
torch.nn.functional.interpolate and upsample
-
autograd
Main References:
https://pytorch.org https://github.com https://discuss.pytorch.org https://stackoverflow.com https://blog.csdn.net