Just some personal notes for pytorch, I will sort it out once I get a time.


pytorch 主要由四个主要的包组成

  1. torch: 通用的向量运算包,类似numpy,同时支持gpu 运算当把类型 cast 到 torch.cuda.TensorFloat。
  2. torch.autograd: 用来构建计算图并且自动获取梯度。
  3. torch.nn: 包含常见的网络层与损失函数的神经网络库
  4. torch.optim: 包含常见优化算法的优化库,如 sgd, adam.
import torch # arrays on GPU
import torch.autograd as autograd #build a computational graph
import torch.nn as nn ## neural net library
import torch.nn.functional as F ## most non-linearities are here
import torch.optim as optim # optimization package
  1. torch array 替代 numpy array 通过提供 GPU 支持,torch.Tensor(xxx)
  2. requiresgrad=True,默认是不求导的,autograd 提供variable对象,包含data, 以及grad,w1.grad.data.zero()
  3. optim只用来管理梯度,optim.zero_grad(), optim.step(),具体的计算由 variable 的backward 生成。
  4. nn包含funcional 和 module,一般 module 是有状态的层,functional 是无状态的层。
  5. nn.sequential, model.parameters()
  6. torch.nn.Module and implementing a forward() function that accepts a Variable() as input and produces a Variable() as output
  7. torch.save(the_model.state_dict(), PATH), model.load_state_dict(torch.load(PATH))



  • pytorch中的所有运算都是基于Tensor的,Variable只是一个Wrapper,Variable的计算的实质就是里面的Tensor在计算。Variable默认代表的是里面存储的Tensor(weights)。理解到这,我们就可以对grad进行随意操作了。
  • retain_variables=False,backward会销毁东西
  • 每个 variable 都有 requires_grad 与 volatile,两者相反,有一个是true, 那么结果 variable 也会true,两者不能同时为 true, 默认的variable 都是 false. 确定是推导阶段的时候可以设置输入为volatile, 加快计算速度,默认的虽然输入是false,但是途中有require的变量,那么就会产生require的结果。
  • grad_fn,is_leaf.
  • 变量不要直接操作


  • Each tensor has at least one dimension.
  • When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
  • in_place 不能 broadcast 形状
  • tensor不会记录路径,variable会记录路径,所以在前向计算的时候用variable,后向计算的时候取出里面的tensor来进行计算,虽然实际上的计算结果是一样的。

  • .unsqueeze(0), 添加维度1在指定维度。 squeeze移除所有1维度的。
  • nn.conv是一个对象, 可以.conv1.bias
  • variable.data得到tensor, 内存上减sub_, variable.grad.data
  • For images, packages such as Pillow, OpenCV are useful.
  • For audio, packages such as scipy and librosa
  • For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful.
  • 视觉上有torchvision, torch.utils.data.DataLoader, torchvision.datasets, import torchvision.transforms as transforms,
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
  • 生成器用iter生成一个iter,然后用next获得数据
  • _, predicted = torch.max(xx, 1)
  • 计算分类错误两种方式,一种按照类别列表进行统计,一种生成样本长度的列表,然后计算confusion.
  • cuda,只需要调用 net 的 cuda方法,然后把输入的tensor转换成cuda。
  • .clamp(min, max)
  • mm 矩阵》?
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.Linear(H, D_out),


import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os


model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

if use_gpu:
    model_conv = model_conv.cuda()

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opoosed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)


best_model_wts = model.state_dict()


注意这里的dacay lr, 在每一个iter前进行.step()一下就可以了。



  • len so that len(dataset) returns the size of the dataset.
  • getitem to support the indexing such that dataset[i] can be used to get iith sample

import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

# Ignore warnings
import warnings

plt.ion()   # interactive mode

class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, self.landmarks_frame.ix[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.ix[idx, 1:].as_matrix().astype('float')
        landmarks = landmarks.reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample


face_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',

def show_landmarks(image, landmarks):
    """Show image with landmarks"""
    plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
    plt.pause(0.001)  # pause a bit so that plots are updated

fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]

    print(i, sample['image'].shape, sample['landmarks'].shape)

    ax = plt.subplot(1, 4, i + 1)
    ax.set_title('Sample #{}'.format(i))

    if i == 3:


transform 类需要两个方法

  • init
  • call
class Rescale(object):
    """Rescale the image in a sample to a given size.

        output_size (tuple or tuple): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
                new_h, new_w = self.output_size, self.output_size * w / h
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]

        return {'image': img, 'landmarks': landmarks}

class RandomCrop(object):
    """Crop randomly the image in a sample.

        output_size (tuple or int): Desired output size. If int, square crop
            is made.

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        new_h, new_w = self.output_size

        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)

        image = image[top: top + new_h,
                      left: left + new_w]

        landmarks = landmarks - [left, top]

        return {'image': image, 'landmarks': landmarks}

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'landmarks': torch.from_numpy(landmarks)}

transformed_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['landmarks'].size())

    if i == 3:


  • batch
  • shuffle
  • parallel


dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

# Helper function to show a batch
def show_landmarks_batch(sample_batched):
    """Show image with landmarks for a batch of samples."""
    images_batch, landmarks_batch = \
            sample_batched['image'], sample_batched['landmarks']
    batch_size = len(images_batch)
    im_size = images_batch.size(2)

    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

    for i in range(batch_size):
        plt.scatter(landmarks_batch[i, :, 0].numpy() + i * im_size,
                    landmarks_batch[i, :, 1].numpy(),
                    s=10, marker='.', c='r')

        plt.title('Batch from dataloader')

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['image'].size(),

    # observe 4th batch and stop.
    if i_batch == 3:



import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,
                                             batch_size=4, shuffle=True,

总的来说,datasets提供len, getitem, transform 三个功能,dataloader也提供三个功能,batch, shuffle, num_workers


  • GRU(input_size, hidden_size, num_layers, batch_first, dropout, bidirectional)
  • h_0 (num_layers * num_directions, batch, hidden_size)
  • input (seq_len, batch, input_size)
  • output (seq_len, batch, hidden_size * num_directions)
  • h_n (num_layers * num_directions, batch, hidden_size):


>>> rnn = nn.GRU(10, 20, 2)
>>> input = Variable(torch.randn(5, 3, 10))
>>> h0 = Variable(torch.randn(2, 3, 20))
>>> output, hn = rnn(input, h0)
  • Python除了def/class/lambda 外,其他如: if/elif/else/ try/except for/while并不能改变其作用域,所以在语句内定义的变量,外部可以直接用的。
  • bmm, batch 矩阵相乘。


# Turn a Unicode string to plain ASCII, thanks to
# http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'

# Lowercase, trim, and remove non-letter characters

def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s
  • reversed
  • sorted
  • startswith
  1. Read text file and split into lines, split lines into pairs
  2. Normalize text, filter by length and content
  3. Make word lists from sentences in pairs
  • torch.LongTensor().view
  • SOS, EOS, 注意在seq2seq中,每个句子都要加上 EOS,而SOS只在
  • decoder_output.data.topk(1), topv, topi = decoder_output.data.topk(1)
  • clamp_(min, max)
  • Variable(torch.randn(content_img.data.size())).type(dtype)
  • 注意variables不是paramters. nn.Parameter(xx_Variable.data)进行初始化,就是传一个tensor进去。
  • 注意只有variable才能backward,nn.module本身只有forward。
  • 如要中间的loss进行backward,那么就必须把他封装成一个nn.module,放在里面,然后backward一下。
  • pdb.set_trace()
  • %load_ext autoload
  • %autoload 2
  • improt pbd
  • pbd.run(xxfun())
  •   p(print) 查看一个变量值
      n(next) 下一步
      s(step) 单步,可进入函数


  • torch/nn/modules/module.py
  • torch/autograd/variable.py
  • torch/optim
  • torch/utils/data 有loader和dataset类
  • torch/nn/modules/linear.py
  • torchvision/ImageFolder
  • RNN+LSTM+GRU: torch/nn/modules/rnn.py

  • retain_graph只有在.backward的时候如果需要二阶导数才设置为True.

  • half-tensor就是float的一半
  • chartensor是signed
  • tensor默认是floattensor
  • tensor本身不存放数据,内部有torch.storage,只用来看数据和操纵数据。
  • t()
  • tensor 和 storage都是在TN里面的,所以没有python源码,都是继承的c的类,
  • storage是一维的数据,Every torch.Tensor has a corresponding storage of the same data type.
  • 如 torch.FloatStorage
  • torch.numel, eye, from_numpy, linsapce, logspace, ones, *size类型注意,ones_like,
  • cat(sequence, dim=0), chunk(tensor, chunks, dim=0), torch.masked_select(x, mask), nonzero, sqeeze(input, dim=0), stack, 数据是从右往左看的。
  • rand, randn, randperm, *size,
  • In-place random sampling,
  • get_num_threads, set_num_threads
  • topk, val, index
  • sort,注意sort是升序的,默认的是last dim
  • min, 计算大小,自带broadcast, 还有一种用法就是类似topk,min(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor),不带keepdim的话会squezz一个维度
  • torch.lt, eq, 是逻辑运算
  • kthvalue,
  • (
    4 5 6
    [torch.FloatTensor of size 1x3]
    1 1 1
    [torch.LongTensor of size 1x3]
    • 注意这里topk, kthvalu返回的是一个tuple, tuple两个tensor,kth是k小,topk是最大的k
    • eq与equal不一样,equal是完全相同,eq是elemental-wise
    • var计算方差,不给dim的话默认是全部数字,sum效果一昂,std,prod, norm(计算范数, p = )如果要按dim的话,先给出p。mode, median, mean, 一般默认的dim都是-1.
    • torch.dist(x1,x2,p),计算范数距离,自动broadcast
    • cumsum, cumprod,需要dim。
    • trunc切出整数部分,tanh, tan, sqrt, sinh, sign, sin, sigmoid, rsqrt, rsqrt(平方根的倒数),round, reciprocal, remainder(divisor), fmod neg, log1p, log, frac, floor, exp, erfinv, erf, ceil,
    • pow
    • 注意broadcast只有三种情况,不相同至少一方是1或者不存在,相同。如果不是可以broadcast的情况下,如果numel相同,可以展开成1d进行计算。
    • mul, 向量乘积, lerp插值,lerp(start, end, weight, out=None), clamp(min=,max=)
torch.Tensor.bernoulli_() - in-place version of torch.bernoulli()
torch.Tensor.cauchy_() - numbers drawn from the Cauchy distribution
torch.Tensor.exponential_() - numbers drawn from the exponential distribution
torch.Tensor.geometric_() - elements drawn from the geometric distribution
torch.Tensor.log_normal_() - samples from the log-normal distribution
torch.Tensor.normal_() - in-place version of torch.normal()
torch.Tensor.random_() - numbers sampled from the discrete uniform distribution
torch.Tensor.uniform_() - numbers sampled from the uniform distribution


import cv2

Traceback (most recent call last):

file "", line 1, in

ImportError: No module named cv2

那么你需要将含 cv2.so 的目录添加到 PYTHONPATH。 例如:

$ export PYTHONPATH=/usr/local/lib/python2.7/site-packages/

  • ps -aux –sort -pcpu | head -n 10
  • opencv 和 skimage读进来的图片都是(h,w,c), BGR,
  • 最好用skimage或者opencv处理图片,格式统一为ndarray

==脚本自动化 alias==
alias格式:alias name text
/!:1 表示第1个参数
/!:2 表示第2个及余下的所有参数
如:alias test_val ‘@ z = (/!*) ;echo $z’
test_var 1 + 1

alias mykill ‘ps -u xddai | grep /!:1 | awk’ “‘“‘{print $1}’”‘|xargs kill”
如使用mykill java可以kill掉用户名为xddai的所有java进程
ps -u xddai | grep !:1 | awk ‘{print $1}’|xargs kill

  • contiguous, 有些tensor不是一整块内存,而是又不同的数据块组成,而tensor的view()操作依赖于内存是整块的,这时只需要执行以下contiguous()这个操作。相当于不是数组而是链表。需要内存放在一起才能view。
  • super就是调用父类。



class ConvOffset2D(nn.Conv2d):

    Convolutional layer responsible for learning the 2D offsets and output the
    deformed feature map using bilinear interpolation

    Note that this layer does not perform convolution on the deformed feature
    map. See get_deform_cnn in cnn.py for usage
    def __init__(self, filters, init_normal_stddev=0.01, **kwargs):

        filters : int
            Number of channel of the input feature map
        init_normal_stddev : float
            Normal kernel initialization
            Pass to superclass. See Con2d layer in pytorch
        self.filters = filters
        self._grid_param = None
        super(ConvOffset2D, self).__init__(self.filters, self.filters*2, 3, padding=1, bias=False, **kwargs)
        self.weight.data.copy_(self._init_weights(self.weight, init_normal_stddev))

    def forward(self, x):
        """Return the deformed featured map"""
        x_shape = x.size()
        offsets = super(ConvOffset2D, self).forward(x)

        # offsets: (b*c, h, w, 2)
        offsets = self._to_bc_h_w_2(offsets, x_shape)

        # x: (b*c, h, w)
        x = self._to_bc_h_w(x, x_shape)

        # X_offset: (b*c, h, w)
        x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))

        # x_offset: (b, h, w, c)
        x_offset = self._to_b_c_h_w(x_offset, x_shape)

        return x_offset

    def _get_grid(self, x):
        batch_size, input_size= x.size(0), x.size(1)
        dtype, cuda = x.data.type(), x.data.is_cuda
        if self._grid_param == (batch_size, input_size, dtype, cuda):
            return self._grid
        self._grid_param = (batch_size, input_size, dtype, cuda)
        self._grid = th_generate_grid(batch_size, input_size, dtype, cuda)
        return self._grid

    def _init_weights(weights, std):
        fan_out = weights.size(0)
        fan_in = weights.size(1) * weights.size(2) * weights.size(3)
        w = np.random.normal(0.0, std, (fan_out, fan_in))
        return torch.from_numpy(w.reshape(weights.size()))

    def _to_bc_h_w_2(x, x_shape):
        """(b, 2c, h, w) -> (b*c, h, w, 2)"""
        x = x.contiguous().view(-1, int(x_shape[2]), int(x_shape[3]), 2)
        return x

    def _to_bc_h_w(x, x_shape):
        """(b, c, h, w) -> (b*c, h, w)"""
        x = x.contiguous().view(-1, int(x_shape[2]), int(x_shape[3]))
        return x

    def _to_b_c_h_w(x, x_shape):
        """(b*c, h, w) -> (b, c, h, w)"""
        x = x.contiguous().view(-1, int(x_shape[1]), int(x_shape[2]), int(x_shape[3]))
        return x
  • np.tile 重复
  • np.expand_dims
  • np.stack合并list