Finished the part of logestic and add english version to neural network

5 years ago · bf1eab98b2
--- a/4_logistic_regression/3-PCA_and_Logistic_Regression.ipynb
+++ b/4_logistic_regression/3-PCA_and_Logistic_Regression.ipynb
--- a/4_logistic_regression/3-PCA_and_Logistic_Regression_EN.ipynb
+++ b/4_logistic_regression/3-PCA_and_Logistic_Regression_EN.ipynb
--- a/4_logistic_regression/PCA.ipynb
+++ b/4_logistic_regression/PCA.ipynb
@@ -0,0 +1,32 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
--- a/5_nn/1-Perceptron.ipynb
+++ b/5_nn/1-Perceptron.ipynb
@@ -248,7 +248,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
--- a/5_nn/1-Perceptron_EN.ipynb
+++ b/5_nn/1-Perceptron_EN.ipynb
@@ -4,11 +4,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 感知机\n",
    "## Perceptron\n",
    "\n",
    "感知机（perceptron）是二分类的线性分类模型，输入为实例的特征向量，输出为实例的类别（取+1和-1）。感知机对应于输入空间中将实例划分为两类的分离超平面。感知机旨在求出该超平面，为求得超平面导入了基于误分类的损失函数，利用梯度下降法 对损失函数进行最优化（最优化）。感知机的学习算法具有简单而易于实现的优点，分为原始形式和对偶形式。感知机预测是用学习得到的感知机模型对新的实例进行预测的，因此属于判别模型。感知机由Rosenblatt于1957年提出的，是神经网络和支持向量机的基础。\n",
    "Perceptron is a a linear classification model of dichotomy, the input is the egienvector of instance and the output is other category of instance(take +1 and -1). The perceptron corresponds to a separate hyperplane in which the instance is divided into two classes in the input space. The perceptron aims to find the hyperplane. In order to find the hyperplane, the loss function based on misclassification is introduced. The gradient descent method is used to optimize the loss function (optimization).Perceptron learning algorithm is simple and easy to implement. It can be divided into primitive form and dual form. Perceptron prediction is a discriminant model, beacause it uses the peceptron model through learning to predict the new instance. Perceptronsis proposed by Rosenblatt in 1957, are the basis of neural networks and support vector machines.\n",
    "\n",
    "模仿的是生物神经系统内的神经元，它能够接受来自多个源的信号输入，然后将信号转化为便于传播的信号在进行输出(在生物体内表现为电信号)。\n",
    "It imitate neurons in the biological nervous system, which can receive signals from multiple sources and then convert them into signals that are easy to transmit for output (which in the biological body is represented as electrical signals).\n",
    "\n",
    "![neuron](images/neuron.png)\n",
    "\n",
@@ -16,36 +16,39 @@
    "* nucleus - 细胞核\n",
    "* axon - 轴突\n",
    "\n",
    "心理学家Rosenblatt构想了感知机，它作为简化的数学模型解释大脑神经元如何工作：它取一组二进制输入值（附近的神经元），将每个输入值乘以一个连续值权重（每个附近神经元的突触强度），并设立一个阈值，如果这些加权输入值的和超过这个阈值，就输出1，否则输出0（同理于神经元是否放电）。对于感知机，绝大多数输入值不是一些数据，就是别的感知机的输出值。\n",
    "Psychologist Rosenblatt conceive perception machine, as a simplified mathematical model to explain how neurons in the brain work: it took a set of binary input values (nearby neurons), multiply each input value by a continuous value weight (near each neuron synaptic strength), and setting up a threshold, if the weighted input value is more than the threshold, the output is 1, otherwise 0 (similarly in the neurons discharge process). For perceptrons, most of the input values are either data or outputs from other perceptrons.\n",
    "\n",
    "唐纳德·赫布提出了一个出人意料并影响深远的想法，称知识和学习发生在大脑主要是通过神经元间突触的形成与变化，简要表述为赫布法则：\n",
    "Donald Hebb proposed an unexpected and far-reaching idea that knowledge and learning occur in the brain mainly through the formation and change of synapses between neurons, which is briefly described as Hebb's law:\n",
    "\n",
    "> 当细胞A的轴突足以接近以激发细胞B，并反复持续地对细胞B放电，一些生长过程或代谢变化将发生在某一个或这两个细胞内，以致A作为对B放电的细胞中的一个，效率增加。\n",
    "\n",
    "> When cell A's axons are close enough to excite cell B, and repeatedly and continuously discharge cell B, some growth process or metabolic changes will occur in one or both of these cells, so that A becomes more efficient as one of the cells that discharge B.\n",
    "\n",
    "感知机并没有完全遵循这个想法，**但通过调输入值的权重，可以有一个非常简单直观的学习方案：给定一个有输入输出实例的训练集，感知机应该「学习」一个函数：对每个例子，若感知机的输出值比实例低太多，则增加它的权重，否则若设比实例高太多，则减少它的权重。**\n"
    "Perception machine has not completely follow the idea. **However, by the weight of the input value, we can have a very simple and intuitive learning plan: the training set of a given an input/output instance, perception machine should \"learn\" a function: for each example, if the output value is much lower than the instance, then increase the weight of it, otherwise if the value is much higher than the instance, reduce the weight of it.**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 感知机模型\n",
    "## 1. Perceptron model\n",
    "\n",
    "假设输入空间(特征向量)为$X \\subseteq R^n$，输出空间为$Y=\\{-1, +1\\}$。输入$x \\in X$ 表示实例的特征向量，对应于输入空间的点；输出$y \\in Y$表示示例的类别。由输入空间到输出空间的函数为\n",
    "Assume that the input space(eigenvector) is $X \\subseteq R^n$, then the output space is $Y=\\{-1, +1\\}$. Input $x \\in X$ stands for the eigenvector of instance which is correspond to the point in input sapce; Input $y \\in Y$ stancds for the category of instance. The function from input space to output space is:\n",
    "\n",
    "$$\n",
    "f(x) = sign(w x + b)\n",
    "$$\n",
    "\n",
    "称为感知机。其中，参数$w$叫做权值向量，$b$称为偏置。$w·x$表示$w$和$x$的内积。$sign$为符号函数，即\n",
    "This is called perceptron. Among this, parameter $w$ is called weigth vector, $b$ is called bias. $w·x$ represent the dot product of $w$ and $x$. $sign$ is symbol function, which is:\n",
    "\n",
    "![sign_function](images/sign.png)\n",
    "\n",
    "### 1.1 几何解释    \n",
    "感知机模型是线性分类模型，感知机模型的假设空间是定义在特征空间中的所有线性分类模型，即函数集合{f|f(x)=w·x+b}。线性方程 w·x+b=0对应于特征空间Rn中的一个超平面S，其中w是超平面的法向量，b是超平面的截踞。这个超平面把特征空间划分为两部分。位于两侧的点分别为正负两类。超平面S称为分离超平面，如下图：\n",
    "### 1.1 Geometric interpretation\n",
    "\n",
    "Perceptron model is linear classification model, it assumes that the space is all linear classification model defined in egienspace, which is the function set ${f|f(x)=w·x+b}$. Liner function $w·x+b=0$ correspond to a hyperplane $S$ in eigen space $Rn$, $w$ is a normal vector of the hyperplane, and B is a truncation of the hyperplane. This hyperpalne divide eigen space into two parts. The points on both sides are positive and negative. Hyperplane S is called the separation hyperplane, as shown in the figure below:\n",
    "\n",
    "![perceptron_geometry_def](images/perceptron_geometry_def.png)\n",
    "\n",
    "### 1.2 生物学类比\n",
    "### 1.2 Biological analogy\n",
    "![perceptron_2](images/perceptron_2.PNG)\n",
    "\n",
    "\n"
@@ -55,80 +58,86 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 感知机学习策略\n",
    "## 2. Learning strategy of perceptron\n",
    "\n",
    "假设训练数据集是线性可分的，感知机学习的目标是求得一个能够将训练数据的正负实例点完全分开的分离超平面，即最终求得参数w、b。这需要一个学习策略，即定义（经验）损失函数并将损失函数最小化。\n",
    "Assume that the training data is linear separable, the goal of perceptron learing is to get a hyperpalne that can totally split the positve and negative point in the training data, that is to get the parameter w and b. This need a learning strategy, which is define loss function and minimize the loss function.\n",
    "\n",
    "损失函数的一个自然的选择是误分类的点的总数。但是这样得到的损失函数不是参数w、b的连续可导函数，不宜优化。损失函数的另一个选择是误分类点到分类面的距离之和。\n",
    "A natural selection of the loss function is the total number of misclassified points. However, the loss function obtained in this way is not a continuous differentiable function of parameters W and B, so it is not suitable for optimization. Another choice for the loss function is the sum of the distances from the misclassification point to the classification plane.\n",
    "\n",
    "Firstly, for any poitn $x_0$ the distance for it to hyperplane is:\n",
    "\n",
    "首先，对于任意一点xo到超平面的距离为\n",
    "$$\n",
    "\\frac{1}{||w||} | w \\cdot xo + b |\n",
    "$$\n",
    "\n",
    "其次，对于误分类点$(x_i,y_i)$来说 $-y_i(w \\cdot x_i + b) > 0$\n",
    "Next, for the misclassified point $(x_i,y_i)$:\n",
    "\n",
    "$-y_i(w \\cdot x_i + b) > 0$\n",
    "\n",
    "In this way, assume that the total misclassified point of hyperplane S is set M:\n",
    "\n",
    "这样，假设超平面S的总的误分类点集合为M，那么所有误分类点到S的距离之和为\n",
    "$$\n",
    "-\\frac{1}{||w||} \\sum_{x_i \\in M} y_i (w \\cdot x_i + b)\n",
    "$$\n",
    "不考虑1/||w||，就得到了感知机学习的损失函数。\n",
    "\n",
    "### 经验风险函数\n",
    "Without the consideration of 1/||w||, we can get the loss function of perceptron learning.\n",
    "\n",
    "### Empirical risk function\n",
    "\n",
    "Given a dataset $T = \\{(x_1,y_1), (x_2, y_2), ... (x_N, y_N)\\}$(among them $x_i \\in R^n$, $y_i \\in \\{-1, +1\\}，i=1,2...N$), the loss function of perceptron $sign(w·x+b)$ is defined as:\n",
    "\n",
    "给定数据集$T = \\{(x_1,y_1), (x_2, y_2), ... (x_N, y_N)\\}$（其中$x_i \\in R^n$, $y_i \\in \\{-1, +1\\}，i=1,2...N$），感知机sign(w·x+b)学习的损失函数定义为\n",
    "$$\n",
    "L(w, b) = - \\sum_{x_i \\in M} y_i (w \\cdot x_i + b)\n",
    "$$\n",
    "其中M为误分类点的集合，这个损失函数就是感知机学习的[经验风险函数](https://blog.csdn.net/zhzhx1204/article/details/70163099)。\n",
    "\n",
    "显然，损失函数L(w,b)是非负的。如果没有误分类点，那么L(w,b)为0，误分类点数越少，L(w,b)值越小。一个特定的损失函数：在误分类时是参数w,b的线性函数，在正确分类时，是0.因此，给定训练数据集T,损失函数L(w,b)是w,b的连续可导函数。\n"
    "Among them M is the set of misclassified point, and the loss funciton is the [empirical risk function](https://blog.csdn.net/zhzhx1204/article/details/70163099) of perceptron learning.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 感知机学习算法\n",
    "\n",
    "## 3. Algorithm of perceptron learning\n",
    "\n",
    "最优化问题：给定数据集$T = \\{(x_1,y_1), (x_2, y_2), ... (x_N, y_N)\\}$（其中$x_i \\in R^n$, $y_i \\in \\{-1, +1\\}，i=1,2...N$），求参数w,b,使其成为损失函数的解（M为误分类的集合）：\n",
    "Optimization problem: Given a dataset $T = \\{(x_1,y_1), (x_2, y_2), ... (x_N, y_N)\\}$(among them $x_i \\in R^n$, $y_i \\in \\{-1, +1\\}，i=1,2...N$), calcualte parameter w,b to make it the solve of loss function(M is the set of misclassified point): \n",
    "\n",
    "$$\n",
    "min_{w,b} L(w, b) =  - \\sum_{x_i \\in M} y_i (w \\cdot x_i + b)\n",
    "$$\n",
    "\n",
    "感知机学习是误分类驱动的，具体采用[随机梯度下降法](https://blog.csdn.net/zbc1090549839/article/details/38149561)。首先，任意选定$w_0$、$b_0$，然后用梯度下降法不断极小化目标函数，极小化的过程不是一次性的把M中的所有误分类点梯度下降，而是一次随机选取一个误分类点使其梯度下降。\n",
    "Perceptron learnign is driven by misclassified, it specifically use [random gradient descent method](https://blog.csdn.net/zbc1090549839/article/details/38149561). Firstly, randomly choose $w_0$、$b_0$. After that, use gradient descent method to constantly minimize object function, the minimization process is not a gradient descent of all the misclassification points in M all at once, instead, it randomly choose one misclassified point at a time to make it gradient descent.\n",
    "\n",
    "Assume that misclassified set M is fixed, then the depeth of loss function $L(w,b)$ is:\n",
    "\n",
    "假设误分类集合M是固定的，那么损失函数L(w,b)的梯度为\n",
    "$$\n",
    "\\triangledown_w L(w, b) = - \\sum_{x_i \\in M} y_i x_i \\\\\n",
    "\\triangledown_b L(w, b) = - \\sum_{x_i \\in M} y_i \\\\\n",
    "$$\n",
    "\n",
    "随机选取一个误分类点$(x_i,y_i)$,对$w,b$进行更新：\n",
    "Randomly choose a misclassified point $(x_i,y_i)$, update $w,b$:\n",
    "\n",
    "$$\n",
    "w = w + \\eta y_i x_i \\\\\n",
    "b = b + \\eta y_i\n",
    "$$\n",
    "\n",
    "式中$\\eta$（0 ≤ $ \\eta $ ≤ 1）是步长，在统计学是中成为学习速率。步长越大，梯度下降的速度越快，更能接近极小点。如果步长过大，有可能导致跨过极小点，导致函数发散；如果步长过小，有可能会耗很长时间才能达到极小点。\n",
    "In the formula $\\eta$（0 ≤ $ \\eta $ ≤ 1） is step length(In statistics, it is learning rate). TThe greater the step size is, the faster the gradient descends and the more it approaches the minimum point. If the step length is too large, it may cross the minimum point and lead to divergence of the function. If the step size is too small, it may take a long time to reach the minimum.\n",
    "\n",
    "直观解释：当一个实例点被误分类时，调整w,b，使分离超平面向该误分类点的一侧移动，以减少该误分类点与超平面的距离，直至超越该点被正确分类。\n",
    "Visually explain: when a instance point is being misclassified, adjust w,b to make hyperplane move to the side of misclassified point, so that the distance between misclassified point and hyperplane will be reduced untill pass thorough the point and correctly classify it.\n",
    "\n",
    "\n",
    "Algorithm\n",
    "\n",
    "算法\n",
    "```\n",
    "输入：T={(x1,y1),(x2,y2)...(xN,yN)}（其中xi∈X=Rn，yi∈Y={-1, +1}，i=1,2...N，学习速率为η）\n",
    "输出：w, b;感知机模型f(x)=sign(w·x+b)\n",
    "(1) 初始化w0,b0\n",
    "(2) 在训练数据集中选取（xi, yi）\n",
    "(3) 如果yi(w * xi+b)≤0\n",
    "Input：T={(x1,y1),(x2,y2)...(xN,yN)}（Among them xi∈X=Rn，yi∈Y={-1, +1}，i=1,2...N，learning rate is η）\n",
    "outout：w, b;Perceptron model: f(x)=sign(w·x+b)\n",
    "(1) Initialization: w0,b0\n",
    "(2) Choose（xi, yi）in the training dataset.\n",
    "(3) Ifyi(w * xi+b)≤0\n",
    "           w = w + ηyixi\n",
    "           b = b + ηyi\n",
    "(4) 如果所有的样本都正确分类，或者迭代次数超过设定值，则终止\n",
    "(5) 否则，跳转至（2）\n",
    "(4) If all the sample are correctly classified, or the iteration steps reaches the limit, finish the program.\n",
    "(5) Otherwise, jump to（2）\n",
    "```\n",
    "\n"
   ]
@@ -137,7 +146,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 示例程序\n"
    "## 4.Sample algorithm\n"
   ]
  },
  {
@@ -248,7 +257,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
--- a/5_nn/2-mlp_bp.ipynb
+++ b/5_nn/2-mlp_bp.ipynb
@@ -4921,7 +4921,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
--- a/5_nn/2-mlp_bp_EN.ipynb
+++ b/5_nn/2-mlp_bp_EN.ipynb
@@ -4,43 +4,48 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多层神经网络和反向传播\n"
    "# Multiple layer neural network and back propagation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 神经元\n",
    "## 1. Neurons \n",
    "\n",
    "神经元和感知器本质上是一样的，只不过我们说感知器的时候，它的激活函数是阶跃函数；而当我们说神经元时，激活函数往往选择为sigmoid函数或tanh函数。如下图所示：\n",
    "Neurons are essentially the same as perceptron, only whenm we talk about perceptron, their activation function is step function; While when we talk about neurons,  the activation function usually choose sigmoid function or tanh function. As shown in the figure below:\n",
    "\n",
    "![neuron](images/neuron.gif)\n",
    "\n",
    "计算一个神经元的输出的方法和计算一个感知器的输出是一样的。假设神经元的输入是向量$\\vec{x}$，权重向量是$\\vec{w}$(偏置项是$w_0$)，激活函数是sigmoid函数，则其输出y：\n",
    "The way to calculate the output of a neurons and calculate the output of perceptron is the same. Assume that the input of nurons is vector $\\vec{x}$, and weight vector is $\\vec{w}$(bias term is $w_0$), activation function is sigmoid function, then the output of y is:\n",
    "\n",
    "$$\n",
    "y = sigmod(\\vec{w}^T \\cdot \\vec{x})\n",
    "$$\n",
    "\n",
    "sigmoid函数的定义如下：\n",
    "The definitation of sigmoid function is as following：\n",
    "$$\n",
    "sigmod(x) = \\frac{1}{1+e^{-x}}\n",
    "$$\n",
    "将其带入前面的式子，得到\n",
    "\n",
    "Put this into the former formula, we obtain:\n",
    "\n",
    "$$\n",
    "y = \\frac{1}{1+e^{-\\vec{w}^T \\cdot \\vec{x}}}\n",
    "$$\n",
    "\n",
    "sigmoid函数是一个非线性函数，值域是(0,1)。函数图像如下图所示\n",
    "Sigmoid is a nolinear function, the domain is (0,1). The function of grapgh is shown as following:\n",
    "\n",
    "\n",
    "![sigmod_function](images/sigmod.jpg)\n",
    "\n",
    "sigmoid函数的导数是：\n",
    "The derivative of sigmoid function is：\n",
    "\\begin{eqnarray}\n",
    "y & = & sigmod(x) \\tag{1} \\\\\n",
    "y' & = & y(1-y)\n",
    "\\end{eqnarray}\n",
    "\n",
    "We can see that the derivative of sigmoid function is very interesting, it can use sigmoid function itself to represent. In this way, once the value of sigmoid funtion is being calcualted, it is very convenient to calculate the value of its derivative.\n",
    "可以看到，sigmoid函数的导数非常有趣，它可以用sigmoid函数自身来表示。这样，一旦计算出sigmoid函数的值，计算它的导数的值就非常方便。\n",
    "\n"
   ]
@@ -49,18 +54,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 神经网络是啥?\n",
    "## 2. What is neural network?\n",
    "\n",
    "![nn1](images/nn1.jpeg)\n",
    "\n",
    "神经网络其实就是按照一定规则连接起来的多个神经元。上图展示了一个全连接(full connected, FC)神经网络，通过观察上面的图，我们可以发现它的规则包括：\n",
    "Neural is actually multiple neurons connected according to certain rules. The upper graph shows a fully connected neural networks. By observing the upper graph, we can find the rule of it including:\n",
    "\n",
    "* 神经元按照层来布局。最左边的层叫做输入层，负责接收输入数据；最右边的层叫输出层，我们可以从这层获取神经网络输出数据。输入层和输出层之间的层叫做隐藏层，因为它们对于外部来说是不可见的。\n",
    "* 同一层的神经元之间没有连接。\n",
    "* 第N层的每个神经元和第N-1层的所有神经元相连(这就是full connected的含义)，第N-1层神经元的输出就是第N层神经元的输入。\n",
    "* 每个连接都有一个权值。\n",
    "* Neurons are laid out in layers. The leftmost layer, called the input layer, receives input data; The rightmost layer is called the output layer, from which we can get the neural network output data. The layers between the input and output layers are called hidden layers because they are not visible to the outside world.\n",
    "* Neurons in the same layer do not have connection with each other.\n",
    "* All the neurons in Nth layer is connect to all neurons in N-1 layer(this is the meaning of full connected), the output of N-1 layer neurons is the input of N layer's input.\n",
    "* Every connection has a weigth.\n",
    "\n",
    "上面这些规则定义了全连接神经网络的结构。事实上还存在很多其它结构的神经网络，比如卷积神经网络(CNN)、循环神经网络(RNN)，他们都具有不同的连接规则。\n"
    "All the rules defined the construction of fully connected neural networks. In fact, there exist many other kind of construction neural network, such as CNN, RNN, they all have different connect rules.\n",
    "\n"
   ]
  },
  {
@@ -4921,7 +4927,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
   "version": "3.6.8"
  }
 },
 "nbformat": 4,