深度学习-Logistic回归二分类
图片存储形式
计算机中保存一幅彩色图像,需要存储3个独立矩阵(RGB)Red、Green、Blue,分别对应红、绿、蓝3个颜色通道
最后我们通常使用一个向量x来表式图像,x中包含了展开的R、G、B三个矩阵,如果R、G、B矩阵大小是6464,则对应的x的大小是64 64 * 3 \[ (x,y) \ x \in \mathbb R^{n_x} (\mathbb R表示实数域), y \in \{0,1\} \\ m个训练样本:\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})\} \\ X = \begin{bmatrix} x^{(1)}_1 \ x^{(2)}_1 \cdots x^{(m)}_1 \\ x^{(1)}_2 \ x^{(2)}_2 \cdots x^{(m)}_2 \\ \cdots \\ x^{(1)}_n \ x^{(2)}_n \cdots x^{(m)}_n \end{bmatrix} = \begin{bmatrix} (x^{(1)})^T \\ (x^{(2)})^T \\ \cdots \\ (x^{(m)})^T \end{bmatrix} (n表示特征数,m表示训练集大小) \\ Y = [y^{(1)} \ y^{(2)} \ \cdots \ y^{(m)}] \]
Logistic Regression
\[ 给定x,求\hat y = P(y=1|x) x \in \mathbb R^{n_x}(x \in \mathbb R^{n_x}:表示x是一个n维向量)\\ 参数:w \in \mathbb R^{(n_x)} \ b \in \mathbb R \\ 输出:线性回归: \hat y = w^T x + b \\ Logistic:\sigma(w^Tx+b) \\ \sigma(z) = \frac{1}{1+e^{-z}}(如图) \]
\[
也可以设定x_0 = 1,然后X \in \mathbb R^{n_x +1} \\
\hat y = \sigma(\theta^Tx) \\
\theta = \begin{bmatrix}\theta_0 \\ \theta_1 \\ \cdots \\ \theta_{n_x} \end{bmatrix} \\
\theta_0相当于b,\theta_1 - \theta_n表示w
\]
Cost Function(代价函数)
\[ Loss(error) function: \\ L(\hat y,y) = \frac{1}{2}(\hat y-y)^2 \\ 在Logistic \ Regression中:L(\hat y,y) = -(y\log\hat y+ (1-y)\log(1-\hat y)) \\ 和上面L(\hat y,y)一样,也需要使L(\hat y,y)尽量小 \\ Cost \ Function:J(w,b) = \frac{1}{m}\sum_{i=1}^mL(\hat y^{(i)},y^{(i)}) \\ = -\frac{1}{m} \sum_{i=1}^m[ y^{(i)}log\hat y^{(i)} + (1-y^{(i)})log(1-\hat y^{(i)})] \]
Loss function:适用于单一优化示例
Cost function:反映的是整个模型参数代价
Gradient Descent(梯度下降)
找到代价函数后,需要找到一组w,b来使J(w,b)最小
对w而言,暂时忽略b
\(Repeat \ \{ \\ w := w - \alpha \frac{dJ(w)}{dw}\\ \}(:=表示迭代,\alpha:表示学习率,在编写代码时我们通常将\frac{dJ(w)}{dw}写作dw变量名)\)
对J(w,b)而言,导数以为偏导数
\(w:=w-\alpha\frac{\partial J(w)}{\partial w} \\b:=b-\alpha\frac{\partial J(w,b)}{\partial b}\)

m examples: \[ \begin{array}{l} J=0;dw_i=0;dw_2=0;db=0 \\ For \ i=1 \ to \ m:\\ \ \ \ z^{(i)} = w^Tx^{(i)}+b \\ \ \ \ a^{(i)} = \sigma(z^{(i)}) \\ \ \ \ J += -[y^{(i)}\log a^{(i)} + (1-y^{(i)})\log(1-a^{(i)})] \\ \ \ \ dz^{(i)} = a^{(i)} - y^{(i)} \\ \ \ \ dw_1 += x_1^{(i)}dz^{(i)} \\ \ \ \ dw_2 += x_2^{(i)}dz^{(i)} \\ \ \ \ db += dz^{(i)} \\ J /= m;dw_1 /=m;dw_2/=m;db/=m \\ dw_1 = \frac{\partial J}{\partial w_1} \\ w_1 := w_1 - \alpha dw_1 \\ w_2 := w_2 - \alpha dw_2 \\ b := b - \alpha db \end{array} \]
使用矩阵
大量的for循环会使代码速度非常慢,使用矩阵乘法来改善
对于\(z=w^Tx+b\)
使用非向量化方法(python中):
1 | for i in range(n): |
使用向量化方法:
1 | z = np.dot(w,x) + b |
实际代码验证
1 | import time |
向量化的Logistic Regression
\[ X = \begin{bmatrix} \cdots \ \cdots \ \cdots \ \cdots \\ x^{(1)} \ x^{(2)} \ \cdots \ x^{(m)}\\ \cdots \ \cdots \ \cdots \ \cdots \end{bmatrix} \\ w^T = [w^{(1)} \ w^{(2)} \cdots w^{(n)}]\\ Z = [z^{(1)} \ z^{(2)} \cdots z^{(m)}] = w^TX +[b \ b \cdots b] \\ A = [a^{(1)} \ a^{(2)} \cdots a^{(m)}] = \sigma(z) \]
1 | z = np.dot(w.T,x) + b |
向量化Logistic Regression的梯度计算
\(dz^{(1)} = a^{(1)} - y^{(1)} \ \ \ dz^{(2)} = a^{(2)} - y^{(2)} \cdots \\ dZ = [dz^{(1)} \ dz^{(2)} \dots dz^{(m)}] \\ A = [a^{(1)} \cdots a^{(m)}] \ \ Y = [y^{(1)} \cdots y^{(m)}]\\ dZ = A-Y = [a^{(1)}-y^{(1)} \cdots a^{(m)}-y^{(m)}] \\\)
\(dw = 0 \\ dw += x^{(1)}dz^{(1)} \\ dw += x^{(2)}dz^{(2)} \\ \cdots \\ dw /= m \\ dw = \frac{1}{m}XdZ^T = \frac{1}{m}\begin{bmatrix} \cdots \ \cdots \ \cdots \\ x^{(1)} \ \cdots \ x^{(m)} \\ \cdots \ \cdots \ \cdots \end{bmatrix} \begin{bmatrix}dz^{(1)} \\ \cdots \\ dz^{(m)}\end{bmatrix}\) $db = 0 \ db += dz^{(1)} \ db += dz^{(2)} \ \ db /= m \ db = _{i=1}mdz{(i)} = np.sum(dZ) $
1 | # 向量计算db |
Python中的向量说明
1 | a = np.random.randn(5) # 生成秩为1的数组,a.shape=(5,) |