1. 符号表示

首先我们将训练样本的特征矩阵X进行表示，其中N为样本个数，p为特征个数，每一行表示为每个样本，每一列表示特征的每个维度：
$\begin{gathered} \begin{pmatrix} x_{11} & x_{12} & … & x_{1p} \\ x_{21} & x_{22} & … & x_{2p} \\ … & … &… &… \\ x_{N1} & x_{N2} & … & x_{Np} \end{pmatrix} \quad \end{gathered}_{N\cdot p}$

然后我们对训练样本的标签向量Y和权重向量w进行表示，其中权重向量指的是线性回归中各个系数形成的向量。
$\begin{gathered} \begin{pmatrix} y_{1} \\ y_{2} \\ … \\ y_{N} \end{pmatrix} \quad \end{gathered}$

$\begin{gathered} \begin{pmatrix} w_{1} \\ w_{2} \\ … \\ w_{p} \end{pmatrix} \quad \end{gathered}$
为了方便运算，我们把 $y_{i} = x_{i}w + b$ 中的b也并入到w和x中。则上述的符号表示则为：

$\begin{gathered} \begin{pmatrix} x_{10} & x_{11} & x_{12} & … & x_{1p} \\ x_{20} & x_{21} & x_{22} & … & x_{2p} \\ … & … &… &… &… \\ x_{N0} & x_{N1} & x_{N2} & … & x_{Np} \end{pmatrix} \quad \end{gathered}_{N\cdot p}$

$\begin{gathered} \begin{pmatrix} w_{0} \\ w_{1} \\ w_{2} \\ … \\ w_{p} \end{pmatrix} \quad \end{gathered}$

2. 公式推导

$\sum^{N}_{i =1 } (x_{i}w – y_{i})^{2}$
$\operatorname { arg } \operatorname { min }L(w) = \operatorname { arg } \operatorname { min } \sum^{N}_{i =1 } (x_{i}w – y_{i})^{2}$
为什么是转置乘以原矩阵，这是由于Y是列向量，则 $(X W - Y)$ 则也是列向量。根据矩阵乘法的定义，只有行向量乘以列向量，最终结果才是一个常数。
$L(w) = (XW-Y)^{T} (XW-Y)$