blmoistawinde/ml_equations_latex: Classical ML equations in Latex, helps paper a ...

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称（OpenSource Name）：

blmoistawinde/ml_equations_latex

开源软件地址(OpenSource Url)：

https://github.com/blmoistawinde/ml_equations_latex

开源编程语言(OpenSource Language)：

HTML 99.9%

开源软件介绍(OpenSource Introduction)：

Classical ML Equations in LaTeX

A collection of classical ML equations in Latex . Some of them are provided with simple notes and paper link. Hopes to help writings such as papers and blogs.

Better viewed at https://blmoistawinde.github.io/ml_equations_latex/

Classical ML Equations in LaTeX

Model

RNNs(LSTM, GRU)

encoder hidden state $h_t$ at time step $t$ , with input token embedding $x_t$

$h_t = RNN_{enc}(x_t, h_{t-1})$

decoder hidden state $s_t$ at time step $t$ , with input token embedding $y_t$

$s_t = RNN_{dec}(y_t, s_{t-1})$

h_t = RNN_{enc}(x_t, h_{t-1})
s_t = RNN_{dec}(y_t, s_{t-1})

The $RNN_{enc}$ , $RNN_{dec}$ are usually either

LSTM (paper: Long short-term memory)
GRU (paper: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation).

Attentional Seq2seq

The attention weight $\alpha_{ij}$ , the $i$ th decoder step over the $j$ th encoder step, resulting in context vector $c_i$

$c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j$

$\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})}$

$e_{ij} = a(s_{i-1}, h_j)$

c_i = \sum_{j=1}^{T_x} \alpha_{ij}h_j

\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^{T_x} \exp(e_{ik})}

e_{ij} = a(s_{i-1}, h_j)

$a$ is an specific attention function, which can be

Bahdanau Attention

Paper: Neural Machine Translation by Jointly Learning to Align and Translate

$e_{ij} = v^T tanh(W[s_{i-1}; h_j])$

e_{ij} = v^T tanh(W[s_{i-1}; h_j])

Luong(Dot-Product) Attention

Paper: Effective Approaches to Attention-based Neural Machine Translation

If $s_i$ and $h_j$ has same number of dimension.

$e_{ij} = s_{i-1}^T h_j$

otherwise

$e_{ij} = s_{i-1}^T W h_j$

e_{ij} = s_{i-1}^T h_j

e_{ij} = s_{i-1}^T W h_j

Finally, the output $o_i$ is produced by:

$s_t = tanh(W[s_{t-1};y_t;c_t])$

$o_t = softmax(Vs_t)$

s_t = tanh(W[s_{t-1};y_t;c_t])
o_t = softmax(Vs_t)

Transformer

Paper: Attention Is All You Need

Scaled Dot-Product attention

$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V

where $\sqrt{d_k}$ is the dimension of the key vector $k$ and query vector $q$ .

Multi-head attention

$MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^O$

where

$head_i = Attention(Q W^Q_i, K W^K_i, V W^V_i)$

MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^O

head_i = Attention(Q W^Q_i, K W^K_i, V W^V_i)

Generative Adversarial Networks(GAN)

Paper: Generative Adversarial Networks

Minmax game objective

$\min_{G}\max_{D}\mathbb{E}_{x\sim p_{\text{data}}(x)}[\log{D(x)}] + \mathbb{E}_{z\sim p_{\text{z}}(z)}[1 - \log{D(G(z))}]$

\min_{G}\max_{D}\mathbb{E}_{x\sim p_{\text{data}}(x)}[\log{D(x)}] +  \mathbb{E}_{z\sim p_{\text{z}}(z)}[1 - \log{D(G(z))}]

Variational Auto-Encoder(VAE)

Paper: Auto-Encoding Variational Bayes

Reparameterization trick

To produce a latent variable z such that $z \sim q_{\mu, \sigma}(z) = \mathcal{N}(\mu, \sigma^2)$ , we sample $\epsilon \sim \mathcal{N}(0,1)$ , than z is produced by

$z = \mu + \epsilon \cdot \sigma$

z \sim q_{\mu, \sigma}(z) = \mathcal{N}(\mu, \sigma^2)
\epsilon \sim \mathcal{N}(0,1)
z = \mu + \epsilon \cdot \sigma

Above is for 1-D case. For a multi-dimensional (vector) case we use:

$\vec{\epsilon} \sim \mathcal{N}(0, \textbf{I})$

$\vec{z} \sim \mathcal{N}(\vec{\mu}, \sigma^2 \textbf{I})$

\epsilon \sim \mathcal{N}(0, \textbf{I})
\vec{z} \sim \mathcal{N}(\vec{\mu}, \sigma^2 \textbf{I})

Activations

Sigmoid

Related to Logistic Regression. For single-label/multi-label binary classification.

$\sigma(z) = \frac{1} {1 + e^{-z}}$

\sigma(z) = \frac{1} {1 + e^{-z}}

Tanh

$tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{1 - e^{-2x}}{1 + e^{-2x}}$

tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{1 - e^{-2x}}{1 + e^{-2x}}

Softmax

For multi-class single label classification.

$\sigma(z_i) = \frac{e^{z_{i}}}{\sum_{j=1}^K e^{z_{j}}} \ \ \ for\ i=1,2,\dots,K$

\sigma(z_i) = \frac{e^{z_{i}}}{\sum_{j=1}^K e^{z_{j}}} \ \ \ for\ i=1,2,\dots,K

Relu

$Relu(z) = max(0, z)$

Relu(z) = max(0, z)

Gelu

$Gelu(x) = x\Phi(x)$

where $\Phi(x)$ is the cumulative distribution function of Gaussian distribution.

Gelu(x) = x\Phi(x)

Loss

Regression

Below $x$ and $y$ are $D$ dimensional vectors, and $x_i$ denotes the value on the $i$ th dimension of $x$ .

Mean Absolute Error(MAE)

$\sum_{i=1}^{D}|x_i-y_i|$

\sum_{i=1}^{D}|x_i-y_i|

Mean Squared Error(MSE)

$\sum_{i=1}^{D}(x_i-y_i)^2$

\sum_{i=1}^{D}(x_i-y_i)^2

Huber loss

It’s less sensitive to outliers than the MSE as it treats error as square only inside an interval.

$L_{\delta}= \left\{\begin{matrix} \frac{1}{2}(y - \hat{y})^{2} & if \left | (y - \hat{y}) \right | < \delta\\ \delta ((y - \hat{y}) - \frac1 2 \delta) & otherwise \end{matrix}\right.$

L_{\delta}=
    \left\{\begin{matrix}
        \frac{1}{2}(y - \hat{y})^{2} & if \left | (y - \hat{y})  \right | < \delta\\
        \delta ((y - \hat{y}) - \frac1 2 \delta) & otherwise
    \end{matrix}\right.

Classification

Cross Entropy

In binary classification, where the number of classes $M$ equals 2, Binary Cross-Entropy(BCE) can be calculated as:

$-{(y\log(p) + (1 - y)\log(1 - p))}$

If $M > 2$ (i.e. multiclass classification), we calculate a separate loss for each class label per observation and sum the result.

$-\sum_{c=1}^My_{o,c}\log(p_{o,c})$

-{(y\log(p) + (1 - y)\log(1 - p))}

-\sum_{c=1}^My_{o,c}\log(p_{o,c})

M - number of classes

log - the natural log

y - binary indicator (0 or 1) if class label c is the correct classification for observation o

p - predicted probability observation o is of class c

Negative Loglikelihood

$NLL(y) = -{\log(p(y))}$

Minimizing negative loglikelihood

$\min_{\theta} \sum_y {-\log(p(y;\theta))}$

is equivalent to Maximum Likelihood Estimation(MLE).

$\max_{\theta} \prod_y p(y;\theta)$

Here $p(y)$ is a scaler instead of vector. It is the value of the single dimension where the ground truth $y$ lies. It is thus equivalent to cross entropy (See wiki).\

NLL(y) = -{\log(p(y))}

\min_{\theta} \sum_y {-\log(p(y;\theta))}

\max_{\theta} \prod_y p(y;\theta)

Hinge loss

Used in Support Vector Machine(SVM).

$max(0, 1 - y \cdot \hat{y})$

max(0, 1 - y \cdot \hat{y})

KL/JS divergence

$KL(\hat{y} || y) = \sum_{c=1}^{M}\hat{y}_c \log{\frac{\hat{y}_c}{y_c}}$

$JS(\hat{y} || y) = \frac{1}{2}(KL(y||\frac{y+\hat{y}}{2}) + KL(\hat{y}||\frac{y+\hat{y}}{2}))$

KL(\hat{y} || y) = \sum_{c=1}^{M}\hat{y}_c \log{\frac{\hat{y}_c}{y_c}}

JS(\hat{y} || y) = \frac{1}{2}(KL(y||\frac{y+\hat{y}}{2}) + KL(\hat{y}||\frac{y+\hat{y}}{2}))

Regularization

The $Error$ below can be any of the above loss.

L1 regularization

A regression model that uses L1 regularization technique is called Lasso Regression.

$Loss = Error(Y - \widehat{Y}) + \lambda \sum_1^n |w_i|$

Loss = Error(Y - \widehat{Y}) + \lambda \sum_1^n |w_i|

L2 regularization

A regression model that uses L1 regularization technique is called Ridge Regression.

$Loss = Error(Y - \widehat{Y}) + \lambda \sum_1^n w_i^{2}$

Loss = Error(Y - \widehat{Y}) +  \lambda \sum_1^n w_i^{2}

Metrics

Some of them overlaps with loss, like MAE, KL-divergence.

Classification

Accuracy, Precision, Recall, F1

$Accuracy = \frac{TP+TN}{TP+TN+FP+FN}$

$Precision = \frac{TP}{TP+FP}$

$Recall = \frac{TP}{TP+FN}$

$F1 = \frac{2*Precision*Recall}{Precision+Recall} = \frac{2*TP}{2*TP+FP+FN}$

Accuracy = \frac{TP+TN}{TP+TN+FP+FN}
Precision = \frac{TP}{TP+FP}
Recall = \frac{TP}{TP+FN}
F1 = \frac{2*Precision*Recall}{Precision+Recall} = \frac{2*TP}{2*TP+FP+FN}

Sensitivity, Specificity and AUC

$Sensitivity = Recall = \frac{TP}{TP+FN}$

$Specificity = \frac{TN}{FP+TN}$

Sensitivity = Recall = \frac{TP}{TP+FN}
Specificity = \frac{TN}{FP+TN}

AUC is calculated as the Area Under the $Sensitivity$ (TPR)- $(1-Specificity)$ (FPR) Curve.

Regression

MAE, MSE, equation above.

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

cheptsov/kotlin-nosql: NoSQL database query and access library for Kotlin发布时间：2022-07-07

Tinder/StateMachine: A Kotlin and Swift DSL for finite state machine发布时间：2022-07-07

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18325|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9698|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8193|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8560|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8469|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9410|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8443|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7875|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8427|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7403|2022-11-06

客服电话

电子邮件

blmoistawinde/ml_equations_latex: Classical ML equations in Latex, helps paper a ...

开源软件名称（OpenSource Name）：

开源软件地址(OpenSource Url)：

开源编程语言(OpenSource Language)：

开源软件介绍(OpenSource Introduction)：

Classical ML Equations in LaTeX

Model

RNNs(LSTM, GRU)

Attentional Seq2seq

Bahdanau Attention

Luong(Dot-Product) Attention

Transformer

Scaled Dot-Product attention

Multi-head attention

Generative Adversarial Networks(GAN)

Minmax game objective

Variational Auto-Encoder(VAE)

Reparameterization trick

Activations

Sigmoid

Tanh

Softmax

Relu

Gelu

Loss

Regression

Mean Absolute Error(MAE)

Mean Squared Error(MSE)

Huber loss

Classification

Cross Entropy

Negative Loglikelihood

Hinge loss

KL/JS divergence

Regularization

L1 regularization

L2 regularization

Metrics

Classification

Accuracy, Precision, Recall, F1

Sensitivity, Specificity and AUC

Regression

请发表评论

全部评论

上一篇：

下一篇：

xorrior/macOSTools: macOS Offensive Tool

caseypugh/minecraft: Some scripts to mak

Testinium/MobileDeviceInfo

xsfelvis/MaterialDesignStudy: A practic

学习笔记(09):MATLAB零基础入门教程-常数

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053