Week 14

$$\gdef \sam #1 {\mathrm{softargmax}(#1)}$$ $$\gdef \vect #1 {\boldsymbol{#1}} $$ $$\gdef \matr #1 {\boldsymbol{#1}} $$ $$\gdef \E {\mathbb{E}} $$ $$\gdef \V {\mathbb{V}} $$ $$\gdef \R {\mathbb{R}} $$ $$\gdef \N {\mathbb{N}} $$ $$\gdef \relu #1 {\texttt{ReLU}(#1)} $$ $$\gdef \D {\,\mathrm{d}} $$ $$\gdef \deriv #1 #2 {\frac{\D #1}{\D #2}}$$ $$\gdef \pd #1 #2 {\frac{\partial #1}{\partial #2}}$$ $$\gdef \set #1 {\left\lbrace #1 \right\rbrace} $$

Lecture part A

In this section, we discussed the structured prediction. We first introduced the Energy-Based factor graph and efficient inference for it. Then we gave some examples for simple Energy-Based factor graphs with “shallow” factors. Finally, we discussed the Graph Transformer Net.

Lecture part B

The second leg of the lecture further discusses the application of graphical model methods to energy-based models. After spending some time comparing different loss functions, we discuss the application of the Viterbi algorithm and forward algorithm to graphical transformer networks. We then transition to discussing the Lagrangian formulation of backpropagation and then variational inference for energy-based models.

Practicum

When training highly parametrised models such as deep neural networks there is a risk of overfitting to the training data. This leads to greater generalization error. To help reduce overfitting we can introduce regularization into our training, discouraging certain solutions to decrease the extent to which our models will fit to noise.