Tags Activation Function1 Backpropagation3 Batch Normalization1 Bengio1 Bias1 Bias-Variance Tradeoff1 Bigram1 Broadcasting2 Chain Rule1 Cross Entropy1 Curse of Dimensionality1 Derivative1 Efficient Inference Engine1 Exploding Gradient1 Gradient1 Gradient Descent2 Gradient Saturation1 Kaiming He Initialization1 L11 L21 Language Model1 Linear Transformation1 Linearity1 Log Likelihood1 Loss Function2 Matrix1 Maximum Likelihood Estimation1 Mini-Batch Training1 MLP1 Model Compression1 Model Smoothing2 N-gram1 Negative Log Likelihood1 Negative Log-Likelihood1 Neural Network2 NLL1 One Hot Encoding1 OOV1 Optimization3 Out of Vocabulary1 Overfitting1 Pruning1 PyTorch2 Quantization1 Regularization2 Sparsity1 Special Tokens1 Tanh1 Tokenizer1 Train Validation Test Split1 Underfitting1 Vanishing Gradient1 Variance1 Vector1 Weight Initialization2 Word Embedding1 Xavier Initialization1