diff --git a/2022 ML/04 Sequence as input/Machine Learning HW4.pdf b/2022 ML/04 Sequence as input/Machine Learning HW4.pdf
new file mode 100644
index 0000000..acc22a1
Binary files /dev/null and b/2022 ML/04 Sequence as input/Machine Learning HW4.pdf differ
diff --git a/2022 ML/04 Sequence as input/hw04.ipynb b/2022 ML/04 Sequence as input/hw04.ipynb
new file mode 100644
index 0000000..d43a1f5
--- /dev/null
+++ b/2022 ML/04 Sequence as input/hw04.ipynb
@@ -0,0 +1,769 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "C_jdZ5vHJ4A9"
+ },
+ "source": [
+ "# Task description\n",
+ "- Classify the speakers of given features.\n",
+ "- Main goal: Learn how to use transformer.\n",
+ "- Baselines:\n",
+ " - Easy: Run sample code and know how to use transformer.\n",
+ " - Medium: Know how to adjust parameters of transformer.\n",
+ " - Strong: Construct [conformer](https://arxiv.org/abs/2005.08100) which is a variety of transformer. \n",
+ " - Boss: Implement [Self-Attention Pooling](https://arxiv.org/pdf/2008.01077v1.pdf) & [Additive Margin Softmax](https://arxiv.org/pdf/1801.05599.pdf) to further boost the performance.\n",
+ "\n",
+ "- Other links\n",
+ " - Kaggle: [link](https://www.kaggle.com/t/ac77388c90204a4c8daebeddd40ff916)\n",
+ " - Slide: [link](https://docs.google.com/presentation/d/1HLAj7UUIjZOycDe7DaVLSwJfXVd3bXPOyzSb6Zk3hYU/edit?usp=sharing)\n",
+ " - Data: [link](https://github.com/MachineLearningHW/ML_HW4_Dataset)\n",
+ "\n",
+ "# Download dataset\n",
+ "- Data is [here](https://github.com/MachineLearningHW/ML_HW4_Dataset)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "LhLNWB-AK2Z5"
+ },
+ "outputs": [],
+ "source": [
+ "\"\"\"\n",
+ "If the links below become inaccessible, please connect TAs.\n",
+ "\"\"\"\n",
+ "\n",
+ "!wget https://github.com/MachineLearningHW/ML_HW4_Dataset/raw/0.0.1/Dataset.tar.gz.partaa\n",
+ "!wget https://github.com/MachineLearningHW/ML_HW4_Dataset/raw/0.0.1/Dataset.tar.gz.partab\n",
+ "!wget https://github.com/MachineLearningHW/ML_HW4_Dataset/raw/0.0.1/Dataset.tar.gz.partac\n",
+ "!wget https://github.com/MachineLearningHW/ML_HW4_Dataset/raw/0.0.1/Dataset.tar.gz.partad\n",
+ "\n",
+ "!cat Dataset.tar.gz.parta* > Dataset.tar.gz\n",
+ "\n",
+ "!tar zxvf Dataset.tar.gz"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ENWVAUDVJtVY"
+ },
+ "source": [
+ "## Fix Random Seed"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "E6burzCXIyuA"
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import torch\n",
+ "import random\n",
+ "\n",
+ "def set_seed(seed):\n",
+ " np.random.seed(seed)\n",
+ " random.seed(seed)\n",
+ " torch.manual_seed(seed)\n",
+ " if torch.cuda.is_available():\n",
+ " torch.cuda.manual_seed(seed)\n",
+ " torch.cuda.manual_seed_all(seed)\n",
+ " torch.backends.cudnn.benchmark = False\n",
+ " torch.backends.cudnn.deterministic = True\n",
+ "\n",
+ "set_seed(87)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "k7dVbxW2LASN"
+ },
+ "source": [
+ "# Data\n",
+ "\n",
+ "## Dataset\n",
+ "- Original dataset is [Voxceleb2](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html).\n",
+ "- The [license](https://creativecommons.org/licenses/by/4.0/) and [complete version](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/files/license.txt) of Voxceleb2.\n",
+ "- We randomly select 600 speakers from Voxceleb2.\n",
+ "- Then preprocess the raw waveforms into mel-spectrograms.\n",
+ "\n",
+ "- Args:\n",
+ " - data_dir: The path to the data directory.\n",
+ " - metadata_path: The path to the metadata.\n",
+ " - segment_len: The length of audio segment for training. \n",
+ "- The architecture of data directory \\\\\n",
+ " - data directory \\\\\n",
+ " |---- metadata.json \\\\\n",
+ " |---- testdata.json \\\\\n",
+ " |---- mapping.json \\\\\n",
+ " |---- uttr-{random string}.pt \\\\\n",
+ "\n",
+ "- The information in metadata\n",
+ " - \"n_mels\": The dimention of mel-spectrogram.\n",
+ " - \"speakers\": A dictionary. \n",
+ " - Key: speaker ids.\n",
+ " - value: \"feature_path\" and \"mel_len\"\n",
+ "\n",
+ "\n",
+ "For efficiency, we segment the mel-spectrograms into segments in the traing step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "KpuGxl4CI2pr"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import json\n",
+ "import torch\n",
+ "import random\n",
+ "from pathlib import Path\n",
+ "from torch.utils.data import Dataset\n",
+ "from torch.nn.utils.rnn import pad_sequence\n",
+ " \n",
+ " \n",
+ "class myDataset(Dataset):\n",
+ "\tdef __init__(self, data_dir, segment_len=128):\n",
+ "\t\tself.data_dir = data_dir\n",
+ "\t\tself.segment_len = segment_len\n",
+ "\t\n",
+ "\t\t# Load the mapping from speaker neme to their corresponding id. \n",
+ "\t\tmapping_path = Path(data_dir) / \"mapping.json\"\n",
+ "\t\tmapping = json.load(mapping_path.open())\n",
+ "\t\tself.speaker2id = mapping[\"speaker2id\"]\n",
+ "\t\n",
+ "\t\t# Load metadata of training data.\n",
+ "\t\tmetadata_path = Path(data_dir) / \"metadata.json\"\n",
+ "\t\tmetadata = json.load(open(metadata_path))[\"speakers\"]\n",
+ "\t\n",
+ "\t\t# Get the total number of speaker.\n",
+ "\t\tself.speaker_num = len(metadata.keys())\n",
+ "\t\tself.data = []\n",
+ "\t\tfor speaker in metadata.keys():\n",
+ "\t\t\tfor utterances in metadata[speaker]:\n",
+ "\t\t\t\tself.data.append([utterances[\"feature_path\"], self.speaker2id[speaker]])\n",
+ " \n",
+ "\tdef __len__(self):\n",
+ "\t\t\treturn len(self.data)\n",
+ " \n",
+ "\tdef __getitem__(self, index):\n",
+ "\t\tfeat_path, speaker = self.data[index]\n",
+ "\t\t# Load preprocessed mel-spectrogram.\n",
+ "\t\tmel = torch.load(os.path.join(self.data_dir, feat_path))\n",
+ "\n",
+ "\t\t# Segmemt mel-spectrogram into \"segment_len\" frames.\n",
+ "\t\tif len(mel) > self.segment_len:\n",
+ "\t\t\t# Randomly get the starting point of the segment.\n",
+ "\t\t\tstart = random.randint(0, len(mel) - self.segment_len)\n",
+ "\t\t\t# Get a segment with \"segment_len\" frames.\n",
+ "\t\t\tmel = torch.FloatTensor(mel[start:start+self.segment_len])\n",
+ "\t\telse:\n",
+ "\t\t\tmel = torch.FloatTensor(mel)\n",
+ "\t\t# Turn the speaker id into long for computing loss later.\n",
+ "\t\tspeaker = torch.FloatTensor([speaker]).long()\n",
+ "\t\treturn mel, speaker\n",
+ " \n",
+ "\tdef get_speaker_number(self):\n",
+ "\t\treturn self.speaker_num"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "668hverTMlGN"
+ },
+ "source": [
+ "## Dataloader\n",
+ "- Split dataset into training dataset(90%) and validation dataset(10%).\n",
+ "- Create dataloader to iterate the data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "B7c2gZYoJDRS"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "from torch.utils.data import DataLoader, random_split\n",
+ "from torch.nn.utils.rnn import pad_sequence\n",
+ "\n",
+ "\n",
+ "def collate_batch(batch):\n",
+ "\t# Process features within a batch.\n",
+ "\t\"\"\"Collate a batch of data.\"\"\"\n",
+ "\tmel, speaker = zip(*batch)\n",
+ "\t# Because we train the model batch by batch, we need to pad the features in the same batch to make their lengths the same.\n",
+ "\tmel = pad_sequence(mel, batch_first=True, padding_value=-20) # pad log 10^(-20) which is very small value.\n",
+ "\t# mel: (batch size, length, 40)\n",
+ "\treturn mel, torch.FloatTensor(speaker).long()\n",
+ "\n",
+ "\n",
+ "def get_dataloader(data_dir, batch_size, n_workers):\n",
+ "\t\"\"\"Generate dataloader\"\"\"\n",
+ "\tdataset = myDataset(data_dir)\n",
+ "\tspeaker_num = dataset.get_speaker_number()\n",
+ "\t# Split dataset into training dataset and validation dataset\n",
+ "\ttrainlen = int(0.9 * len(dataset))\n",
+ "\tlengths = [trainlen, len(dataset) - trainlen]\n",
+ "\ttrainset, validset = random_split(dataset, lengths)\n",
+ "\n",
+ "\ttrain_loader = DataLoader(\n",
+ "\t\ttrainset,\n",
+ "\t\tbatch_size=batch_size,\n",
+ "\t\tshuffle=True,\n",
+ "\t\tdrop_last=True,\n",
+ "\t\tnum_workers=n_workers,\n",
+ "\t\tpin_memory=True,\n",
+ "\t\tcollate_fn=collate_batch,\n",
+ "\t)\n",
+ "\tvalid_loader = DataLoader(\n",
+ "\t\tvalidset,\n",
+ "\t\tbatch_size=batch_size,\n",
+ "\t\tnum_workers=n_workers,\n",
+ "\t\tdrop_last=True,\n",
+ "\t\tpin_memory=True,\n",
+ "\t\tcollate_fn=collate_batch,\n",
+ "\t)\n",
+ "\n",
+ "\treturn train_loader, valid_loader, speaker_num"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5FOSZYxrMqhc"
+ },
+ "source": [
+ "# Model\n",
+ "- TransformerEncoderLayer:\n",
+ " - Base transformer encoder layer in [Attention Is All You Need](https://arxiv.org/abs/1706.03762)\n",
+ " - Parameters:\n",
+ " - d_model: the number of expected features of the input (required).\n",
+ "\n",
+ " - nhead: the number of heads of the multiheadattention models (required).\n",
+ "\n",
+ " - dim_feedforward: the dimension of the feedforward network model (default=2048).\n",
+ "\n",
+ " - dropout: the dropout value (default=0.1).\n",
+ "\n",
+ " - activation: the activation function of intermediate layer, relu or gelu (default=relu).\n",
+ "\n",
+ "- TransformerEncoder:\n",
+ " - TransformerEncoder is a stack of N transformer encoder layers\n",
+ " - Parameters:\n",
+ " - encoder_layer: an instance of the TransformerEncoderLayer() class (required).\n",
+ "\n",
+ " - num_layers: the number of sub-encoder-layers in the encoder (required).\n",
+ "\n",
+ " - norm: the layer normalization component (optional)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "iXZ5B0EKJGs8"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "import torch.nn.functional as F\n",
+ "\n",
+ "\n",
+ "class Classifier(nn.Module):\n",
+ "\tdef __init__(self, d_model=80, n_spks=600, dropout=0.1):\n",
+ "\t\tsuper().__init__()\n",
+ "\t\t# Project the dimension of features from that of input into d_model.\n",
+ "\t\tself.prenet = nn.Linear(40, d_model)\n",
+ "\t\t# TODO:\n",
+ "\t\t# Change Transformer to Conformer.\n",
+ "\t\t# https://arxiv.org/abs/2005.08100\n",
+ "\t\tself.encoder_layer = nn.TransformerEncoderLayer(\n",
+ "\t\t\td_model=d_model, dim_feedforward=256, nhead=2\n",
+ "\t\t)\n",
+ "\t\t# self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=2)\n",
+ "\n",
+ "\t\t# Project the the dimension of features from d_model into speaker nums.\n",
+ "\t\tself.pred_layer = nn.Sequential(\n",
+ "\t\t\tnn.Linear(d_model, d_model),\n",
+ "\t\t\tnn.ReLU(),\n",
+ "\t\t\tnn.Linear(d_model, n_spks),\n",
+ "\t\t)\n",
+ "\n",
+ "\tdef forward(self, mels):\n",
+ "\t\t\"\"\"\n",
+ "\t\targs:\n",
+ "\t\t\tmels: (batch size, length, 40)\n",
+ "\t\treturn:\n",
+ "\t\t\tout: (batch size, n_spks)\n",
+ "\t\t\"\"\"\n",
+ "\t\t# out: (batch size, length, d_model)\n",
+ "\t\tout = self.prenet(mels)\n",
+ "\t\t# out: (length, batch size, d_model)\n",
+ "\t\tout = out.permute(1, 0, 2)\n",
+ "\t\t# The encoder layer expect features in the shape of (length, batch size, d_model).\n",
+ "\t\tout = self.encoder_layer(out)\n",
+ "\t\t# out: (batch size, length, d_model)\n",
+ "\t\tout = out.transpose(0, 1)\n",
+ "\t\t# mean pooling\n",
+ "\t\tstats = out.mean(dim=1)\n",
+ "\n",
+ "\t\t# out: (batch, n_spks)\n",
+ "\t\tout = self.pred_layer(stats)\n",
+ "\t\treturn out"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "W7yX8JinM5Ly"
+ },
+ "source": [
+ "# Learning rate schedule\n",
+ "- For transformer architecture, the design of learning rate schedule is different from that of CNN.\n",
+ "- Previous works show that the warmup of learning rate is useful for training models with transformer architectures.\n",
+ "- The warmup schedule\n",
+ " - Set learning rate to 0 in the beginning.\n",
+ " - The learning rate increases linearly from 0 to initial learning rate during warmup period."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ykt0N1nVJJi2"
+ },
+ "outputs": [],
+ "source": [
+ "import math\n",
+ "\n",
+ "import torch\n",
+ "from torch.optim import Optimizer\n",
+ "from torch.optim.lr_scheduler import LambdaLR\n",
+ "\n",
+ "\n",
+ "def get_cosine_schedule_with_warmup(\n",
+ "\toptimizer: Optimizer,\n",
+ "\tnum_warmup_steps: int,\n",
+ "\tnum_training_steps: int,\n",
+ "\tnum_cycles: float = 0.5,\n",
+ "\tlast_epoch: int = -1,\n",
+ "):\n",
+ "\t\"\"\"\n",
+ "\tCreate a schedule with a learning rate that decreases following the values of the cosine function between the\n",
+ "\tinitial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the\n",
+ "\tinitial lr set in the optimizer.\n",
+ "\n",
+ "\tArgs:\n",
+ "\t\toptimizer (:class:`~torch.optim.Optimizer`):\n",
+ "\t\tThe optimizer for which to schedule the learning rate.\n",
+ "\t\tnum_warmup_steps (:obj:`int`):\n",
+ "\t\tThe number of steps for the warmup phase.\n",
+ "\t\tnum_training_steps (:obj:`int`):\n",
+ "\t\tThe total number of training steps.\n",
+ "\t\tnum_cycles (:obj:`float`, `optional`, defaults to 0.5):\n",
+ "\t\tThe number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0\n",
+ "\t\tfollowing a half-cosine).\n",
+ "\t\tlast_epoch (:obj:`int`, `optional`, defaults to -1):\n",
+ "\t\tThe index of the last epoch when resuming training.\n",
+ "\n",
+ "\tReturn:\n",
+ "\t\t:obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.\n",
+ "\t\"\"\"\n",
+ "\tdef lr_lambda(current_step):\n",
+ "\t\t# Warmup\n",
+ "\t\tif current_step < num_warmup_steps:\n",
+ "\t\t\treturn float(current_step) / float(max(1, num_warmup_steps))\n",
+ "\t\t# decadence\n",
+ "\t\tprogress = float(current_step - num_warmup_steps) / float(\n",
+ "\t\t\tmax(1, num_training_steps - num_warmup_steps)\n",
+ "\t\t)\n",
+ "\t\treturn max(\n",
+ "\t\t\t0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress))\n",
+ "\t\t)\n",
+ "\n",
+ "\treturn LambdaLR(optimizer, lr_lambda, last_epoch)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-LN2XkteM_uH"
+ },
+ "source": [
+ "# Model Function\n",
+ "- Model forward function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "N-rr8529JMz0"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "\n",
+ "\n",
+ "def model_fn(batch, model, criterion, device):\n",
+ "\t\"\"\"Forward a batch through the model.\"\"\"\n",
+ "\n",
+ "\tmels, labels = batch\n",
+ "\tmels = mels.to(device)\n",
+ "\tlabels = labels.to(device)\n",
+ "\n",
+ "\touts = model(mels)\n",
+ "\n",
+ "\tloss = criterion(outs, labels)\n",
+ "\n",
+ "\t# Get the speaker id with highest probability.\n",
+ "\tpreds = outs.argmax(1)\n",
+ "\t# Compute accuracy.\n",
+ "\taccuracy = torch.mean((preds == labels).float())\n",
+ "\n",
+ "\treturn loss, accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cwM_xyOtNCI2"
+ },
+ "source": [
+ "# Validate\n",
+ "- Calculate accuracy of the validation set."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "YAiv6kpdJRTJ"
+ },
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "import torch\n",
+ "\n",
+ "\n",
+ "def valid(dataloader, model, criterion, device): \n",
+ "\t\"\"\"Validate on validation set.\"\"\"\n",
+ "\n",
+ "\tmodel.eval()\n",
+ "\trunning_loss = 0.0\n",
+ "\trunning_accuracy = 0.0\n",
+ "\tpbar = tqdm(total=len(dataloader.dataset), ncols=0, desc=\"Valid\", unit=\" uttr\")\n",
+ "\n",
+ "\tfor i, batch in enumerate(dataloader):\n",
+ "\t\twith torch.no_grad():\n",
+ "\t\t\tloss, accuracy = model_fn(batch, model, criterion, device)\n",
+ "\t\t\trunning_loss += loss.item()\n",
+ "\t\t\trunning_accuracy += accuracy.item()\n",
+ "\n",
+ "\t\tpbar.update(dataloader.batch_size)\n",
+ "\t\tpbar.set_postfix(\n",
+ "\t\t\tloss=f\"{running_loss / (i+1):.2f}\",\n",
+ "\t\t\taccuracy=f\"{running_accuracy / (i+1):.2f}\",\n",
+ "\t\t)\n",
+ "\n",
+ "\tpbar.close()\n",
+ "\tmodel.train()\n",
+ "\n",
+ "\treturn running_accuracy / len(dataloader)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "g6ne9G-eNEdG"
+ },
+ "source": [
+ "# Main function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Usv9s-CuJSG7"
+ },
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "from torch.optim import AdamW\n",
+ "from torch.utils.data import DataLoader, random_split\n",
+ "\n",
+ "\n",
+ "def parse_args():\n",
+ "\t\"\"\"arguments\"\"\"\n",
+ "\tconfig = {\n",
+ "\t\t\"data_dir\": \"./Dataset\",\n",
+ "\t\t\"save_path\": \"model.ckpt\",\n",
+ "\t\t\"batch_size\": 32,\n",
+ "\t\t\"n_workers\": 8,\n",
+ "\t\t\"valid_steps\": 2000,\n",
+ "\t\t\"warmup_steps\": 1000,\n",
+ "\t\t\"save_steps\": 10000,\n",
+ "\t\t\"total_steps\": 70000,\n",
+ "\t}\n",
+ "\n",
+ "\treturn config\n",
+ "\n",
+ "\n",
+ "def main(\n",
+ "\tdata_dir,\n",
+ "\tsave_path,\n",
+ "\tbatch_size,\n",
+ "\tn_workers,\n",
+ "\tvalid_steps,\n",
+ "\twarmup_steps,\n",
+ "\ttotal_steps,\n",
+ "\tsave_steps,\n",
+ "):\n",
+ "\t\"\"\"Main function.\"\"\"\n",
+ "\tdevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+ "\tprint(f\"[Info]: Use {device} now!\")\n",
+ "\n",
+ "\ttrain_loader, valid_loader, speaker_num = get_dataloader(data_dir, batch_size, n_workers)\n",
+ "\ttrain_iterator = iter(train_loader)\n",
+ "\tprint(f\"[Info]: Finish loading data!\",flush = True)\n",
+ "\n",
+ "\tmodel = Classifier(n_spks=speaker_num).to(device)\n",
+ "\tcriterion = nn.CrossEntropyLoss()\n",
+ "\toptimizer = AdamW(model.parameters(), lr=1e-3)\n",
+ "\tscheduler = get_cosine_schedule_with_warmup(optimizer, warmup_steps, total_steps)\n",
+ "\tprint(f\"[Info]: Finish creating model!\",flush = True)\n",
+ "\n",
+ "\tbest_accuracy = -1.0\n",
+ "\tbest_state_dict = None\n",
+ "\n",
+ "\tpbar = tqdm(total=valid_steps, ncols=0, desc=\"Train\", unit=\" step\")\n",
+ "\n",
+ "\tfor step in range(total_steps):\n",
+ "\t\t# Get data\n",
+ "\t\ttry:\n",
+ "\t\t\tbatch = next(train_iterator)\n",
+ "\t\texcept StopIteration:\n",
+ "\t\t\ttrain_iterator = iter(train_loader)\n",
+ "\t\t\tbatch = next(train_iterator)\n",
+ "\n",
+ "\t\tloss, accuracy = model_fn(batch, model, criterion, device)\n",
+ "\t\tbatch_loss = loss.item()\n",
+ "\t\tbatch_accuracy = accuracy.item()\n",
+ "\n",
+ "\t\t# Updata model\n",
+ "\t\tloss.backward()\n",
+ "\t\toptimizer.step()\n",
+ "\t\tscheduler.step()\n",
+ "\t\toptimizer.zero_grad()\n",
+ "\n",
+ "\t\t# Log\n",
+ "\t\tpbar.update()\n",
+ "\t\tpbar.set_postfix(\n",
+ "\t\t\tloss=f\"{batch_loss:.2f}\",\n",
+ "\t\t\taccuracy=f\"{batch_accuracy:.2f}\",\n",
+ "\t\t\tstep=step + 1,\n",
+ "\t\t)\n",
+ "\n",
+ "\t\t# Do validation\n",
+ "\t\tif (step + 1) % valid_steps == 0:\n",
+ "\t\t\tpbar.close()\n",
+ "\n",
+ "\t\t\tvalid_accuracy = valid(valid_loader, model, criterion, device)\n",
+ "\n",
+ "\t\t\t# keep the best model\n",
+ "\t\t\tif valid_accuracy > best_accuracy:\n",
+ "\t\t\t\tbest_accuracy = valid_accuracy\n",
+ "\t\t\t\tbest_state_dict = model.state_dict()\n",
+ "\n",
+ "\t\t\tpbar = tqdm(total=valid_steps, ncols=0, desc=\"Train\", unit=\" step\")\n",
+ "\n",
+ "\t\t# Save the best model so far.\n",
+ "\t\tif (step + 1) % save_steps == 0 and best_state_dict is not None:\n",
+ "\t\t\ttorch.save(best_state_dict, save_path)\n",
+ "\t\t\tpbar.write(f\"Step {step + 1}, best model saved. (accuracy={best_accuracy:.4f})\")\n",
+ "\n",
+ "\tpbar.close()\n",
+ "\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ "\tmain(**parse_args())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NLatBYAhNNMx"
+ },
+ "source": [
+ "# Inference\n",
+ "\n",
+ "## Dataset of inference"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "background_save": true
+ },
+ "id": "efS4pCmAJXJH"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import json\n",
+ "import torch\n",
+ "from pathlib import Path\n",
+ "from torch.utils.data import Dataset\n",
+ "\n",
+ "\n",
+ "class InferenceDataset(Dataset):\n",
+ "\tdef __init__(self, data_dir):\n",
+ "\t\ttestdata_path = Path(data_dir) / \"testdata.json\"\n",
+ "\t\tmetadata = json.load(testdata_path.open())\n",
+ "\t\tself.data_dir = data_dir\n",
+ "\t\tself.data = metadata[\"utterances\"]\n",
+ "\n",
+ "\tdef __len__(self):\n",
+ "\t\treturn len(self.data)\n",
+ "\n",
+ "\tdef __getitem__(self, index):\n",
+ "\t\tutterance = self.data[index]\n",
+ "\t\tfeat_path = utterance[\"feature_path\"]\n",
+ "\t\tmel = torch.load(os.path.join(self.data_dir, feat_path))\n",
+ "\n",
+ "\t\treturn feat_path, mel\n",
+ "\n",
+ "\n",
+ "def inference_collate_batch(batch):\n",
+ "\t\"\"\"Collate a batch of data.\"\"\"\n",
+ "\tfeat_paths, mels = zip(*batch)\n",
+ "\n",
+ "\treturn feat_paths, torch.stack(mels)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tl0WnYwxNK_S"
+ },
+ "source": [
+ "## Main funcrion of Inference"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "background_save": true
+ },
+ "id": "i8SAbuXEJb2A"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "import csv\n",
+ "from pathlib import Path\n",
+ "from tqdm.notebook import tqdm\n",
+ "\n",
+ "import torch\n",
+ "from torch.utils.data import DataLoader\n",
+ "\n",
+ "def parse_args():\n",
+ "\t\"\"\"arguments\"\"\"\n",
+ "\tconfig = {\n",
+ "\t\t\"data_dir\": \"./Dataset\",\n",
+ "\t\t\"model_path\": \"./model.ckpt\",\n",
+ "\t\t\"output_path\": \"./output.csv\",\n",
+ "\t}\n",
+ "\n",
+ "\treturn config\n",
+ "\n",
+ "\n",
+ "def main(\n",
+ "\tdata_dir,\n",
+ "\tmodel_path,\n",
+ "\toutput_path,\n",
+ "):\n",
+ "\t\"\"\"Main function.\"\"\"\n",
+ "\tdevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+ "\tprint(f\"[Info]: Use {device} now!\")\n",
+ "\n",
+ "\tmapping_path = Path(data_dir) / \"mapping.json\"\n",
+ "\tmapping = json.load(mapping_path.open())\n",
+ "\n",
+ "\tdataset = InferenceDataset(data_dir)\n",
+ "\tdataloader = DataLoader(\n",
+ "\t\tdataset,\n",
+ "\t\tbatch_size=1,\n",
+ "\t\tshuffle=False,\n",
+ "\t\tdrop_last=False,\n",
+ "\t\tnum_workers=8,\n",
+ "\t\tcollate_fn=inference_collate_batch,\n",
+ "\t)\n",
+ "\tprint(f\"[Info]: Finish loading data!\",flush = True)\n",
+ "\n",
+ "\tspeaker_num = len(mapping[\"id2speaker\"])\n",
+ "\tmodel = Classifier(n_spks=speaker_num).to(device)\n",
+ "\tmodel.load_state_dict(torch.load(model_path))\n",
+ "\tmodel.eval()\n",
+ "\tprint(f\"[Info]: Finish creating model!\",flush = True)\n",
+ "\n",
+ "\tresults = [[\"Id\", \"Category\"]]\n",
+ "\tfor feat_paths, mels in tqdm(dataloader):\n",
+ "\t\twith torch.no_grad():\n",
+ "\t\t\tmels = mels.to(device)\n",
+ "\t\t\touts = model(mels)\n",
+ "\t\t\tpreds = outs.argmax(1).cpu().numpy()\n",
+ "\t\t\tfor feat_path, pred in zip(feat_paths, preds):\n",
+ "\t\t\t\tresults.append([feat_path, mapping[\"id2speaker\"][str(pred)]])\n",
+ "\n",
+ "\twith open(output_path, 'w', newline='') as csvfile:\n",
+ "\t\twriter = csv.writer(csvfile)\n",
+ "\t\twriter.writerows(results)\n",
+ "\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ "\tmain(**parse_args())"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "collapsed_sections": [],
+ "name": "hw04.ipynb",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/2022 ML/HW DATA.txt b/2022 ML/HW DATA.txt
new file mode 100644
index 0000000..dd8f075
--- /dev/null
+++ b/2022 ML/HW DATA.txt
@@ -0,0 +1,3 @@
+数据集太大了,UP放在了在线网盘中。
+
+为方便所有网课资料与优质电子书籍的实时更新维护,创建了一个在线实时网盘文件夹,放在公众号【啥都会一点的研究生】,本节课对应序号【05】。
\ No newline at end of file
diff --git a/README.md b/README.md
index e4634b1..3f43a3e 100644
--- a/README.md
+++ b/README.md
@@ -7,10 +7,17 @@
## 重磅须知
```
+(重磅须知,统一说明)为方便所有网课资料与优质电子书籍的实时更新维护,创建了一个在线实时网盘文件夹,放在公众号【啥都会一点的研究生】,本节课对应序号【05】。
+
+UP将2021&2022所有作业的数据资料整理打包好了,由于文件太大,已同步放在上述所提在线网盘。
+
+在线网盘能满足该课程所需资料的全部需求,链接挂掉也会及时更新,祝大家学习顺利。
+
2022仅在2021基础上进行小补充,2021内容变成了前置知识,UP会在视频标题打上2022的标签;
ppt/pdf支持直链下载。
```
+[![BILIBILI](https://raw.githubusercontent.com/Fafa-DL/readme-data/main/gzh.png)](https://space.bilibili.com/46880349)
## 更新日志
@@ -33,6 +40,7 @@ ppt/pdf支持直链下载。
|2022/02/21|更新Lecture 1:Introductionof Deep Learning补充内容,Github排版大更新|
|2022/02/25|更新Lecture 2:What to do if my network fails to train补充内容与HW2|
|2022/03/05|更新Lecture 3:Images input,HW3|
+|2022/03/05|更新Lecture 4 Sequence as input,HW4
UP将2021&2022所有作业的数据资料整理打包好放在公众号【啥都会一点的研究生】|
****
@@ -58,6 +66,7 @@ ppt/pdf支持直链下载。
|---|---|---|---|---|
|Lecture 1|[(上)机器学习基本概念简介](https://www.bilibili.com/video/BV1Wv411h7kN?p=3)
[(下)机器学习基本概念简介](https://www.bilibili.com/video/BV1Wv411h7kN?p=4)|Video:
[2022-机器学习相关规定](https://www.bilibili.com/video/BV1Wv411h7kN?p=1)
[2022-Colab教学](https://www.bilibili.com/video/BV1Wv411h7kN?p=5)
[2022-Pytorch Tutorial 1](https://www.bilibili.com/video/BV1Wv411h7kN?p=6)
[2022-Pytorch Tutorial 2](https://www.bilibili.com/video/BV1Wv411h7kN?p=7)
PDF:
[Rules](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/rule%20(v2).pdf)
[Chinese class course intro](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/introduction%20(v2).pdf)
[Pytorch Tutorial 1](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Pytorch%20Tutorial%201.pdf)
[Pytorch Tutorial 2](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Pytorch%20Tutorial%202.pdf)
[Colab Tutorial](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Colab%20Tutorial%202022.pdf)
[Environment Setup](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/EnvironmentSetup.pdf)|[深度学习简介](https://www.bilibili.com/video/BV1Wv411h7kN?p=13)
[反向传播](https://www.bilibili.com/video/BV1Wv411h7kN?p=14)
[预测-宝可梦](https://www.bilibili.com/video/BV1Wv411h7kN?p=15)
[分类-宝可梦](https://www.bilibili.com/video/BV1Wv411h7kN?p=16)
[逻辑回归](https://www.bilibili.com/video/BV1Wv411h7kN?p=17)|[Video](https://www.bilibili.com/video/BV1Wv411h7kN?p=11)
[Slide](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/HW01.pdf)
[Code](https://colab.research.google.com/drive/1FTcG6CE-HILnvFztEFKdauMlPKfQvm5Z#scrollTo=YdttVRkAfu2t)
[Submission](https://www.kaggle.com/t/a3ebd5b5542f0f55e828d4f00de8e59a)|
|Lecture 2|[(一)局部最小值 (local minima) 与鞍点 (saddle point)](https://www.bilibili.com/video/BV1Wv411h7kN?p=19)
[(二)批次 (batch) 与动量 (momentum)](https://www.bilibili.com/video/BV1Wv411h7kN?p=20)
[(三)自动调整学习率 (Learning Rate)](https://www.bilibili.com/video/BV1Wv411h7kN?p=21)
[(四)损失函数 (Loss) 也可能有影响](https://www.bilibili.com/video/BV1Wv411h7kN?p=22)|Video:
[2022-再探宝可梦、数码宝贝分类器 — 浅谈机器学习原理](https://www.bilibili.com/video/BV1Wv411h7kN?p=23)
PDF:
[Theory](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/theory%20(v7).pdf)|[Gradient Descent (Demo by AOE)](https://www.bilibili.com/video/BV1Wv411h7kN?p=24)
[ Beyond Adam (part 1)](https://www.bilibili.com/video/BV1Wv411h7kN?p=26)
[ Beyond Adam (part 2)](https://www.bilibili.com/video/BV1Wv411h7kN?p=27)|[Video](https://www.bilibili.com/video/BV1Wv411h7kN?p=28)
[Slide](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/hw2_slides%202022.pdf)
[Code](https://colab.research.google.com/drive/1hmTFJ8hdcnqRz_0oJSXjTGhZLVU-bS1a?usp=sharing)
[Submission](https://www.kaggle.com/c/ml2022spring-hw2)|
-|Lecture 3|[卷积神经网络CNN](https://www.bilibili.com/video/BV1Wv411h7kN?p=31)|Video:
[为什么用了验证集还是过拟合](https://www.bilibili.com/video/BV1Wv411h7kN?p=32)
[鱼与熊掌可以兼得的机器学习](https://www.bilibili.com/video/BV1Wv411h7kN?p=33)
PDF:
[Validation](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/validation.pdf)
[Why Deep](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/whydeep%20(v3).pdf)|[RNN(part 1)](https://www.bilibili.com/video/BV1Wv411h7kN?p=34)
[RNN(part 2)](https://www.bilibili.com/video/BV1Wv411h7kN?p=35)
[GNN(part 1)](https://www.bilibili.com/video/BV1Wv411h7kN?p=36)
[GNN(part 2)](https://www.bilibili.com/video/BV1Wv411h7kN?p=37)|[Video](https://www.bilibili.com/video/BV1Wv411h7kN?p=38)
[Slide](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Machine%20Learning%20HW3%20-%20Image%20Classification.pdf)
[Code](https://colab.research.google.com/drive/15hMu9YiYjE_6HY99UXon2vKGk2KwugWu)
[Submission](https://www.kaggle.com/c/ml2022spring-hw3b)|
+|Lecture 3|[卷积神经网络CNN](https://www.bilibili.com/video/BV1Wv411h7kN?p=31)|Video:
[为什么用了验证集还是过拟合](https://www.bilibili.com/video/BV1Wv411h7kN?p=32)
[鱼与熊掌可以兼得的机器学习](https://www.bilibili.com/video/BV1Wv411h7kN?p=33)
PDF:
[Validation](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/validation.pdf)
[Why Deep](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/whydeep%20(v3).pdf)|[Spatial Transformer Layer](https://www.bilibili.com/video/BV1Wv411h7kN?p=34)|[Video](https://www.bilibili.com/video/BV1Wv411h7kN?p=35)
[Slide](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Machine%20Learning%20HW3%20-%20Image%20Classification.pdf)
[Code](https://colab.research.google.com/drive/15hMu9YiYjE_6HY99UXon2vKGk2KwugWu)
[Submission](https://www.kaggle.com/c/ml2022spring-hw3b)|
+|Lecture 4|[自注意力机制(Self-attention)(上)](https://www.bilibili.com/video/BV1Wv411h7kN?p=41)
[自注意力机制(Self-attention)(下)](https://www.bilibili.com/video/BV1Wv411h7kN?p=42)|Video:
[None]
PDF:
[None]|[RNN(part 1)](https://www.bilibili.com/video/BV1Wv411h7kN?p=40)
[RNN(part 2)](https://www.bilibili.com/video/BV1Wv411h7kN?p=41)
[GNN(part 1)](https://www.bilibili.com/video/BV1Wv411h7kN?p=42)
[GNN(part 2)](https://www.bilibili.com/video/BV1Wv411h7kN?p=43)|[Video](https://www.bilibili.com/video/BV1Wv411h7kN?p=45)
[Slide](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2022-course-data/Machine%20Learning%20HW4.pdf)
[Code](https://colab.research.google.com/drive/1gC2Gojv9ov9MUQ1a1WDpVBD6FOcLZsog?usp=sharing)
[Submission](https://www.kaggle.com/c/ml2022spring-hw4)|
****