lizeyujack is back

Damn · 07 Mar 2023

I’m back from pancreatic minimally invasive surgery😢.

更改链接：

阅读文章：Representation Learning with Contrastive Predictive Coding.

什么是autoregressive model? old value have specific latent effect on present values, these nums can be predicted by a statistic model- auto regressive model.

how to training with torch-leaf:

bash /cluster/home/lizeyu/Efficient/speechcommand.sh

运行后，会在/cluster/home/lizeyu/Efficient/out/models文件下生成对应model-name的子文件夹。

几个重要的参数：compression:efficient leaf（使用TBN）和leaf（使用PCEN）使用的不同（如果使用TBN会将两个图拼接在一起）

生成的前端提取图可以展示在：/cluster/home/lizeyu/Efficient/out/runs中

运行tensorboard --logdir /cluster/home/lizeyu/Efficient/out/runs指令可以执行可视化每步的样例。

efficient leaf中的前端代码部分：/cluster/home/lizeyu/Efficient/engine.py

我尝试更改了，124行代码：打印出：output shape print torch.Size([128, 1, 40, 1334]) top 16th ouput print torch.Size([16, 1, 40, 1334])

因为训练的batch size设置为128，所以文中数据的第一个size为128，但是我只需要展示几个例子。比如前16个，所以我使用for循环只取前16个。目前遇到两个问题：

音频提取之后的数据规模为16000，即，没有训练之后的数据，我需要确认一下有没有预处理操作。
使用frontend展开之后的数据目前shape为4013343 为WHC数据

查看tf如何处理speechcommand数据集的？问题：为什么我使用不同的采样率提取出来的图像的维度是不一样的？推荐使用/cluster/home/lizeyu/leafff/leaf-audio

生成的数据是二维的。但是在torch中使用二维数据的时候，默认C的数值为1。所以默认显示黑白图像。不同于plt直接绘图产生的图像