a fmri_data based disease classification using ML&DL methods
将预处理后的功能磁共振数据组织成相应的文件结构,自动实现数据准备、数据分割、模型训练和结果可视化。
Organize pre-processed FMRI data referring to the corresponding file structure, and then the code will automatically realize data preparation, data segmentation, model training, and result visualization
注意在过程中为了防止程序中断需要重新运行程序花费的时间,在运行过程中会自动生成中间文件,这样即使中断了,上一步的处理结果也会以文件的形式保存下来,中间文件包括HC/MDD_splice_along_time, model等,参见Intermediate file
Note that to prevent program interruption, which will lead to re-running the program, during the running process, intermediate files will be automatically generated. In this way, even if the process is interrupted, the results of the previous step will be saved in file form. Intermediate documents include: HC/MDD_splice_along_time, model, etc., see Intermediate file
基于fmri数据的ml&dl方法疾病分类
Classification of diseases using ML and DL methods based on FMRI data
通过将预处理后的数据组织到相应的文件中,自动实现数据的划分、模型的训练和模型的可视化绘制
By organizing the pre-processed data into the corresponding files, the training of the data division model and the visualization of the model are realized automatically
对原始fMRI数据用DPABI工具进行处理后(包括脑区分割);
Raw fMRI data processed with the DPABI tool (addition with brain segmentation)
用GRENTA工具进行SFC (static functional connection)和DFC (dynamic functional connection)的提取;
GRENTA tool was used to extract SFC (static functional connection) and DFC (dynamic functional connection) features.
得到每个被试的SFC和DFC矩阵,为index.mat文件格式,最终输入文件格式应该如下:(SFC中的mat文件是2维的,DFC中的mat文件是3维的,有一个维度是时间)
The SFC and DFC matrix of each subject is obtained in the file format index.mat. The final input file format should be as follows: (mat file in SFC is 2-dimensional; the mat file in DFC is 3-dimensional, with one dimension of time.)
参考本repo中的文件格式,有一些文件夹是生成的中间文件,最初始的输入文件格式是下面这样,主要是将SFC和DFC分开放,SFC/DFC中的HC和MDD分开放,为了能容下中间文件的更好的查看方式,注意HC_Data才是存放HC数据的地方,而不是HC;HC是存放HC这一类的总目录(包括HC数据和生成的中间文件)。
For the file format in this repo, we list the initial input file structure as follows:
and note that some folders are generated as intermediate files.
- SFC_Data
- HC
- HC_Data
- 0001.mat
- 0002.mat
- 0003.mat
- Intermediate_HC_Data (generated in the running process)
- MDD
- MDD_Data
- 0001.mat
- 0002.mat
- 0003.mat
- Intermediate_MDD_Data (generated in the running process)
- DFC_Data
- HC
- HC_Data
- 0001.mat
- 0002.mat
- 0003.mat
- Intermediate_HC_Data (generated in the running process)
- MDD
- MDD_Data
- 0001.mat
- 0002.mat
- 0003.mat
- Intermediate_MDD_Data (generated in the running process)
做特征选择之后,对选择出来的特征进行生理解释,也就是进行区域的对应,找出选择出来的特征是哪些区域之间的功能连接)
After feature selection, physiological interpretation is carried out on the selected features, that is, regions related to the selected functional connection feature will be printed out)
训练和测试结束之后,绘制gridsearch的过程,绘制AUC曲线
After training and testing, draw the grid-search process and draw the AUC
分类模型pipeline
Classification model pipeline
主函数,输入HC和MDD文件的路径,FC类别,模型种类(SVM/LSTM);
进行自动特征处理,t test特征选择,SVM-RFE特征选择,SVM分类, Gridsearch寻找最佳参数,以及特征的生理解释的自动对应输出,测试结果的AUC图像的自动绘制。
The main function, Input: HC and MDD file path, SFC/DFC, model class (SVM/LSTM); Perform automatic feature processing, t-test feature selection, SVM-RFE feature selection, SVM classification, Grid search to find the best hyperparameters, as well as automatic print physiological interpretation of features (regions of the brain), automatic rendering of AUC images of test results
python main.py SVM sfc "SFC_Data\HC\HC_Data" "SFC_Data\MDD\MDD_Data" --Threshold 0.2 --atlas AAL
python
usage: main.py [-h] [--threshold THRESHOLD] [--atlas ATLAS] {SVM,LSTM,oLSTM} {DFC,SFC} hc mdd
预处理之后数据的抑郁症诊断,可以选择三种方法,一种是SFC/DFC+特征选择SVM,一种是DFC+LSTM,一种是直接LSTM
Three methods can be selected for the diagnosis of depression after data pretreatment, one is SFC/DFC+ feature selection SVM, the other is DFC+LSTM, and the other is direct LSTM
positional arguments:
{SVM,LSTM,oLSTM} 分类方法类别 Classification method
{DFC,SFC} 功能连接类别 Functional Connectivity feature categories
hc 正常组FC目录 File path of healthy control groups (HC)
mdd MDD组FC目录 File path of Major Depression Disorder groups (HC)
optional arguments:
-h, --help 帮助 HELP
--threshold THRESHOLD, -t THRESHOLD t test的阈值 t-test threshold
--atlas ATLAS, -a ATLAS 选择使用的分割图,注意要和输入的对应的地址中数据使用的地址一致,默认是AAL90 Select the brain segmentation atlas to use and be sure to match this option to the atlas used in the [DPABI](http://rfmri.org/dpabi). The default is AAL90.
参考Rationale理解
一类人的SFC/DFC矩阵拉直之后堆叠成的矩阵
直接拉伸成一维,所有人的堆叠,得到二维矩阵
对每一个人,将DFC的每一个矩阵拉伸成一维,将所有的一维向量拼接在一起;接着将所有人的拉伸拼接后的向量堆叠在一起,得到一个矩阵,一个维度是人的编号,一个维度是FC特征
一类人的DFC矩阵,每一个人的DFC中的每一个FC拉伸之后,将他们进行堆叠成一个二维矩阵,一个维度是时间;接着将所有人的矩阵堆叠在一起得到一个三维矩阵,一个维度是人编号,一个维度是时间,还有一个维度是拉伸后的FC
存储训练后的模型,之后在进行生理解释和可视化的时候,可以直接获取训练好的模型进行后续运算
SFC相关结果图
DFC相关结果图
p = Pool(processes=15)
# 把15改成你的计算机的cpu内核数
[^1]: Castellazzi, Gloria, et al. “A machine learning approach for the differential diagnosis of Alzheimer and Vascular Dementia Fed by MRI selected features.” Frontiers in neuroinformatics 14 (2020): 25.