Yousefi09a
Supplementary materials for
Reporting bias when using real data sets to analyze classification performance
Mohammadmahdi R. Yousefi1, Jianping Hua2, Chao Sima2 and Edward R. Dougherty1,2
1 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.
2 Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, 85004, USA.
Supplementary materials for the real data
Results
|
Synthetic Data: |
Real Data: |
Synthetic Data
| Simulation Parameters | Model 1: 20 Global and 100 heterogeneous markers | Model 2: 20 Global and no heterogeneous markers | |||
| LDA |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
| 3NN |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
| Simulation Parameters | Model 1: 20 Global and 100 heterogeneous markers | Model 2: 20 Global and no heterogeneous markers | |||
| LDA |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
| 3NN |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
| Simulation Parameters | Model 1: 20 Global and 100 heterogeneous markers | Model 2: 20 Global and no heterogeneous markers | |||
| LDA |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
| 10 features (1st minimum) | 10 features (2nd minimum) | 10 features (1st minimum) | 10 features (2nd minimum) | ||
| 3NN |
equal variances (σ0=0.6, σ1=0.6) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) |
|
unequal variances (σ0=0.6, σ1=1.2) t-test, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) | 5 features (1st minimum) | 5 features (2nd minimum) | |
Real Data
| LDA | test, 60 samples | 5 features (1st minimum) | 5 features (2nd minimum) |
| 10 features (1st minimum) | 10 features (2nd minimum) | ||
| t-test+SFS, 60 samples | 5 features (1st minimum) | 5 features (2nd minimum) | |
| 10 features (1st minimum) | 10 features (2nd minimum) | ||
| 3NN | test, 60 samples | 5 features (1st minimum) | 5 features (2nd minimum) |
| t-test+SFS, 60 samples | 5 features (1st minimum) | 5 features (2nd minimum) |
| LDA | test, 60 samples t-test+SFS, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) |
| 10 features (1st minimum) | 10 features (2nd minimum) | ||
| 3NN | test, 60 samples t-test+SFS, 60 samples |
5 features (1st minimum) | 5 features (2nd minimum) |
|
test, 60 samples |
LDA, 3NN, 5 features (1st minimum) | LDA, 3NN, 5 features (2nd minimum) |
| LDA, 10 features (1st minimum) | LDA, 10 features (2nd minimum) |
|
test, 60 samples |
LDA, 3NN, 5 features (1st minimum) | LDA, 3NN, 5 features (2nd minimum) |
| LDA, 10 features (1st minimum) | LDA, 10 features (2nd minimum) |



