Personal tools
You are here: Home Publications Supplementary Yousefi09a

Yousefi09a

Supplementary materials for

Reporting bias when using real data sets to analyze classification performance

Mohammadmahdi R. Yousefi1, Jianping Hua2, Chao Sima2 and Edward R. Dougherty1,2

1 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.
2 Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, 85004, USA.

 

Supplementary materials for the real data

 

Results

Synthetic Data:

Real Data: 

 

Synthetic Data

Simulation Parameters Model 1: 20 Global and 100 heterogeneous markers Model 2: 20 Global and no heterogeneous markers
 LDA  t-test  60 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 120 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 t-test+SFS  60 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 120 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 3NN  t-test  60 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 120 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 t-test+SFS  60 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 120 samples  σ0=0.6, σ1=0.6  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 σ0=0.6, σ1=1.2  5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 

Simulation Parameters Model 1: 20 Global and 100 heterogeneous markers Model 2: 20 Global and no heterogeneous markers
 LDA

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 3NN

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 

Simulation Parameters Model 1: 20 Global and 100 heterogeneous markers Model 2: 20 Global and no heterogeneous markers
 LDA

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 3NN

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 

Simulation Parameters Model 1: 20 Global and 100 heterogeneous markers Model 2: 20 Global and no heterogeneous markers
 LDA

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)  10 features (1st minimum)  10 features (2nd minimum)
 3NN

 equal variances 0=0.6, σ1=0.6)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 unequal variances0=0.6, σ1=1.2)

 t-test, 60 samples
 t-test, 120 samples
 t-test+SFS, 60 samples
 t-test+SFS, 120 samples

 5 features (1st minimum)  5 features (2nd minimum)  5 features (1st minimum)  5 features (2nd minimum)

 

Real Data

 LDA  test, 60 samples  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)
 t-test+SFS, 60 samples  5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)
 3NN  test, 60 samples  5 features (1st minimum)  5 features (2nd minimum)
 t-test+SFS, 60 samples  5 features (1st minimum)  5 features (2nd minimum)

 

 LDA  test, 60 samples
 t-test+SFS, 60 samples

 
 5 features (1st minimum)  5 features (2nd minimum)
 10 features (1st minimum)  10 features (2nd minimum)
 3NN  test, 60 samples
 t-test+SFS, 60 samples
 5 features (1st minimum)  5 features (2nd minimum)

 

 test, 60 samples
 t-test+SFS, 60 samples

 LDA, 3NN, 5 features (1st minimum)  LDA, 3NN, 5 features (2nd minimum)
 LDA, 10 features (1st minimum)  LDA, 10 features (2nd minimum)

 

 test, 60 samples
 t-test+SFS, 60 samples

 LDA, 3NN, 5 features (1st minimum)  LDA, 3NN, 5 features (2nd minimum)
 LDA, 10 features (1st minimum)  LDA, 10 features (2nd minimum)