2024-03-20
This page contains the additional material related to the work presented in the article:
Jesús M. Pérez, Olatz Arbelaitz and Javier Muguerza. "Driven PCTBagging: Seeking greater discriminating capacity for the same level of interpretability". XX Conference of the Spanish Association for Artificial Intelligence (CAEPIA'24).
All the tables of results can be downloaded as an OpenDocument Spreadsheet (ODS) file.
Table of Contents
Table 1: Description of imbalanced datasets
2. Subsample numbers by data set to achieve the selected coverage value
Table 2: Subsample amounts for imbalanced data sets
Figure 1: Average AUC values for the 33 datasets
Table 3: AUC values for all algorithms over 33 datasets
Figure 2: Average balanced accuracy values for the 33 datasets
Table 4: Balanced Accuracy values for all algorithms over 33 datasets
Figure 3: Average True Positive Rate values for the 33 datasets
Table 5: True Positive Rate values for all algorithms over 33 datasets
Figure 4: Average values of then number of internal nodes for the 33 datasets
Table 6: Internal Nodes values for the algorithms with explaining capacity over 33 datasets
Figure 5: Average construction time values for the 33 datasets
Table 8: Elapsed Time Training Rate values for all algorithms over 33 datasets
This section contains the table with the characteristics for the 33 datasets from the KEEL repository used in this study. We present the datasets from the second (Imbalanced) context.
Table 1: Description of imbalanced datasets
Data set |
#Atts. |
#Examples |
Imbalance |
Size of Min. Class |
Size of Maj. Class |
Abalone19 |
8 |
4174 |
0.77% |
32 |
4142 |
Yeast6 |
8 |
1484 |
2.49% |
37 |
1447 |
Yeast5 |
8 |
1484 |
2.96% |
44 |
1440 |
Yeast4 |
8 |
1484 |
3.43% |
51 |
1433 |
Yeast2vs8 |
8 |
482 |
4.15% |
20 |
462 |
Glass5 |
9 |
214 |
4.2% |
9 |
205 |
Abalone9vs18 |
8 |
731 |
5.65% |
41 |
690 |
Glass4 |
9 |
214 |
6.07% |
13 |
201 |
Ecoli4 |
7 |
336 |
6.74% |
23 |
313 |
Glass2 |
9 |
214 |
8.78% |
19 |
195 |
Vowel0 |
13 |
988 |
9.01% |
89 |
899 |
Page-blocks0 |
10 |
5472 |
10.23% |
560 |
4912 |
Ecoli3 |
7 |
336 |
10.88% |
37 |
299 |
Yeast3 |
8 |
1484 |
10.98% |
163 |
1321 |
Glass6 |
9 |
214 |
13.55% |
29 |
185 |
Segment0 |
19 |
2308 |
14.26% |
329 |
1979 |
Ecoli2 |
7 |
336 |
15.48% |
52 |
284 |
New-thyroid1 |
5 |
215 |
16.28% |
35 |
180 |
New-thyroid2 |
5 |
215 |
16.89% |
36 |
179 |
Ecoli1 |
7 |
336 |
22.92% |
77 |
259 |
Vehicle0 |
18 |
846 |
23.64% |
200 |
646 |
Glass0123vs456 |
9 |
214 |
23.83% |
51 |
163 |
Haberman |
3 |
306 |
27.42% |
84 |
222 |
Vehicle1 |
18 |
846 |
28.37% |
240 |
606 |
Vehicle2 |
18 |
846 |
28.37% |
240 |
606 |
Vehicle3 |
18 |
846 |
28.37% |
240 |
606 |
Yeast1 |
8 |
1484 |
28.91% |
429 |
1055 |
Glass0 |
9 |
214 |
32.71% |
70 |
144 |
Iris0 |
4 |
150 |
33.33% |
50 |
100 |
Pima |
8 |
768 |
34.84% |
268 |
500 |
Ecoli0vs1 |
7 |
220 |
35% |
77 |
143 |
Wisconsin |
9 |
683 |
35% |
239 |
444 |
Glass1 |
9 |
214 |
35.51% |
76 |
138 |
Mean |
9.39 |
919.94 |
17.61% |
120 |
799.94 |
Median |
8 |
482 |
15.48% |
52 |
444 |
The table in this section show the number of subsamples computed for each data set for 99% coverage value.
Table 2: Subsample amounts for imbalanced data sets
|
Original |
Training sample |
Subsample set |
||||
Data set |
Size |
%Min |
Size |
Min. Class Size |
Maj. Class Size |
Size |
Number |
Abalone19 |
4174 |
0.77 |
3340 |
26 |
3314 |
52 |
585 |
Yeast6 |
1484 |
2.49 |
1188 |
30 |
1158 |
60 |
176 |
Yeast5 |
1484 |
2.96 |
1189 |
36 |
1153 |
72 |
146 |
Yeast4 |
1484 |
3.43 |
1188 |
41 |
1147 |
82 |
127 |
Yeast2vs8 |
482 |
4.15 |
387 |
17 |
370 |
34 |
98 |
Glass5 |
214 |
4.2 |
173 |
8 |
165 |
16 |
93 |
Abalone9vs18 |
731 |
5.65 |
586 |
34 |
552 |
68 |
73 |
Glass4 |
214 |
6.07 |
172 |
11 |
161 |
22 |
66 |
Ecoli4 |
336 |
6.74 |
270 |
19 |
251 |
38 |
59 |
Glass2 |
214 |
8.78 |
173 |
16 |
157 |
32 |
43 |
Vowel0 |
988 |
9.01 |
792 |
72 |
720 |
144 |
44 |
Page-blocks0 |
5472 |
10.23 |
4378 |
448 |
3930 |
896 |
39 |
Ecoli3 |
336 |
10.88 |
270 |
30 |
240 |
60 |
35 |
Yeast3 |
1484 |
10.98 |
1188 |
131 |
1057 |
262 |
35 |
Glass6 |
214 |
13.55 |
173 |
24 |
149 |
48 |
27 |
Segment0 |
2308 |
14.26 |
1848 |
264 |
1584 |
528 |
26 |
Ecoli2 |
336 |
15.48 |
270 |
42 |
228 |
84 |
23 |
New-thyroid1 |
215 |
16.28 |
173 |
29 |
144 |
58 |
21 |
New-thyroid2 |
215 |
16.89 |
173 |
30 |
143 |
60 |
20 |
Ecoli1 |
336 |
22.92 |
270 |
62 |
208 |
124 |
14 |
Vehicle0 |
846 |
23.64 |
677 |
160 |
517 |
320 |
13 |
Glass0123vs456 |
214 |
23.83 |
172 |
41 |
131 |
82 |
13 |
Haberman |
306 |
27.42 |
246 |
68 |
178 |
136 |
10 |
Vehicle1 |
846 |
28.37 |
678 |
193 |
485 |
386 |
10 |
Vehicle2 |
846 |
28.37 |
678 |
193 |
485 |
386 |
10 |
Vehicle3 |
846 |
28.37 |
678 |
193 |
485 |
386 |
10 |
Yeast1 |
1484 |
28.91 |
1188 |
344 |
844 |
688 |
9 |
Glass0 |
214 |
32.71 |
172 |
56 |
116 |
112 |
7 |
Iris0 |
150 |
33.33 |
121 |
40 |
81 |
80 |
7 |
Pima |
768 |
34.84 |
616 |
215 |
401 |
430 |
6 |
Ecoli0vs1 |
220 |
35 |
177 |
62 |
115 |
124 |
6 |
Wisconsin |
683 |
35 |
548 |
192 |
356 |
384 |
6 |
Glass1 |
214 |
35.51 |
172 |
61 |
111 |
122 |
6 |
Mean |
919.94 |
17.61 |
737.09 |
96.61 |
640.48 |
193.21 |
56 |
Median |
482 |
15.48 |
387 |
42 |
356 |
84 |
23 |
This section includes the complete figures of the average values and the full tables of the results related to the algorithms compared in the study (C4.5, CTC, Bagging and Driven PCTBagging for 6 different criteria and with 11 consolidation percentages) for the discriminating capacity, structural complexity, and computational cost measures.
Table 3: AUC values for all algorithms over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link
Figure 2: Average balanced accuracy values for the 33 datasets
Table 4:
Balanced Accuracy values for all algorithms over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link
Figure 3: Average True Positive Rate values for the 33 datasets
Table 5:
True Positive Rate values for all algorithms over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link
Figure 4: Average values of then number of internal nodes for the 33 datasets
Table 6:
Internal Nodes values for the algorithms with explaining capacity over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link
Table 7: Average values of the number of internal nodes of all the trees of the ensembles over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link
Figure 5: Average construction time values for the 33 datasets
Table 8:
Elapsed Time Training Rate values for all algorithms over 33 datasets
This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link