This page contains the additional material related to the work presented in the article:

Jesús M. Pérez, Olatz Arbelaitz and Javier Muguerza. "Driven PCTBagging: Seeking greater discriminating capacity for the same level of interpretability". XX Conference of the Spanish Association for Artificial Intelligence (CAEPIA'24).

All the tables of results can be downloaded as an OpenDocument Spreadsheet (ODS) file.

Table of Contents

1. Datasets characteristics

Table 1: Description of imbalanced datasets

2. Subsample numbers by data set to achieve the selected coverage value

Table 2: Subsample amounts for imbalanced data sets

3. Results

3.1. Discriminating capacity

Figure 1: Average AUC values for the 33 datasets

Table 3: AUC values for all algorithms over 33 datasets

Figure 2: Average balanced accuracy values for the 33 datasets

Table 4: Balanced Accuracy values for all algorithms over 33 datasets

Figure 3: Average True Positive Rate values for the 33 datasets

Table 5: True Positive Rate values for all algorithms over 33 datasets

3.2. Structural complexity

Figure 4: Average values of then number of internal nodes for the 33 datasets

Table 6: Internal Nodes values for the algorithms with explaining capacity over 33 datasets

Table 7: Average values of the number of internal nodes of all the trees of the ensembles over 33 datasets

3.3. Computational cost

Figure 5: Average construction time values for the 33 datasets

Table 8: Elapsed Time Training Rate values for all algorithms over 33 datasets

1. Datasets characteristics

This section contains the table with the characteristics for the 33 datasets from the KEEL repository used in this study. We present the datasets from the second (Imbalanced) context.

2. Subsample numbers by data set to achieve the selected coverage value

The table in this section show the number of subsamples computed for each data set for 99% coverage value.

3. Results

This section includes the complete figures of the average values and the full tables of the results related to the algorithms compared in the study (C4.5, CTC, Bagging and Driven PCTBagging for 6 different criteria and with 11 consolidation percentages) for the discriminating capacity, structural complexity, and computational cost measures.

3.1. Discriminating capacity

This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link

3.2. Structural complexity

This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link

3.3. Computational cost

This table can be downloaded as an OpenDocument Spreadsheet (ODS) file by clicking on the following link

Data set	#Atts.	#Examples	Imbalance	Size of Min. Class	Size of Maj. Class
Abalone19	8	4174	0.77%	32	4142
Yeast6	8	1484	2.49%	37	1447
Yeast5	8	1484	2.96%	44	1440
Yeast4	8	1484	3.43%	51	1433
Yeast2vs8	8	482	4.15%	20	462
Glass5	9	214	4.2%	9	205
Abalone9vs18	8	731	5.65%	41	690
Glass4	9	214	6.07%	13	201
Ecoli4	7	336	6.74%	23	313
Glass2	9	214	8.78%	19	195
Vowel0	13	988	9.01%	89	899
Page-blocks0	10	5472	10.23%	560	4912
Ecoli3	7	336	10.88%	37	299
Yeast3	8	1484	10.98%	163	1321
Glass6	9	214	13.55%	29	185
Segment0	19	2308	14.26%	329	1979
Ecoli2	7	336	15.48%	52	284
New-thyroid1	5	215	16.28%	35	180
New-thyroid2	5	215	16.89%	36	179
Ecoli1	7	336	22.92%	77	259
Vehicle0	18	846	23.64%	200	646
Glass0123vs456	9	214	23.83%	51	163
Haberman	3	306	27.42%	84	222
Vehicle1	18	846	28.37%	240	606
Vehicle2	18	846	28.37%	240	606
Vehicle3	18	846	28.37%	240	606
Yeast1	8	1484	28.91%	429	1055
Glass0	9	214	32.71%	70	144
Iris0	4	150	33.33%	50	100
Pima	8	768	34.84%	268	500
Ecoli0vs1	7	220	35%	77	143
Wisconsin	9	683	35%	239	444
Glass1	9	214	35.51%	76	138
Mean	9.39	919.94	17.61%	120	799.94
Median	8	482	15.48%	52	444

	Original		Training sample			Subsample set
Data set	Size	%Min	Size	Min. Class Size	Maj. Class Size	Size	Number
Abalone19	4174	0.77	3340	26	3314	52	585
Yeast6	1484	2.49	1188	30	1158	60	176
Yeast5	1484	2.96	1189	36	1153	72	146
Yeast4	1484	3.43	1188	41	1147	82	127
Yeast2vs8	482	4.15	387	17	370	34	98
Glass5	214	4.2	173	8	165	16	93
Abalone9vs18	731	5.65	586	34	552	68	73
Glass4	214	6.07	172	11	161	22	66
Ecoli4	336	6.74	270	19	251	38	59
Glass2	214	8.78	173	16	157	32	43
Vowel0	988	9.01	792	72	720	144	44
Page-blocks0	5472	10.23	4378	448	3930	896	39
Ecoli3	336	10.88	270	30	240	60	35
Yeast3	1484	10.98	1188	131	1057	262	35
Glass6	214	13.55	173	24	149	48	27
Segment0	2308	14.26	1848	264	1584	528	26
Ecoli2	336	15.48	270	42	228	84	23
New-thyroid1	215	16.28	173	29	144	58	21
New-thyroid2	215	16.89	173	30	143	60	20
Ecoli1	336	22.92	270	62	208	124	14
Vehicle0	846	23.64	677	160	517	320	13
Glass0123vs456	214	23.83	172	41	131	82	13
Haberman	306	27.42	246	68	178	136	10
Vehicle1	846	28.37	678	193	485	386	10
Vehicle2	846	28.37	678	193	485	386	10
Vehicle3	846	28.37	678	193	485	386	10
Yeast1	1484	28.91	1188	344	844	688	9
Glass0	214	32.71	172	56	116	112	7
Iris0	150	33.33	121	40	81	80	7
Pima	768	34.84	616	215	401	430	6
Ecoli0vs1	220	35	177	62	115	124	6
Wisconsin	683	35	548	192	356	384	6
Glass1	214	35.51	172	61	111	122	6
Mean	919.94	17.61	737.09	96.61	640.48	193.21	56
Median	482	15.48	387	42	356	84	23

Complete results and additional material for the article “Driven PCTBagging: Seeking greater discriminating capacity for the same level of interpretability”

1. Datasets characteristics

2. Subsample numbers by data set to achieve the selected coverage value

3. Results

3.1. Discriminating capacity

3.2. Structural complexity

3.3. Computational cost