Table of contents

Watson Studio Local benchmarks

The following benchmarks compare the training performances (CPU, memory, total time) of different machine learning models on various Spark and scikit learn configurations.

Spark 1G

Spark 1G performance

Cluster: 
Number of nodes: 6
Number of computing node: 3
Total Memory: 93G
Total CPU:    24

Spark config:
- "spark.cores.max", "12"
- "spark.dynamicAllocation.initialExecutors", "3"
- "spark.executor.cores", "3"
- "spark.executor.memory", "10g"
- "spark.driver.memory","20g"

+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+-----+
|         feature-0|         feature-1|         feature-2|         feature-3|         feature-4|         feature-5|         feature-6|         feature-7|         feature-8|         feature-9|label|
+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+-----+
| 0.989004252659766|0.9953677775517561|0.8989891554766021|0.9570283155342225|0.9279935974147131| 0.963743714080638|0.9551375087525289|0.9368142598679606|0.9553326111983994|0.9144579195459672|  8.0|
|0.7128287098648072|0.6911934939230833|0.7730268366147294|0.7223557300436229|0.7509027924929264|0.6771896127719912|0.6846048247500502|0.6955520428897617|0.6896335898440052| 0.716709909773148|  5.0|
|0.7605480360198263|0.7106410524354252|0.6961861632824142|0.7000117139651633|0.7375596451709303|0.7483861302181797|0.6718046849169641|0.7490144988489712|0.7632004596275657| 0.677716089015945|  5.0|
|0.6991799573764926|0.7081818019195293|0.6830983368462993|0.7059591466713946|0.6890492579596641|0.7279171488316457|0.7029215581616208|0.7457635660790121|0.7185676822069076|  0.76156844924655|  5.0|
|0.8948304862366993| 0.968712492610978|  0.93639433689214|  0.97452222515561|0.9445643975774145|0.8980241068202129| 0.977835313593961| 0.985066408942899|0.8930083817881528|0.9677882775453642|  7.0|
+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+-----+
only showing top 5 rows

data load time:  18.880s
================================================================================
DecisionTree Classifier
________________________________________________________________________________
Training: 
DecisionTreeClassifier_45669c7701d6533eb928
train time: 61.600s
test time:  0.184s
Accuracy = 0.624136 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       0.0|  0.0|[0.11111395446960...|
|       1.0|  0.0|[0.11112254492407...|
|       1.0|  1.0|[0.11112968588384...|
|       0.0|  0.0|[0.11113247726233...|
|       0.0|  0.0|[0.11113588069976...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111395446960547,"feature-1":0.13357554988540363,"feature-2":0.13264447250464487,"feature-3":0.18485241273761344,"feature-4":0.17151580359363663,"feature-5":0.15307238937394696,"feature-6":0.1929945128865126,"feature-7":0.1308362556010035,"feature-8":0.16030620968839052,"feature-9":0.2018223653530566,"label":0.0,"features":{"type":1,"values":[0.11111395446960547,0.13357554988540363,0.13264447250464487,0.18485241273761344,0.17151580359363663,0.15307238937394696,0.1929945128865126,0.1308362556010035,0.16030620968839052,0.2018223653530566]}}
[('DecisionTreeClassifier', 0.6241357212266664, 61.59977197647095, 0.18398308753967285)]
================================================================================
RandomForest Classifier
________________________________________________________________________________
Training: 
RandomForestClassifier_42849a1f2593b2ae943e
train time: 83.970s
test time:  0.170s
Accuracy = 0.739294 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       0.0|  0.0|[0.11111395446960...|
|       1.0|  0.0|[0.11112254492407...|
|       1.0|  1.0|[0.11112968588384...|
|       0.0|  0.0|[0.11113247726233...|
|       0.0|  0.0|[0.11113588069976...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111395446960547,"feature-1":0.13357554988540363,"feature-2":0.13264447250464487,"feature-3":0.18485241273761344,"feature-4":0.17151580359363663,"feature-5":0.15307238937394696,"feature-6":0.1929945128865126,"feature-7":0.1308362556010035,"feature-8":0.16030620968839052,"feature-9":0.2018223653530566,"label":0.0,"features":{"type":1,"values":[0.11111395446960547,0.13357554988540363,0.13264447250464487,0.18485241273761344,0.17151580359363663,0.15307238937394696,0.1929945128865126,0.1308362556010035,0.16030620968839052,0.2018223653530566]}}
[('DecisionTreeClassifier', 0.6241357212266664, 61.59977197647095, 0.18398308753967285), ('RandomForestClassifier', 0.7392938828072085, 83.96990203857422, 0.17013311386108398)]
================================================================================
Multilayer perceptron classifier
________________________________________________________________________________
Training: 
MultilayerPerceptronClassifier_4c4284c90112a2ddf273
train time: 230.339s
test time:  0.117s
Accuracy = 0.734227 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       1.0|  0.0|[0.11111395446960...|
|       1.0|  0.0|[0.11112254492407...|
|       1.0|  1.0|[0.11112968588384...|
|       1.0|  0.0|[0.11113247726233...|
|       1.0|  0.0|[0.11113588069976...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111395446960547,"feature-1":0.13357554988540363,"feature-2":0.13264447250464487,"feature-3":0.18485241273761344,"feature-4":0.17151580359363663,"feature-5":0.15307238937394696,"feature-6":0.1929945128865126,"feature-7":0.1308362556010035,"feature-8":0.16030620968839052,"feature-9":0.2018223653530566,"label":0.0,"features":{"type":1,"values":[0.11111395446960547,0.13357554988540363,0.13264447250464487,0.18485241273761344,0.17151580359363663,0.15307238937394696,0.1929945128865126,0.1308362556010035,0.16030620968839052,0.2018223653530566]}}
[('DecisionTreeClassifier', 0.6241357212266664, 61.59977197647095, 0.18398308753967285), ('RandomForestClassifier', 0.7392938828072085, 83.96990203857422, 0.17013311386108398), ('MultilayerPerceptronClassifier', 0.7342265536421766, 230.33888816833496, 0.11662507057189941)]
================================================================================
Naive Bayes classifier
________________________________________________________________________________
Training: 
NaiveBayes_46c08dda163ece366f57
train time: 23.237s
test time:  0.125s
Accuracy = 0.111285 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       3.0|  0.0|[0.11111395446960...|
|       3.0|  0.0|[0.11112254492407...|
|       3.0|  1.0|[0.11112968588384...|
|       3.0|  0.0|[0.11113247726233...|
|       3.0|  0.0|[0.11113588069976...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111395446960547,"feature-1":0.13357554988540363,"feature-2":0.13264447250464487,"feature-3":0.18485241273761344,"feature-4":0.17151580359363663,"feature-5":0.15307238937394696,"feature-6":0.1929945128865126,"feature-7":0.1308362556010035,"feature-8":0.16030620968839052,"feature-9":0.2018223653530566,"label":0.0,"features":{"type":1,"values":[0.11111395446960547,0.13357554988540363,0.13264447250464487,0.18485241273761344,0.17151580359363663,0.15307238937394696,0.1929945128865126,0.1308362556010035,0.16030620968839052,0.2018223653530566]}}
[('DecisionTreeClassifier', 0.6241357212266664, 61.59977197647095, 0.18398308753967285), ('RandomForestClassifier', 0.7392938828072085, 83.96990203857422, 0.17013311386108398), ('MultilayerPerceptronClassifier', 0.7342265536421766, 230.33888816833496, 0.11662507057189941), ('NaiveBayes', 0.11128540934687232, 23.237314224243164, 0.1246500015258789)]

CPU/memory usage for computing node 1:

Spark 1G computing node 1 performance

CPU/memory usage for computing node 2:

Spark 1G computing node 2 performance

CPU/memory usage for computing node 3:

Spark 1G computing node 3 performance

Spark 10G

Spark 10G performance

Cluster: 
Number of nodes: 6
Number of computing node: 3
Total Memory: 93G
Total CPU:    24

Spark config:
- "spark.cores.max", "12"
- "spark.dynamicAllocation.initialExecutors", "3"
- "spark.executor.cores", "3"
- "spark.executor.memory", "10g"
- "spark.driver.memory","20g"

+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+-------------------+-------------------+-------------------+-----+
|          feature-0|          feature-1|          feature-2|         feature-3|          feature-4|          feature-5|         feature-6|          feature-7|          feature-8|          feature-9|label|
+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+-------------------+-------------------+-------------------+-----+
| 0.3354375845249969|0.37020937155609673| 0.4104664233640112|0.3544423373026981| 0.4241302594908546|0.42830316945636926|0.3766371757230972| 0.3596804265605827|0.43385088256332144|  0.426189278744473|  3.0|
| 0.5798682488671502|  0.600326999736997| 0.5861009576728922|0.5720663795397962| 0.6394915076901379| 0.5997483169163339|0.5862630800074884| 0.6305120087881222| 0.6598122571392385| 0.6051241305933726|  4.0|
|0.48052402991857407| 0.5063386088894373|  0.504242632037987|0.5124342176415759| 0.5319882583215219| 0.5244008820561251|0.4795040335050947| 0.5543544358403774|  0.554115431672455|0.48962021738379036|  4.0|
|0.22597517392973515|0.33062822724082547|0.28965651147336713|0.3129544237685775|0.24754955239467163|0.27307634321726765|0.2338103191224497| 0.2659056893887223| 0.2908938218025195|0.24894491784303469|  1.0|
|0.42506373758975674| 0.3806845466325415|0.39213017857202176|0.3858984999635536|0.43947256624042075| 0.4134888059786186|0.3352964408735879|0.37313866201672596|0.41695511107216227| 0.3764819904693926|  3.0|
+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+-------------------+-------------------+-------------------+-----+
only showing top 5 rows

data load time:  70.581s
================================================================================
DecisionTree Classifier
________________________________________________________________________________
Training: 
DecisionTreeClassifier_4119a36702635c6a2d92
train time: 296.804s
test time:  0.274s
Accuracy = 0.623728 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       0.0|  0.0|[0.11111422099982...|
|       1.0|  0.0|[0.11113066012990...|
|       1.0|  0.0|[0.11113333773373...|
|       0.0|  0.0|[0.11113429800377...|
|       0.0|  0.0|[0.11113613851969...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111422099982567,"feature-1":0.11923761941247307,"feature-2":0.1167587442497096,"feature-3":0.1463104906924988,"feature-4":0.16240131325221807,"feature-5":0.1421517160152414,"feature-6":0.18504213273005424,"feature-7":0.13740978017875663,"feature-8":0.20897787218269454,"feature-9":0.21872938883973533,"label":0.0,"features":{"type":1,"values":[0.11111422099982567,0.11923761941247307,0.1167587442497096,0.1463104906924988,0.16240131325221807,0.1421517160152414,0.18504213273005424,0.13740978017875663,0.20897787218269454,0.21872938883973533]}}
[('DecisionTreeClassifier', 0.6237284081554948, 296.8044500350952, 0.2738039493560791)]
================================================================================
RandomForest Classifier
________________________________________________________________________________
Training: 
RandomForestClassifier_4cf49d901a844153cca0
train time: 434.964s
test time:  0.190s
Accuracy = 0.72293 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       0.0|  0.0|[0.11111422099982...|
|       1.0|  0.0|[0.11113066012990...|
|       1.0|  0.0|[0.11113333773373...|
|       0.0|  0.0|[0.11113429800377...|
|       0.0|  0.0|[0.11113613851969...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111422099982567,"feature-1":0.11923761941247307,"feature-2":0.1167587442497096,"feature-3":0.1463104906924988,"feature-4":0.16240131325221807,"feature-5":0.1421517160152414,"feature-6":0.18504213273005424,"feature-7":0.13740978017875663,"feature-8":0.20897787218269454,"feature-9":0.21872938883973533,"label":0.0,"features":{"type":1,"values":[0.11111422099982567,0.11923761941247307,0.1167587442497096,0.1463104906924988,0.16240131325221807,0.1421517160152414,0.18504213273005424,0.13740978017875663,0.20897787218269454,0.21872938883973533]}}
[('DecisionTreeClassifier', 0.6237284081554948, 296.8044500350952, 0.2738039493560791), ('RandomForestClassifier', 0.7229298790348314, 434.9638080596924, 0.1899569034576416)]
================================================================================
Multilayer perceptron classifier
________________________________________________________________________________
Training: 
MultilayerPerceptronClassifier_41218e2a84ed0f0fc970
train time: 1845.551s
test time:  0.194s
Accuracy = 0.702271 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       1.0|  0.0|[0.11111422099982...|
|       1.0|  0.0|[0.11113066012990...|
|       1.0|  0.0|[0.11113333773373...|
|       1.0|  0.0|[0.11113429800377...|
|       1.0|  0.0|[0.11113613851969...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111422099982567,"feature-1":0.11923761941247307,"feature-2":0.1167587442497096,"feature-3":0.1463104906924988,"feature-4":0.16240131325221807,"feature-5":0.1421517160152414,"feature-6":0.18504213273005424,"feature-7":0.13740978017875663,"feature-8":0.20897787218269454,"feature-9":0.21872938883973533,"label":0.0,"features":{"type":1,"values":[0.11111422099982567,0.11923761941247307,0.1167587442497096,0.1463104906924988,0.16240131325221807,0.1421517160152414,0.18504213273005424,0.13740978017875663,0.20897787218269454,0.21872938883973533]}}
[('DecisionTreeClassifier', 0.6237284081554948, 296.8044500350952, 0.2738039493560791), ('RandomForestClassifier', 0.7229298790348314, 434.9638080596924, 0.1899569034576416), ('MultilayerPerceptronClassifier', 0.702271125636979, 1845.5510170459747, 0.1942911148071289)]
================================================================================
Naive Bayes classifier
________________________________________________________________________________
Training: 
NaiveBayes_47db8b4d8794e011a5f1
train time: 182.231s
test time:  0.274s
Accuracy = 0.111094 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       8.0|  0.0|[0.11111422099982...|
|       8.0|  0.0|[0.11113066012990...|
|       8.0|  0.0|[0.11113333773373...|
|       8.0|  0.0|[0.11113429800377...|
|       8.0|  0.0|[0.11113613851969...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111422099982567,"feature-1":0.11923761941247307,"feature-2":0.1167587442497096,"feature-3":0.1463104906924988,"feature-4":0.16240131325221807,"feature-5":0.1421517160152414,"feature-6":0.18504213273005424,"feature-7":0.13740978017875663,"feature-8":0.20897787218269454,"feature-9":0.21872938883973533,"label":0.0,"features":{"type":1,"values":[0.11111422099982567,0.11923761941247307,0.1167587442497096,0.1463104906924988,0.16240131325221807,0.1421517160152414,0.18504213273005424,0.13740978017875663,0.20897787218269454,0.21872938883973533]}}
[('DecisionTreeClassifier', 0.6237284081554948, 296.8044500350952, 0.2738039493560791), ('RandomForestClassifier', 0.7229298790348314, 434.9638080596924, 0.1899569034576416), ('MultilayerPerceptronClassifier', 0.702271125636979, 1845.5510170459747, 0.1942911148071289), ('NaiveBayes', 0.11109381912662347, 182.2313630580902, 0.27434706687927246)]

CPU/memory usage

CPU/memory usage for computing node 1:

Spark 10G computing node 1 performance

CPU/memory usage for computing node 2:

Spark 10G computing node 2 performance

CPU/memory usage for computing node 3:

Spark 10G computing node 3 performance

Spark 100G

Cluster:

Number of nodes: 6
Number of computing node: 6
Total Memory: 156G
Total CPU: 24
Settings

"spark.cores.max", "12"
"spark.dynamicAllocation.initialExecutors", "6"
"spark.executor.cores", "4"
"spark.executor.memory", "20g"
"spark.driver.memory","20g"

+-------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+-----+
|          feature-0|          feature-1|          feature-2|         feature-3|          feature-4|         feature-5|         feature-6|          feature-7|          feature-8|          feature-9|label|
+-------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+-----+
|0.13127255170763547|0.14840479169954554|0.12079972535203576|0.2193642768983671|0.22047786603768219|0.1552880949369035|0.1556900992441466|0.12946275162880627|0.21835473825347357|0.18064131908136474|  1.0|
| 0.7586591771310679| 0.7238592496635637| 0.7519515173495599|0.7590457457758281| 0.7325192974259364|0.6992064240123037|0.6705297329957295|  0.736623660254258| 0.7443096756484171| 0.7485795738337745|  6.0|
| 1.0076245012783538| 1.0681225893380943| 1.0572184602195078|1.0539513032194112| 1.0154517128486822|1.1061944603672609|  1.05310030697752| 1.0249307354865218| 1.0774196947538925| 1.0138093756891224|  8.0|
| 0.8379034147384927| 0.8356543127578707| 0.8057539822777369|0.8766350135270392| 0.8886019423663701|0.7829345672168186|0.8465105830922769| 0.8367542390338352|  0.852374392500937| 0.8860022290031622|  7.0|
| 1.0880168929111156| 1.0977624319399963|  1.052081314664641|1.0039713508532517| 1.0489643018565384|1.0839021696813795|1.0957279111290055| 1.0214993718641856| 1.0212466375773153|   1.03933494340771|  8.0|
+-------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+-----+
only showing top 5 rows

data load time:  3340.904s
================================================================================
DecisionTree Classifier
________________________________________________________________________________
Training: 
DecisionTreeClassifier_4174b859c0565910af44
train time: 15047.069s
test time:  0.400s
Accuracy = 0.621548 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       1.0|  1.0|[0.11111123608384...|
|       0.0|  0.0|[0.11111256373457...|
|       1.0|  1.0|[0.11111279432466...|
|       1.0|  0.0|[0.11111429440031...|
|       1.0|  0.0|[0.11111861228207...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111123608384078,"feature-1":0.21955310736354489,"feature-2":0.16576053668262797,"feature-3":0.2034375736372112,"feature-4":0.15987334982766666,"feature-5":0.21755236938105182,"feature-6":0.14515929439887915,"feature-7":0.1879961151379847,"feature-8":0.16703210189013706,"feature-9":0.1783399960581859,"label":1.0,"features":{"type":1,"values":[0.11111123608384078,0.21955310736354489,0.16576053668262797,0.2034375736372112,0.15987334982766666,0.21755236938105182,0.14515929439887915,0.1879961151379847,0.16703210189013706,0.1783399960581859]}}
[('DecisionTreeClassifier', 0.6215479812883834, 15047.068594932556, 0.39990687370300293)]
================================================================================
RandomForest Classifier
________________________________________________________________________________
Training: 
RandomForestClassifier_4973a8845c59e4d6dc39
train time: 12702.197s
test time:  0.317s
Accuracy = 0.717335 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       1.0|  1.0|[0.11111123608384...|
|       1.0|  0.0|[0.11111256373457...|
|       1.0|  1.0|[0.11111279432466...|
|       1.0|  0.0|[0.11111429440031...|
|       0.0|  0.0|[0.11111861228207...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111123608384078,"feature-1":0.21955310736354489,"feature-2":0.16576053668262797,"feature-3":0.2034375736372112,"feature-4":0.15987334982766666,"feature-5":0.21755236938105182,"feature-6":0.14515929439887915,"feature-7":0.1879961151379847,"feature-8":0.16703210189013706,"feature-9":0.1783399960581859,"label":1.0,"features":{"type":1,"values":[0.11111123608384078,0.21955310736354489,0.16576053668262797,0.2034375736372112,0.15987334982766666,0.21755236938105182,0.14515929439887915,0.1879961151379847,0.16703210189013706,0.1783399960581859]}}
[('DecisionTreeClassifier', 0.6215479812883834, 15047.068594932556, 0.39990687370300293), ('RandomForestClassifier', 0.7173351515208543, 12702.196943044662, 0.31724119186401367)]
================================================================================
Multilayer perceptron classifier
________________________________________________________________________________
Training: 
MultilayerPerceptronClassifier_4e1bb61fa2e43239635f
train time: 19420.338s
test time:  0.299s
Accuracy = 0.612698 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       0.0|  0.0|[0.11111861228207...|
|       0.0|  1.0|[0.11112612435708...|
|       0.0|  0.0|[0.11112624033425...|
|       0.0|  0.0|[0.11112922579363...|
|       0.0|  0.0|[0.11112935903956...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111861228207576,"feature-1":0.21399466709708517,"feature-2":0.1902775874713078,"feature-3":0.1595339044642154,"feature-4":0.17015861896659923,"feature-5":0.1399296033770442,"feature-6":0.1860862935670538,"feature-7":0.20609405541119763,"feature-8":0.13593457431261974,"feature-9":0.1446929110750308,"label":0.0,"features":{"type":1,"values":[0.11111861228207576,0.21399466709708517,0.1902775874713078,0.1595339044642154,0.17015861896659923,0.1399296033770442,0.1860862935670538,0.20609405541119763,0.13593457431261974,0.1446929110750308]}}
[('MultilayerPerceptronClassifier', 0.6126977834141878, 19420.33825492859, 0.29893994331359863)]

================================================================================
Naive Bayes classifier
________________________________________________________________________________
Training: 
NaiveBayes_4a36ae3e87bed15a3013
train time: 4718.321s
test time:  0.643s
Accuracy = 0.111128 
+----------+-----+--------------------+
|prediction|label|            features|
+----------+-----+--------------------+
|       6.0|  0.0|[0.11111429440031...|
|       6.0|  0.0|[0.11111861228207...|
|       6.0|  0.0|[0.11112724470427...|
|       6.0|  1.0|[0.11113540604010...|
|       6.0|  0.0|[0.11113943660499...|
+----------+-----+--------------------+
only showing top 5 rows

{"feature-0":0.11111429440031335,"feature-1":0.13107952951160362,"feature-2":0.22107280028217008,"feature-3":0.11763368867660354,"feature-4":0.2169663310209506,"feature-5":0.1733765361613333,"feature-6":0.19526618204853924,"feature-7":0.13651544440937152,"feature-8":0.14896657645302724,"feature-9":0.19901396688888556,"label":0.0,"features":{"type":1,"values":[0.11111429440031335,0.13107952951160362,0.22107280028217008,0.11763368867660354,0.2169663310209506,0.1733765361613333,0.19526618204853924,0.13651544440937152,0.14896657645302724,0.19901396688888556]}}
[('NaiveBayes', 0.11112847232115351, 4718.32071185112, 0.6434900760650635)]

CPU/memory usage for computing node 1:

Spark 100G computing node 1 performance

CPU/memory usage for computing node 2:

Spark 100G computing node 2 performance

CPU/memory usage for computing node 3:

Spark 100G computing node 3 performance

CPU/memory usage for computing node 4:

Spark 100G computing node 4 performance

CPU/memory usage for computing node 5:

Spark 100G computing node 5 performance

CPU/memory usage for computing node 6:

Spark 100G computing node 6 performance

Scikit learn 1G

scikit learn 1G performance

Cluster:

Number of nodes: 6
Number of computing node: 3
Total Memory: 93G
Total CPU: 48

================================================================================
Ridge Classifier
________________________________________________________________________________
Training: 
RidgeClassifier(alpha=1.0, class_weight=None, copy_X=True, fit_intercept=True,
        max_iter=None, normalize=False, random_state=None, solver='lsqr',
        tol=0.01)
train time: 7.447s
test time:  0.418s
accuracy:   0.222
dimensionality: 10
density: 1.000000
classification report:
/opt/conda/lib/python2.7/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00     82156
        1,0       0.23      1.00      0.38    164616
        2.0       0.00      0.00      0.00    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.00      0.00      0.00    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.22      1.00      0.37    164098
        9.0       0.00      0.00      0.00     82289

avg / total       0.05      0.22      0.08   1479466

confusion matrix:
[[     0  82156      0      0      0      0      0      0      0      0]
 [     0 164616      0      0      0      0      0      0      0      0]
 [     0 164672      0      0      0      0      0      0      0      0]
 [     0 164524      0      0      0      0      0      0      0      0]
 [     0 137047  27973      0      0      0      0      0      0      0]
 [     0      0   3411      0      0      0      0      0 160953      0]
 [     0      0      0      0      0      0      0      0 163659      0]
 [     0      0      0      0      0      0      0      0 164068      0]
 [     0      0      0      0      0      0      0      0 164098      0]
 [     0      0      0      0      0      0      0      0  82289      0]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535)]
================================================================================
Perceptron
________________________________________________________________________________
Training: 
Perceptron(alpha=0.0001, class_weight=None, eta0=1.0, fit_intercept=True,
      max_iter=None, n_iter=50, n_jobs=-1, penalty=None, random_state=0,
      shuffle=True, tol=None, verbose=0, warm_start=False)
/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py:117: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.
  DeprecationWarning)
train time: 84.775s
test time:  0.437s
accuracy:   0.314
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       1.00      0.95      0.97     82156
        1,0       0.00      0.00      0.00    164616
        2.0       0.00      0.00      0.00    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.16      1.00      0.28    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.52      1.00      0.69    164098
        9.0       1.00      0.71      0.83     82289

avg / total       0.19      0.31      0.21   1479466

confusion matrix:
[[ 77778      0      0      0      0   4378      0      0      0      0]
 [     0      0      0      0      0 164616      0      0      0      0]
 [     0      0      0      0      0 164672      0      0      0      0]
 [     0      0      0      0      0 164524      0      0      0      0]
 [     0      0      0      0      0 165020      0      0      0      0]
 [     0      0      0      0      0 164364      0      0      0      0]
 [     0      0      0      0      0 159823      0      0   3836      0]
 [     0      0      0      0      0  42288      0      0 121780      0]
 [     0      0      0      0      0      0      0      0 164098      0]
 [     0      0      0      0      0      0      0      0  23572  58717]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467)]
================================================================================
Passive-Aggressive
________________________________________________________________________________
Training: 
PassiveAggressiveClassifier(C=1.0, average=False, class_weight=None,
              fit_intercept=True, loss='hinge', max_iter=None, n_iter=50,
              n_jobs=-1, random_state=None, shuffle=True, tol=None,
              verbose=0, warm_start=False)
/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py:117: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.
  DeprecationWarning)
train time: 89.539s
test time:  0.912s
accuracy:   0.283
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       0.99      1.00      1.00     82156
        1,0       1.00      0.37      0.54    164616
        2.0       0.43      0.47      0.45    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.15      1.00      0.26    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.00      0.00      0.00    164098
        9.0       1.00      0.42      0.59     82289

avg / total       0.29      0.28      0.23   1479466

confusion matrix:
[[ 82129     27      0      0      0      0      0      0      0      0]
 [   465  60598 103517      0      0     36      0      0      0      0]
 [     0      0  76934      0      0  87738      0      0      0      0]
 [     0      0      0      0      0 164524      0      0      0      0]
 [     0      0      0      0      0 165020      0      0      0      0]
 [     0      0      0      0      0 164364      0      0      0      0]
 [     0      0      0      0      0 163659      0      0      0      0]
 [     0      0      0      0      0 164068      0      0      0      0]
 [     0      0      0      0      0 164098      0      0      0      0]
 [     0      0      0      0      0  47680      0      0      0  34609]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584)]
================================================================================
kNN
________________________________________________________________________________
Training: 
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=-1, n_neighbors=10, p=2,
           weights='uniform')
train time: 36.609s
test time:  944.290s
accuracy:   0.947
classification report:
             precision    recall  f1-score   support

        0.0       0.93      0.97      0.95     82156
        1,0       0.95      0.95      0.95    164616
        2.0       0.95      0.95      0.95    164672
        3.0       0.95      0.95      0.95    164524
        4.0       0.95      0.95      0.95    165020
        5.0       0.95      0.95      0.95    164364
        6.0       0.95      0.95      0.95    163659
        7.0       0.95      0.95      0.95    164068
        8.0       0.95      0.95      0.95    164098
        9.0       0.96      0.93      0.95     82289

avg / total       0.95      0.95      0.95   1479466

confusion matrix:
[[ 79390   2766      0      0      0      0      0      0      0      0]
 [  5910 155965   2741      0      0      0      0      0      0      0]
 [     0   5889 155998   2785      0      0      0      0      0      0]
 [     0      0   5827 155849   2848      0      0      0      0      0]
 [     0      0      0   5837 156321   2862      0      0      0      0]
 [     0      0      0      0   5910 155659   2795      0      0      0]
 [     0      0      0      0      0   5820 155097   2742      0      0]
 [     0      0      0      0      0      0   5814 155433   2821      0]
 [     0      0      0      0      0      0      0   5896 155354   2848]
 [     0      0      0      0      0      0      0      0   5715  76574]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584), ('KNeighborsClassifier', 0.9473958847313828, 36.609216928482056, 944.2896919250488)]
================================================================================
Random forest
________________________________________________________________________________
Training: 
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
train time: 423.830s
test time:  17.271s
accuracy:   0.942
classification report:
             precision    recall  f1-score   support

        0.0       0.94      0.95      0.94     82156
        1,0       0.94      0.94      0.94    164616
        2.0       0.94      0.94      0.94    164672
        3.0       0.94      0.94      0.94    164524
        4.0       0.94      0.94      0.94    165020
        5.0       0.94      0.94      0.94    164364
        6.0       0.94      0.94      0.94    163659
        7.0       0.94      0.94      0.94    164068
        8.0       0.94      0.94      0.94    164098
        9.0       0.94      0.94      0.94     82289

avg / total       0.94      0.94      0.94   1479466

confusion matrix:
[[ 77639   4517      0      0      0      0      0      0      0      0]
 [  5013 154987   4616      0      0      0      0      0      0      0]
 [     0   5027 155269   4376      0      0      0      0      0      0]
 [     0      0   5028 154949   4547      0      0      0      0      0]
 [     0      0      0   4977 155480   4563      0      0      0      0]
 [     0      0      0      0   5023 154747   4594      0      0      0]
 [     0      0      0      0      0   5052 154209   4398      0      0]
 [     0      0      0      0      0      0   5015 154651   4402      0]
 [     0      0      0      0      0      0      0   5129 154287   4682]
 [     0      0      0      0      0      0      0      0   4925  77364]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584), ('KNeighborsClassifier', 0.9473958847313828, 36.609216928482056, 944.2896919250488), ('RandomForestClassifier', 0.9419493249591406, 423.82988715171814, 17.270785808563232)]
================================================================================
LinverSVC
L2 penalty
________________________________________________________________________________
Training: 
LinearSVC(C=1.0, class_weight=None, dual=False, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.001,
     verbose=0)
train time: 144.780s
test time:  0.441s
accuracy:   0.437
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       1.00      0.94      0.97     82156
        1,0       0.65      1.00      0.79    164616
        2.0       0.28      0.50      0.36    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.00      0.00      0.00    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.29      0.50      0.37    164068
        8.0       0.65      1.00      0.79    164098
        9.0       1.00      0.93      0.96     82289

avg / total       0.32      0.44      0.36   1479466

confusion matrix:
[[ 76946   5210      0      0      0      0      0      0      0      0]
 [     0 164616      0      0      0      0      0      0      0      0]
 [     0  82508  82164      0      0      0      0      0      0      0]
 [     0      0 164524      0      0      0      0      0      0      0]
 [     0      0  45731 119289      0      0      0      0      0      0]
 [     0      0      0   2538      0      0 122089  39737      0      0]
 [     0      0      0      0      0      0      0 163659      0      0]
 [     0      0      0      0      0      0      0  82255  81813      0]
 [     0      0      0      0      0      0      0      0 164098      0]
 [     0      0      0      0      0      0      0      0   6003  76286]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584), ('KNeighborsClassifier', 0.9473958847313828, 36.609216928482056, 944.2896919250488), ('RandomForestClassifier', 0.9419493249591406, 423.82988715171814, 17.270785808563232), ('LinearSVC', 0.43689074301132974, 144.78039693832397, 0.44084811210632324)]
________________________________________________________________________________
Training: 
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=None, n_iter=50,
       n_jobs=-1, penalty='l2', power_t=0.5, random_state=None,
       shuffle=True, tol=None, verbose=0, warm_start=False)
/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py:117: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.
  DeprecationWarning)
train time: 88.187s
test time:  0.411s
accuracy:   0.388
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       0.56      1.00      0.72     82156
        1,0       0.55      0.60      0.57    164616
        2.0       0.00      0.00      0.00    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.20      1.00      0.33    165020
        5.0       0.00      0.00      0.00    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.62      1.00      0.76    164098
        9.0       1.00      0.77      0.87     82289

avg / total       0.24      0.39      0.27   1479466

confusion matrix:
[[ 82156      0      0      0      0      0      0      0      0      0]
 [ 65455  99161      0      0      0      0      0      0      0      0]
 [     0  82508      0      0  82164      0      0      0      0      0]
 [     0      0      0      0 164524      0      0      0      0      0]
 [     0      0      0      0 165020      0      0      0      0      0]
 [     0      0      0      0 164364      0      0      0      0      0]
 [     0      0      0      0 163659      0      0      0      0      0]
 [     0      0      0      0  82255      0      0      0  81813      0]
 [     0      0      0      0      0      0      0      0 164098      0]
 [     0      0      0      0      0      0      0      0  19306  62983]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584), ('KNeighborsClassifier', 0.9473958847313828, 36.609216928482056, 944.2896919250488), ('RandomForestClassifier', 0.9419493249591406, 423.82988715171814, 17.270785808563232), ('LinearSVC', 0.43689074301132974, 144.78039693832397, 0.44084811210632324), ('SGDClassifier', 0.38758443925037817, 88.18716382980347, 0.4109640121459961)]
================================================================================
NearestCentroid (aka Rocchio classifier)
________________________________________________________________________________
Training: 
NearestCentroid(metric='euclidean', shrink_threshold=None)
train time: 1.749s
test time:  0.509s
accuracy:   0.890
classification report:
             precision    recall  f1-score   support

        0.0       0.50      1.00      0.67     82156
        1,0       1.00      0.51      0.67    164616
        2.0       1.00      1.00      1.00    164672
        3.0       1.00      1.00      1.00    164524
        4.0       1.00      1.00      1.00    165020
        5.0       1.00      1.00      1.00    164364
        6.0       1.00      1.00      1.00    163659
        7.0       1.00      1.00      1.00    164068
        8.0       1.00      0.51      0.67    164098
        9.0       0.50      1.00      0.67     82289

avg / total       0.94      0.89      0.89   1479466

confusion matrix:
[[ 82156      0      0      0      0      0      0      0      0      0]
 [ 80862  83754      0      0      0      0      0      0      0      0]
 [     0    220 164452      0      0      0      0      0      0      0]
 [     0      0     59 164330    135      0      0      0      0      0]
 [     0      0      0      0 164625    395      0      0      0      0]
 [     0      0      0      0      0 164347     17      0      0      0]
 [     0      0      0      0      0     13 163646      0      0      0]
 [     0      0      0      0      0      0    362 163706      0      0]
 [     0      0      0      0      0      0      0    126  83033  80939]
 [     0      0      0      0      0      0      0      0      0  82289]]
()
[('RidgeClassifier', 0.22218422052281026, 7.447027206420898, 0.41751885414123535), ('Perceptron', 0.31427352842174133, 84.77450895309448, 0.4368560314178467), ('PassiveAggressiveClassifier', 0.28296290688667397, 89.53925013542175, 0.9124600887298584), ('KNeighborsClassifier', 0.9473958847313828, 36.609216928482056, 944.2896919250488), ('RandomForestClassifier', 0.9419493249591406, 423.82988715171814, 17.270785808563232), ('LinearSVC', 0.43689074301132974, 144.78039693832397, 0.44084811210632324), ('SGDClassifier', 0.38758443925037817, 88.18716382980347, 0.4109640121459961), ('NearestCentroid', 0.8897385948713928, 1.7490949630737305, 0.508552074432373)]
================================================================================
Naive Bayes
________________________________________________________________________________
Training: 
MultinomialNB(alpha=0.01, class_prior=None, fit_prior=True)
train time: 2.456s
test time:  0.402s
accuracy:   0.111
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00     82156
        1,0       0.00      0.00      0.00    164616
        2.0       0.11      1.00      0.20    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.00      0.00      0.00    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.00      0.00      0.00    164098
        9.0       0.00      0.00      0.00     82289

avg / total       0.01      0.11      0.02   1479466

confusion matrix:
[[     0      0  82156      0      0      0      0      0      0      0]
 [     0      0 164616      0      0      0      0      0      0      0]
 [     0      0 164672      0      0      0      0      0      0      0]
 [     0      0 164524      0      0      0      0      0      0      0]
 [     0      0 165020      0      0      0      0      0      0      0]
 [     0      0 164364      0      0      0      0      0      0      0]
 [     0      0 163659      0      0      0      0      0      0      0]
 [     0      0 164068      0      0      0      0      0      0      0]
 [     0      0 164098      0      0      0      0      0      0      0]
 [     0      0  82289      0      0      0      0      0      0      0]]
()
________________________________________________________________________________
Training: 
BernoulliNB(alpha=0.01, binarize=0.0, class_prior=None, fit_prior=True)
train time: 3.554s
test time:  0.733s
accuracy:   0.111
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00     82156
        1,0       0.00      0.00      0.00    164616
        2.0       0.11      1.00      0.20    164672
        3.0       0.00      0.00      0.00    164524
        4.0       0.00      0.00      0.00    165020
        5.0       0.00      0.00      0.00    164364
        6.0       0.00      0.00      0.00    163659
        7.0       0.00      0.00      0.00    164068
        8.0       0.00      0.00      0.00    164098
        9.0       0.00      0.00      0.00     82289

avg / total       0.01      0.11      0.02   1479466

confusion matrix:
[[     0      0  82156      0      0      0      0      0      0      0]
 [     0      0 164616      0      0      0      0      0      0      0]
 [     0      0 164672      0      0      0      0      0      0      0]
 [     0      0 164524      0      0      0      0      0      0      0]
 [     0      0 165020      0      0      0      0      0      0      0]
 [     0      0 164364      0      0      0      0      0      0      0]
 [     0      0 163659      0      0      0      0      0      0      0]
 [     0      0 164068      0      0      0      0      0      0      0]
 [     0      0 164098      0      0      0      0      0      0      0]
 [     0      0  82289      0      0      0      0      0      0      0]]

CPU/memory usage for computing node 1:

scikit learn 1G computing node 1 performance

CPU/memory usage for computing node 2:

scikit learn 1G computing node 2 performance

CPU/memory usage for computing node 3:

scikit learn 1G computing node 3 performance

Scikit learn 10G

10 features, 10 lables, 8 workers:

================================================================================
Ridge Classifier
________________________________________________________________________________
Training: 
RidgeClassifier(alpha=1.0, class_weight=None, copy_X=True, fit_intercept=True,
        max_iter=None, normalize=False, random_state=None, solver='lsqr',
        tol=0.01)
train time: 65.248s
test time:  3.180s
accuracy:   0.222
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       0.00      0.00      0.00    820779
        1,0       0.22      1.00      0.36   1644015
        2.0       0.00      0.00      0.00   1643217
        3.0       0.00      0.00      0.00   1646764
        4.0       0.00      0.00      0.00   1644785
        5.0       0.00      0.00      0.00   1642756
        6.0       0.00      0.00      0.00   1644958
        7.0       0.00      0.00      0.00   1643869
        8.0       0.22      1.00      0.36   1643210
        9.0       0.00      0.00      0.00    820150

avg / total       0.05      0.22      0.08  14794503

confusion matrix:
[[      0  820779       0       0       0       0       0       0       0
        0]
 [      0 1644015       0       0       0       0       0       0       0
        0]
 [      0 1643217       0       0       0       0       0       0       0
        0]
 [      0 1646764       0       0       0       0       0       0       0
        0]
 [      0 1640142       0       0       0       0       0       0    4643
        0]
 [      0       0       0       0       0       0       0       0 1642756
        0]
 [      0       0       0       0       0       0       0       0 1644958
        0]
 [      0       0       0       0       0       0       0       0 1643869
        0]
 [      0       0       0       0       0       0       0       0 1643210
        0]
 [      0       0       0       0       0       0       0       0  820150
        0]]
()
[('RidgeClassifier', 0.22219232372996917, 65.24785089492798, 3.1803510189056396)]
================================================================================
Perceptron
________________________________________________________________________________
Training: 
Perceptron(alpha=0.0001, class_weight=None, eta0=1.0, fit_intercept=True,
      max_iter=None, n_iter=50, n_jobs=-1, penalty=None, random_state=0,
      shuffle=True, tol=None, verbose=0, warm_start=False)
/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py:117: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.
  DeprecationWarning)
train time: 776.782s
test time:  3.127s
accuracy:   0.219
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       1.00      0.92      0.96    820779
        1,0       0.00      0.00      0.00   1644015
        2.0       0.00      0.00      0.00   1643217
        3.0       0.13      1.00      0.23   1646764
        4.0       0.00      0.00      0.00   1644785
        5.0       0.00      0.00      0.00   1642756
        6.0       0.00      0.00      0.00   1644958
        7.0       0.00      0.00      0.00   1643869
        8.0       0.95      0.01      0.03   1643210
        9.0       1.00      0.99      0.99    820150

avg / total       0.23      0.22      0.14  14794503

confusion matrix:
[[ 759188       0       0   61591       0       0       0       0       0
        0]
 [      0       0       0 1644015       0       0       0       0       0
        0]
 [      0       0       0 1643217       0       0       0       0       0
        0]
 [      0       0       0 1646764       0       0       0       0       0
        0]
 [      0       0       0 1644785       0       0       0       0       0
        0]
 [      0       0       0 1642756       0       0       0       0       0
        0]
 [      0       0       0 1644958       0       0       0       0       0
        0]
 [      0       0       0 1641159       0    2710       0       0       0
        0]
 [      0       0       0  943759       0  195903       0  482078   21470
        0]
 [      0       0       0     472       0    1630       0    6634    1220
   810194]]
()
[('RidgeClassifier', 0.22219232372996917, 65.24785089492798, 3.1803510189056396), ('Perceptron', 0.21883911882676965, 776.7821178436279, 3.1268820762634277)]
================================================================================
Passive-Aggressive
________________________________________________________________________________
Training: 
PassiveAggressiveClassifier(C=1.0, average=False, class_weight=None,
              fit_intercept=True, loss='hinge', max_iter=None, n_iter=50,
              n_jobs=-1, random_state=None, shuffle=True, tol=None,
              verbose=0, warm_start=False)
/opt/conda/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py:117: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.
  DeprecationWarning)
train time: 824.810s
test time:  3.123s
accuracy:   0.312
dimensionality: 10
density: 1.000000
classification report:
             precision    recall  f1-score   support

        0.0       1.00      0.97      0.98    820779
        1,0       0.66      1.00      0.79   1644015
        2.0       1.00      0.00      0.00   1643217
        3.0       0.00      0.00      0.00   1646764
        4.0       0.00      0.00      0.00   1644785
        5.0       0.15      1.00      0.26   1642756
        6.0       0.00      0.00      0.00   1644958
        7.0       0.00      0.00      0.00   1643869
        8.0       0.00      0.00      0.00   1643210
        9.0       1.00      0.66      0.79    820150

avg / total       0.31      0.31      0.22  14794503

confusion matrix:
[[ 793470   27309       0       0       0       0       0       0       0
        0]
 [      0 1644015       0       0       0       0       0       0       0
        0]
 [      0  828109      38       0       0  815070       0       0       0
        0]
 [      0       0       0       0       0 1646764       0       0       0
        0]
 [      0       0       0       0       0 1644785       0       0       0
        0]
 [      0       0       0       0       0 1642756       0       0       0
        0]
 [      0       0       0       0       0 1644958       0       0       0
        0]
 [      0       0       0       0       0 1643869       0       0       0
        0]
 [      0       0       0       0       0 1643210       0       0       0
        0]
 [      0       0       0       0       0  280617       0       0       0
   539533]]
()
[('RidgeClassifier', 0.22219232372996917, 65.24785089492798, 3.1803510189056396), ('Perceptron', 0.21883911882676965, 776.7821178436279, 3.1268820762634277), ('PassiveAggressiveClassifier', 0.3122654407518793, 824.8098239898682, 3.123262882232666)]
================================================================================
kNN
________________________________________________________________________________
Training: 
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=-1, n_neighbors=10, p=2,
           weights='uniform')
train time: 353.195s
test time:  10401.894s
accuracy:   0.960
classification report:
             precision    recall  f1-score   support

        0.0       0.95      0.97      0.96    820779
        1,0       0.96      0.96      0.96   1644015
        2.0       0.96      0.96      0.96   1643217
        3.0       0.96      0.96      0.96   1646764
        4.0       0.96      0.96      0.96   1644785
        5.0       0.96      0.96      0.96   1642756
        6.0       0.96      0.96      0.96   1644958
        7.0       0.96      0.96      0.96   1643869
        8.0       0.96      0.96      0.96   1643210
        9.0       0.97      0.95      0.96    820150

avg / total       0.96      0.96      0.96  14794503

confusion matrix:
[[ 799670   21109       0       0       0       0       0       0       0
        0]
 [  45006 1577600   21409       0       0       0       0       0       0
        0]
 [      0   44482 1577181   21554       0       0       0       0       0
        0]
 [      0       0   44587 1580494   21683       0       0       0       0
        0]
 [      0       0       0   44524 1578785   21476       0       0       0
        0]
 [      0       0       0       0   44411 1576984   21361       0       0
        0]
 [      0       0       0       0       0   44713 1578679   21566       0
        0]
 [      0       0       0       0       0       0   44115 1578086   21668
        0]
 [      0       0       0       0       0       0       0   44474 1577159
    21577]
 [      0       0       0       0       0       0       0       0   44592
   775558]]
()
[('RidgeClassifier', 0.22219232372996917, 65.24785089492798, 3.1803510189056396), ('Perceptron', 0.21883911882676965, 776.7821178436279, 3.1268820762634277), ('PassiveAggressiveClassifier', 0.3122654407518793, 824.8098239898682, 3.123262882232666), ('KNeighborsClassifier', 0.9598292014270435, 353.1945149898529, 10401.893601894379)]
================================================================================