云计算

Spark on Angel?Spark??????????


Spark??????RDD??RDD?????????????????????????????????????????????????????????????????????????????????????????????????

??????????RDD??????????????????????????RDD????????????????????????????????RDD?????????????????????????????????????Spark?MLlib????????????2015???????????????????

???Angel???????????????Spark??V1.0.0????????????Spark on Angel??????Angel?Spark???PS???????????????????????

????L-BFGS??????Spark?????????????????Spark on Angel?????Spark????????????????Spark??????????

1. L-BFGS????

SparkonAngel?Spark??????????

SparkonAngel?Spark??????????

2.L-BFGS?Spark??

SparkonAngel?Spark??????????

SparkonAngel?Spark??????????

3.L-BFGS?Spark on Angel??

3.1????

Spark on Angel??Angel PS-Service????Spark??PS?????????????driver????two-loop recursion???????PS??driver???????????????driver??????

Angel PS?????????????vector?matrix??????partition??????????????vector?matrix?????;

SparkonAngel?Spark??????????

3.2????

???????driver????????????two-loop recursion???PS???????Aggregate???????executor?PS???????????????????????????PSVector???????????????????????????????????????????????? ??Spark on Angel?????Spark?driver?????????????????????

4.“????”?Spark on Angel

Spark on Angel?Angel???Spark?????????????????“??”????Spark?“???”???????????????? “?”?“?”?“?”?“?” ???Spark on Angel????

4.1? — “???”???

Spark on Angel?Angel???Spark?????????????????“??”?Spark on Angel???Spark??RDD????????Spark on Angel????Spark?Angel?????????????Spark?Angel? ???Spark????Spark on Angel????????Spark??????????????????Angel?Github Spark on Angel Quick Start??

???????Spark on Angel????????????Spark?????????????Spark????

source ${Angel_HOME}/bin/spark-on-angel-env.sh

$SPARK_HOME/bin/spark-submit \

--master yarn-cluster \

--conf spark.ps.jars=$SONA_ANGEL_JARS \

--conf spark.ps.instances=20\

--conf spark.ps.cores=4\

--conf spark.ps.memory=10g \

--jars $SONA_SPARK_JARS \

....

Spark on Angel????????????????Angel?PS-Service?????Spark?driver?executor????PsAgent?PSClient?Angel PS??????

SparkonAngel?Spark??????????

4.2? — ???????breeze?

breeze??scala????????????????Spark MLlib????????????????breeze??????????Spark?Spark on Angel??????????breeze.optimize.LBFGS????Spark????­­­BreezePSVector?­­­­­

BreezePSVector??Angel PS??Vector??Vector???breeze NumericOps????????? dot?scale?axpy?add???????LBFGS[BreezePSVector] two-loop recursion????????????BreezePSVector???????BreezePSVector?????Angel PS???????

Spark?L-BFGS??

SparkonAngel?Spark??????????

4.3? — ??????

Spark??????????????????????????????????Spark on Angel?????????? Spark on Angel?????Spark????????????Spark????;????PSVector???????????????

???????LBFGS?Spark?Spark on Angel??????????????????????????????Aggregate??? ?pull/push? ??????Spark??????Spark on Angel?????????????????

L-BFGS??????DiffFunction?DiffFunction?calculte??????? ?????????? loss ? gradient?

?????????Github SparseLogistic

Spark?DiffFunction??

SparkonAngel?Spark??????????

4.4? — ????

???????SGD?LBFGS?OWLQN???????LR???Spark?Spark on Angel???????? ????????Github SparseLRWithX.scala .

??????????????????2.3????5????

?????

??1?????????????????????????????????????????????????????;

??2???Spark????????spark.driver.maxResultSize??;?Spark on Angel?????????

SparkonAngel?Spark??????????

???????Spark on Angel???Spark???LR????50%?????;??????????????????

5.??

Spark on Angel??????????????Spark????????????;???????Spark on Angel??????????????Github???????????

Angel??Github?Angel??????Github????Star?