Spark??????RDD??RDD?????????????????????????????????????????????????????????????????????????????????????????????????
??????????RDD??????????????????????????RDD????????????????????????????????RDD?????????????????????????????????????Spark?MLlib????????????2015???????????????????
???Angel???????????????Spark??V1.0.0????????????Spark on Angel??????Angel?Spark???PS???????????????????????
????L-BFGS??????Spark?????????????????Spark on Angel?????Spark????????????????Spark??????????
1. L-BFGS????
2.L-BFGS?Spark??
3.L-BFGS?Spark on Angel??
3.1????
Spark on Angel??Angel PS-Service????Spark??PS?????????????driver????two-loop recursion???????PS??driver???????????????driver??????
Angel PS?????????????vector?matrix??????partition??????????????vector?matrix?????;
3.2????
???????driver????????????two-loop recursion???PS???????Aggregate???????executor?PS???????????????????????????PSVector???????????????????????????????????????????????? ??Spark on Angel?????Spark?driver?????????????????????
4.“????”?Spark on Angel
Spark on Angel?Angel???Spark?????????????????“??”????Spark?“???”???????????????? “?”?“?”?“?”?“?” ???Spark on Angel????
4.1? — “???”???
Spark on Angel?Angel???Spark?????????????????“??”?Spark on Angel???Spark??RDD????????Spark on Angel????Spark?Angel?????????????Spark?Angel? ???Spark????Spark on Angel????????Spark??????????????????Angel?Github Spark on Angel Quick Start??
???????Spark on Angel????????????Spark?????????????Spark????
source ${Angel_HOME}/bin/spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.instances=20\
--conf spark.ps.cores=4\
--conf spark.ps.memory=10g \
--jars $SONA_SPARK_JARS \
....
Spark on Angel????????????????Angel?PS-Service?????Spark?driver?executor????PsAgent?PSClient?Angel PS??????
4.2? — ???????breeze?
breeze??scala????????????????Spark MLlib????????????????breeze??????????Spark?Spark on Angel??????????breeze.optimize.LBFGS????Spark????BreezePSVector?
BreezePSVector??Angel PS??Vector??Vector???breeze NumericOps????????? dot?scale?axpy?add???????LBFGS[BreezePSVector] two-loop recursion????????????BreezePSVector???????BreezePSVector?????Angel PS???????
Spark?L-BFGS??
4.3? — ??????
Spark??????????????????????????????????Spark on Angel?????????? Spark on Angel?????Spark????????????Spark????;????PSVector???????????????
???????LBFGS?Spark?Spark on Angel??????????????????????????????Aggregate??? ?pull/push? ??????Spark??????Spark on Angel?????????????????
L-BFGS??????DiffFunction?DiffFunction?calculte??????? ?????????? loss ? gradient?
?????????Github SparseLogistic
Spark?DiffFunction??
4.4? — ????
???????SGD?LBFGS?OWLQN???????LR???Spark?Spark on Angel???????? ????????Github SparseLRWithX.scala .
??????????????????2.3????5????
?????
??1?????????????????????????????????????????????????????;
??2???Spark????????spark.driver.maxResultSize??;?Spark on Angel?????????
???????Spark on Angel???Spark???LR????50%?????;??????????????????
5.??
Spark on Angel??????????????Spark????????????;???????Spark on Angel??????????????Github???????????
Angel??Github?Angel??????Github????Star?