–I–1摘要对于机器学习中的数值优化问题,考虑到其规模和维数都比较大,传统的方法难以高效的解决这一问题。近些年来,针对大规模的机器学习问题做了很多研究,比较重要的一类方法是随机算法。优化方法主要分为一阶梯度方法和二阶牛顿方法两大类,针对一阶方法的改进和研究比较成熟和完善,一阶方法又可以分为两类,原始方法和对偶方法,原始方法比较有代表性的有SAG、SAGA、SVRG,对偶方法有SDCA和SPDC两种。此外,加速方法如Catalyst、Katyusha,其收敛速度为一阶方法所能达到的最好结果。二阶方法目前是机器学习领域研究的重要方向,其收敛速度要优于一阶方法,但是其实践中会有一些难度,比较实用的是L-BFGS方法及其随机算法的改进。本文将详细全面的叙述机器学习中各种随机算法,介绍随机算法的发展历程,研究方向及研究热点,最后通过数值试验比较了几种常见随机算法,以给读者直观的数值效果。关键词:大规模机器学习,随机算法,优化方法–II–2StochasticOptimizationAlgorithminMachineLearningAbstractFortheoptimizationprobleminmachinelearningfield,traditionalmethodhavedifficultiesinsolvingthehighdimensionandbigdataproblem.Inrecentyears,therearemanyresearchesinlargescalemachinelearningproblems,especiallystochasticalgorithms.Generally,stochasticmethodcandividedintotwoparts.Oneisfirst-ordergradientmethodandtheotherissecond-orderNewtonmethod.Thereismoreimprovementandresearchinfirstordermethod,andthefirstordermethodismorematureandperfect.Thereistwoclassesforfirstordermethod.Fortheprimalclass,SVRG,SAG,SAGAistherepresentation,andSDCA,SPDCfordualclass.Otherwise,theaccelerationmethodsuchascatalystandkatyusha,whichhastheoptimalcon-vergencespeedforfirstordermethod,isputforwardinlasttwoyears.Secondordermethodisoneimportantresearcharea,andithasbetterconvergencebutnotbetterperformancebecauseithastocomputethehessianmatrix,oneusefulmethodisL-BFGSanditsvariants.Inthispaper,theauthorwillintroducestochasticalgorithmsinmachinelearningareaindetail.Intheend,numericalexperimentscomparesomecommonalgorithmandgiveadirectviewtoreaders.KeyWords:Large-scalemachinelearningproblem,Stochasticalgorithm,Optimizationmethod–III–3目录摘要..............................................................................IAbstract...............................................................................II引言..............................................................................11基础知识...........................................................................52一阶方法...........................................................................62.1梯度下降法(GradientDescent)..............................................62.2随机梯度方法(StochasticGradientMethod).................................72.3方差减小方法(VarianceReductionMethod)...................................102.3.1随机平均梯度方法(StochasticAverageGradient)..........................112.3.2随机方差减小梯度方法(StochasticVarianceReductionGradient)..........122.3.3SAGA...................................................................132.4随机对偶坐标上升法(StochasticDualCoordinateAscent)...................142.5随机原始对偶坐标法(StochasticPrimalDualCoordinate)....................162.6加速方法Catalyst..............................................................192.7Katyusha......................................................................203二阶方法2–III–4...........................................................................23.1牛顿法与拟牛顿法...........................................