Comments on: Gradient descent
https://qchu.wordpress.com/2018/02/07/gradient-descent/
"A good stock of examples, as large as possible, is indispensable for a thorough understanding of any concept, and when I want to learn something new, I make it my first job to build one." - Paul HalmosSat, 28 Apr 2018 12:14:48 +0000hourly1http://wordpress.com/By: rajeshd007
https://qchu.wordpress.com/2018/02/07/gradient-descent/#comment-8823
Tue, 20 Mar 2018 06:22:40 +0000http://qchu.wordpress.com/?p=28434#comment-8823Having different learning rates for different parameters is by definition, not gradient descent, unless there isn’t a mathematically sensible way of choosing them. In a sense you are playing catch 22, where in you are searching for best paraemeters, and apriori you are making assumptions about their relative scaling (by having different learning rates), which makes your entire solution a heuristic one rather than gradient descent.
]]>By: Robert
https://qchu.wordpress.com/2018/02/07/gradient-descent/#comment-8579
Fri, 16 Feb 2018 14:40:14 +0000http://qchu.wordpress.com/?p=28434#comment-8579There are various “gradient descent strategies” that are not stateless: e.g. momentum. Can you transform an optimization problem in such a way (this obviously requires making the parameter space larger) that momentum becomes SGD?
]]>By: Qiaochu Yuan
https://qchu.wordpress.com/2018/02/07/gradient-descent/#comment-8508
Thu, 08 Feb 2018 17:38:56 +0000http://qchu.wordpress.com/?p=28434#comment-8508Fixed, thanks!
]]>By: Kram Einsnulldreizwei
https://qchu.wordpress.com/2018/02/07/gradient-descent/#comment-8506
Thu, 08 Feb 2018 11:38:13 +0000http://qchu.wordpress.com/?p=28434#comment-8506Just a heads up: That final link is broken. You accidentally put the url twice.
]]>