The history dependent adaptive learning rate SGD methods are analyzed and implemented under asynchronized distributed setting. Given the delayed updates, AdaGrad and AdaDelta are reformualted accordingly, and a theoretical proof establies both the O(1/sqrt(T)) convergence rate and the sub-linear regret bound of AdaGrad. Furtuer, proposed methods are tested on MNIST digit classification using Softmax regression and Feedforward Neural Network, and sequence-to-sequence regression using Recurrent Neural Network. Experiments show the general effectiveness of different methods, while AdaGrad is empirically more stable and robust for complex problem.
Epidemic modeling is a fundamental problem in network science. In this work, we solve an optimal resource allocation problem for controlling an epidemic out- break using the SIS epidemic model. Semidefinite and geometric programming approaches are considered for undirected and directed graphs respectively for de- termining the minimum cost of intervention that causes an exponential decay in the outbreak. Our semidefinite programming implementation confirms prior re- sults that immunization cost is strongly correlated with node degree, suggesting that higher degree nodes should be targeted for vaccine distribution first. A dis- tributed ADMM approach is considered for efficiently solving the GP formulation by leveraging local consensus to arrive at a global optimal solution.
Maximum margin clustering is an extension of support vector machine for unsu- pervised clustering. It separates unlabeled data into two approximately equally- sized groups by finding a hyperplane with the largest margin. Although many existing works has shown promising results in computer vision applications, there is no convergence guarantee on the methods due to their non-convex formulations. In this project, we solve this problem directly in non-convex formulation with an algorithm called proximal alternating linearized minimization, which has been proven to have global convergence to stationary points for some non-convex and non-smooth problems. We propose an algorithm to solve the maximum margin clustering problem and we prove that our algorithm enjoys global convergence. Experiment results show that our proposed algorithm runs faster than recent state- of-the-art implementation of maximum margin clustering.
For the linear programming problem of minimum cost flow where the cost for each network edge is a constant, we explored and implemented two classical algo- rithms, negative cycle canceling algorithm and successive shortest path algorithm. To compare the time efficiency of these two algorithms, we tested the performance on different randomly generated graphs. For the more generalized minimum convex cost flow problem which cannot be solved by classical algorithms, we studied the accelerated dual descent (ADD) algorithm proposed by Zargham et al, which is based on approximating the Hessian inverse using matrix splitting techniques. We further implemented this algorithm and applied it to solve a physical problem of resistor network with quadratic costs. The results from ADD algorithm agree with brute force calculation on small testing networks, but it is much more effi- cient and applicable to larger resistor networks where physics based brute force calculation is almost impossible.
Tomography and gas reconstruction problem aim at recovering the density field of an object given measurements from different angles, which is often an integration of texxture properties along the light/x-ray path. Since most of the objects have continuous density in nature, we formulate the problem into a total-variance based regression problem, which is related to a generalized lasso problem. However, our problem is more difficult because there are additional constraints and the matrix in the regression and the penalty terms are often rank deficient. In this project, we solve the problem by (1) adding L-2 regularization term and (2) developing method that avoids matrix inversion operator. Also, since it is not a one to one mapping from dual solution to primal solution, we use a path following algorithm to keep solution on both side feasible. We apply several solvers to synthetic data and discuss the convergence rate and performance of each method.
Probabilistic Soft Logic (PSL [5]) is a highly flexible formulation that, like Markov Logic Networks, brings together probabilistic graphical models and pred- icate logic to allow collectively reasoning about random variables connected by logical rules. PSL relaxes Markov Logic Networks by using soft truth values (con- tinuous values in the interval [0,1]), in such a way that finding the most probable configuration becomes a convex optimization problem. Although the convexity of the problem guarantees convergence to the global minimum, convergence can still be slow for the enormous logical graphs that can be tackled with PSL. The current state-of-the-art method for solving the PSL optimization problem is a con- sensus optimization method that is both fast and distributed and is based on the alternating directions method of multipliers (ADMM) framework. In this project we derive and implement that consensus optimization method, and we further ex- tend it by introducing to it some stochastic optimization ideas, and demonstrate improved empirical performance.
Many solutions to constrained optimization problems require finding the solution to a linear system Ax = b. It is commonly the case that solving a linear system is the most costly step in optimizing the objective function and consequently, much work has been done in speeding up various solvers by exploiting various properties or structures within the matrix. For example, one can use the LU or QR decomposition to solve general linear systems of the form Ax = b. However, if the matrix A satisfies certain properties, then often the computation can be sped up by using a different ma- trix factorization. For example, for positive definite matrices there are solvers using the Cholesky decomposition, or the for more general symmetric matrices there are solvers using the LDLT de- composition. Furthermore, more recent work has focused on optimizing solvers for Ax = b where A is a sparse matrix, resulting in sparse versions of the aforementioned solvers.
Within the context of convex optimization, many LP or QP solvers require solving the system Ax = b where A is a particular kind of quasi semidefinite matrix (??). While some work has been done in this area when A is dense, the case when A is sparse is not as well studied. This immediately raises the following question: Can we exploit the structure on sparse matrices found in LP and QP solvers to get better performance when solving linear systems? Intuitively, one should get better performance than a generic symmetric matrix due to the significant amount of additional structure. However, quasi semidefinite is not as strong as positive definite so we can not always use the Cholesky decomposition. Presumably, one should be able to obtain better better performance than general symmetric solvers, but not faster than positive definite solvers.
Stochastic Gradient Descent (SGD) is the most commonly employed method in deep neural network (DNN) training. However, SGD suffers the issues of lower layer gradient vanish and local curvature. Second-order methods could use local curvature information to scale the gradient update, but it is extremely expensive to calculate and store Hessian for DNN. Hessian-free method could solve the second order update step without estimating Hessian explicitly. In this project, we aims at implementing Hessian-free training for DNN and comparing it with SGD on classification task.