We define the group lasso estimator for the natural parameters of the exponential families of distributions representing hierarchical log-linear models under multinomial sampling scheme. Such estimator arises as the unique solution of a convex penalized likelihood program using the group lasso penalty. We illustrate how it is possible to construct, in a straightforward way, an estimator of the underlying log-linear model based on the blocks of non-negative coefficients recovered by the group lasso procedure.
We investigate the asymptotic properties of the group lasso estimator and of the associated model selection criterion in a double-asymptotic framework, in which both the sample size and the model complexity grow simultaneously. We provide conditions guaranteeing that the group lasso estimator is norm consistent and that the group lasso model selection is a consistent procedure, in the sense that, with overwhelming probability as the sample size increases, it will correctly identify all the sets of non-zero interactions among the variables. Provided the sequences of true underlying models is sparse enough, recovery is possible even if the number of cells grows larger than the sample size. Finally, we derive some central limit type of results for the log-linear group lasso estimator.