On my To Do list from Wednesday.
(1) Find a network clustering algorithm. Ask Andrew for one.
Update: Using Mclust instead, but I have to deal with a warning b/c of the dimensionality:data ratio. Uses BIC to choose clusters which is better than what I was doing before.
Got one – it’s not great, but uses Hierarchical clustering and plot to choose number of clusters. I wrote a function to “choose” the optimal number of clusters based one slopes (to find the “elbow”). It’s slow, but usable for now.
(2) Incorporate rho into the MCMC by using the estimate in the Ado paper. Not the MLE, but the “dumb” estimate.
Yes, it was another headache, but it’s done. Will post about that below.
(3) Run clustering algorithm on original Y, newY from model with rho, newY from model without rho. Compare plots too.
I realized that I was making my life difficult. Instead of using the estimates for the sender and receiver parameters, I was regenerating them from the theta’s. So in fact things really aren’t *that* bad.
Here are networks generated with and without rho. We can see using Rho we do much better.
Also reconstruct the adjacency matricies like Airoldi did.
Look, how nice! And both seem fine. It’s odd then that the other plots were weird. Looking into this now….
Original vs 2 threshold plots – with rho
Orignal vs 2 thresholds no rho
(4) How well do these models fit? How can we tell? See 5 and 6.
(5) Posterior Predictive Checks: simulate theta from posterior (i.e. step i in mcmc), then generate new Y using sampling distribution. Calculate T(newY) for each step – this is the null distribution of T. Compare with T(Y).
Three values of T: density, density of outside group ties, number of networks. Upon first glance, things look GOOD, but the number of networks was being determined so I’m running that now…and it’s terribly slow. So I’ll post plots tomorrow.
Posterior Predictive Measures – No Rho
*******
Putting analyses into the JEBS paper now that Brian is finished with his edits. I’m happy to almost be done with that paper (or at least until it gets returned with a revise and resubmit fingers crossed). I’m just redoing some of the plots now and tomorrow I plan on dropping them into the latex file, rewriting some of the analyses, and proofing the entire thing.
Also on board for this week is finally coding the HMMSBM. But in light of all of this, the question remains…where should the intervention coefficient lie? Perhaps the idea is to go from dirichlet (0.2, 0.2, 0.2, 0.2) to (1000,1,1,1)? Maybe, we’ll see what happens.
And then we’ll be able to start writing the MMSBM paper. I just need to keep the momentum for a few more weeks (like 5-6) and then I can breathe for the 2 full weeks that I’ll be taking off.