Posted on Thursday, 26th January 2012

Please read sections 4.3 and 5.4-5.5, and post a comment.
As always, students without a strong math background may skim
the more technical material and try to focus on the concepts.

Posted in Class | Comments (18)

  1. Rob Rasmussen Says:

    I am still unsure from the details section on pg 111 as to why the function psi in the KL divergence is necessarily logarithmic.

  2. yid8 Says:

    In experiments, are there examples of real data that follow beta distribution?

  3. Shubham Debnath Says:

    For the definition of entropy, I was wondering why base 2…I’ve always used natural log. I’ve often seen Boltzmann’s constant as well.

    Is there any use of Gibbs free energy or enthalpy in statistics?

  4. Eric VanEpps Says:

    I appreciated the section on mutual information as being a way to describe how uncertainty about one variable can be reduced by knowing about another variable in ways other than the linear relationship described by correlations. That said, I don’t know how I would talk about mutual information in an experimental setting. What statistics are used to refer to mutual information? Is there a standard for what level of mutual information is “enough” to be meaningful?

  5. Sharlene Flesher Says:

    Is the spectral decomposition mentioned in section 4.3.1 (A = PDPT) the same as projecting the data from matrix A onto the eigenvector?

  6. Ben Dichter Says:

    I have never seen the Beta distribution before. I don’t quite understand what it looks like and when it might be used. Could you give a concrete example?

  7. dzhou Says:

    Can you talk about spectral decomposition? What is one example of data in which you can analyze its variance matrix using spectral decomposition over other methods?

  8. mpanico Says:

    I think PDF graphs describing all of the new distributions would have helped me to understand their use more intuitively.

  9. Matt Bauman Says:

    In the KL discrepancy, it seems strange to me that you’re evaluating the deviation between two pdfs by using some random variable X from some third, unknown, distribution. I would think that the value of D_KL(f,g) would depend your choice of X. What am I missing? … (reading on) … Ah! In your illustration of two normal distributions, it appears as though the pdf of X should be f. Is that correct?

  10. suchitra Says:

    Could you please go over the section where you talk about using spectral decomposition to analyze a covariance matrix?

  11. amarkey Says:

    I had thought the geometric distribution also had the memoryless property – is this the case? Also, I’m curious: does the geometric function apply to any neural phenomena?

  12. skennedy Says:

    I don’t follow the logic on the top of page 120 for 4.3.4 proof. Don’t we have to specify which x (x1 or x2) is classified to f(x) to minimize the error, instead of generic x? f(x2) > g(x2), so does that mean that for x2 we should always classify x2 as f(x)? What about for x1? Do we sepcify x1 as g(x) since g(x1) > f(x1)?

  13. Thomas Kraynak Says:

    The KL discrepancy seems like a very useful tool for considering differences in distributions. I’m not sure exactly what are the requirements that the distributions must fit in order to apply this, could you explain them in more depth?

  14. suchitra Says:

    Could you also go over how the inverse Gaussian distribution relates to the integrate-and-fire neuron model?

  15. Rich Says:

    In the description of the “memoryless” property – considering the probability that a channel will remain open for the interval h – I’m not sure why the length of time a channel will remain open starting at time t must be greater than t (X > t). It would make more sense to me, given the discussion, that X > h, so that P(X > t+h | X > t) = P(X > h). Is the X > t requirement a typo?

  16. nsnyder Says:

    Using the Kullback-Leiber discrepancy to quantify the dependence between two random vectors it seems that there is a requirement that the variances be the same, is this true? Or is that only the case for the two normal distributions used in the illustration?

  17. kmatula Says:

    5.4.7 discusses the degrees of freedom needed for the t and F distributions to be approximately normal (12 is specified for t but for F it just says when F is large). I remember from learning the Central Limit Theorem that an experiment with over 30 observations (29 degrees of freedom) can be considered normal. In practice, does this mean that t-tests of experiments with between 12 and 30 participants would lead to similarly viable (because the t distribution is normal) results even though the experiment had not yet reached a normal distribution by the CLT? If so, does this fact have any practical significance for doing hypothesis tests in experiments with small n?

  18. rexntien Says:

    On page 114 at the top, shouldn’t we say that information about Y is associated with X whenever abs(rho) > 0?

    We say that Bayes’ classifiers have the lowest expected number of misclassifications. However, I see other classifiers such as LDA being used. Why might we prefer something other than an “optimal” estimator?

Leave a Reply

You must be logged in to post a comment.