720
Adrian Dobra
Considerable efforts have been dedicated to the development of sound
procedures for assessing the size of the World Wide Web. The problem is
compounded by the fact that sampling directly from the Web is not
possible. Several groups of researchers have found sampling schemes which
consist of running a number of queries on several major search engines.
Although the quality of the datasets is as good as it gets, the methods
the experimenters employed are not satisfactory. In this paper we present
new approaches to analyze datasets collected by query-based sampling,
approaches founded on a hierarchical Bayes formulation of the Rasch model.
We show that our procedures abide by the real-world constraints and
consequently they let us make more credible inferences.
Keywords: World Wide Web evaluation; Clustering; Contingency tables; Rasch model; Markov chain Monte Carlo methods.