Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

How Big is the World Wide Web?

Publication Date

August, 2000

Publication Type

Tech Report

Author(s)

Adrian Dobra

Abstract

Considerable efforts have been dedicated to the development of sound procedures for assessing the size of the World Wide Web. The problem is compounded by the fact that sampling directly from the Web is not possible. Several groups of researchers have found sampling schemes which consist of running a number of queries on several major search engines. Although the quality of the datasets is as good as it gets, the methods the experimenters employed are not satisfactory. In this paper we present new approaches to analyze datasets collected by query-based sampling, approaches founded on a hierarchical Bayes formulation of the Rasch model. We show that our procedures abide by the real-world constraints and consequently they let us make more credible inferences.