Xiao Hui Tai

I am a Ph.D. student in the Department of Statistics and Data Science at Carnegie Mellon University. My research develops methods for comparing unstructured data, applied to forensics and cybercrime.

I am originally from Singapore and was previously a government statistician at the Department of Statistics, which is part of the Ministry of Trade and Industry.


My thesis is on Forensic Data Matching Problems, advised by Bill Eddy. Briefly, I draw a link between matching problems in forensics, and record linkage in statistics and computer science. I then tackle two problems in detail: predicting if pairs of cartridge cases are fired from the same gun, and matching seller accounts on anonymous marketplaces. Work on the second problem is with Nicolas Christin’s group at CyLab, CMU’s university-wide security institute.

My thesis document is here, and slides are here.

Published/Accepted Papers

Xiao Hui Tai, Kyle Soska and Nicolas Christin. Adversarial Matching of Dark Net Market Vendor Accounts. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), to appear, 2019. [Code] [Video]

Xiao Hui Tai. Record Linkage and Matching Problems in Forensics. IEEE 18th International Conference on Data Mining Workshops (ICDMW), 2018.

Xiao Hui Tai and William F. Eddy. A Fully Automatic Method for Comparing Cartridge Case Images. Journal of Forensic Sciences, 2018. [Code]


Xiao Hui Tai and Kayla Frisoli. Benchmarking Minimax Linkage.

Book Chapters and Magazine Articles

Xiao Hui Tai, Open Forensic Science in R, Firearms: casings. 2019.

Carriquiry, A. , Hofmann, H. , Tai, X. H. and VanderPlas, S., Machine learning in forensic applications. Significance, 2019.

Working Papers

Robin Mejia, Xiao Hui Tai, Jay Kadane, Anjali Mazumder, and Bill Eddy. US Forensic Databases: Data, organization, and uses in the fields of ballistics, fingerprints, and DNA.


Xiao Hui Tai
Ph.D. Student
Department of Statistics
Carnegie Mellon University
FMS 323
Pittsburgh, PA

Last updated: 2019-06-14.