A Language of Life: Characterizing People using Cell Phone Tracks
Abstract
Mobile devices can produce continuous streams of data which are often specific to the person carrying them. We show that cell phone tracks from the MIT Reality dataset can be used to reliably characterize individual people. This is done by treating each person’s data as a separate language by building a standard n-gram language model for each “author.” We then compute the perplexities of an unlabelled sample as based on each person’s language model. The sample is assigned to the user yielding the lowest perplexity score. This technique achieves 85% precision and can also be used for clustering. We also show how language models can also be used for predicting movement and propose metrics to measure the accuracy of the predictions. Finally, we develop an alternative method for identifying individuals by counting the subsequences in a sample which are unique to their authors. This is done by building a generalized suffix tree of the training set and counting each subsequence from a sample which is unique for some person as evidence towards identifying that person as the author. We present the identification and prediction as a part of a HUMBLE human behavior modelling framework, outline general modelling goals, and show how our methods help. Our results suggest that people’s medium-scale movement behavioral patterns, at the granularity of cell tower footprints, can be used to characterize individuals.