Our high level combined methodology can be summarized in the figure to the right. We begin
by working with migration researchers to understand the broad factors people consider when
deciding whether or not to migrate (Step 1).
Our next step is to identify data sources for
relevant variables (Step 2).
Unfortunately, data collection and survey research are dangerous
and challenging in conflict zones and regions with high poverty or political instability. To fill this
gap, we identify alternative sources of data that can serve as indirect indicators of movement
for variables that cannot reasonably be collected directly (Step 3).
Specifically we focus on using
social media and newspaper data sources to fill this void. Step 4 focuses on determining which
conversation topics/signals are reasonable indirect indicators for specific conflict situations. For
example, is conversation about violence in Iraq a reasonable proxy for death counts - does a
relationship between the two exist? This step helps us map our indirect signals to the variables
that have been identified as theoretically important or as proxies for gaps in the data. This in
turn makes our modeling approach useful for both prediction, and understanding relationships
among drivers of movement.
Next, we consider different models for generating movement
predictions (Step 5).
Because we have spatial data and temporal data at different time
resolutions and spatial scales, we need models that can handle this variation effectively. Finally,
we must validate our results using any ground truth data that can be attained or through
manual validation (Step 6).
Singh, L., Wahedi, L., Wang, Y., Kirov, C., Wei, Y., Martin, S., Donato, K., Liu, Y., and Kawintiranon, K. (2019). Blending Noisy Social Media Signals with Traditional Movement Variables to Predict Forced Migration. ACM International Conference on Knowledge Discovery and Data Mining (KDD), Anchorage, Alaska. [Download]