Research Methods

Our high level combined methodology can be summarized in the figure to the right. We begin by working with migration researchers to understand the broad factors people consider when deciding whether or not to migrate (Step 1).

Our next step is to identify data sources for relevant variables (Step 2).

Unfortunately, data collection and survey research are dangerous and challenging in conflict zones and regions with high poverty or political instability. To fill this gap, we identify alternative sources of data that can serve as indirect indicators of movement for variables that cannot reasonably be collected directly (Step 3).

Specifically we focus on using social media and newspaper data sources to fill this void. Step 4 focuses on determining which conversation topics/signals are reasonable indirect indicators for specific conflict situations. For example, is conversation about violence in Iraq a reasonable proxy for death counts - does a relationship between the two exist? This step helps us map our indirect signals to the variables that have been identified as theoretically important or as proxies for gaps in the data. This in turn makes our modeling approach useful for both prediction, and understanding relationships among drivers of movement.

Next, we consider different models for generating movement predictions (Step 5).

Because we have spatial data and temporal data at different time resolutions and spatial scales, we need models that can handle this variation effectively. Finally, we must validate our results using any ground truth data that can be attained or through manual validation (Step 6).

Responsive image


Singh, L., Wahedi, L., Wang, Y., Kirov, C., Wei, Y., Martin, S., Donato, K., Liu, Y., and Kawintiranon, K. (2019). Blending Noisy Social Media Signals with Traditional Movement Variables to Predict Forced Migration. ACM International Conference on Knowledge Discovery and Data Mining (KDD), Anchorage, Alaska. [Download]