By Alia Aghajanian
This post is mainly in response to the reactions of surprise and suspicion when talking to colleagues about my fieldwork. Yes, I am an economist, and yes, I have gone to the field!
Most PhD students at IDS spend the better half of the second year of their PhD programme conducting primary research with real research participants. This research can take the form of life history interviews, short focus group discussions, participant observation, ethnographic research and so on. So how does a research project that relies on numbers, rather than people, involve fieldwork[1]? Hopefully, this post will shed some light on what happens when an economist goes to the field, by describing the field data collection conducted for my PhD thesis.
Two months ago I conducted a household survey amongst residents of a Palestinian refugee camp in Lebanon (Nahr el Bared) who had been displaced to various locations throughout the North of Lebanon and beyond, due to a conflict that destroyed the camp in 2007. The idea behind the research is to evaluate the consequences of returning home after conflict-induced displacement – in terms of social and economic reintegration. While some qualitative research has shown that formerly displaced persons often face difficulties upon returning home[2], the quantitative literature is yet to address this question in a rigorous manner.
Once I had finessed my research questions, I set to work designing my household survey. Palestinian refugees in Lebanon are registered with the United Nations Relief and Works Agency (UNRWA), and after the conflict and the initiation of the reconstruction of the camp, UNRWA created a database of displaced refugees (including their addresses) which is updated on a quarterly basis. Using this database I was able to create my sample, and I aimed for 600 households – around 10% of the population. In order to guarantee that the sample collected was representative of the geographical distribution of the population under study, I set the sample size in each geographical area equal to 10% of its actual population size.
After designing the sample I started work on the questionnaire. Many months were spent developing this, as I drew from various different standardised questionnaires and then adapted the questions to suit my context. Particularly, I was interested in measuring social capital, and after an extensive literature review I developed a relatively large range of questions that aimed to capture different elements of social capital, from trust to social interactions. Unfortunately, a household survey does not allow the luxury of changing the questionnaire once data collection starts, so many run-throughs with friends, supervisors, and finally members of the data collection team ensured that it was as perfect as possible (inevitably, in hindsight there are many questions I wish I had also included!).
Planning the logistics of the actual data collection came next. As much as I would have liked to interview 600 households myself, time would not allow it[3]. So I needed to hire a team that I could trust, and more importantly that the research participants could trust. In an ideal world, data collectors would be complete strangers to the respondents, to ensure confidentiality and unbiased responses. On the other hand, the refugee camps that we were conducting research in were not safe environments, and can often turn hostile. I did not think it a good idea to bring in a team of strangers to the field sites and draw unnecessary attention to ourselves. For this reason, data collectors were local residents of the camps in the sample, but were assigned to sectors in the camp as far as possible from the sector that they lived in.
Prior to fieldwork starting we set up meetings with relevant persons in the field site communities (this included the camp services officers, leaders of political parties, and members of the camps’ popular committees) to obtain the necessary permissions and explain the aim of the household survey. I was questioned on the importance of my research, what new information my research could provide, and what policy implications my research could have, as there has already been quite a lot of research conducted on the residents of Nahr el-Bared[4]. A few days prior to fieldwork, these meetings were a reminder of the commitment I was making to the research participants. While no direct policy changes would necessarily result from my research (a point made clear to consenting participants), I made a commitment to understand how these households had been affected by the conflict, subsequent displacement, and return. As well as to do my best to communicate these findings not only to the academic world, I will try to disseminate my findings to policy makers who are concerned (or should be concerned) with the reintegration process of refugees and displaced persons around the world.
With the necessary permissions granted, we arrived to the field with a specific target of households to be interviewed for each field site. But we still needed to ensure that households were selected randomly within these areas. In areas where displaced households were clustered together, data collectors used what is known as the “right-hand-rule”. This means that after the first interview was completed, the data collector crossed eight[5] consecutive households on his/her right-hand path and interviewed the eighth household. Whenever data collectors reached an intersection, they took the lane on their right hand side and completed the route by turning to their right until they reached the main route again. In this way all areas of the camp were covered, and we ensured that all households had an equal probability of being selected. However, in areas where households were spread apart, households were tracked using addresses, phone numbers, and the key-contacts of the field supervisors[6].
As it was important that these protocols were followed correctly and consistently, and that data collectors were not interviewing the first household they could find (or worse, sitting under a tree and completing the required number of questionnaires), the field supervisors, coordinators and I conducted random checks on the data collectors and asked them to retrace their path with us.
Data collectors interviewed households using tablet PCs programmed with a structured questionnaire (which in turn required many weeks of testing!). This meant that at the end of each day I was able to extract all the data collected and convert it to an excel file. To the annoyance of the data collectors, I was then able to check for any inconsistencies and clear this up with them the next morning. Some examples of these inconsistencies are the sex of an individual not reflecting his/her name, or a 50 year old woman being the mother to a 45 year old man. If I was not able to clear these issues up with the data collectors, they went back to the household for further clarification.
In addition to making data validation easier, there was an all round consensus among the team that the tablet PCs were a great bonus to the data collection. Many questions in the questionnaire were related to the responses of previous questions, but luckily there was no need for data collectors to navigate through these difficult and complex skip codes, as the programme[7] used automatically calculated these. Inevitable errors that occur during data entry were minimised, as responses were only entered once. In addition, I was able to observe the data in real time, which was quite satisfactory after a long hard day of data collection. Also, rather than finding the tablet PCs intimidating, respondents were initially only interested in the tablets, and at this point data collectors found it much easier introducing themselves and the research project once their attention had been caught. Even during training, data collectors were eager to learn about this new and exciting technology, practicing the questionnaire amongst themselves and then later at home with their families.
After two weeks of intense data collection, we were done! I now have a dataset of 590[8] households ready to analyse. This dataset will allow me to describe certain demographic characteristics of the sample, being representative of the larger population. For example, I will be able to estimate the employment rates and education levels of the population under study, as well as observe community levels of social capital, cohesion, and integration. Collecting a representative and unbiased sample allows me to make scientific conclusions, such as whether the average literacy rate is statistically different for females and males. More importantly, I hope to estimate causal relationships and their statistical significance: For example, what is the effect of returning compared to prolonged displacement on levels of trust in neighbours, and is this effect significant?
While my sample is relatively large, it can be argued that I will not have as much depth to my research that qualitative research allows. Although a properly collected large dataset increases the representativeness and the unbiased position of the participants in the research, more focused research can provide insight and information which is not necessarily limited to a structured questionnaire. While I acknowledge this trade-off between depth and breadth, I still believe there is room for these two methods to complement each other, rather than to conveniently ignore and debase the other. In fact, much of the qualitative research on return migration and return after displacement has guided the hypotheses that I will empirically test with the data collected.
Unfortunately not many economists get to collect their own primary data. Organizing a household survey can be expensive and time-consuming. Researchers need to think twice before conducting a large survey. Is it worth the required resources? Is it worth the time of the respondents? And can another existing dataset answer the research questions just as effectively? Chris Blattman writes an interesting blog on the questions researchers (and I think this is especially relevant for quantitative researchers) should ask themselves before going to the field. On the other hand, collecting your own data can be a very rewarding experience, attaching human faces to what can otherwise be just numbers.
[1] In fact, most applied analysis conducted by micro-economists involves secondary datasets that were initially collected from the field. This blog gives an example of the collection of these kinds of datasets.
[2] Often refugees and internally displaced persons are away for so long that “home” has become a foreign environment. See Black, Richard and Saskia Gent. 2006. "Sustainable Return in Post-Conflict Contexts." International Migration, 44(3), 15-38.
[3] The role that I played during the fieldwork was that of a field supervisor, assigning tasks for each member of the team at the beginning of the day and checking up on the data collection process and resulting data throughout the day - and unfortunately long into the night.
[4] Having said that, during one meeting the relevant person seemed to be more interested in an upcoming Premier League football match than the relevance of my household survey.
[5] This number was calculated by dividing the number of households in the sample by the total population, and allowing a margin for refusals and unavailable households. Data collectors asked neighbours or directly knocked on the doors of dwellings to enquire as to whether households were originally from Nahr el Bared camp. This was because Nahr el Bared families are living alongside other Palestinian families and more recently alongside Syrian refugees, and we did not want the number of non-Nahr el Bared households to inflate the count number. If households were not from Nahr el-Bared then the household was not included in the count.
[6] Before heading to the field the supervisors met with various key informants, such as the Palestinian popular committee for relevant camps, the relevant camp services officer, local NGOs who had provided aid to displaced persons, the national Palestinian scouts group, notable media persons, and notables of Palestinian political parties.
[7] I used ODK, an open source programme developed by researchers at the University of Washington.
[8] This is slightly lower than the target as there was a difficulty in tracing certain households. While this is a very small “attrition rate”, I will still try to include robustness checks later on in my analysis to correct for this.