Deciphering Migrants’ Letters

Félix Krawatzek  and Gwendolyn Sasse reflect on the research that informed their essay, “Integration and Identities: The Effects of Time, Migrant Networks, and Political Crises on Germans in the United States.”

Creating a digital corpus of migrant letters from scratch can be a daunting task, even if one starts from excellent primary sources. The collection we work with was started by Walter D. Kamphoefner and Wolfgang Helbich in the 1980s and is housed in the Forschungsbibliothek Gotha, Germany, administered by Ursula Lehmkuhl. At over eight thousand letters, this is probably the single largest collection of migrant letters, and certainly the most comprehensive one of “ordinary” German-speaking migrants. Fortunately, we could rely on typewritten transcripts of the handwritten letters when creating our digital corpus. However, even the typewritten versions of the letters posed various challenges when they were to be transferred into a digital database. At times, historians who worked with a subset of the transcribed letters annotated them by hand. This added information on individual pages complicated the digitalization of the overall corpus. These and other difficulties did not allow for a fully automated optical character recognition of the digital image files. Instead, we used a specially designed software, Intranda, which let us to identify those terms with no matches in a dictionary and speed up the correction process. Nonetheless, irregular fonts, pale ink, hyphens, line breaks, punctuation, or page breaks in the typewritten transcripts made text recognition a challenge. We had to frequently compare them with the original scans of the letters and this slowed the creation of a digital corpus.

Once we had retrieved and assembled the individual letters and were satisfied that that the corpus could be searched satisfactorily, we collected metadata for each letter and the individual letter writers in a database. This step brought new challenges, such as variations in the spelling of names or letter writers changing their names as they became Americanized, imprecise locations (“deep in the woods near Michigan”; “in a cornfield near Delaware,”) and changing place names or the same settlement names recurring in different U.S. states. A socioeconomic classification of the writers remains difficult since today’s categories do not provide a close fit for historical experiences and the ways in which they were recorded. For example, one letter writer described himself as a “pianist and factory worker,” while others simply report that they “go to the shop” every day without indicating a specific profession or activity. This part of the database is therefore incomplete, and we continue to ponder how to classify these self-reported professions to best describe the socioeconomic mobility that was communicated across the Atlantic.

The time-consuming preparation of the digital corpus allowed us to look at our sources from different angles, using a blend of different text analytical methods and varying the levels of analysis. This let us balance analyses of the entire corpus with those of individual stories. Our publications in several academic journals in political science, migration studies, and interdisciplinary studies trace the path of our thematic exploration of the corpus to date. This step-by-step process reinforced our commitment to co-author a monograph that will move more smoothly between different levels of analysis and to integrate the details of individual migration experiences with patterns characteristic of the overall corpus.

Initially, we expected to use topic modeling and clustering techniques along with corpus analytic techniques, but it became apparent that for our first paper we needed a different method, one that facilitated an understanding of the detail in the structure and content of the corpus. We chose a qualitative coding strategy and developed a detailed initial coding tree. We coded a set of one hundred letters by attributing interpretive tags to parts of them and then carefully reducing the number of codes in an iterative coding process. Once the complexity of the coding scheme corresponded to the questions we were most interested in asking, we coded one thousand letters that capture the overall geographic and temporal variations in the larger corpus. This subset of the corpus is the basis of our CSSH article and its appendix explains our method.

Our initial thematic focus was on the transatlantic and German-American migrant networks and their compositions, development, and intersections over a period of 150 years. Migrant networks are a prominent factor in migration research, but usually feature as a factor that explains other outcomes. They are rarely treated as something worth exploring from within. The letters provide us with a good basis to study the strong and weak ties underpinning different types of migrant networks and their changing functions over time. Our first article on this theme was published in the leading German comparative politics journal, Zeitschrift für vergleichende Politikwissenschaft (2018) (“Migrantische Netzwerke und Integration: Das transnationale Kommunikationsfeld deutscher Einwandererfamilien in den USA“). There we focused on the interconnections between the transatlantic family and friendship networks and the German-American networks. While the respective importance of these networks varied over time, they coexisted and reinforced each other.

Our second article, “The Simultaneity of Feeling German and Being American: Analyzing 150 Years of Private Migrant Correspondence,” was published in Migration Studies (2018). We engaged with another central theme in migration research, integration into the host country, through an analysis of migrant perceptions of social and cultural integration and the remaking of migrant identities. We applied a fine-grained computerized text analysis commonly used in corpus linguistics that concentrates on collocations, the statistically relevant correlations between specific key words derived from our coding process and their neighbouring terms. The second terms from the collocation pairs were, in turn, used as search tools applied the corpus, which allowed for a better understanding of the contexts of the keywords. Our analysis, which once again benefitted from the extended time-period captured by the collection of letters, highlighted the intrinsic, strong incentives for social and cultural integration in the absence of pressure exerted through restrictive host-state policies. We further identified how political events and crises affecting both the country of origin and the destination country act as a catalyst in mobilizing and redefining migrant identities in relation to both sending states and host states.

Our CSSH article (2018), “Integration and Identities: The Effects of Time, Migrant Networks, and Political Crises on Germans in the United States,” further develops the themes of integration and transnationalism by combining structural quantitative insights into the corpus patterns with a deep qualitative understanding of individual letters and by embedding the findings into the historical scholarship on German immigration in the United States and comparative cases. We show that international crises affecting both the origin and destination countries are a key cross-temporal factor that can influence migrant perceptions of integration. Such moments of crisis can both renew migrants’ identification with their old homeland and lead them to reassess their attachment to their new homeland. This dynamic calls into question the usefulness of strict empirical or conceptual distinctions made on the basis of migrant generation.

We are completing a fourth article that analyzes the variation in destination characteristics, the importance of which has been highlighted in recent social science research on migrant transnational behavior, across four U.S. states (Illinois, Wisconsin, Ohio and Missouri) and the corresponding integration narratives in those states expressed in the migrant letters. We focus on a concrete policy difference across the four states, namely, legislation on language instruction in schools. We collected data about schooling policies across the United States between the 1830s and the 1920s. We coded the types of schools affected (public and/or private), the target language of the policy, the type of provision, and the target groups. The corpus linguistic perspective on the use of terms relating to schooling as well as the sense of belonging does not allow us to identify clear patterns by state or policy regime. We followed up by identifying two different integration trajectories derived from an in-depth qualitative analysis of two letter series, one from Missouri, one from Illinois. These series illuminated ways in which migrant families dealt with the availability of German-language instruction in schools (or lack thereof) and the coping strategies they developed in response. With or without an explicit state policy on language, the migrants recognized the advantages of learning the language of the host country while they also valued bilingualism and maintained German customs and social networks.

One thing is certain: our continuing exploration of the corpus will keep us interested and busy for several years. Thus far, each step has generated new directions of study and combinations of methodological approaches. The corpus has enabled us to converse with a wider literature and range of scholars than either of us had previously experienced, making the project much bigger and more rewarding than we could have anticipated at the outset.