Codes and Documentation
Below are the Stata .do files used to harmonize every country for which we have relevant data. Conditional on having the relevant data, these codes turn a given country's raw data into the harmonized version we use. There are 4 codes provided:
A conceptual code that allows a generic country's dataset to be harmonized. It requires you to input variable names for each harmonized variable we use.
Harmonization code for each country in our dataset. Clicking the country name below automatically downloads the file.
Crosswalks for industry and occupation used to harmonize those variables.
Our weight adjustment code. This is used to modify observational weights to adjust for any potential bias created when using the panel component of the data. This turns out not to matter for the results (see the paper for details) but is generally best practice.
To fully build the dataset we use in "Labor Market Dynamics and Development," you would:
Download the raw data from statistical agencies.
Run harmonization codes for each individual country, C. This creates datasets CrossSectional_C.dta and RawPanel_C.dta.
Run the ex post weight adjustment code over all countries. This creates AdjPanel_C.dta, which is the same as RawPanel_C.dta with new weights.
A few things to note/words of caution:
Be sure to set your file structure to be consistent with ours to avoid errors. We generally defer to whatever structure the data arrive in, so structure will differ across countries.
Some countries have multiple cleaning files. This is mostly related to survey re-designs over time.
We write our own crosswalks to harmonize industry and occupation. Those are included here. If your personal data gets ahead of our crosswalks (i.e., you add 2023 data then apply our crosswalk, or a country adjusts its coding scheme), there is some chance it will link incorrectly or not at all.
The occupation and industrial crosswalks include 1 and 2 digit harmonization. We use the 1 digit in our work. Use the 2-digit with caution. It requires some judgement calls on how to allocate local codes. We think it is ~ the best harmonization available for this set of countries, but whether the absolute quality is sufficient likely depends on the research application.