The possibilities of linking together sets of data

I saw multiple interesting presentations at ASA this year that linked together several datasets to develop robust analysis and interesting findings. These data sources included government data, data collected by the researchers, and other available data. Doing this unlocks a lot of possibilities for answering research questions.

Photo by Manuel Geissinger on Pexels.com

But, how might this happen more regularly? Or, put differently, how might more researchers use multiple datasets in a single project? Here are some quick thoughts on what could help make this possible:

-More access to data. Some data is publicly available. Other data is restricted for a variety of reasons. Having more big datasets accessible opens up possibilities. Just knowing where to request data is a process plus whatever applications and/or resources might be needed to access it.

-Having the know-how to put datasets together. It takes work to become familiar with a single dataset. To be able to merge data requires additional work. I do not know if it would be useful to offer more instruction in doing this or whether it matters which individual datasets are involved.

-Asking research questions gets more interesting and complicated with more variables and layers at play. Constructing sets of questions that build on the strengths of the combined data is a skill.

-Including more – but concise and understandable – explanations of how the data was merged in publications can help demystify the process.

And with all of this data innovation, it is interesting to consider how projects that link multiple datasets complement and come alongside other projects with only one source of data.

Leave a comment