Enhancing tourism research with data science: a brief toolkit with the python programming language

Maria Fernanda Bernal Salazar
Elisa Baraibar-Diez
Jesús Collado Agudo
Abstract

Data science is positioned as a cross-cutting field that leverages different disciplines, transforming how knowledge is generated and analyzed. The application of this field as a methodological framework offers numerous opportunities, but also poses important challenges for researchers. In the field of tourism science, there is a need for further efforts to fully implement such innovative methodological approaches. This paper first highlights the importance of data science in tourism science research. Secondly, it provides a brief guide to relevant data science tools and models, as well as to the main libraries of the Python programming language that can be used by researchers who are starting to learn in this line. This guide includes specific applications for analyzing online tourist reviews on the TripAdvisor travel platform.

Article Details

Author Biographies

Elisa Baraibar-Diez, University of Cantabria

Elisa Baraibar-Diez, Ph.D., is a Full Professor of Business Organization in the Business Administration Department at the University of Cantabria. Her main lines of research are focused on corporate transparency, social responsibility and sustainability, social impact, reputation, and corporate governance, mainly. She has been coordinator of the Official Master in Business Administration (MBA) of the University of Cantabria and is currently Vice Dean of Planning, Digitization and International Relations GADE of the Faculty of Economic and Business Sciences at the University of Cantabria. She is a researcher in the R+D+i Economic Management Group for the Sustainable Development of the Primary Sector. Her research activity has generated 23 publications in national and international journals (17 indexed in the JCR and SJR indices), 38 books or book chapters, and 42 presentations at national and international conferences. She has participated in 34 research projects and contracts and has a six-year research period (period 2011-2017). She is currently directing 2 doctoral theses and has carried out research stays at the Institut für Management belonging to Humboldt Universität (Berlin), at Sun Yat-sen University (Guangzhou, China) and at La Trobe University (Melbourne, Australia).

Jesús Collado Agudo, University of Cantabria

Graduate in Business Administration and Management (1998) and PhD from the University of Cantabria (2004). Its main lines of work are relationship marketing, distribution channels and tourism marketing. He is currently Dean of the Faculty of Economic and Business Sciences and researcher of the Marketing Intelligence R+D+i Group. His research activity has led to the publication of 16 scientific articles published in international and national journals of recognized prestige and 4 book chapters. He has directed two doctoral theses and is currently directing an additional one. Finally, he has participated in more than 30 research projects with public and private funding.

Keywords:
Data science, Tourism research, Big data, User Generated Content, UGC
References

Alhoshan, W., Ferrari, A., & Zhao, L. (2023). Zero-shot learning for requirements classification: An exploratory study. Information and Software Technology, 159, 107202. https://doi.org/10.1016/J.INFSOF.2023.107202 DOI: https://doi.org/10.1016/j.infsof.2023.107202

António, N., & Rita, P. (2023). Twenty-two years of International Journal of Hospitality Management: A bibliometric analysis 2000-2021. International Journal of Hospitality Management, 114. https://doi.org/10.1016/j.ijhm.2023.103578 DOI: https://doi.org/10.1016/j.ijhm.2023.103578

Bigné, E., Oltra, E., & Andreu, L. (2019). Harnessing stakeholder input on Twitter: A case study of short breaks in Spanish tourist cities. Tourism Management, 71, 490-503. https://doi.org/10.1016/J.TOURMAN.2018.10.013 DOI: https://doi.org/10.1016/j.tourman.2018.10.013

Cai, Y., Li, G., Wen, L., & Liu, C. (2024). Intellectual landscape and emerging trends of big data research in hospitality and tourism: A scientometric analysis. International Journal of Hospitality Management, 117, 103633. https://doi.org/10.1016/J.IJHM.2023.103633 DOI: https://doi.org/10.1016/j.ijhm.2023.103633

Cañete, J., Chaperon, G., Fuentes, R., Ho, J.-H., Kang, H., & Pérez, J. (2020). Spanish Pre-trained BERT Model and Evaluation Data. ArXiv. https://doi.org/10.48550/arXiv.2308.02976

Cervera, D. de J., de Esteban Curiel, J., & Pérez-Bustamante Yábar, D. C. (2024). Machine Learning for short-term property rental pricing based on seasonality and proximity to food establishments. British Food Journal, 126(13), 332-352. https://doi.org/10.1108/BFJ-07-2023-0634 DOI: https://doi.org/10.1108/BFJ-07-2023-0634

Chai, C., Song, Y., & Qin, Z. (2021). A Thousand Words Express a Common Idea? Understanding International Tourists’ Reviews of Mt. Huangshan, China, through a Deep Learning Approach. Land, 10(6), 549. https://doi.org/10.3390/LAND10060549 DOI: https://doi.org/10.3390/land10060549

D’Acunto, D., Filieri, R., & Amato, S. (2024). Who is sharing green eWOM? Big data evidence from the travel and tourism industry. Journal of Sustainable Tourism, 1–23. https://doi.org/10.1080/09669582.2024.2328103 DOI: https://doi.org/10.1080/09669582.2024.2328103

D’Acunto, D., Tuan, A., Dalli, D., Viglia, G., & Okumus, F. (2020). Do consumers care about CSR in their online reviews? An empirical analysis. International Journal of Hospitality Management, 85. https://doi.org/10.1016/j.ijhm.2019.102342 DOI: https://doi.org/10.1016/j.ijhm.2019.102342

Daugherty, T., Eastin, M. S., & Bright, L. (2008). Exploring Consumer Motivations for Creating User-Generated Content. Journal of Interactive Advertising, 8(2), 16–25. https://doi.org/10.1080/15252019.2008.10722139 DOI: https://doi.org/10.1080/15252019.2008.10722139

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv. https://doi.org/10.48550/arXiv.1810.04805

Egger, R. (2022). Tourism on the verge. Applied data science in tourism: Interdisciplinary approaches, methodologies, and applications. Springer. DOI: https://doi.org/10.1007/978-3-030-88389-8

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. https://doi.org/10.1016/J.IJINFOMGT.2014.10.007 DOI: https://doi.org/10.1016/j.ijinfomgt.2014.10.007

García, S., Luengo, J., & Herrera, F. (2015). Data Preprocessing in Data Mining. Springer Cham. https://doi.org/10.1007/978-3-319-10247-4 DOI: https://doi.org/10.1007/978-3-319-10247-4

Garijo, D., Alper, P., Belhajjame, K., Corcho, O., Gil, Y., & Goble, C. (2014). Common motifs in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36, 338–351. https://doi.org/10.1016/J.FUTURE.2013.09.018 DOI: https://doi.org/10.1016/j.future.2013.09.018

George, G., Osinga, E., Lavie, D., & Scott, B. (2016). Big Data and Data Science Methods for Management Research. Academy of Management Journal, 59(5), 1493–1507. https://doi.org/10.5465/AMJ.2016.4005 DOI: https://doi.org/10.5465/amj.2016.4005

Grootendorst, M. (2021). MaartenGr/BERTopic: Fix embedding parameter. https://doi.org/10.5281/ZENODO.4430182

Guerrero-Rodriguez, R., Álvarez-Carmona, M., Aranda, R., & López-Monroy, A. P. (2023). Studying Online Travel Reviews related to tourist attractions using NLP methods: the case of Guanajuato, Mexico. Current Issues in Tourism, 26(2), 289–304. https://doi.org/10.1080/13683500.2021.2007227 DOI: https://doi.org/10.1080/13683500.2021.2007227

International Journal of Contemporary Hospitality Management. (n.d.-a). Virtual special issue: Artificial intelligence (AI) in hospitality and tourism. Emerald Publishing. Retrieved October 27, 2025, from https://www.emeraldgrouppublishing.com/journal/ijchm/virtual-special-issue-artificial-intelligence-ai-hospitality-and-tourism

International Journal of Contemporary Hospitality Management. (n.d.-b). Virtual special issue: Big data in hospitality and tourism. Emerald Publishing. Retrieved October 27, 2025, from https://www.emeraldgrouppublishing.com/journal/ijchm/virtual-special-issue-big-data-hospitality-and-tourism

Köseoglu, M. A., Mehraliyev, F., Altin, M., & Okumus, F. (2020). Competitor intelligence and analysis (CIA) model and online reviews: integrating big data text mining with network analysis for strategic analysis. Tourism Review, 76(3), 529–552. https://doi.org/10.1108/TR-10-2019-0406 DOI: https://doi.org/10.1108/TR-10-2019-0406

Lalicic, L., Marine-Roig, E., Ferrer-Rosell, B., & Martin-Fuentes, E. (2021). Destination image analytics for tourism design: An approach through Airbnb reviews. Annals of Tourism Research, 86, 103100. https://doi.org/10.1016/J.ANNALS.2020.103100 DOI: https://doi.org/10.1016/j.annals.2020.103100

León, C. J., Suárez-Rojas, C., Cazorla-Artiles, J. M., & González Hernández, M. M. (2025). Satisfaction and sustainability concerns in whale-watching tourism: A user-generated content model. Tourism Management, 106, 105019. https://doi.org/10.1016/J.TOURMAN.2024.105019 DOI: https://doi.org/10.1016/j.tourman.2024.105019

Li, J., Xu, L., Tang, L., Wang, S., & Li, L. (2018). Big data in tourism research: A literature review. Tourism Management, 68, 301–323. https://doi.org/10.1016/J.TOURMAN.2018.03.009 DOI: https://doi.org/10.1016/j.tourman.2018.03.009

Li, X., & Law, R. (2020). Network analysis of big data research in tourism. Tourism Management Perspectives, 33, 100608. https://doi.org/10.1016/J.TMP.2019.100608 DOI: https://doi.org/10.1016/j.tmp.2019.100608

Liang, X., Li, X., Shu, L., Wang, X., & Luo, P. (2024). Tourism demand forecasting using graph neural network. Current Issues in Tourism. https://doi.org/10.1080/13683500.2024.2320851 DOI: https://doi.org/10.1080/13683500.2024.2320851

Liu, Z., & Park, S. (2015). What makes a useful online review? Implication for travel product websites. Tourism Management, 47, 140–151. https://doi.org/10.1016/J.TOURMAN.2014.09.020 DOI: https://doi.org/10.1016/j.tourman.2014.09.020

Lu, W., & Stepchenkova, S. (2015). User-Generated Content as a Research Mode in Tourism and Hospitality Applications: Topics, Methods, and Software. Journal of Hospitality Marketing & Management, 24(2), 119–154. https://doi.org/10.1080/19368623.2014.907758 DOI: https://doi.org/10.1080/19368623.2014.907758

Manning, C. D., Raghavan, P., & Schütze, H. (2009). Introduction to Information Retrieval. In Introduction to Information Retrieval. Cambridge University Press. https://doi.org/10.1017/CBO9780511809071 DOI: https://doi.org/10.1017/CBO9780511809071

Mariani, M., & Baggio, R. (2022). Big data and analytics in hospitality and tourism: a systematic literature review. International Journal of Contemporary Hospitality Management, 34(1), 231–278. https://doi.org/10.1108/IJCHM-03-2021-0301 DOI: https://doi.org/10.1108/IJCHM-03-2021-0301

Mariani, M., Baggio, R., Fuchs, M., & Höepken, W. (2018). Business intelligence and big data in hospitality and tourism: a systematic literature review. International Journal of Contemporary Hospitality Management, 30(12), 3514-3554. https://doi.org/10.1108/IJCHM-07-2017-0461 DOI: https://doi.org/10.1108/IJCHM-07-2017-0461