Enhancing tourism research with data science: a brief toolkit with the python programming language
Article Sidebar

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Main Article Content
Data science is positioned as a cross-cutting field that leverages different disciplines, transforming how knowledge is generated and analyzed. The application of this field as a methodological framework offers numerous opportunities, but also poses important challenges for researchers. In the field of tourism science, there is a need for further efforts to fully implement such innovative methodological approaches. This paper first highlights the importance of data science in tourism science research. Secondly, it provides a brief guide to relevant data science tools and models, as well as to the main libraries of the Python programming language that can be used by researchers who are starting to learn in this line. This guide includes specific applications for analyzing online tourist reviews on the TripAdvisor travel platform.
Article Details
Elisa Baraibar-Diez, University of Cantabria
Elisa Baraibar-Diez, Ph.D., is a Full Professor of Business Organization in the Business Administration Department at the University of Cantabria. Her main lines of research are focused on corporate transparency, social responsibility and sustainability, social impact, reputation, and corporate governance, mainly. She has been coordinator of the Official Master in Business Administration (MBA) of the University of Cantabria and is currently Vice Dean of Planning, Digitization and International Relations GADE of the Faculty of Economic and Business Sciences at the University of Cantabria. She is a researcher in the R+D+i Economic Management Group for the Sustainable Development of the Primary Sector. Her research activity has generated 23 publications in national and international journals (17 indexed in the JCR and SJR indices), 38 books or book chapters, and 42 presentations at national and international conferences. She has participated in 34 research projects and contracts and has a six-year research period (period 2011-2017). She is currently directing 2 doctoral theses and has carried out research stays at the Institut für Management belonging to Humboldt Universität (Berlin), at Sun Yat-sen University (Guangzhou, China) and at La Trobe University (Melbourne, Australia).
Jesús Collado Agudo, University of Cantabria
Graduate in Business Administration and Management (1998) and PhD from the University of Cantabria (2004). Its main lines of work are relationship marketing, distribution channels and tourism marketing. He is currently Dean of the Faculty of Economic and Business Sciences and researcher of the Marketing Intelligence R+D+i Group. His research activity has led to the publication of 16 scientific articles published in international and national journals of recognized prestige and 4 book chapters. He has directed two doctoral theses and is currently directing an additional one. Finally, he has participated in more than 30 research projects with public and private funding.
Alhoshan, W., Ferrari, A., & Zhao, L. (2023). Zero-shot learning for requirements classification: An exploratory study. Information and Software Technology, 159, 107202. https://doi.org/10.1016/J.INFSOF.2023.107202 DOI: https://doi.org/10.1016/j.infsof.2023.107202
António, N., & Rita, P. (2023). Twenty-two years of International Journal of Hospitality Management: A bibliometric analysis 2000-2021. International Journal of Hospitality Management, 114. https://doi.org/10.1016/j.ijhm.2023.103578 DOI: https://doi.org/10.1016/j.ijhm.2023.103578
Bigné, E., Oltra, E., & Andreu, L. (2019). Harnessing stakeholder input on Twitter: A case study of short breaks in Spanish tourist cities. Tourism Management, 71, 490-503. https://doi.org/10.1016/J.TOURMAN.2018.10.013 DOI: https://doi.org/10.1016/j.tourman.2018.10.013
Cai, Y., Li, G., Wen, L., & Liu, C. (2024). Intellectual landscape and emerging trends of big data research in hospitality and tourism: A scientometric analysis. International Journal of Hospitality Management, 117, 103633. https://doi.org/10.1016/J.IJHM.2023.103633 DOI: https://doi.org/10.1016/j.ijhm.2023.103633
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.-H., Kang, H., & Pérez, J. (2020). Spanish Pre-trained BERT Model and Evaluation Data. ArXiv. https://doi.org/10.48550/arXiv.2308.02976
Cervera, D. de J., de Esteban Curiel, J., & Pérez-Bustamante Yábar, D. C. (2024). Machine Learning for short-term property rental pricing based on seasonality and proximity to food establishments. British Food Journal, 126(13), 332-352. https://doi.org/10.1108/BFJ-07-2023-0634 DOI: https://doi.org/10.1108/BFJ-07-2023-0634
Chai, C., Song, Y., & Qin, Z. (2021). A Thousand Words Express a Common Idea? Understanding International Tourists’ Reviews of Mt. Huangshan, China, through a Deep Learning Approach. Land, 10(6), 549. https://doi.org/10.3390/LAND10060549 DOI: https://doi.org/10.3390/land10060549
D’Acunto, D., Filieri, R., & Amato, S. (2024). Who is sharing green eWOM? Big data evidence from the travel and tourism industry. Journal of Sustainable Tourism, 1–23. https://doi.org/10.1080/09669582.2024.2328103 DOI: https://doi.org/10.1080/09669582.2024.2328103
D’Acunto, D., Tuan, A., Dalli, D., Viglia, G., & Okumus, F. (2020). Do consumers care about CSR in their online reviews? An empirical analysis. International Journal of Hospitality Management, 85. https://doi.org/10.1016/j.ijhm.2019.102342 DOI: https://doi.org/10.1016/j.ijhm.2019.102342
Daugherty, T., Eastin, M. S., & Bright, L. (2008). Exploring Consumer Motivations for Creating User-Generated Content. Journal of Interactive Advertising, 8(2), 16–25. https://doi.org/10.1080/15252019.2008.10722139 DOI: https://doi.org/10.1080/15252019.2008.10722139
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv. https://doi.org/10.48550/arXiv.1810.04805
Egger, R. (2022). Tourism on the verge. Applied data science in tourism: Interdisciplinary approaches, methodologies, and applications. Springer. DOI: https://doi.org/10.1007/978-3-030-88389-8
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. https://doi.org/10.1016/J.IJINFOMGT.2014.10.007 DOI: https://doi.org/10.1016/j.ijinfomgt.2014.10.007
García, S., Luengo, J., & Herrera, F. (2015). Data Preprocessing in Data Mining. Springer Cham. https://doi.org/10.1007/978-3-319-10247-4 DOI: https://doi.org/10.1007/978-3-319-10247-4
Garijo, D., Alper, P., Belhajjame, K., Corcho, O., Gil, Y., & Goble, C. (2014). Common motifs in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36, 338–351. https://doi.org/10.1016/J.FUTURE.2013.09.018 DOI: https://doi.org/10.1016/j.future.2013.09.018
George, G., Osinga, E., Lavie, D., & Scott, B. (2016). Big Data and Data Science Methods for Management Research. Academy of Management Journal, 59(5), 1493–1507. https://doi.org/10.5465/AMJ.2016.4005 DOI: https://doi.org/10.5465/amj.2016.4005
Grootendorst, M. (2021). MaartenGr/BERTopic: Fix embedding parameter. https://doi.org/10.5281/ZENODO.4430182
Guerrero-Rodriguez, R., Álvarez-Carmona, M., Aranda, R., & López-Monroy, A. P. (2023). Studying Online Travel Reviews related to tourist attractions using NLP methods: the case of Guanajuato, Mexico. Current Issues in Tourism, 26(2), 289–304. https://doi.org/10.1080/13683500.2021.2007227 DOI: https://doi.org/10.1080/13683500.2021.2007227
International Journal of Contemporary Hospitality Management. (n.d.-a). Virtual special issue: Artificial intelligence (AI) in hospitality and tourism. Emerald Publishing. Retrieved October 27, 2025, from https://www.emeraldgrouppublishing.com/journal/ijchm/virtual-special-issue-artificial-intelligence-ai-hospitality-and-tourism
International Journal of Contemporary Hospitality Management. (n.d.-b). Virtual special issue: Big data in hospitality and tourism. Emerald Publishing. Retrieved October 27, 2025, from https://www.emeraldgrouppublishing.com/journal/ijchm/virtual-special-issue-big-data-hospitality-and-tourism
Köseoglu, M. A., Mehraliyev, F., Altin, M., & Okumus, F. (2020). Competitor intelligence and analysis (CIA) model and online reviews: integrating big data text mining with network analysis for strategic analysis. Tourism Review, 76(3), 529–552. https://doi.org/10.1108/TR-10-2019-0406 DOI: https://doi.org/10.1108/TR-10-2019-0406
Lalicic, L., Marine-Roig, E., Ferrer-Rosell, B., & Martin-Fuentes, E. (2021). Destination image analytics for tourism design: An approach through Airbnb reviews. Annals of Tourism Research, 86, 103100. https://doi.org/10.1016/J.ANNALS.2020.103100 DOI: https://doi.org/10.1016/j.annals.2020.103100
León, C. J., Suárez-Rojas, C., Cazorla-Artiles, J. M., & González Hernández, M. M. (2025). Satisfaction and sustainability concerns in whale-watching tourism: A user-generated content model. Tourism Management, 106, 105019. https://doi.org/10.1016/J.TOURMAN.2024.105019 DOI: https://doi.org/10.1016/j.tourman.2024.105019
Li, J., Xu, L., Tang, L., Wang, S., & Li, L. (2018). Big data in tourism research: A literature review. Tourism Management, 68, 301–323. https://doi.org/10.1016/J.TOURMAN.2018.03.009 DOI: https://doi.org/10.1016/j.tourman.2018.03.009
Li, X., & Law, R. (2020). Network analysis of big data research in tourism. Tourism Management Perspectives, 33, 100608. https://doi.org/10.1016/J.TMP.2019.100608 DOI: https://doi.org/10.1016/j.tmp.2019.100608
Liang, X., Li, X., Shu, L., Wang, X., & Luo, P. (2024). Tourism demand forecasting using graph neural network. Current Issues in Tourism. https://doi.org/10.1080/13683500.2024.2320851 DOI: https://doi.org/10.1080/13683500.2024.2320851
Liu, Z., & Park, S. (2015). What makes a useful online review? Implication for travel product websites. Tourism Management, 47, 140–151. https://doi.org/10.1016/J.TOURMAN.2014.09.020 DOI: https://doi.org/10.1016/j.tourman.2014.09.020
Lu, W., & Stepchenkova, S. (2015). User-Generated Content as a Research Mode in Tourism and Hospitality Applications: Topics, Methods, and Software. Journal of Hospitality Marketing & Management, 24(2), 119–154. https://doi.org/10.1080/19368623.2014.907758 DOI: https://doi.org/10.1080/19368623.2014.907758
Manning, C. D., Raghavan, P., & Schütze, H. (2009). Introduction to Information Retrieval. In Introduction to Information Retrieval. Cambridge University Press. https://doi.org/10.1017/CBO9780511809071 DOI: https://doi.org/10.1017/CBO9780511809071
Mariani, M., & Baggio, R. (2022). Big data and analytics in hospitality and tourism: a systematic literature review. International Journal of Contemporary Hospitality Management, 34(1), 231–278. https://doi.org/10.1108/IJCHM-03-2021-0301 DOI: https://doi.org/10.1108/IJCHM-03-2021-0301
Mariani, M., Baggio, R., Fuchs, M., & Höepken, W. (2018). Business intelligence and big data in hospitality and tourism: a systematic literature review. International Journal of Contemporary Hospitality Management, 30(12), 3514-3554. https://doi.org/10.1108/IJCHM-07-2017-0461 DOI: https://doi.org/10.1108/IJCHM-07-2017-0461