class: center, middle, inverse, title-slide # Using R in Latin America: the great, the good, the bad, and the ugly ## Uso de R en Latinoamérica: fortalezas, desafíos y debilidades ### Virginia A. García Alonso; Paola Corrales; Claudia A. Huaylla; Andrea Gómez Vargas; Joselyn Chávez; Denisse Fierro Arcos --- ## The initiative .pull-left[ The R environment is globally used. How is it used in Latin America where * the language is different, * infrastructure is not guaranteed, * access to resources is scarcer, * ...? Objective: to get to know the community of R users in Latin America. ] .pull-right[ <img src="data:image/png;base64,#fig/iconocontexto.png" title="Illustrative image of Latin America with the text 'First Latin American survey on the use of R’" alt="Illustrative image of Latin America with the text 'First Latin American survey on the use of R’" width="100%" /> ] ??? Hi! My name is Andrea and together with Paola, Denisse, Virginia, Claudia and Joselyn will be presenting our work about Using R in Latin America: the great, the good, the bad and the ugly. What was our work about? We know that the R programming language is used globally in a wide variety of projects. However, how widely is R used in regions such as Latin America where English is not the official language, and where infrastructure and access to various resources, including technology, are not guaranteed? In August 2020, members from the R user community in Latin America got together to create a survey with the aim of knowing more about Latinamerican R users. We sought to identify what our potential strengths are and what challenges we face. --- ## The survey .pull-left[ * 3 languages + Spanish + Portuguese + English * 18 people collaborated in its design * 31 optional questions ] .pull-right[ * 5 axes of interest + Interest in R + Demographics + Education + R and the community + Use of R ] <img src="data:image/png;base64,#fig/convocatoria2020.png" title="Flyers used to promote the survey with the text 'Participate in the first Latin American survey on the use of R.' in Spanish, English and Portuguese, along with the R-forwards logo and the logos of the social networks used: Twitter, Slack and Telegram." alt="Flyers used to promote the survey with the text 'Participate in the first Latin American survey on the use of R.' in Spanish, English and Portuguese, along with the R-forwards logo and the logos of the social networks used: Twitter, Slack and Telegram." width="75%" style="display: block; margin: auto;" /> ??? With our aim and objectives in mind, we designed a survey of 31 questions which addressed five main axes: their interest in R, demographic information, their academic level, R community engagement, and the area in which they use R. In order to promote inclusiveness, the survey was conducted in three languages: Spanish, Portuguese and English so that people would respond in the language they felt most comfortable with. The survey was shared through social media channels, such as Twitter, Telegram and Slack from R groups. People born and/or resident in Latin America were invited to participate in this survey. This project was possible thanks to the great work and support of many people who added questions, helped to correct errors and translated questions. The survey became an R-Forwards project, which is the branch of the R Foundation concerned in improving inclusion and diversity in the R community. --- class: chapter-slide # Who responded to the survey? ??? Who completed the survey? --- ## Demographics More than 900 respondents From 26 countries (people born and residing in Latin America) + 3% of people now reside in North America, Europe and Oceania .center.middle[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-3-1.svg" title="World map showing with color gradient the number of respondents by country of birth, from zero to 180. Most of the respondents were born in Latin America, with Argentina, Brazil, Colombia and Mexico being the countries with the highest number of responses. Also highlighted are the United States and some countries in Europe and Asia where some of the respondents currently residing in Latin America were born." alt="World map showing with color gradient the number of respondents by country of birth, from zero to 180. Most of the respondents were born in Latin America, with Argentina, Brazil, Colombia and Mexico being the countries with the highest number of responses. Also highlighted are the United States and some countries in Europe and Asia where some of the respondents currently residing in Latin America were born." style="display: block; margin: auto;" /> ] ??? Over 900 people completed the survey. They came from 26 different countries, mostly from Latin America, of whom a small proportion were born outside the region, but currently reside in a Latin American country. About 3% of Latin American born responders currently reside in North America, Europe and Oceania. --- ## Age, gender and academic degree .pull-left[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-4-1.svg" title="Vertical bar graph showing the age of respondents on the x-axis and the frequency of responses on the y-axis. The bars have different colors according to the gender of the respondents (male, female, other gender). The bars on the x-axis range from 19 to 69 years of age, with maximum frequencies near age 30. The number of responses is similar for males and females, representing almost all the responses between them." alt="Vertical bar graph showing the age of respondents on the x-axis and the frequency of responses on the y-axis. The bars have different colors according to the gender of the respondents (male, female, other gender). The bars on the x-axis range from 19 to 69 years of age, with maximum frequencies near age 30. The number of responses is similar for males and females, representing almost all the responses between them." width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-5-1.svg" title="Pie chart showing the highest academic level of the survey respondents. 35% have a master's degree, 33% have a university degree, 19% have a doctorate, while the rest of the population has a secondary, tertiary or technical degree." alt="Pie chart showing the highest academic level of the survey respondents. 35% have a master's degree, 33% have a university degree, 19% have a doctorate, while the rest of the population has a secondary, tertiary or technical degree." width="85%" style="display: block; margin: auto 0 auto auto;" /> ] ??? The survey was completed by people between 19 and 69 years old, with most being under 40. Responders who identified with a gender were almost evenly divided between females and males, and 2% identified as either being gender diverse, no gender or preferred not to answer. From an education perspective, most people had a tertiary education, with 87% saying that they had a bachelor, masters or doctorate degree. This proportion is well over the average for Latin America. About 13% said that they had a technical, associate degree or high school diploma. All responders had at least completed a high school diploma. --- ## How do they use R? .pull-left[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-6-1.svg" title="Horizontal bar chart showing the sector in which the users who responded to the survey work on the y-axis, the percentage of total responses on the x-axis and the main activities in which they use R in different colors. Almost 60% of the survey respondents are engaged in Research or Development, about 20% work in the education sector, 12% work in industry or the private sector, 10% work in government agencies and 2% work in other areas. In all sectors, between 85 and 90% of the use of R corresponds to data analysis and visualization, followed by its use for document presentation, web page development and package development." alt="Horizontal bar chart showing the sector in which the users who responded to the survey work on the y-axis, the percentage of total responses on the x-axis and the main activities in which they use R in different colors. Almost 60% of the survey respondents are engaged in Research or Development, about 20% work in the education sector, 12% work in industry or the private sector, 10% work in government agencies and 2% work in other areas. In all sectors, between 85 and 90% of the use of R corresponds to data analysis and visualization, followed by its use for document presentation, web page development and package development." width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-7-1.svg" title="Vertical bar graph showing on the 'x' axis the years of experience using R and on the 'y' axis the percentage of total responses. 23% have less than 2 years of experience, 36% have between 2 and 5 years of experience. 27% have between 5 and 10 years of experience, 13% have more than 10 years of experience and 1% did not answer." alt="Vertical bar graph showing on the 'x' axis the years of experience using R and on the 'y' axis the percentage of total responses. 23% have less than 2 years of experience, 36% have between 2 and 5 years of experience. 27% have between 5 and 10 years of experience, 13% have more than 10 years of experience and 1% did not answer." width="100%" /> ] ??? The majority of respondents, about 60%, use R mainly in the field of research and development. Education was the second most common area where R was used, but people also used it in the public and private sectors and in a variety of industries such as design, finance and journalism just to name a few. We also found most people use R mainly for data analysis and visualization. We then asked about how long they used R. 40% of respondents said they used R for 5 years or more, which seems to be more common among people with the highest levels of education. However, about 50% of all people surveyed mentioned that they were relatively inexperienced with R, with one fifth using this language for two years or less. --- class: chapter-slide # What challenges do people using R face? ??? So, now we know a little bit about how the respondents use R, but does living in Latin America entail additional challenges during their R experience? --- ## What we know about English <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-8-1.svg" title="Vertical bar graph showing on the x-axis the native language of the survey respondents and on the y-axis the percentage of responses. Each native language bar is subdivided into colors according to the level of English proficiency. 85% of respondents speak Spanish as their first language, almost 13% speak Portuguese, 1% speak other languages such as Quechua and Dutch, and less than 1% speak English as their native language. Of those who speak Spanish or Portuguese, about 80% have an intermediate to advanced level of English, about 10% are bilingual, 9% have a basic level, and less than 1% have no knowledge of English. While those whose first language is English have an intermediate, advanced or bilingual level, those whose first language is Spanish or Portuguese have an intermediate to advanced level of English." alt="Vertical bar graph showing on the x-axis the native language of the survey respondents and on the y-axis the percentage of responses. Each native language bar is subdivided into colors according to the level of English proficiency. 85% of respondents speak Spanish as their first language, almost 13% speak Portuguese, 1% speak other languages such as Quechua and Dutch, and less than 1% speak English as their native language. Of those who speak Spanish or Portuguese, about 80% have an intermediate to advanced level of English, about 10% are bilingual, 9% have a basic level, and less than 1% have no knowledge of English. While those whose first language is English have an intermediate, advanced or bilingual level, those whose first language is Spanish or Portuguese have an intermediate to advanced level of English." style="display: block; margin: auto;" /> ??? R was built mostly by English speakers and most package documentation is available only in English, so we wondered whether knowledge of English would be a barrier to learning and using R by Latin Americans, who mostly speak Spanish or Portuguese. Spanish was the first language of 85% of survey respondents, followed by Portuguese and English. About 1% of people reported Quechua, Dutch and other official Latin American languages to be their first language. However, the majority of Spanish and Portuguese speakers reported to have an advanced level of English, with a minority classifying themselves as beginners and less than 1% reporting to not speak English at all. Surprisingly, when we asked if English was a barrier to learning about R or seeking support with programming issues, less than 25% said that was true for them. Due to the structure of our survey and to the demographics of people who completed it, we could not assess whether knowledge of English was essential to learn R, if they already spoke English and that allowed them to learn R, or if our results are linked to the profile of our respondents, highly educated and bilingual people. --- ## Participation in conferences <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-9-1.svg" title="Horizontal bar graph showing on the 'y' axis some barriers that respondents face in attending R conferences and on the 'x' axis the number of people who responded affirmatively to each type of barrier; each bar is colored with the percentage of people who were able to attend or not attend the event. About 30% of respondents find the cost of attending conferences to be high, considering the registration fee and travel costs. About 7% find the event too time-consuming, 5% responded that they do not feel comfortable attending the events, while less than 3% find other difficulties such as not feeling represented in the presentations or lack of technical resources to follow the event. Less than half of the people who encountered difficulties were able to attend the event." alt="Horizontal bar graph showing on the 'y' axis some barriers that respondents face in attending R conferences and on the 'x' axis the number of people who responded affirmatively to each type of barrier; each bar is colored with the percentage of people who were able to attend or not attend the event. About 30% of respondents find the cost of attending conferences to be high, considering the registration fee and travel costs. About 7% find the event too time-consuming, 5% responded that they do not feel comfortable attending the events, while less than 3% find other difficulties such as not feeling represented in the presentations or lack of technical resources to follow the event. Less than half of the people who encountered difficulties were able to attend the event." /> ??? We also wondered if conference attendance represented a challenge for Latin American R users. Mostly because international conferences tend to take place almost exclusively in the US and Europe. We asked if people knew and/or attended one of 10 major R related international events. Most people said they had no knowledge of these events and thus did not attend in the last five years. Even LatinR, which we expected to be more widely recognized as it is the major R event in the region, was only known by 14% of all respondents. When asked about barriers preventing event participation, over 30% of respondents said that the events were too expensive, and only half have actually attended a conference. Total duration of events and discomfort related to either not speaking the language in which the conference was delivered or with lack of diversity, as well as lack of resources were identified as major barriers preventing people attending these events. --- class: chapter-slide # R communities ??? We also wanted to explore the role of R communities in Latin America. --- ## Which communities do they belong to? 40% of the people who responded to the survey are part of some R community. <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-10-1.svg" title="Horizontal bar chart showing on the y-axis the name of various R communities and on the x-axis the percentage of membership in each community, taking as total the number of people who claimed to belong to an R community. Of the 349 people who belong to a community, more than 40% belong to a local R-Ladies chapter, almost 20% belong to LatinR, about 15% belong to the RUG R User Group, about 3% belong to ConectaR, 2% to MiR, 1% to RainbowR and about 4% belong to other communities not mentioned in the survey options." alt="Horizontal bar chart showing on the y-axis the name of various R communities and on the x-axis the percentage of membership in each community, taking as total the number of people who claimed to belong to an R community. Of the 349 people who belong to a community, more than 40% belong to a local R-Ladies chapter, almost 20% belong to LatinR, about 15% belong to the RUG R User Group, about 3% belong to ConectaR, 2% to MiR, 1% to RainbowR and about 4% belong to other communities not mentioned in the survey options." style="display: block; margin: auto;" /> ??? Roughly 40% of respondents belonged to at least one community, of which at least a quarter declared they were members of two or more communities, with some being members of at least five. R-Ladies chapters were the best represented community group in our survey, with 43% of community members saying they were members of a local chapter. This proportion is almost the same as the next three community groups combined: LatinR, R User Groups and RSpatial_ES. Latin American users were not only members of local community groups, about 11% mentioned they were part of an international R community. However, it must be noted that these results are conservative as some people responded that they participate in community events, but did not consider themselves part of the community simply because they were not playing an active organizational role. --- ## Communities and gender .pull-left[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-11-1.svg" title="Bar graph showing the proportion of survey respondents who belong to any R community separated by the gender with which they identify. The 'x' axis shows female, male, and other miscellaneous genders grouped into the 'Self-identifies' classification; the 'y' axis shows the relative percentage of people who identify within each gender. 60% of people who belong to an R community identify with the female gender, almost 40% identify with the male gender, and about 2% self-identify." alt="Bar graph showing the proportion of survey respondents who belong to any R community separated by the gender with which they identify. The 'x' axis shows female, male, and other miscellaneous genders grouped into the 'Self-identifies' classification; the 'y' axis shows the relative percentage of people who identify within each gender. 60% of people who belong to an R community identify with the female gender, almost 40% identify with the male gender, and about 2% self-identify." width="95%" /> ] .pull-right[ <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-12-1.svg" title="Bar graph showing the proportion of survey respondents separated by the gender with which they identify. The 'x' axis shows female, male, and other miscellaneous genders grouped into the 'Self-identifies' classification; the 'y' axis shows the relative percentage of respondents who identify within each gender. About 46% of the survey respondents identify with the female gender, almost 52% identify with the male gender, and about 3% belong to other genders." alt="Bar graph showing the proportion of survey respondents separated by the gender with which they identify. The 'x' axis shows female, male, and other miscellaneous genders grouped into the 'Self-identifies' classification; the 'y' axis shows the relative percentage of respondents who identify within each gender. About 46% of the survey respondents identify with the female gender, almost 52% identify with the male gender, and about 3% belong to other genders." width="95%" /> ] ??? Focusing on the composition of the communities by gender, we found that women represented just over 60% of the people who responded that they belonged to any community. This was striking because it represents a higher proportion than that of the total number of females who responded to the survey who were almost equal with men, at approximately 48%. However, it is unclear whether this difference is due to the fact that a significant number of R-Ladies responded to the survey, as many chapters actively publicized the initiative, or if it is truly representative of the community of R users. Additionally, about 10% of the people identified themselves as part of the LGBTQI+ community, which is similar to the overall survey results. These results are encouraging as it suggests that R communities are providing spaces where women and other underrepresented groups feel safe to learn and share their knowledge with other members of the community. --- ## Most used social networks <img src="data:image/png;base64,#presentation_english_files/figure-html/unnamed-chunk-13-1.svg" title="Pie chart showing the social networks through which users keep up with the R community. More than 35% use Twitter, 15% use facebook, 12% use meetup, 11% use slack, 10% use instagram, 8% use telegram, 2% use other networks, and 4% do not use any social network." alt="Pie chart showing the social networks through which users keep up with the R community. More than 35% use Twitter, 15% use facebook, 12% use meetup, 11% use slack, 10% use instagram, 8% use telegram, 2% use other networks, and 4% do not use any social network." style="display: block; margin: auto;" /> Others: LinkedIn, R bloggers, Reddit, Stackoverflow, RStudio Community, YouTube, WhatsApp, other local networks. ??? In addition, we were interested to know which social network is the most used among R users. Ninety-one percent of the people surveyed responded that they use some social network to communicate with the R community or keep up to date with the latest news. The majority of people, 70% of them, use Twitter, being the most popular network followed by Facebook, Meetup and Slack, among others. --- class: chapter-slide # Some conclusions ??? So far we have presented some of the results of this first Latin American survey on the use of R, describing the people who use it, identifying some of the barriers they encountered and recognizing their participation in the communities and networks. We are very happy that the initiative could be carried out and that we had a great number of responses! We do not want to stop thanking both the people who participated in its construction and those who completed it. But what do we want to highlight? --- ## What we learned from doing the survey <br> * The choice of questions is complex + Represent the diversity of the region. + Reduce the number of questions. + Questions that better address the answers. * Data cleaning is time consuming! + Handling categorical data is a challenge. * Possible analyses should be planned in advance. ??? First, we wanted to share what we learned about constructing the survey. In the process of formulating the survey we realized some aspects that we did not consider at the beginning of the initiative. The selection of questions was very complex and it is possible that the final version with 33 questions was too long. Along the way we learned a lot about how to ask the questions taking into account the diversity of each country but we are sure that some questions could be improved to obtain better targeted answers. The organization and data manipulation of a survey of this magnitude is very time consuming. This is usually the case with most data sets, but many of us faced for the first time analyzing a large amount of categorical data among other factors that made the analysis a time-consuming procedure. We also found room for improvement in the survey design that could facilitate the generation of correlations and new analyses between some responses. We recommend devoting even more time to planning the analyses to be carried out prior to the preparation of the survey. --- ## What we learned about the R community * Important role of communities and social networks! + Using any given network is being part of a community + Participating in meetups and webinars is being part of a community + High participation of women: is it really an advantage? * There are some challenges we have to go through + English is one of them (?) + Resources and infrastructure - Help for conferences in #clinicadecharlas at LatinR Slack - R-Ladies review system ####We still need to reach more people to detect possible barriers and challenges! ??? And what did we learn about the respondents? We were struck (though not surprised) by the important role of communities and networks. More than half of the people surveyed indicated that they belonged to at least one community and that that same community helped them solve problems! Networks also play a key role, even acting as communities themselves. In that context we also noticed that many people are part of some of the mentioned communities without knowing it! Since participating in meetups, webinars and other activities also implies being part of a community, we can surely put more emphasis on that. Women represent the majority of people who are part of communities at least according to the group of people who responded to the survey. While it is positive that women are well represented in them and are possibly creating safe spaces for other women and minorities as R-Ladies chapters do, we must consider that this volunteer work falls on minorities. We found studies that show that when there are projects to improve diversity, the unpaid and unrecognized work often falls on people who are part of minority groups, and this additional work can put them at a disadvantageous position because it means they cannot devote the same amount of time to career development as people from majority groups. In terms of the challenges faced by the respondents we corroborated that there are challenges in terms of resources and infrastructure that condition the use of R and participation in conferences. It is important to undertake and continue actions to promote the participation of the Latin American community in international events, for example by allowing talk presentations in languages other than English, as in useR! 2021. Submitting a paper or abstract to R conferences can be a major challenge for many people. That is why we suggest implementing and promoting initiatives such as #clinicadecharlas within the LatinR Slack where people can share ideas about their proposals or also the R-Ladies review system. We consider that we still lack information to answer some of the initial questions. In the future, we would like to identify and survey people who are aware of the existence of R but for various reasons do not use it, in order to identify areas for improvement from the R community to improve the experience of those who want to learn how to use the language. --- ## What we still need to further explore <br> * How do we facilitate the integration of the missing population? * What can we improve so that minorities feel better represented in R events and communities? * Should the teaching of R be promoted from basic or intermediate academic grades? ??? In this same context, we consider that it is still necessary to deepen certain key issues. How do we facilitate the integration of the missing population to use R and involve in its communities? To do so, we must first determine the reasons why people have not done so up to now. But we believe that presenting the benefits of using R and belonging to these communities, solving problems, accessing free training, and even job opportunities, can be a good strategy to expand inclusion. Continuing with the goal of improving inclusion we would also be interested in deepening into what can we improve so that minorities feel better represented in R events and communities? There remains the challenge of broadening participation within and outside the R communities towards a more heterogeneous and inclusive composition in the region, including greater participation of people with disabilities, LGBTIQ+ people, African descent, Afro-Latin American, Caribbean and indigenous descent who have an interest in R in particular or in data science in general. A final topic to explore in more depth relates to the educational level. Eighty-eight percent of the people surveyed have a high degree of education, an average that is approximately twice as high as that observed in Latin America. This made us wonder whether the teaching of R should be promoted from basic or intermediate academic grades since the results suggest that there are inequalities in the people who have access to learn programming in R (and potentially in other languages). This is probably related to the technological gap in Latin America, where on average less than 50% of the population has access to the Internet. Including programming in the basic education curriculum or offering face-to-face training opportunities may help to close this gap. --- class: chapter-slide # Thank you! <br> Obrigado! <br> ¡Muchas gracias! <img src="data:image/png;base64,#fig/marmot_user.png" title="useR's maRmote waving a hand" alt="useR's maRmote waving a hand" width="50%" style="display: block; margin: auto 0 auto auto;" /> ??? There is still a lot of research to be done, but this is the end of our presentation for UseR 2021. Thank you very much for your attention and we are at your disposal for any questions you may have.