Hi! My name is Andrea and together with Paola, Denisse, Virginia, Claudia and Joselyn will be presenting our work about Using R in Latin America: the great, the good, the bad and the ugly.
What was our work about?
We know that the R programming language is used globally in a wide variety of projects. However, how widely is R used in regions such as Latin America where English is not the official language, and where infrastructure and access to various resources, including technology, are not guaranteed?
In August 2020, members from the R user community in Latin America got together to create a survey with the aim of knowing more about Latinamerican R users. We sought to identify what our potential strengths are and what challenges we face.
The R environment is globally used.
How is it used in Latin America where
Objective: to get to know the community of R users in Latin America.
Hi! My name is Andrea and together with Paola, Denisse, Virginia, Claudia and Joselyn will be presenting our work about Using R in Latin America: the great, the good, the bad and the ugly.
What was our work about?
We know that the R programming language is used globally in a wide variety of projects. However, how widely is R used in regions such as Latin America where English is not the official language, and where infrastructure and access to various resources, including technology, are not guaranteed?
In August 2020, members from the R user community in Latin America got together to create a survey with the aim of knowing more about Latinamerican R users. We sought to identify what our potential strengths are and what challenges we face.
With our aim and objectives in mind, we designed a survey of 31 questions which addressed five main axes: their interest in R, demographic information, their academic level, R community engagement, and the area in which they use R.
In order to promote inclusiveness, the survey was conducted in three languages: Spanish, Portuguese and English so that people would respond in the language they felt most comfortable with.
The survey was shared through social media channels, such as Twitter, Telegram and Slack from R groups. People born and/or resident in Latin America were invited to participate in this survey.
This project was possible thanks to the great work and support of many people who added questions, helped to correct errors and translated questions. The survey became an R-Forwards project, which is the branch of the R Foundation concerned in improving inclusion and diversity in the R community.
Who completed the survey?
More than 900 respondents
From 26 countries (people born and residing in Latin America)
Over 900 people completed the survey. They came from 26 different countries, mostly from Latin America, of whom a small proportion were born outside the region, but currently reside in a Latin American country.
About 3% of Latin American born responders currently reside in North America, Europe and Oceania.
The survey was completed by people between 19 and 69 years old, with most being under 40. Responders who identified with a gender were almost evenly divided between females and males, and 2% identified as either being gender diverse, no gender or preferred not to answer.
From an education perspective, most people had a tertiary education, with 87% saying that they had a bachelor, masters or doctorate degree. This proportion is well over the average for Latin America. About 13% said that they had a technical, associate degree or high school diploma. All responders had at least completed a high school diploma.
The majority of respondents, about 60%, use R mainly in the field of research and development. Education was the second most common area where R was used, but people also used it in the public and private sectors and in a variety of industries such as design, finance and journalism just to name a few.
We also found most people use R mainly for data analysis and visualization.
We then asked about how long they used R. 40% of respondents said they used R for 5 years or more, which seems to be more common among people with the highest levels of education. However, about 50% of all people surveyed mentioned that they were relatively inexperienced with R, with one fifth using this language for two years or less.
So, now we know a little bit about how the respondents use R, but does living in Latin America entail additional challenges during their R experience?
R was built mostly by English speakers and most package documentation is available only in English, so we wondered whether knowledge of English would be a barrier to learning and using R by Latin Americans, who mostly speak Spanish or Portuguese.
Spanish was the first language of 85% of survey respondents, followed by Portuguese and English. About 1% of people reported Quechua, Dutch and other official Latin American languages to be their first language. However, the majority of Spanish and Portuguese speakers reported to have an advanced level of English, with a minority classifying themselves as beginners and less than 1% reporting to not speak English at all. Surprisingly, when we asked if English was a barrier to learning about R or seeking support with programming issues, less than 25% said that was true for them.
Due to the structure of our survey and to the demographics of people who completed it, we could not assess whether knowledge of English was essential to learn R, if they already spoke English and that allowed them to learn R, or if our results are linked to the profile of our respondents, highly educated and bilingual people.
We also wondered if conference attendance represented a challenge for Latin American R users. Mostly because international conferences tend to take place almost exclusively in the US and Europe.
We asked if people knew and/or attended one of 10 major R related international events. Most people said they had no knowledge of these events and thus did not attend in the last five years. Even LatinR, which we expected to be more widely recognized as it is the major R event in the region, was only known by 14% of all respondents.
When asked about barriers preventing event participation, over 30% of respondents said that the events were too expensive, and only half have actually attended a conference. Total duration of events and discomfort related to either not speaking the language in which the conference was delivered or with lack of diversity, as well as lack of resources were identified as major barriers preventing people attending these events.
We also wanted to explore the role of R communities in Latin America.
40% of the people who responded to the survey are part of some R community.
Roughly 40% of respondents belonged to at least one community, of which at least a quarter declared they were members of two or more communities, with some being members of at least five.
R-Ladies chapters were the best represented community group in our survey, with 43% of community members saying they were members of a local chapter. This proportion is almost the same as the next three community groups combined: LatinR, R User Groups and RSpatial_ES.
Latin American users were not only members of local community groups, about 11% mentioned they were part of an international R community.
However, it must be noted that these results are conservative as some people responded that they participate in community events, but did not consider themselves part of the community simply because they were not playing an active organizational role.
Focusing on the composition of the communities by gender, we found that women represented just over 60% of the people who responded that they belonged to any community. This was striking because it represents a higher proportion than that of the total number of females who responded to the survey who were almost equal with men, at approximately 48%. However, it is unclear whether this difference is due to the fact that a significant number of R-Ladies responded to the survey, as many chapters actively publicized the initiative, or if it is truly representative of the community of R users.
Additionally, about 10% of the people identified themselves as part of the LGBTQI+ community, which is similar to the overall survey results.
These results are encouraging as it suggests that R communities are providing spaces where women and other underrepresented groups feel safe to learn and share their knowledge with other members of the community.
Others: LinkedIn, R bloggers, Reddit, Stackoverflow, RStudio Community, YouTube, WhatsApp, other local networks.
In addition, we were interested to know which social network is the most used among R users.
Ninety-one percent of the people surveyed responded that they use some social network to communicate with the R community or keep up to date with the latest news. The majority of people, 70% of them, use Twitter, being the most popular network followed by Facebook, Meetup and Slack, among others.
So far we have presented some of the results of this first Latin American survey on the use of R, describing the people who use it, identifying some of the barriers they encountered and recognizing their participation in the communities and networks.
We are very happy that the initiative could be carried out and that we had a great number of responses! We do not want to stop thanking both the people who participated in its construction and those who completed it.
But what do we want to highlight?
The choice of questions is complex
Data cleaning is time consuming!
Possible analyses should be planned in advance.
First, we wanted to share what we learned about constructing the survey.
In the process of formulating the survey we realized some aspects that we did not consider at the beginning of the initiative. The selection of questions was very complex and it is possible that the final version with 33 questions was too long. Along the way we learned a lot about how to ask the questions taking into account the diversity of each country but we are sure that some questions could be improved to obtain better targeted answers.
The organization and data manipulation of a survey of this magnitude is very time consuming. This is usually the case with most data sets, but many of us faced for the first time analyzing a large amount of categorical data among other factors that made the analysis a time-consuming procedure.
We also found room for improvement in the survey design that could facilitate the generation of correlations and new analyses between some responses. We recommend devoting even more time to planning the analyses to be carried out prior to the preparation of the survey.
Important role of communities and social networks!
There are some challenges we have to go through
And what did we learn about the respondents?
We were struck (though not surprised) by the important role of communities and networks. More than half of the people surveyed indicated that they belonged to at least one community and that that same community helped them solve problems! Networks also play a key role, even acting as communities themselves. In that context we also noticed that many people are part of some of the mentioned communities without knowing it! Since participating in meetups, webinars and other activities also implies being part of a community, we can surely put more emphasis on that.
Women represent the majority of people who are part of communities at least according to the group of people who responded to the survey. While it is positive that women are well represented in them and are possibly creating safe spaces for other women and minorities as R-Ladies chapters do, we must consider that this volunteer work falls on minorities. We found studies that show that when there are projects to improve diversity, the unpaid and unrecognized work often falls on people who are part of minority groups, and this additional work can put them at a disadvantageous position because it means they cannot devote the same amount of time to career development as people from majority groups.
In terms of the challenges faced by the respondents we corroborated that there are challenges in terms of resources and infrastructure that condition the use of R and participation in conferences. It is important to undertake and continue actions to promote the participation of the Latin American community in international events, for example by allowing talk presentations in languages other than English, as in useR! 2021. Submitting a paper or abstract to R conferences can be a major challenge for many people. That is why we suggest implementing and promoting initiatives such as #clinicadecharlas within the LatinR Slack where people can share ideas about their proposals or also the R-Ladies review system.
We consider that we still lack information to answer some of the initial questions. In the future, we would like to identify and survey people who are aware of the existence of R but for various reasons do not use it, in order to identify areas for improvement from the R community to improve the experience of those who want to learn how to use the language.
How do we facilitate the integration of the missing population?
What can we improve so that minorities feel better represented in R events and communities?
Should the teaching of R be promoted from basic or intermediate academic grades?
In this same context, we consider that it is still necessary to deepen certain key issues.
How do we facilitate the integration of the missing population to use R and involve in its communities? To do so, we must first determine the reasons why people have not done so up to now. But we believe that presenting the benefits of using R and belonging to these communities, solving problems, accessing free training, and even job opportunities, can be a good strategy to expand inclusion.
Continuing with the goal of improving inclusion we would also be interested in deepening into what can we improve so that minorities feel better represented in R events and communities? There remains the challenge of broadening participation within and outside the R communities towards a more heterogeneous and inclusive composition in the region, including greater participation of people with disabilities, LGBTIQ+ people, African descent, Afro-Latin American, Caribbean and indigenous descent who have an interest in R in particular or in data science in general.
A final topic to explore in more depth relates to the educational level. Eighty-eight percent of the people surveyed have a high degree of education, an average that is approximately twice as high as that observed in Latin America. This made us wonder whether the teaching of R should be promoted from basic or intermediate academic grades since the results suggest that there are inequalities in the people who have access to learn programming in R (and potentially in other languages). This is probably related to the technological gap in Latin America, where on average less than 50% of the population has access to the Internet. Including programming in the basic education curriculum or offering face-to-face training opportunities may help to close this gap.
There is still a lot of research to be done, but this is the end of our presentation for UseR 2021. Thank you very much for your attention and we are at your disposal for any questions you may have.
The R environment is globally used.
How is it used in Latin America where
Objective: to get to know the community of R users in Latin America.
Hi! My name is Andrea and together with Paola, Denisse, Virginia, Claudia and Joselyn will be presenting our work about Using R in Latin America: the great, the good, the bad and the ugly.
What was our work about?
We know that the R programming language is used globally in a wide variety of projects. However, how widely is R used in regions such as Latin America where English is not the official language, and where infrastructure and access to various resources, including technology, are not guaranteed?
In August 2020, members from the R user community in Latin America got together to create a survey with the aim of knowing more about Latinamerican R users. We sought to identify what our potential strengths are and what challenges we face.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |