Merging tables when variables are stored in different ways

Hola Amigos,

I am trying to merge different data frames with one another (I am very much a beginner).
In some of them, variables are are stored as follows: different variables are stored in different columns

Country     Year     GDP     Life Expectancy
France       2014    987          67.6
France       2015    1002         67.7

Whilst in some others, variables are stored with one column for all variable names, and the corresponding value is stored in another column:

Country     Year               Variable Name           Value
France      2014     Public Health Expenditure            28 
France      2014     Percentage of Smokers                24
France      2015     Public Health Expenditure            27
France      2015     Percentage of Smokers                24

I would like to group them in a single table to perform a regression.
Ideally, I think my table would look like this:

Country     Year       GDP     LifeExp    HealthSpendings    % of Smokers
France       2014      987       67.6            28               24
France       2015     1002       67.7            27               24

I am not a native English speaker (and very far from being proficient in the R language too !), so even just helping me to reformulate my question could help me find the resources I need online :blush: Many thanks for your help !

Using the package tidyverse you can use the pivot_wider function to change the shape of your second table

Thank you so much for your reactivity, pivot_wider does wonders !
I bumped into another problem now though:

One of my tables is stored as follow:

                       2000          2001        2002
FRA     GDP            990           1000        1010              
FRA     Expenditure    100           110         120

I used pivot_longer to change it to what I will call an "annoying table":

          
FRA      GDP         2000    990
FRA      Expenditure 2000    100
FRA      GDP         2001    1000
FRA      Expenditure 2001    110

This table looks exactly the way the other looked like initially.
And whereas pivot_wider did wonders with these other tables, I does not work with the annoying one.
It tried pivot_wider to get this annoying table to:

            GDP         Expenditure
FRA 2000    990         100
FRA 2001   1000         110

instead, I get the error
Column 2 must be named. Use .name_repair to specify repair

My script is the following - it uses data downloaded from the World Development Indicators (WDI):

WDI3<- pivot_longer(WDI2,
  cols = c("1999","2000","2001","2002","2003","2004","2005","2006","2007",
"2008","2009","2010","2011","2012","2013","2014","2015"),
  names_to = "Year",
  values_to = "Value")

WDI_filtered <-subset(WDI3,select = c("Series.Name","Country.Name",
"Country.Code","Year","Value"))
head(WDI_filtered)
class(UNESCO)
class(WDI_filtered)
WDI_wide <- pivot_wider(WDI_filtered,names_from = Series.Name,
values_from = Value,values_fill = NULL,names_repair = Series)

The last step (creating WDI_wide) yields the error
any clue ?

Quizá te ayudaría usar las funciones de reshape2 para darle el formato adecuado a tu segunda tabla, de manera que quede como la primera tabla que mostraste en tu pregunta. Esta página es bastante ilustrativa, espero que te ayude. https://seananderson.ca/2013/10/19/reshape/

Adicionalmente, veo que la tabla que esperas tiene nombres de las variables diferentes a los originales, sería recomendable que estén homogéneos y dentro de lo posible te recomiendo usar nombres con guiones bajos o puntos en lugar de espacio, es mucho más fácil para acceder a las variables, por ejemplo Life_Expectancy o Life.Expectancy.

Saludos!!

Muchas gracias, esto funciona perfectamente ! Y gracias por tu consejo a proposito de los nombres de las variables

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.