Simple question about duplicate row names

Hello,

I am a beginner using R studio for an introductory econometrics course. I am using the Ecdat data set which contains panel data, and I want to make a multiple regression and then use fixed effects for individuals and time. Whenever I try to use the fixed effects function of the plm package, I get the following error message:

Error in `row.names<-.data.frame`(`*tmp*`, value = c("X1.2", "X1.3", "X1.4",  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': β€˜X1’, β€˜X1.1’, β€˜X1.2’, β€˜X1.3’, β€˜X1.4’, β€˜X1.5’, β€˜X1.6’, β€˜X2’, β€˜X2.1’, β€˜X2.2’, β€˜X2.3’, β€˜X2.4’, β€˜X2.5’, β€˜X2.6’, β€˜X3’, β€˜X3.1’, β€˜X3.2’, β€˜X3.3’, β€˜X3.4’, β€˜X3.5’, β€˜X3.6’, β€˜X4’, β€˜X4.1’, β€˜X4.2’, β€˜X4.3’, β€˜X4.4’, β€˜X4.5’, β€˜X4.6’, β€˜X5’, β€˜X5.1’, β€˜X5.2’, β€˜X5.3’, β€˜X5.4’, β€˜X5.5’, β€˜X5.6’, β€˜X6’, β€˜X6.1’, β€˜X6.2’, β€˜X6.3’, β€˜X6.4’, β€˜X6.5’, β€˜X6.6’, β€˜X7’, β€˜X7.1’, β€˜X7.2’, β€˜X7.3’, β€˜X7.4’, β€˜X7.5’, β€˜X7.6’, β€˜X8’ 

I looked around on different forums trying to find a way to make the values unique when setting the names of the rows, but I haven't had any success so far. Here's what I've done so far:

Wages <- plm.data(Wages, index = 595) #in order to get the id and time for the data from Ecdat

rownames(Wages) <- make.names(Wages[,1], unique = TRUE) #in order to try to make the row names unique

I've also created the following dummies:

allwage <- Wages$lwage

skin <- Wages$black

edu <- Wages$ed

years <- Wages$exp

Which I then tried to use with fixed effects in the following regression:

theFEline <- plm(allwage ~ skin, + edu + years, data = Wages, index = c("id","time"), model="within")

And that's when I got the error message.

I have a very limited idea of what I'm doing, and even less of an idea what to do next. Any help will be most appreciated.

BjΓΆrn

Hello @Burrez

looks like you have a comma after skin that doesn't belong? Try:

theFEline <- plm(allwage ~ skin + edu + years, data = Wages, index = c("id","time"), model="within")

Also note

  1. plm::plm.data appears to be a deprecated function - probably better to use the suggested plm::pdata.frame. If you are not getting this warning then try to update plm to the newest version using install.packages('plm')

  2. Renaming your variables is not needed (but okay if you want different names). The following code would work too as long as you specify data = Wages:

theFEline <- plm(lwage ~ black + ed + years, data = Wages, index = c("id","time"), model="within")
  1. I can't figure out why you have edu and skin as predictors - these are fixed per individual correct?. I am not familiar with plm() but it seems like the individual effect is already being estimated.
1 Like

Thank you so much Lewis, this made everything so much clearer. Especially what you said in 3. Thanks to your timely advice I can now move on with my research paper. You just made my Christmas :):grinning:

Happy to help.

A tip for future questions - try to create a complete minimal and reproducible example (also known as a reprex). This ensures helpers can quickly run your code and hopefully reproduce the problem you are having - which is essential to providing help. More importantly for you - It also increases the likelihood someone will try to help :).

You gave a good amount of detail to your question, but it took some effort for me to reproduce it and see what happened (finding the right packages and so on). Plus, I noticed that pesky comma, so I had a good idea what the issue was without having to run the code. An example of a reprex for your question would be something like:

library(Ecdat)
library(plm)
Wages <- plm.data(Wages, index = 595) #in order to get the id and time for the data from Ecdat

rownames(Wages) <- make.names(Wages[,1], unique = TRUE) #in order to try to make the row names uniqueallwage <- Wages$lwage

allwage <- Wages$lwage
skin <- Wages$black
edu <- Wages$ed
years <- Wages$exp

theFEline <- plm(allwage ~ skin, + edu + years, data = Wages, index = c("id","time"), model="within")

The main difference between this and what you provided is that it can just be copied and pasted in its entirety and run in a fresh new session of R on any machine.

2 Likes

Thanks again for the great advice. I actually posted this question on a few different websites and it was quickly voted down on most of them. This lead me to believe that I hadn't done a very good job formulating the question, but it wasn't until I read your feedback just now that I realized what the problem actually was. This will definitely aid me in getting better and quicker answers to questions in the future. Thank you so much :slight_smile: :slight_smile: :slight_smile:

@Burrez See this post by @hadley regarding cross-posting. If you do cross post you should link to your original question so that community members do not spend time answering it if you have already gotten an answer.

3 Likes

Oh, I didn't even think about that, but now that you mention it that obviously the only right way to go about it. I'll make sure to keep that in mind in the future, thanks a bunch! :slight_smile:

1 Like