Linking CRSP and Compustat in R


Hello everyone,

I have a question with regards to combining two data bases with a linking table:
Has anyone ever tried to combine the monthly stock return data table of CRSP with the yearly accounting data tables of Compustat through the CCM table? WRDS only offers a SAS solution, but unfortunately not one for R.
In detail: I have a table with all stock returns month-wise and their respective PERMNO-code (CRSP). I also have the yearly accounting data of these stocks with their GVKEY (Compustat). Obviously, the accounting data is the same for the company during the year and only changes every 12 months. Now I have the linking table which assigns each GVKEY to a PERMNO. I would like to combine them the following way:

Company | Month/Year | Return for this month | Accounting data columns | Stock Return columns for the prior 20 months

Does anyone have experience in this field?
Thanks in advance!



There are two ways of doing it. If you have ccm then use the link permno, if not use cusip. Just translate the sas code into r, it should be very straightforward.



so I have to admit, I am not that proficient with R and having troubles translating the code. Maybe just as clues:

  1. Which packages do I need?
  2. I don't fully understand how the SAS code does it ( I have all three files as an excel, so I don't have to do a sql query, right? Does that change the code significantly?


I have this same problem 3 years ago. Are you a PhD student in finance or accounting



I am a finance major. So how did you solve it 3 years ago. For me it says, that my data is too large I think (if I run the final merge command based on the PERMNO, the console gives back an error "Error: cannot allocate vector of size 317.9 Mb").



When you download data in CSV and text from wrds, and then read it into r, you may exceed the memory limit. What is your computer RAM, how many rows does your crsp data have? I use wrds cloud with r, so I do not have this issue. If you really want do it locally, use dplyr and data.table



A good introduction to seeking help from coding forums is to pose your questions as a [reproducible example (what many folks call a reprex for short)](FAQ: What's a reproducible example (`reprex`) and how do I do one?, the reprex-r-package is a handy tool to help make this easy).

With a question like this, you might provide samples (small samples) of the two tables you'd like to merge, and a copy of the merged data you'd like to get. (Here's a discussion of how you can take data from your R console and offer it in a reprex-friendly format for coding forums like this, I prefer dput)

In terms of tools for merging two datasets, I personally like the tidyverse's dplyr's two-table verbs for stuff like this.. But I'd be worried about suggesting this until we more clearly understand your desired outcome - since the time series model you want to apply later may require your data is structured in a particular way.



Here is some R code to replicate the Fama French 1992 paper.

The author links the Compustat data with the CRSP data. He also uses the CCM LNKHIST data as you require. Unfortunately I don`t have access to the CCM data so I have to link the data through the GVKEY, CUSIP / some other method.

I will post my code if anybody is interested.



I am one of the authors of this code. If anyone has question about it, feel free to ask.

There are two ways of merging compustat and CRSP. If you do not have the CCM data, this code does not work for you.