Summation of row based on different id and same date

No is used grouped by

b<-NewData%>% 
  mutate(firmMarket=NewData$cshoc*NewData$prccd)%>%
  group_by(gvkey,datadate)%>%
  mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
  mutate(weightage=firmMarket/TotalMarket)%>%
  mutate(PriceWeightage=weightage*prccd)%>%
  group_by(gvkey,datadate)%>%
  mutate(MarketIndex=sum(PriceWeightage,na.rm = TRUE))

And does this work?

mpg %>%
select(model,hwy) %>%
 group_by(model) %>%
 mutate(sum_hwy_by_model = sum(hwy,na.rm=TRUE))
# A tibble: 234 x 3
# Groups:   model [38]
   model        hwy sum_hwy_by_model
   <chr>      <int>           <int>
 1 a4            29             198
 2 a4            29             198
 3 a4            31             198
 4 a4            30             198
 5 a4            26             198
 6 a4            26             198
 7 a4            27             198
 8 a4 quattro    26             206
 9 a4 quattro    25             206
10 a4 quattro    28             206
...

its giving same output.

How about you give me some of NewData, and we see if grouped sums can be calculated.?

Ok, NewData look like this with many more gvkeys, these are in the monthly data set, but have a daily data set. Along with that i have a very big data set I tried to do dout() it took a long process to compile and too much information to be displayed here.

gvkey datadate cshoc prccd
7041 31/1/2000 42000000 3.72
7041 29/2/2000 42000000 4.4
7041 31/3/2000 42000000 4.04
7041 28/4/2000 42000000 3.44
7041 31/5/2000 42000000 3.24
15632 31/1/2000 2367508585 3.96
15632 29/2/2000 2367508585 4.2
15632 31/3/2000 2367508585 4
15632 28/4/2000 2367508585 3.96
15632 31/5/2000 2367508585 3.82
203244 31/1/2000 269308002 1.64
203244 29/2/2000 269308002 1.53
203244 31/3/2000 269308002 1.55
203244 28/4/2000 269308002 1.45
203244 29/5/2000 269308002 1.42
212452 31/1/2000 17095000 2.24
212452 29/2/2000 17107000 3.06
212452 31/3/2000 17245000 6
212452 28/4/2000 17356000 7.5
212452 31/5/2000 17515000 10.9

now based on this i need to create new column named "firmMarket", "TotalMarket", "weightage", "PriceWeigtage". So the major problem is in calculating "TotalMarket". Kindly refer to the image i am uploading in this sum is required based on time and id.

Kindly help in solving this problem. Thanks in advance.

Hello,
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...

This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code

gvkey datadate cshoc prccd
7041 31/1/2000 42000000 3.72
7041 29/2/2000 42000000 4.4
7041 31/3/2000 42000000 4.04
7041 28/4/2000 42000000 3.44
7041 31/5/2000 42000000 3.24
15632 31/1/2000 2367508585 3.96
15632 29/2/2000 2367508585 4.2
15632 31/3/2000 2367508585 4
15632 28/4/2000 2367508585 3.96
15632 31/5/2000 2367508585 3.82
203244 31/1/2000 269308002 1.64
203244 29/2/2000 269308002 1.53
203244 31/3/2000 269308002 1.55
203244 28/4/2000 269308002 1.45
203244 29/5/2000 269308002 1.42
212452 31/1/2000 17095000 2.24
212452 29/2/2000 17107000 3.06
212452 31/3/2000 17245000 6
212452 28/4/2000 17356000 7.5
212452 31/5/2000 17515000 10.9

Sir, this is the data set. Image i just used to show how the sum is done in TotalMarket. You can use this data it is not in image formate.
Thanks

Did you read the article? It has been posted three times in the thread, yet you still haven't done it.

`datapasta::df_paste(head(NewData, 10)[, c('gvkey', 'datadate','cshoc','prccd')])`

 data.frame(
        gvkey = c(7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041),
     datadate = c(20000103,20000104,20000105,20000106,
                  20000111,20000112,20000113,20000114,20000117,20000118),
        cshoc = c(4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07,
                  4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07),
        prccd = c(3.86, 4.28, 4, 4.04, 4, 3.96, 3.92, 3.96, 4.06, 4.14)
 )

Is this a good example?.. they are all the same gvkey, and all entries have unique datadates ?

NewData<- data.frame(
  gvkey = c(7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041),
  datadate = c(20000103,20000104,20000105,20000106,
               20000111,20000112,20000113,20000114,20000117,20000118),
  cshoc = c(4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07,
            4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07),
  prccd = c(3.86, 4.28, 4, 4.04, 4, 3.96, 3.92, 3.96, 4.06, 4.14)
)

mutate(NewData,
       firmMarket=cshoc*prccd)%>%
  group_by(gvkey,datadate)%>%
  mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
  mutate(weightage=firmMarket/TotalMarket)
# A tibble: 10 x 7
# Groups:   gvkey, datadate [10]
   gvkey datadate    cshoc prccd firmMarket TotalMarket weightage
   <dbl>    <dbl>    <dbl> <dbl>      <dbl>       <dbl>     <dbl>
 1  7041 20000103 42000000  3.86 162120000   162120000          1
 2  7041 20000104 42000000  4.28 179760000   179760000          1
 3  7041 20000105 42000000  4    168000000   168000000          1
 4  7041 20000106 42000000  4.04 169680000   169680000          1
 5  7041 20000111 42000000  4    168000000   168000000          1
 6  7041 20000112 42000000  3.96 166320000   166320000          1
 7  7041 20000113 42000000  3.92 164640000   164640000          1
 8  7041 20000114 42000000  3.96 166320000   166320000          1
 9  7041 20000117 42000000  4.06 170520000.  170520000.         1
10  7041 20000118 42000000  4.14 173880000   173880000          1

There are more than 1000 gvkeys with different entries in the data set and have unique datadate. But in this example, if you see the value for TotlaMaket is the same as firmMarket. And when I run your code

mutate(NewData,
       firmMarket=cshoc*prccd)%>%
  group_by(gvkey,datadate)%>%
  mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
  mutate(weightage=firmMarket/TotalMarket)%>%
  mutate(PriceWeightage=weightage*prccd)%>%
  group_by(gvkey,datadate)%>%
  mutate(MarketIndex=sum(PriceWeightage,na.rm = TRUE))

I get this output

# A tibble: 1,048,575 x 9
# Groups:   gvkey, datadate [1,031,402]
   gvkey datadate    cshoc prccd firmMarket TotalMarket    weightage PriceWeightage MarketIndex
   <dbl>    <dbl>    <dbl> <dbl>      <dbl>       <dbl>        <dbl>          <dbl>       <dbl>
 1  7041 20000103 42000000  3.86 162120000      4.33e15 0.0000000375    0.000000145        17.7
 2  7041 20000104 42000000  4.28 179760000      4.33e15 0.0000000415    0.000000178        17.7
 3  7041 20000105 42000000  4    168000000      4.33e15 0.0000000388    0.000000155        17.7
 4  7041 20000106 42000000  4.04 169680000      4.33e15 0.0000000392    0.000000158        17.7
 5  7041 20000111 42000000  4    168000000      4.33e15 0.0000000388    0.000000155        17.7
 6  7041 20000112 42000000  3.96 166320000      4.33e15 0.0000000384    0.000000152        17.7
 7  7041 20000113 42000000  3.92 164640000      4.33e15 0.0000000380    0.000000149        17.7
 8  7041 20000114 42000000  3.96 166320000      4.33e15 0.0000000384    0.000000152        17.7
 9  7041 20000117 42000000  4.06 170520000.     4.33e15 0.0000000394    0.000000160        17.7
10  7041 20000118 42000000  4.14 173880000      4.33e15 0.0000000402    0.000000166        17.7
# ... with 1,048,565 more rows

I don't know what is missing to calculate TotalMarket values.

totalmarket is the same as full markey because its a summation by the groups gvkey AND datadate.
to not use datadate to split the groups simply remove it

mutate(NewData,
       firmMarket=cshoc*prccd)%>%
  group_by(gvkey)%>%
  mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
  mutate(weightage=firmMarket/TotalMarket)
# A tibble: 10 x 7
# Groups:   gvkey [1]
   gvkey datadate    cshoc prccd firmMarket TotalMarket weightage
   <dbl>    <dbl>    <dbl> <dbl>      <dbl>       <dbl>     <dbl>
 1  7041 20000103 42000000  3.86 162120000   1689240000    0.0960
 2  7041 20000104 42000000  4.28 179760000   1689240000    0.106 
 3  7041 20000105 42000000  4    168000000   1689240000    0.0995
 4  7041 20000106 42000000  4.04 169680000   1689240000    0.100 
 5  7041 20000111 42000000  4    168000000   1689240000    0.0995
 6  7041 20000112 42000000  3.96 166320000   1689240000    0.0985
 7  7041 20000113 42000000  3.92 164640000   1689240000    0.0975
 8  7041 20000114 42000000  3.96 166320000   1689240000    0.0985
 9  7041 20000117 42000000  4.06 170520000.  1689240000    0.101 
10  7041 20000118 42000000  4.14 173880000   1689240000    0.103

To make TotalMarket values more clear, please see the image. In this, we have different gvkeys with datadates and all data will have sum based on datadate. Also if you see a different gvkey with the same date have the same values. So I want TotlaMarket value should be the same for the same date for different gvkeys.

I hope I am able to make my question clear.

Thanks

you are saying you want the total to be the sum of firm values by date (not by gvkey)

mutate(NewData,
       firmMarket=cshoc*prccd)%>%
  group_by(datadate)%>%
  mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
  mutate(weightage=firmMarket/TotalMarket)

No no, let see this way ,
if i do sum of firmMarket values based on gvkey and datadate and store them in TotlaMarket. So for all gvkey and datadate all values will be stored in TotalMarket and value will be same for same date with different gvkey.

This contradicts different gvkeys having the same TotalMarkey values on matching days...

Yes, want to keep same values for different gvkeys for matching days. Is it possible to do.!!!

Thanks

yes, I already showed you, but it seems you didnt understand

if is use your mentioned code,
this is my output

# A tibble: 1,048,575 x 7
# Groups:   gvkey [288]
   gvkey datadate    cshoc prccd firmMarket TotalMarket    weightage
   <dbl>    <dbl>    <dbl> <dbl>      <dbl>       <dbl>        <dbl>
 1  7041 20000103 42000000  3.86 162120000      4.33e15 0.0000000375
 2  7041 20000104 42000000  4.28 179760000      4.33e15 0.0000000415
 3  7041 20000105 42000000  4    168000000      4.33e15 0.0000000388
 4  7041 20000106 42000000  4.04 169680000      4.33e15 0.0000000392
 5  7041 20000111 42000000  4    168000000      4.33e15 0.0000000388
 6  7041 20000112 42000000  3.96 166320000      4.33e15 0.0000000384
 7  7041 20000113 42000000  3.92 164640000      4.33e15 0.0000000380
 8  7041 20000114 42000000  3.96 166320000      4.33e15 0.0000000384
 9  7041 20000117 42000000  4.06 170520000.     4.33e15 0.0000000394
10  7041 20000118 42000000  4.14 173880000      4.33e15 0.0000000402
# ... with 1,048,565 more rows

i am not understanding why TotlaMarket values are coming in "4.33e15" for all. But as per your output, it should be "168924000".

You refer now to code I posted before your statements about total market being across gvkey by grouping datadate only, I provided different code after those statements. It is those codes which would show the same TotalMarket value for every matching datadate across varying gvkeys... (yet you complained about them also, in a way that eludes my understanding)