pa package in R - performance attribution

kamkwong · February 14, 2019, 2:57pm

i tried to upload my own data set (uploadNB image) to pa package to run the analysis, it didn't work even i follow the convention of the original data frame format "year" image, it popped up the error msg ..."non-conformable arrays ...Length of logical index must be 1 or 440, not 0"???

Error_uploadNB-2.PNG

i would be very appreciated if someone here can help and spot the my issue

Best

achan_k

mara · February 14, 2019, 4:04pm

I'm not familiar with the package, but it looks like the arguments for brinson() have specific requirements:

x A data frame containing the data from which brinson analysis will be conducted.

date.var A character vector which indicates the name of the column in x to be used as a date for each observation. If the unique number of levels of date.var is one, a class object of brinson will be formed. If it is more than one, a class object of brinsonMulti will be formed.

cat.var A character vector which indicates the name of the column in x to be used as categorical variables.

bench.weight A character vector which indicates the name of the column or columns in x to be used as benchmark weight.

portfolio.weight A character vector which indicates the name of the column or columns in x to be used as portfolio weight.

ret.var A character vector which indicates the name of the column in x to be used as return variable.

I'm able to create a reproducible example from the vignette, so the package itself seems to be working as it should:

library(pa)
#> Loading required package: grid
data(jan)
br.single <- brinson(x = jan, date.var = "date",
                     cat.var = "sector",
                     bench.weight = "benchmark",
                     portfolio.weight = "portfolio",
                     ret.var = "return")
br.single
#> Period:                              2010-01-01
#> Methodology:                         Brinson
#> Securities in the portfolio:         200
#> Securities in the benchmark:         1000

^{Created on 2019-02-14 by the reprex package (v0.2.1.9000)}

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

kamkwong · February 14, 2019, 4:23pm

Dear mara

thank you so much for your reply.

As you mentioned, yes, the package itself bundled with the data set "jan", "quarter" and "year", there is no problem in running the analysis using these data (jan) or data (year).....but it just my own data "uploadNB" didn't work in the same way....i doubt my "uploadNB" data frame format got some problems i don't know ...in fact i followed exactly the "jan" file format to create my one...look like >< of course i was wrong.....

"jan" or "year" data frame image

Original pa package year data.PNG

same as mine right?

sorry i m just new to the R programming....

Best

achan_k

mara · February 14, 2019, 4:41pm

Yes, if you take a look at the links re. reprex in my previous post, they'll help you make a small reproducible example so I can try to see what's going on in your data.

The idea is just to give a minimal amount needed to reproduce the problem. If you're reading it in from excel, you can copy paste the first few lines using datapasta, but please do try to give reprex a whirl as it will make it much easier to help you:

kamkwong · February 15, 2019, 4:56pm

Dear Mara

Thanks for your tips, datapasta package is so wonderful i have to say. i installed both "data pasta" & "repress" and follow exactly the way the video shows me...i can import the excel data into the Clipboard named it "noges"...and...what next after using "reprex selection" and posting data into Clipboard, how can i create a data frame like the "year" in the "data" tab in the top right Global environment? my excel data is in the "value" tab "noges" which is just below the "data" tab .......and i couldn't use any pa package command to run analysis on my data "noges"??

Noges.PNG

how can i change "noges" value into data frame??

i would be very grateful if you could shed me some insights on this....

Best

achan_k ><

mara · February 15, 2019, 5:08pm

So, you're almost there with getting me a reprex. The content in the viewer pane is actually on your clipboard after you run reprex, if you paste it here (the content, not a screenshot), we'll be able to help you out.

kamkwong · February 15, 2019, 5:59pm

Dear Mara..

here is the content I copied from the view pane...

it seems to me the data is huge it always went to "no responding" for my pc, is it fine I just copied part of it? many thanks Mara

data_nog <- tibble::tribble(
       ~bbgid,              ~name,                  ~sector, ~portfolio, ~benchmark,        ~date, ~return, ~country,
    "1635 HK", "SHANGHAI DAZHO-H",              "Utilities",     0.0071,          0,  "1/31/2018",  -0.059,     "CN",
    "2343 HK",    "PACIFIC BASIN",            "Industrials",      0.013,          0,  "1/31/2018",   0.071,     "HK",
     "388 HK",             "HKEX",             "Financials",     0.0629,          0,  "1/31/2018",   0.236,     "HK",
     "547 HK",   "DIGITAL DOMAIN", "Communication Services",     0.0101,          0,  "1/31/2018",  0.0402,     "HK",
    "6758 JP",        "SONY CORP", "Consumer Discretionary",     0.0399,          0,  "1/31/2018",  0.0248,     "JP",
    "6858 HK",   "HONMA GOLF LTD", "Consumer Discretionary",     0.0013,          0,  "1/31/2018",  0.0997,     "JP",
     "860 HK", "WE SOLUTIONS LTD", "Consumer Discretionary",     0.0084,          0,  "1/31/2018",  0.2579,     "HK",
    "IPGP US",    "IPG PHOTONICS", "Information Technology",     0.0287,          0,  "1/31/2018",  0.1766,     "US",
    "NVDA US",      "NVIDIA CORP", "Information Technology",     0.0343,          0,  "1/31/2018",  0.2703,     "US",
 "USD Curncy",              "USD",                   "cash",     0.0455,          0,  "1/31/2018",       0,       NA,
    "1635 HK", "SHANGHAI DAZHO-H",              "Utilities",     0.0061,          0,  "2/28/2018", -0.0376,     "CN",
    "2343 HK",    "PACIFIC BASIN",            "Industrials",     0.0203,          0,  "2/28/2018",  0.2044,     "HK",
    "3008 TT", "LARGAN PRECISION", "Information Technology",     0.0309,          0,  "2/28/2018", -0.0774,     "TW",
     "388 HK",             "HKEX",             "Financials",     0.0826,          0,  "2/28/2018", -0.0418,     "HK",
     "547 HK",   "DIGITAL DOMAIN", "Communication Services",     0.0099,          0,  "2/28/2018", -0.0608,     "HK",
    "6758 JP",        "SONY CORP", "Consumer Discretionary",     0.0106,          0,  "2/28/2018",  0.0447,     "JP",
     "860 HK", "WE SOLUTIONS LTD", "Consumer Discretionary",     0.0062,          0,  "2/28/2018",  -0.085,     "HK",
     "868 HK",      "XINYI GLASS", "Consumer Discretionary",     0.0224,          0,  "2/28/2018",  0.0235,     "HK",
    "IPGP US",    "IPG PHOTONICS", "Information Technology",     0.0395,          0,  "2/28/2018",  -0.025,     "US",
    "NVDA US",      "NVIDIA CORP", "Information Technology",     0.0352,          0,  "2/28/2018", -0.0148,     "US",
 "USD Curncy",              "USD",                   "cash",      0.047,          0,  "2/28/2018",       0,       NA,
    "1635 HK", "SHANGHAI DAZHO-H",              "Utilities",     0.0047,          0,  "3/30/2018",       0,     "CN",
    "2343 HK",    "PACIFIC BASIN",            "Industrials",     0.0233,          0,  "3/30/2018", -0.0367,     "HK",
    "3008 TT", "LARGAN PRECISION", "Information Technology",     0.0292,          0,  "3/29/2018",  -0.141,     "TW",
     "388 HK",             "HKEX",             "Financials",     0.0769,          0,  "3/30/2018", -0.1014,     "HK",
     "547 HK",   "DIGITAL DOMAIN", "Communication Services",     0.0107,          0,  "3/30/2018",  0.0059,     "HK",
    "6758 JP",        "SONY CORP", "Consumer Discretionary",     0.0104,          0,  "3/30/2018", -0.0517,     "JP",
     "868 HK",      "XINYI GLASS", "Consumer Discretionary",     0.0225,          0,  "3/30/2018", -0.0296,     "HK",
    "IPGP US",    "IPG PHOTONICS", "Information Technology",      0.039,          0,  "3/30/2018", -0.0499,     "US",
    "NVDA US",      "NVIDIA CORP", "Information Technology",      0.035,          0,  "3/30/2018",  -0.043,     "US",
 "USD Curncy",              "USD",                   "cash",     0.0478,          0,  "3/31/2018",       0,       NA,
    "1635 HK", "SHANGHAI DAZHO-H",              "Utilities",     0.0037,          0,  "4/30/2018",  0.0098,     "CN",
     "200 HK",   "MELCO INTL DEV", "Consumer Discretionary",     0.0109,          0,  "4/30/2018",  0.2873,     "HK",
    "2343 HK",    "PACIFIC BASIN",            "Industrials",     0.0234,          0,  "4/30/2018",       0,     "HK",
     "388 HK",             "HKEX",             "Financials",     0.0778,          0,  "4/30/2018",  0.0193,     "HK",
     "547 HK",   "DIGITAL DOMAIN", "Communication Services",     0.0104,          0,  "4/30/2018", -0.0351,     "HK",
    "6758 JP",        "SONY CORP", "Consumer Discretionary",     0.0107,          0,  "4/30/2018",  0.0494,     "JP",
     "799 HK",          "IGG INC", "Communication Services",      0.012,          0,  "4/30/2018",  0.1136,     "SG",
     "868 HK",      "XINYI GLASS", "Consumer Discretionary",     0.0218,          0,  "4/30/2018", -0.0321,     "HK",
    "IPGP US",    "IPG PHOTONICS", "Information Technology",     0.0358,          0,  "4/30/2018", -0.0872,     "US",
    "NVDA US",      "NVIDIA CORP", "Information Technology",     0.0341,          0,  "4/30/2018", -0.0289,     "US",
 "USD Curncy",              "USD",                   "cash",     0.0109,          0,  "4/30/2018",       0,       NA

mara · February 16, 2019, 1:36pm

So when you do the tribble paste, you don't need all of the data. Just select the first ten lines or so.

kamkwong · February 16, 2019, 2:47pm

Dear Mara

please correct me if i m wrong, it seems you don't have to run my data set to see my problem(reprex)? coz by using datapasta you recommended, i already imported and created my data set (noges) BUT i think this is the issue.......very obviously, the own built in data file like "jan" located in "data" in the top right global environment is different from the object located in "values" ?? my imported data being created and was located in the "values" call "noges" ...i can use the pa package to run its own built in "jan" data frame but not the "values" "noges"...error saying there is no such object...

is there any a way i can simply convert the object "noges" in "values" to df like the "jan" in "data" in global environment so that i can use the data to run the package for analysis purpose?

Best

Noges-2 Noges-2.PNG

kamkwong · February 16, 2019, 2:48pm

i simply couldn't run the "noges" to do analysis...

error

error.PNG

mara · February 16, 2019, 3:10pm

It looks like you're assigning the character string including the data assignments data_nog to a vector. I can't see exactly what you're doing without a reprex, but I'm guessing you have the entire tibble in quotes or something to that effect.

mara · February 16, 2019, 3:14pm

Run this exact code, you will see it assigns to a data frame

data_nog <- tibble::tribble(
  ~bbgid, ~name, ~sector, ~portfolio, ~benchmark, ~date, ~return, ~country,
  "1635 HK", "SHANGHAI DAZHO-H", "Utilities", 0.0071, 0, "1/31/2018", -0.059, "CN",
  "2343 HK", "PACIFIC BASIN", "Industrials", 0.013, 0, "1/31/2018", 0.071, "HK",
  "388 HK", "HKEX", "Financials", 0.0629, 0, "1/31/2018", 0.236, "HK",
  "547 HK", "DIGITAL DOMAIN", "Communication Services", 0.0101, 0, "1/31/2018", 0.0402, "HK",
  "6758 JP", "SONY CORP", "Consumer Discretionary", 0.0399, 0, "1/31/2018", 0.0248, "JP",
  "6858 HK", "HONMA GOLF LTD", "Consumer Discretionary", 0.0013, 0, "1/31/2018", 0.0997, "JP",
  "860 HK", "WE SOLUTIONS LTD", "Consumer Discretionary", 0.0084, 0, "1/31/2018", 0.2579, "HK",
  "IPGP US", "IPG PHOTONICS", "Information Technology", 0.0287, 0, "1/31/2018", 0.1766, "US",
  "NVDA US", "NVIDIA CORP", "Information Technology", 0.0343, 0, "1/31/2018", 0.2703, "US")

^{Created on 2019-02-16 by the reprex package (v0.2.1.9000)}

kamkwong · February 16, 2019, 3:15pm

Dear Mara

here is the reprex for the first 10 lines or so...many thanks

noges <- tibble::tribble(
~bbgid, ~name, "6869 HK", "YZJSGD SP", "YANGZIJIANG SHIP", "YLLG SP", "YANLORD LAND GRO", "200869 CH", "YANTAI CHANGYU-B", "1230 HK", "YASHILI INT'L", "YINGLI SP", "YING LI INTERNAT", "123 HK", "YUEXIU PROPERTY", "1052 HK", "YUEXIU TRANSPORT", "1628 HK", "YUZHOU PROPERTIE", "ZTO US", "ZTO EXPRESS -ADR", "576 HK", "ZHEJIANGEXPRE-H", "900915 CH", "ZHONGLU "881 HK", "ZHONGSHENG "1458 HK", "ZHOU HEI YA INTE", "3898 HK", "ZHUZHOU CRRC T-H", "1157 HK", "ZOOMLION HEAVY-H", "KANG US", "IKANG HEALTH-ADR", )
head(noges)
#> # A tibble: 6 x 8
#> bbgid name #> #> 1 6869 HK YOFC-H #> 2 YZJSGD~ YANGZIJIA~ Industrials #> 3 YLLG SP YANLORD L~ Real Estate #> 4 200869~ YANTAI CH~ Consumer St~ #> 5 1230 HK YASHILI I~ Consumer St~ #> 6 YINGLI~ YING LI I~ Real Estate ~sector, ~portfolio, ~benchmark, ~date, ~return, ~country,
"YOFC-H", "Information Technology", 0, 6e-04, "1/31/2018", 0.0111, "CN",
"Industrials", 0, 0.002, "1/31/2018", 0.0884, "CN",
"Real Estate", 0, 7e-04, "1/31/2018", 0.142, "SG",
"Consumer Staples", 0.0071, 4e-04, "1/31/2018", 0.0467, "CN",
"Consumer Staples", 0.013, 2e-04, "1/31/2018", 0.0133, "CN",
"Real Estate", 0.0629, 1e-04, "1/31/2018", 0.0331, "SG",
"Real Estate", 0.0101, 0.001, "1/31/2018", 0.1507, "HK",
"Industrials", 0.0399, 4e-04, "1/31/2018", 0.0017, "HK",
"Real Estate", 0.0013, 8e-04, "1/31/2018", 0.3861, "CN",
"Industrials", 0.0084, 0.0018, "1/31/2018", -0.0025, "CN",
"Industrials", 0.0287, 0.0013, "1/31/2018", 0.078, "CN",
CO LTD-B", "Consumer Discretionary", 0, 1e-04, "1/31/2018", 0, "CN",
GROUP", "Consumer Discretionary", 0, 0.0012, "1/31/2018", 0.102, "CN",
"Consumer Staples", 0.0309, 7e-04, "1/31/2018", -0.0463, "CN",
"Industrials", 0, 0.0021, "1/31/2018", -0.1485, "CN",
"Industrials", 0, 4e-04, "1/31/2018", 0.0209, "CN",
"Health Care", 0, 3e-04, "1/31/2018", 0.0196, "CN"
sector portfolio benchmark date return country

Information~ 0 0.000600 1/31/~ 0.0111 CN
0 0.002 1/31/~ 0.0884 CN
0 0.0007 1/31/~ 0.142 SG
0.0071 0.0004 1/31/~ 0.0467 CN
0.013 0.0002 1/31/~ 0.0133 CN
0.0629 0.0001 1/31/~ 0.0331 SG

mara · February 16, 2019, 3:19pm

Right. So now the issues is that you need to run your analysis. As I mentioned in the beginning, I'm not very familiar with the package you're using (or the function), but you need to ensure that your arguments (variables) meet the specifications outlined for the function (see my first response at the top). If you look at the example data, you can see that several of the variables referenced are of specific class/types, such as date. You'll need to format your data accordingly, as well, I believe:

library(pa)
#> Loading required package: grid
data_nog <- tibble::tribble(
  ~bbgid, ~name, ~sector, ~portfolio, ~benchmark, ~date, ~return, ~country,
  "1635 HK", "SHANGHAI DAZHO-H", "Utilities", 0.0071, 0, "1/31/2018", -0.059, "CN",
  "2343 HK", "PACIFIC BASIN", "Industrials", 0.013, 0, "1/31/2018", 0.071, "HK",
  "388 HK", "HKEX", "Financials", 0.0629, 0, "1/31/2018", 0.236, "HK",
  "547 HK", "DIGITAL DOMAIN", "Communication Services", 0.0101, 0, "1/31/2018", 0.0402, "HK",
  "6758 JP", "SONY CORP", "Consumer Discretionary", 0.0399, 0, "1/31/2018", 0.0248, "JP",
  "6858 HK", "HONMA GOLF LTD", "Consumer Discretionary", 0.0013, 0, "1/31/2018", 0.0997, "JP",
  "860 HK", "WE SOLUTIONS LTD", "Consumer Discretionary", 0.0084, 0, "1/31/2018", 0.2579, "HK",
  "IPGP US", "IPG PHOTONICS", "Information Technology", 0.0287, 0, "1/31/2018", 0.1766, "US",
  "NVDA US", "NVIDIA CORP", "Information Technology", 0.0343, 0, "1/31/2018", 0.2703, "US")

data(jan)
str(jan)
#> 'data.frame':    3000 obs. of  15 variables:
#>  $ barrid   : Factor w/ 51132 levels "ARGAAB1","ARGAAC4",..: 43300 25132 25228 45045 45734 7535 7479 44276 40286 41347 ...
#>  $ name     : Factor w/ 51315 levels "                                   ",..: 41025 34789 34790 11177 27326 8029 29073 5498 4141 17301 ...
#>  $ return   : num  0.0234 -0.0791 -0.0867 -0.07 -0.0723 ...
#>  $ date     : Date, format: "2010-01-01" "2010-01-01" ...
#>  $ sector   : Ord.factor w/ 10 levels "Energy"<"Materials"<..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ momentum : num  -0.052 1.757 1.757 0.382 0.629 ...
#>  $ value    : num  1.057 0.524 0.524 0.394 0.394 ...
#>  $ size     : num  0.25 -0.265 -0.265 -0.281 -0.074 0.715 0.172 0 -0.213 0.227 ...
#>  $ growth   : num  0.972 1.233 1.233 1.105 1.317 ...
#>  $ cap.usd  : num  2.72e+10 9.23e+09 9.12e+09 1.27e+10 1.71e+10 ...
#>  $ yield    : num  0 2.162 2.162 0.546 0.978 ...
#>  $ country  : Factor w/ 55 levels "ARE","ARG","AUS",..: 54 38 38 54 54 11 11 54 54 54 ...
#>  $ currency : Factor w/ 44 levels "AREC","ARGC",..: 43 28 28 43 43 9 9 43 43 43 ...
#>  $ portfolio: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ benchmark: num  0.001259 0.000427 0.000422 0.000589 0.000791 ...
str(data_nog)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    9 obs. of  8 variables:
#>  $ bbgid    : chr  "1635 HK" "2343 HK" "388 HK" "547 HK" ...
#>  $ name     : chr  "SHANGHAI DAZHO-H" "PACIFIC BASIN" "HKEX" "DIGITAL DOMAIN" ...
#>  $ sector   : chr  "Utilities" "Industrials" "Financials" "Communication Services" ...
#>  $ portfolio: num  0.0071 0.013 0.0629 0.0101 0.0399 0.0013 0.0084 0.0287 0.0343
#>  $ benchmark: num  0 0 0 0 0 0 0 0 0
#>  $ date     : chr  "1/31/2018" "1/31/2018" "1/31/2018" "1/31/2018" ...
#>  $ return   : num  -0.059 0.071 0.236 0.0402 0.0248 ...
#>  $ country  : chr  "CN" "HK" "HK" "HK" ...

^{Created on 2019-02-16 by the reprex package (v0.2.1.9000)}

mara · February 16, 2019, 3:29pm

Here's an example in which I've done some reformatting (create dates from the date variable, and a factor variable for sector, after which I'm able to run the brinson() function on the sample of data I used:

library(tidyverse)
library(pa)
#> Loading required package: grid

data_nog <- tibble::tribble(
  ~bbgid, ~name, ~sector, ~portfolio, ~benchmark, ~date, ~return, ~country,
  "1635 HK", "SHANGHAI DAZHO-H", "Utilities", 0.0071, 0, "1/31/2018", -0.059, "CN",
  "2343 HK", "PACIFIC BASIN", "Industrials", 0.013, 0, "1/31/2018", 0.071, "HK",
  "388 HK", "HKEX", "Financials", 0.0629, 0, "1/31/2018", 0.236, "HK",
  "547 HK", "DIGITAL DOMAIN", "Communication Services", 0.0101, 0, "1/31/2018", 0.0402, "HK",
  "6758 JP", "SONY CORP", "Consumer Discretionary", 0.0399, 0, "1/31/2018", 0.0248, "JP",
  "6858 HK", "HONMA GOLF LTD", "Consumer Discretionary", 0.0013, 0, "1/31/2018", 0.0997, "JP",
  "860 HK", "WE SOLUTIONS LTD", "Consumer Discretionary", 0.0084, 0, "1/31/2018", 0.2579, "HK",
  "IPGP US", "IPG PHOTONICS", "Information Technology", 0.0287, 0, "1/31/2018", 0.1766, "US",
  "NVDA US", "NVIDIA CORP", "Information Technology", 0.0343, 0, "1/31/2018", 0.2703, "US")


data_nog <- data_nog %>%
  mutate(
    date = as.Date(date, "%m/%d/%Y"),
    sector = as_factor(sector)
    )

nog.single <- brinson(x = data_nog, date.var = "date",
        cat.var = "sector",
        bench.weight = "benchmark",
        portfolio.weight = "portfolio",
        ret.var = "return")

nog.single
#> Period:                              2018-01-31
#> Methodology:                         Brinson
#> Securities in the portfolio:         9
#> Securities in the benchmark:         0

^{Created on 2019-02-16 by the reprex package (v0.2.1.9000)}

kamkwong · February 16, 2019, 3:41pm

Dear Mara

haha I m still digesting what u told me earlier...you are great...

so it seems it is the "format" issues in my orignal excel file, you could spot the problems once I gave you my reprex in this example right, eventually, I understood what you were doing ><

I can tell the difference for the date format by eye but u can even tell there is problem with the "sector" format??

secondly, is it true I need to text these few lines every time I run my analysis?
data_nog <- data_nog %>%
mutate(
date = as.Date(date, "%m/%d/%Y"),
sector = as_factor(sector)
)
Or I can go back to my original excel file and format the "date" and "sector" column again...and probably load it into R again by "Datapasta"? or some simple ways like read_csv??

Best

mara · February 16, 2019, 3:53pm

You can use read_csv() and parse in the variables in the formats you want by specifying the column types. See the readr documentation (e.g. the link below).

I couldn't "see" the difference from the input data (that's why reprex is so useful), I could see it when I ran str() on the example data used in the pa documentation and on the data you gave me. If you look through the earlier reprex where I do that above, you'll see that the type of each column is printed next to the column name, e.g.

#>   $ sector   : Ord.factor w/ 10 levels "Energy"<"Materials"<..: 1

Since the example uses a factor variable for sector, I guessed that your actual data should as well. Note that, if you use readr, it will never guess to read in a column of strings as a factor. You can specify that when you read in the data (e.g. using parse_factor()), or afterward.

You can read more on factor variables in the R for Data Science chapter on it, below:

If you're unfamiliar with importing and manipulating data in R, you might want to look through that book more generally (or any other of a number of great free, online R resources — Google should help you out there!)

kamkwong · February 16, 2019, 4:13pm

Dear Mara

I m really appreciated your help and which def arouse my interest to do better in studying R. Frankly, here is a really great place to start and to chat with all u guys who is willing to help out anytime we encounter problems.....again thank you so much

Best

kamkwong ><

kamkwong · February 17, 2019, 3:26pm

Dear mara

Sorry for interrupting again, back to the same issue about my data frame, I m trying to import less data from excel using the read.table clipboard command line....still I m not good enough to handle factor variable syntax...would u mind do me a favor and show me the "mutate" or "factor variable" command line for "bbgid", "name", "sector", "date" and "country" .....here is the difference "str" between data set "jan" and my one "my_data"

I tried your "mutate" and "factor variable" command line again but not working this time...

Best

kamkwong · February 17, 2019, 3:29pm

str(my_data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	17 obs. of  8 variables:
 $ bbgid    : chr  "6869 HK" "YZJSGD SP" "YLLG SP" "200869 CH" ...
 $ name     : chr  "YOFC-H" "YANGZIJIANG SHIP" "YANLORD LAND GRO" "YANTAI CHANGYU-B" ...
 $ sector   : chr  "Information Technology" "Industrials" "Real Estate" "Consumer Staples" ...
 $ portfolio: num  0 0 0 0.0071 0.013 0.0629 0.0101 0.0399 0.0013 0.0084 ...
 $ benchmark: num  0.0006 0.002 0.0007 0.0004 0.0002 0.0001 0.001 0.0004 0.0008 0.0018 ...
 $ date     : chr  "1/31/2018" "1/31/2018" "1/31/2018" "1/31/2018" ...
 $ return   : num  0.0111 0.0884 0.142 0.0467 0.0133 ...
 $ country  : chr  "CN" "CN" "SG" "CN" ...

str(jan)
'data.frame':	3000 obs. of  15 variables:
 $ barrid   : Factor w/ 51132 levels "ARGAAB1","ARGAAC4",..: 43300 25132 25228 45045 45734 7535 7479 44276 40286 41347 ...
 $ name     : Factor w/ 51315 levels "                                   ",..: 41025 34789 34790 11177 27326 8029 29073 5498 4141 17301 ...
 $ return   : num  0.0234 -0.0791 -0.0867 -0.07 -0.0723 ...
 $ date     : Date, format: "2010-01-01" "2010-01-01" ...
 $ sector   : Ord.factor w/ 10 levels "Energy"<"Materials"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ momentum : num  -0.052 1.757 1.757 0.382 0.629 ...
 $ value    : num  1.057 0.524 0.524 0.394 0.394 ...
 $ size     : num  0.25 -0.265 -0.265 -0.281 -0.074 0.715 0.172 0 -0.213 0.227 ...
 $ growth   : num  0.972 1.233 1.233 1.105 1.317 ...
 $ cap.usd  : num  2.72e+10 9.23e+09 9.12e+09 1.27e+10 1.71e+10 ...
 $ yield    : num  0 2.162 2.162 0.546 0.978 ...
 $ country  : Factor w/ 55 levels "ARE","ARG","AUS",..: 54 38 38 54 54 11 11 54 54 54 ...
 $ currency : Factor w/ 44 levels "AREC","ARGC",..: 43 28 28 43 43 9 9 43 43 43 ...
 $ portfolio: num  0 0 0 0 0 0 0 0 0 0 ...
 $ benchmark: num  0.001259 0.000427 0.000422 0.000589 0.000791 ...