count and compare

Polita · May 8, 2020, 5:32pm

I want to count and compare the two column records:

cod ser_ven asig_cod
1512 15 fhi
1513 30 sri
1514 800 fhi
1515 20
1516 sin
1517 30 otr
1518 sin
1519 38
1520 80 otr
1521 745
and then give me the total of paired records (they have information in the columns) and indicate which records do not have information in both column 1 and column 2 or vice versa.

I really appreciate any help.
Thank you

siddharthprabhu · May 8, 2020, 5:52pm

I'm not clear about the structure of your data. The table you've posted appears to have 3 columns; which ones do you want to compare?

Polita · May 8, 2020, 5:59pm

Sorry I did not stick the table well, the columns I want to compare:

ser_ven and asig_cod

siddharthprabhu · May 8, 2020, 6:36pm

Thank you. In future, it would be great if you could post code required to generate your data. That makes it much easier for community members to help you.

The code below generates the record counts as you want I think.

library(dplyr, warn.conflicts = FALSE)

data <- tribble(~ cod, ~ ser_ven, ~ asig_cod,
                1512, 15, "fhi",
                1513, 30, "sri",
                1514, 800, "fhi",
                1515, 20, NA,
                1516, NA, "sin",
                1517, 30, "otr",
                1518, NA, "sin",
                1519, 38, NA,
                1520, 80, "otr",
                1521, 745, NA)

# Total rows
tally(data)
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1    10

# Rows with values only in ser_ven
data %>% 
  filter(is.na(ser_ven) & !is.na(asig_cod)) %>% 
  tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     2

# Rows with values only in asig_cod
data %>% 
  filter(!is.na(ser_ven) & is.na(asig_cod)) %>% 
  tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     3

# Paired rows (values in both columns)
data %>% 
  filter_at(vars(ser_ven, asig_cod), all_vars(!is.na(.))) %>% 
  tally()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     5

^{Created on 2020-05-09 by the reprex package (v0.3.0)}

There's a fourth case that is not covered above (both columns have values missing). If you want that, use filter_at(vars(ser_ven, asig_cod), all_vars(is.na(.))).

Polita · May 8, 2020, 8:26pm

thanks for your help.

But the records I want are not kept.

See when I do this I see the records and save them, but only the one variable.

validar_dato1 <-dato %>%
select(cod, ser_ven, asig_cod) %>%
filter(is.na(ser_ven))
if (nrow(validar_dato1) > 0) {
guardarExcel('validar_dato1', 'sin dato en ser_ven', validar_dato1)}

Well, I couldn't do them together, but now it's two o'clock, only the count comes out and the result doesn't save me

Rows with values only in ser_ven

validar_dato1 <-dato %>%
select(cod, ser_ven, asig_cod) %>%
filter(is.na(ser_ven) & !is.na(asig_cod)) %>% tally()
if (nrow(validar_dato1) > 0) {
guardarExcel('validar_dato1', 'sin dato en ser_ven', validar_dato1)}

Rows with values only in asig_cod

validar_dato2 <-dato %>%
select(cod, ser_ven, asig_cod) %>%
filter(is.na(ser_ven) & !is.na(asig_cod)) %>% tally()
if (nrow(validar_dato2) > 0) {
guardarExcel('validar_dato2', 'sin dato en asig_cod', validar_dato2)}

Paired rows (values in both columns)

validar_dato3 <-dato %>%
select(cod, ser_ven, asig_cod) %>%
filter_at(vars(ser_ven, asig_cod), all_vars(!is.na(.))) %>% tally()
if (nrow(validar_dato3) > 0) {
guardarExcel('validar_dato3', 'valores ser_ven y asig_cod', validar_dato3)}

#records without values in ser_ven and asig_cod

validar_dato <-dato %>%
select(cod, ser_ven, asig_cod) %>%
filter_at(vars(ser_ven, asig_cod), all_vars(is.na(.))) %>% tally()
if (nrow(validar_dato4) > 0) {
guardarExcel('validar_dato4', 'registros sin dato ser_ven y asig_cod', validar_dato4)}

I do not know what I'm doing wrong

siddharthprabhu · May 8, 2020, 8:40pm

tally() is generating the counts. If you want the records, just remove it and keep only the filter() function.

Polita · May 8, 2020, 8:49pm

yes thanks i did

Polita · May 9, 2020, 2:59pm

Hello again.
You know that the scripts I mentioned before work for excel, but when I apply to read in a .sav base. it does not read the variable assign_cod, which is a nominal string.

Please your help :orar:

siddharthprabhu · May 9, 2020, 5:17pm

If by this you mean SPSS files, then you need to use an appropriate package to read them into R. Check out the foreign package.

Polita · May 9, 2020, 9:07pm

If I have installed it is

library(haven)

siddharthprabhu · May 10, 2020, 12:45am

haven should work too. Can you please post the line of code you've written to read the .sav file and its output?

Polita · May 15, 2020, 7:34pm

Your help please.

I have the same information, but in SPSS.
I use these libraries.
to read
library (haven);

But with the syntax I did, it doesn't read.

data <- tribble (~ cod, ~ ser_ven, ~ assign_cod,
1512, 15, "fhi",
1513, 30, "sri",
1514, 800, "fhi",
1515, 20, NA,
1516, NA, "without",
1517, 30, "otr",
1518, NA, "without",
1519, 38, NA,
1520, 80, "otr",
1521, 745, NA)
tally (data)

data%>%
filter (! is.na (ser_ven) &! is.na (assign_cod))%>%

The variable assign_cod is string.

When I do the same, all the records come out and there are only 5 that have to come out

Polita · May 15, 2020, 7:38pm

What I want to do is very simple, I don't know why I can't

siddharthprabhu · May 15, 2020, 7:49pm

Can you please do what I requested in my previous post? It will help me understand how you are importing the SPSS data into R.

Polita · May 15, 2020, 8:04pm

I apply this,

system.time (read_spss <- read_spss (file_spss))
system.time (data << - read_spss)

if it works with the other variables only the text variable does not count

Polita · May 15, 2020, 9:08pm

in the variable "val5" there are 502 records and in the variable "val6" there are 364 records.

So I want to compare those values (count 502 and 364) and get me from records that have no information on any of the variables

siddharthprabhu · May 16, 2020, 6:54am

I'm not sure why you're doing it like this.

Can you try data <- haven::read_sav(file = "path/filename.sav")? Please report any errors or warnings that you get upon executing that command.

Polita · May 20, 2020, 3:00pm

I already solved it, thanks