Need Help with Plotting a multiline graph

I am taking a Statistics class that uses R. And this is overwhelming. I downloaded data that was in excel and I am trying to use ggplot. But I think my data needs to be cleaned up. I really hate this class. Second time taking it. And It keeps coming up as error. Any advice or help would be great. I don't tidy verse helped at all

Stressed out grad student

Awilda

Perhaps if you could share your data we would be able to help you cleaning it. If you want to do it yourself, the bible for wrangling data is R for Data Science.

Good luck!

Here is an example of how one might do this sort of task. I guessed at what the initial data look like. If this does not help you enough, please show you data or at least part of it. The output of

dput(head(DF))

would be very helpful. That assumes your data frame is named DF.

DF <- data.frame(Xval = 1:5, A = 1:5, B = 2:6, C = 3:7)
DF
#>   Xval A B C
#> 1    1 1 2 3
#> 2    2 2 3 4
#> 3    3 3 4 5
#> 4    4 4 5 6
#> 5    5 5 6 7
library(tidyr)
library(ggplot2)
DFtall <- DF %>% pivot_longer(cols = A:C, names_to = "Type", values_to = "Value")
DFtall
#> # A tibble: 15 x 3
#>     Xval Type  Value
#>    <int> <chr> <int>
#>  1     1 A         1
#>  2     1 B         2
#>  3     1 C         3
#>  4     2 A         2
#>  5     2 B         3
#>  6     2 C         4
#>  7     3 A         3
#>  8     3 B         4
#>  9     3 C         5
#> 10     4 A         4
#> 11     4 B         5
#> 12     4 C         6
#> 13     5 A         5
#> 14     5 B         6
#> 15     5 C         7
ggplot(data = DFtall, mapping = aes(x = Xval, y = Value, group = Type, color = Type)) + 
  geom_line()

Created on 2020-09-16 by the reprex package (v0.3.0)

It is this data. I exported it into an excel. The excel data seems not to be computing into r program. Thank you if you can help. I'll look into the source you gave me.

Awilda Romero

This would be a more helpful I think. The excel doc I exported is in here.
https://gssdataexplorer.norc.org/trends/Gender%20&%20Marriage?measure=cohabfst

Yes! This is what I want to make in R! DF stands for Data Frame?

DF is just the name I gave the variable that holds the data. Have you read in the data or do you still need to do that?

I am afraid I don't know what you mean. This is what I have so far input into R:
getwd()
setwd
setwd("/Users/awilda/Desktop/Statistics-V506 Dumortier/Homework/Homework 1")
library(openxlsx)
MarriageData=read.xlsx("Marriage Data.xlsx")

I see you have already found a way to import your data. Did you get any errors after running this code? If so, which ones?

I did. It said it couldn't find the data. But I was able to pull it up before. So I don't know what happened, So I am leaving the program and starting over again. so see if it will pull it up again.

I really appreciate the both of you helping me. I literally started crying because I couldn't figure this out. So thank you.

if the read.xlsx() function cannot find the file, use the getwd() command to check that R is looking in the intended directory and use the dir() command to see what files are in the current directory. Do you see Marriage Data.xlsx using those commands?

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, :
invalid multibyte string at '{lA<31>x:](mج<84>4Fk<9f>4<84>!NQ<95>F[=wG'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
5: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input

Do you have Excel so that you can open this file? You may not have valid Excel file.

Yes I do have excel
and I used the dir() and this is the output:
dir()
[1] "~$MarriageData.xlsx" "~$mework 1.docx" "bmw.csv" "BoxPlotQ1.png" "Homework 1 (Fall).R"
[6] "Homework 1.docx" "MarriageData.xlsx" "Question 1.R" "Question 2.R" "Question 3.R"
[11] "Rplot.png"

I'll go ahead and open it as an excel again.

I can no longer send anymore replies because I am limited as a new member. The clean up worked. Why the assignment requires is for me to replicate the graph:

But I don't know how to continue this correspondence and help since I won't be able to reply anymore. Any ideas? I am willing to Skype. You are really a great help. I am currently a grad student in Policy Analysis and this is a required course. And this is not my cup of tea at all. Thoughts?

1 Like

What is the data structure? Let's start with the output of

dput(head(MarriageData))

dput(head(MarriageData))
structure(list(Gender.&.Marriage:.People.should.live.together.before.marriage.(agree/disagree) = c("Response:",
"Breakdown:", " ", "Subjective class identification", "Lower class",
"Working class"), X2 = c("Strongly agree", "Subjective class identification",
"Year", "1994", "\r\n14.3\r\n\r\n(4.47)\r\n", "\r\n11.9\r\n\r\n(1.41)\r\n"
), X3 = c(NA, NA, NA, "1998", "\r\n23.5\r\n\r\n(5.38)\r\n", "\r\n17.0\r\n\r\n(1.78)\r\n"
), X4 = c(NA, NA, NA, "2002", "\r\n19.1\r\n\r\n(5.63)\r\n", "\r\n20.3\r\n\r\n(1.77)\r\n"
)), row.names = c(NA, 6L), class = "data.frame")

The imported data are quite a mess. I did some manual revisions. Try this code.

MarriageData2 <- structure(list(X1 = c("Response:","Breakdown:", " ", "Subjective class identification", "Lower class","Working class"), 
                     X2 = c("Strongly agree", "Subjective class identification","Year", "1994", "\r\n14.3\r\n\r\n(4.47)\r\n", "\r\n11.9\r\n\r\n(1.41)\r\n"), 
                     X3 = c(NA, NA, NA, "1998", "\r\n23.5\r\n\r\n(5.38)\r\n", "\r\n17.0\r\n\r\n(1.78)\r\n"), 
                     X4 = c(NA, NA, NA, "2002", "\r\n19.1\r\n\r\n(5.63)\r\n", "\r\n20.3\r\n\r\n(1.77)\r\n")), row.names = c(NA, 6L), class = "data.frame")

and then run

View(MarriageData2)

Does that look like what you would expect? If so, what do you want to graph? I know that this is only the first six lines of the data

1 Like

@awildaromero36 I sent you a DM--happy to Skype with you if you still want help with this or anything else.

1 Like