I am new to coding so this is all annotated for me to understand; sorry if it's not as formally accurate as it should be!
I am trying to make a linear regression of two time points, one we estimated and the actual one to see the accuracy of our estimation protocol.
Here is my code:
>MolClockMASTERDATA <- read_csv("~/Desktop/Projects/Molecular Clock/DataAnalysis/MolClockALLDATA.csv")
>head(MolClockMASTERDATA)
# A tibble: 6 x 27 (a sample of my data set)
Run Sample APD_1 APD_5 APD_10 AAstart AAend GoodAAs Seq_cont HXB2nt_start HXB2nt_end Consensus Fragment
<int> <chr> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <chr> <int>
1 3 pt313… 0.0233 0.0180 0.0166 84 476 392 1 1044 2220 cgggcgag… 1
2 3 pt313… 0.0199 0.0182 0.0168 62 467 405 1 978 2193 tcagtatt… 1
3 3 pt313… 0.0210 0.0176 0.0157 159 238 79 1 1269 1506 tcagtatt… 1
4 3 pt258… 0.0157 0.0120 0.0109 515 937 422 1 2337 3603 gcgtcagt… 1
5 3 pt485… 0.0160 0.0120 0.0120 515 979 464 1 2337 3729 tcagtatt… 1
6 3 pt490… 0.0204 0.0101 0.00780 109 359 250 1 1119 1869 gcgagagc… 1
# ... with 14 more variables: `ActualTOI (Month)` <int>, `ActualTOI (year)` <dbl>, MAE1 <dbl>, Slope1 <int>,
# Yint1 <dbl>, CalculatedTOI1 <dbl>, MAE5 <dbl>, Slope5 <int>, Yint5 <dbl>, CalculatedTOI5 <dbl>, MAE10 <dbl>,
# Slope10 <int>, Yint10 <dbl>, CalculatedTOI10 <dbl>
ETI1=molclockMaster$CalculatedTOI1
TI=molclockMaster$ActualTOI..year.
#I can roughly plot my data without using ggplot
plot(TI, ETI1)
#simple scatterplot ETI vs TI for 1% APD
ggplot2::ggplot(MolClockMASTERDATA, aes (x=TI, y=ETI1)) + geom_point()
Returns: Error: Aesthetics must be either length 1 or the same as the data (151): x, y
Help?