I believe the shared PDF does not contain text, but is entirely image based.
Had you used OCR to 'extract' the text? Im confused on this point...
Anyway, the way you represented example data to this forum is copy and pasteable, so i pasted it to a local file for myself. the raw text once copied appeared tab delimited, so i globally replaced tabs with semi-colons. and resaved the file. I could probably have kept it with tabs and picked a read function which recognises tab delimiters, but i didnt.
once i have semi-colon delimited file I used this code:
library(tidyverse)
almost_csv<- read.delim("forforum_notquite.csv",
sep = ";",
stringsAsFactors = FALSE,
colClasses = "character") %>%
as_tibble()
#realign TI/and ID with 3 rule
almost_csv2<- as.matrix(almost_csv )
for (r in 1:nrow(almost_csv2)){
if(almost_csv2[r,2] =='3' )
{
almost_csv2[r,3:15] <- almost_csv2[r,2:14]
almost_csv2[r,2] <- NA_character_
}
}
#remove all asterix ,and turn all commas to decimals, then H,ID,mer,Z,L,MT,R to integer, the rest to double
almost_csv3 <- as_tibble(almost_csv2) %>%
transmute_all(~stringr::str_replace_all(string=.,pattern ="\\*" ,replacement = "")) %>%
transmute_all(~stringr::str_replace_all(string=.,pattern ="\\," ,replacement = ".")) %>%
mutate_at(.vars = c("H","ID","mer","Z","L","MT","R","xy"),
.funs = as.integer) %>%
mutate_at(.vars = c("Mik", "F", "E", "Hst", "F_E", "e" ),
.funs = as.double)
> almost_csv3
# A tibble: 16 x 15
H Ti ID mer Z L MT Mik F E Hst F_E e R xy
<int> <chr> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 27 Christina 3 587 54572 3 5 37.9 5.99 3.58 250 1.7 47.1 5 NA
2 20 NA 3 598 27377 2 7 38.7 3.47 3.45 146 1 36.5 4 NA
3 192 Betina 3 608 33915 1 8 24.6 6.19 3.5 276 1.8 31 5 NA
4 2 Antje 3 608 33881 1 10 32.5 4.18 3.31 126 1.3 33 4 NA
5 65 Soleika 3 608 33887 1 11 32.1 4.29 3.08 140 1.4 32.6 4 NA
6 179 Carmen 3 587 54567 3 11 41.3 4.53 3.15 125 1.4 43.2 4 NA
7 17 Gundula 3 598 27413 2 13 44.1 3.68 3.42 211 1.1 42.6 5 NA
8 71 Annika 3 598 27454 1 14 30.7 5.23 3.22 255 1.6 34.8 5 NA
9 89 Erna 3 608 33888 1 16 35.8 4.03 3.16 111 1.3 35.4 4 NA
10 152 Erle 3 598 27408 2 19 55 3.33 2.97 145 1.1 49.3 1 NA
11 136 Emily 3 587 54569 3 23 56 4.03 3.22 200 1.3 55.6 5 NA
12 56 Paola 3 587 54607 3 24 46.1 4.04 3.01 238 1.3 45.2 2 NA
13 121 NA 3 598 27376 2 25 35.9 3.67 3.67 210 1 35.2 5 NA
14 98 NA 3 587 54558 3 32 40.9 3.3 3.09 153 1.1 36.8 5 NA
15 182 Anja 3 608 33892 1 35 NA 3.41 2.98 212 1.1 35.3 2 NA
16 93 Paola 3 587 54576 3 56 59.2 2.6 2.95 156 0.9 NA 48 2