I have a .csv file I'm trying to tidy but I failed to do it. The csv shows like this on Rstudio:
I'd like to separate the columns. Thank you.
I have a .csv file I'm trying to tidy but I failed to do it. The csv shows like this on Rstudio:
I'd like to separate the columns. Thank you.
Hi, remember put a reproducible example.
Somes of data and script steps.
Maybe you need use other separator like this
;
read.csv(file, # name of file or path
header = TRUE, # header
sep = ";" )
I tried uploading a txt file but it didn't allow me. I'll paste the first lines of the csv here:
10.20.130.34;projuris-app-clone;141594;CentOS 7 : expat (CESA-2020:3952);Update the affected expat, expat-devel and / or expat-static packages.;;;;;;
10.20.130.34;projuris-app-clone;121124;Apache Tomcat 8.0.x < 8.0.52 / 8.5.x < 8.5.31 / 9.0.x < 9.0.8 Denial of Service;Upgrade to Apache Tomcat version 8.0.52 / 8.5.31 / 9.0.8 or later.;;;;;;
10.20.130.34;projuris-app-clone;147884;CentOS 7 : wpa_supplicant (CESA-2021:0808);Update the affected wpa_supplicant package.;;;;;;
10.20.130.34;projuris-app-clone;152497;CentOS 7 : java-1.8.0-openjdk (CESA-2021:2845);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;150763;CentOS 7 : dhcp (CESA-2021:2357);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;156241;CentOS 7 : samba (CESA-2021:5192);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;100681;Apache Tomcat 7.0.x < 7.0.78 / 8.0.x < 8.0.44 / 8.5.x < 8.5.15 / 9.0.x < 9.0.0.M21 Remote Error Page Manipulation;Upgrade to Apache Tomcat version 7.0.78 / 8.0.44 / 8.5.15 / 9.0.0.M21 or later.;;;;;;
10.20.130.34;projuris-app-clone;150771;CentOS 7 : postgresql (CESA-2021:1512);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;141619;CentOS 7 : kernel (CESA-2020:4060);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;145444;CentOS 7 : net-snmp (CESA-2020:5350);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;146097;CentOS 7 : glibc (CESA-2021:0348);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;141634;CentOS 7 : curl (CESA-2020:3916);Update the affected curl, libcurl and / or libcurl-devel packages.;;;;;;
10.20.130.34;projuris-app-clone;155547;CentOS 7 : kernel (CESA-2021:3801);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;152970;CentOS 7 : kernel (CESA-2021:3327);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;158440;CentOS 7 : openldap (CESA-2022:0621);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;103697;Apache Tomcat 8.0.0.RC1 < 8.0.47 Multiple Vulnerabilities;Upgrade to Apache Tomcat version 8.0.47 or later.;;;;;;
10.20.130.34;projuris-app-clone;155545;CentOS 7 : openssl (CESA-2021:3798);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;155759;CentOS 7 : kernel (CESA-2021:4777);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;142600;CentOS 7 : nss and nspr (CESA-2020:4076);Update the affected packages.;;;;;;
Your data is separated by ";". Just specify read.csv(my.file, sep = ';')
and everything will be fine:
# read the data
read.csv(file = 'sample.csv', sep = ';')
#> X10.20.130.34 projuris.app.clone X141594
#> 1 10.20.130.34 projuris-app-clone 121124
#> 2 10.20.130.34 projuris-app-clone 147884
#> 3 10.20.130.34 projuris-app-clone 152497
#> 4 10.20.130.34 projuris-app-clone 150763
#> 5 10.20.130.34 projuris-app-clone 156241
#> 6 10.20.130.34 projuris-app-clone 100681
#> 7 10.20.130.34 projuris-app-clone 150771
#> 8 10.20.130.34 projuris-app-clone 141619
#> 9 10.20.130.34 projuris-app-clone 145444
#> 10 10.20.130.34 projuris-app-clone 146097
#> 11 10.20.130.34 projuris-app-clone 141634
#> 12 10.20.130.34 projuris-app-clone 155547
#> 13 10.20.130.34 projuris-app-clone 152970
#> 14 10.20.130.34 projuris-app-clone 158440
#> 15 10.20.130.34 projuris-app-clone 103697
#> 16 10.20.130.34 projuris-app-clone 155545
#> 17 10.20.130.34 projuris-app-clone 155759
#> 18 10.20.130.34 projuris-app-clone 142600
#> CentOS.7...expat..CESA.2020.3952.
#> 1 Apache Tomcat 8.0.x < 8.0.52 / 8.5.x < 8.5.31 / 9.0.x < 9.0.8 Denial of Service
#> 2 CentOS 7 : wpa_supplicant (CESA-2021:0808)
#> 3 CentOS 7 : java-1.8.0-openjdk (CESA-2021:2845)
#> 4 CentOS 7 : dhcp (CESA-2021:2357)
#> 5 CentOS 7 : samba (CESA-2021:5192)
#> 6 Apache Tomcat 7.0.x < 7.0.78 / 8.0.x < 8.0.44 / 8.5.x < 8.5.15 / 9.0.x < 9.0.0.M21 Remote Error Page Manipulation
#> 7 CentOS 7 : postgresql (CESA-2021:1512)
#> 8 CentOS 7 : kernel (CESA-2020:4060)
#> 9 CentOS 7 : net-snmp (CESA-2020:5350)
#> 10 CentOS 7 : glibc (CESA-2021:0348)
#> 11 CentOS 7 : curl (CESA-2020:3916)
#> 12 CentOS 7 : kernel (CESA-2021:3801)
#> 13 CentOS 7 : kernel (CESA-2021:3327)
#> 14 CentOS 7 : openldap (CESA-2022:0621)
#> 15 Apache Tomcat 8.0.0.RC1 < 8.0.47 Multiple Vulnerabilities
#> 16 CentOS 7 : openssl (CESA-2021:3798)
#> 17 CentOS 7 : kernel (CESA-2021:4777)
#> 18 CentOS 7 : nss and nspr (CESA-2020:4076)
#> Update.the.affected.expat..expat.devel.and...or.expat.static.packages.
#> 1 Upgrade to Apache Tomcat version 8.0.52 / 8.5.31 / 9.0.8 or later.
#> 2 Update the affected wpa_supplicant package.
#> 3 Update the affected packages.
#> 4 Update the affected packages.
#> 5 Update the affected packages.
#> 6 Upgrade to Apache Tomcat version 7.0.78 / 8.0.44 / 8.5.15 / 9.0.0.M21 or later.
#> 7 Update the affected packages.
#> 8 Update the affected packages.
#> 9 Update the affected packages.
#> 10 Update the affected packages.
#> 11 Update the affected curl, libcurl and / or libcurl-devel packages.
#> 12 Update the affected packages.
#> 13 Update the affected packages.
#> 14 Update the affected packages.
#> 15 Upgrade to Apache Tomcat version 8.0.47 or later.
#> 16 Update the affected packages.
#> 17 Update the affected packages.
#> 18 Update the affected packages.
#> X X.1 X.2 X.3 X.4 X.5
#> 1 NA NA NA NA NA NA
#> 2 NA NA NA NA NA NA
#> 3 NA NA NA NA NA NA
#> 4 NA NA NA NA NA NA
#> 5 NA NA NA NA NA NA
#> 6 NA NA NA NA NA NA
#> 7 NA NA NA NA NA NA
#> 8 NA NA NA NA NA NA
#> 9 NA NA NA NA NA NA
#> 10 NA NA NA NA NA NA
#> 11 NA NA NA NA NA NA
#> 12 NA NA NA NA NA NA
#> 13 NA NA NA NA NA NA
#> 14 NA NA NA NA NA NA
#> 15 NA NA NA NA NA NA
#> 16 NA NA NA NA NA NA
#> 17 NA NA NA NA NA NA
#> 18 NA NA NA NA NA NA
Created on 2022-08-30 by the reprex package (v2.0.1)
Kind regards
Awesome! Thanks! Worked like a charm. Now I need to tidy it, and that's where I get stuck.
10.20.130.34
projuris-app-clone
141636
CentOS 7 : openldap (CESA-2020:4041)
Update the affected packages.
36
10.20.130.34
projuris-app-clone
155757
CentOS 7 : openssh (CESA-2021:4782)
Update the affected packages.
37
10.20.130.34
projuris-app-clone
161196
CentOS 7 : gzip (CESA-2022:2191)
Update the affected gzip package.
38
10.20.130.34
projuris-app-clone
141607
CentOS 7 : dbus (CESA-2020:4032)
Update the affected packages.
39
10.20.130.34
projuris-app-clone
146958
CentOS 7 : bind (CESA-2021:0671)
Update the affected packages.
40
10.20.130.34
projuris-app-clone
141611
CentOS 7 : cups (CESA-2020:3864)
Update the affected packages.
41
10.20.130.34
projuris-app-clone
142603
CentOS 7 : libX11 (CESA-2020:4908)
Update the affected libX11, libX11-common and / or libX11-devel packages.
42
10.20.130.34
projuris-app-clone
156860
Apache Log4j 1.x Multiple Vulnerabilities
Upgrade to a version of Apache Log4j that is currently supported.
43
44
Upgrading to the latest versions for Apache Log4j is highly recommended as intermediate versions / patches have known high severity vulnerabilities and the vendor is updating their advisories often as new research and knowledge about the impact of Log4j is discovered. Refer to Log4j – for the latest versions."
45
10.20.130.34
projuris-app-clone
161374
CentOS 7 : kernel (CESA-2022:4642)
Update the affected packages.
46
10.20.130.34
projuris-app-clone
153767
CentOS 7 : kernel (CESA-2021:3438)
Update the affected packages.
47
10.20.121.30
cjblvjump
163855
Ubuntu 18.04 LTS / 20.04 LTS / 22.04 LTS : OpenJDK vulnerabilities (USN-5546-1)
Update the affected packages.
48
10.20.130.34
projuris-app-clone
148426
CentOS 7 : libldb (CESA-2021:1072)
Update the affected packages.
49
10.20.130.34
projuris-app-clone
143046
CentOS 7 : python (CESA-2020:5009)
Update the affected packages.
50
10.20.130.34
projuris-app-clone
143048
CentOS 7 : python3 (CESA-2020:5010)
Update the affected packages.
51
10.20.130.34
projuris-app-clone
150770
CentOS 7 : kernel (CESA-2021:2314)
Update the affected packages.
52
10.20.130.34
projuris-app-clone
157294
CentOS 7 : samba (CESA-2022:0328)
Update the affected packages.
53
10.20.130.34
projuris-app-clone
156103
Apache Log4j 1.2 JMSAppender Remote Code Execution (CVE-2021-4104)
Upgrade to Apache Log4j version 2.16.0 or later since 1.x is end of life.
What exactly do you want? Deleting empty columns (like X1
etc.)?
Maybe you can specify a bit, what you mean by "tidying" your data and what your expected output would be.
For example:
With the example file you provided there is not such a case. It seems like this is a problem with the *.csv file, since there is text inside the IP column. It might be easier to fix the file you load into R
then doing a clean up.
If you need to do it in R
, you could flag all rows which do not match with the structure of an IP adress (with regex), split the data.frame
based on this flag, rework the problematic columns and then join the tables again. But this is only sufficient if you have (or create) a unique id which can be used for this. Otherwise it will be not that easy to clean your table.
I understand. Thank you.
A common issue with cvs files is that the data itself contains the separator character. If it is then quoted then you need to tell read.csv which character is being used as a quote
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.