Another help with .csv

I have a .csv file I'm trying to tidy but I failed to do it. The csv shows like this on Rstudio:
image

I'd like to separate the columns. Thank you.

1 Like

Hi, remember put a reproducible example.

Somes of data and script steps.

Maybe you need use other separator like this ;

read.csv(file,                 # name of file or path
         header = TRUE,        # header 
         sep = ";" )
1 Like

I tried uploading a txt file but it didn't allow me. I'll paste the first lines of the csv here:

10.20.130.34;projuris-app-clone;141594;CentOS 7 : expat (CESA-2020:3952);Update the affected expat, expat-devel and / or expat-static packages.;;;;;;
10.20.130.34;projuris-app-clone;121124;Apache Tomcat 8.0.x < 8.0.52 / 8.5.x < 8.5.31 / 9.0.x < 9.0.8 Denial of Service;Upgrade to Apache Tomcat version 8.0.52 / 8.5.31 / 9.0.8 or later.;;;;;;
10.20.130.34;projuris-app-clone;147884;CentOS 7 : wpa_supplicant (CESA-2021:0808);Update the affected wpa_supplicant package.;;;;;;
10.20.130.34;projuris-app-clone;152497;CentOS 7 : java-1.8.0-openjdk (CESA-2021:2845);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;150763;CentOS 7 : dhcp (CESA-2021:2357);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;156241;CentOS 7 : samba (CESA-2021:5192);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;100681;Apache Tomcat 7.0.x < 7.0.78 / 8.0.x < 8.0.44 / 8.5.x < 8.5.15 / 9.0.x < 9.0.0.M21 Remote Error Page Manipulation;Upgrade to Apache Tomcat version 7.0.78 / 8.0.44 / 8.5.15 / 9.0.0.M21 or later.;;;;;;
10.20.130.34;projuris-app-clone;150771;CentOS 7 : postgresql (CESA-2021:1512);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;141619;CentOS 7 : kernel (CESA-2020:4060);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;145444;CentOS 7 : net-snmp (CESA-2020:5350);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;146097;CentOS 7 : glibc (CESA-2021:0348);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;141634;CentOS 7 : curl (CESA-2020:3916);Update the affected curl, libcurl and / or libcurl-devel packages.;;;;;;
10.20.130.34;projuris-app-clone;155547;CentOS 7 : kernel (CESA-2021:3801);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;152970;CentOS 7 : kernel (CESA-2021:3327);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;158440;CentOS 7 : openldap (CESA-2022:0621);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;103697;Apache Tomcat 8.0.0.RC1 < 8.0.47 Multiple Vulnerabilities;Upgrade to Apache Tomcat version 8.0.47 or later.;;;;;;
10.20.130.34;projuris-app-clone;155545;CentOS 7 : openssl (CESA-2021:3798);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;155759;CentOS 7 : kernel (CESA-2021:4777);Update the affected packages.;;;;;;
10.20.130.34;projuris-app-clone;142600;CentOS 7 : nss and nspr (CESA-2020:4076);Update the affected packages.;;;;;;

Your data is separated by ";". Just specify read.csv(my.file, sep = ';') and everything will be fine:

# read the data
read.csv(file = 'sample.csv', sep = ';')
#>    X10.20.130.34 projuris.app.clone X141594
#> 1   10.20.130.34 projuris-app-clone  121124
#> 2   10.20.130.34 projuris-app-clone  147884
#> 3   10.20.130.34 projuris-app-clone  152497
#> 4   10.20.130.34 projuris-app-clone  150763
#> 5   10.20.130.34 projuris-app-clone  156241
#> 6   10.20.130.34 projuris-app-clone  100681
#> 7   10.20.130.34 projuris-app-clone  150771
#> 8   10.20.130.34 projuris-app-clone  141619
#> 9   10.20.130.34 projuris-app-clone  145444
#> 10  10.20.130.34 projuris-app-clone  146097
#> 11  10.20.130.34 projuris-app-clone  141634
#> 12  10.20.130.34 projuris-app-clone  155547
#> 13  10.20.130.34 projuris-app-clone  152970
#> 14  10.20.130.34 projuris-app-clone  158440
#> 15  10.20.130.34 projuris-app-clone  103697
#> 16  10.20.130.34 projuris-app-clone  155545
#> 17  10.20.130.34 projuris-app-clone  155759
#> 18  10.20.130.34 projuris-app-clone  142600
#>                                                                                    CentOS.7...expat..CESA.2020.3952.
#> 1                                    Apache Tomcat 8.0.x < 8.0.52 / 8.5.x < 8.5.31 / 9.0.x < 9.0.8 Denial of Service
#> 2                                                                         CentOS 7 : wpa_supplicant (CESA-2021:0808)
#> 3                                                                     CentOS 7 : java-1.8.0-openjdk (CESA-2021:2845)
#> 4                                                                                   CentOS 7 : dhcp (CESA-2021:2357)
#> 5                                                                                  CentOS 7 : samba (CESA-2021:5192)
#> 6  Apache Tomcat 7.0.x < 7.0.78 / 8.0.x < 8.0.44 / 8.5.x < 8.5.15 / 9.0.x < 9.0.0.M21 Remote Error Page Manipulation
#> 7                                                                             CentOS 7 : postgresql (CESA-2021:1512)
#> 8                                                                                 CentOS 7 : kernel (CESA-2020:4060)
#> 9                                                                               CentOS 7 : net-snmp (CESA-2020:5350)
#> 10                                                                                 CentOS 7 : glibc (CESA-2021:0348)
#> 11                                                                                  CentOS 7 : curl (CESA-2020:3916)
#> 12                                                                                CentOS 7 : kernel (CESA-2021:3801)
#> 13                                                                                CentOS 7 : kernel (CESA-2021:3327)
#> 14                                                                              CentOS 7 : openldap (CESA-2022:0621)
#> 15                                                         Apache Tomcat 8.0.0.RC1 < 8.0.47 Multiple Vulnerabilities
#> 16                                                                               CentOS 7 : openssl (CESA-2021:3798)
#> 17                                                                                CentOS 7 : kernel (CESA-2021:4777)
#> 18                                                                          CentOS 7 : nss and nspr (CESA-2020:4076)
#>             Update.the.affected.expat..expat.devel.and...or.expat.static.packages.
#> 1               Upgrade to Apache Tomcat version 8.0.52 / 8.5.31 / 9.0.8 or later.
#> 2                                      Update the affected wpa_supplicant package.
#> 3                                                    Update the affected packages.
#> 4                                                    Update the affected packages.
#> 5                                                    Update the affected packages.
#> 6  Upgrade to Apache Tomcat version 7.0.78 / 8.0.44 / 8.5.15 / 9.0.0.M21 or later.
#> 7                                                    Update the affected packages.
#> 8                                                    Update the affected packages.
#> 9                                                    Update the affected packages.
#> 10                                                   Update the affected packages.
#> 11              Update the affected curl, libcurl and / or libcurl-devel packages.
#> 12                                                   Update the affected packages.
#> 13                                                   Update the affected packages.
#> 14                                                   Update the affected packages.
#> 15                               Upgrade to Apache Tomcat version 8.0.47 or later.
#> 16                                                   Update the affected packages.
#> 17                                                   Update the affected packages.
#> 18                                                   Update the affected packages.
#>     X X.1 X.2 X.3 X.4 X.5
#> 1  NA  NA  NA  NA  NA  NA
#> 2  NA  NA  NA  NA  NA  NA
#> 3  NA  NA  NA  NA  NA  NA
#> 4  NA  NA  NA  NA  NA  NA
#> 5  NA  NA  NA  NA  NA  NA
#> 6  NA  NA  NA  NA  NA  NA
#> 7  NA  NA  NA  NA  NA  NA
#> 8  NA  NA  NA  NA  NA  NA
#> 9  NA  NA  NA  NA  NA  NA
#> 10 NA  NA  NA  NA  NA  NA
#> 11 NA  NA  NA  NA  NA  NA
#> 12 NA  NA  NA  NA  NA  NA
#> 13 NA  NA  NA  NA  NA  NA
#> 14 NA  NA  NA  NA  NA  NA
#> 15 NA  NA  NA  NA  NA  NA
#> 16 NA  NA  NA  NA  NA  NA
#> 17 NA  NA  NA  NA  NA  NA
#> 18 NA  NA  NA  NA  NA  NA

Created on 2022-08-30 by the reprex package (v2.0.1)

Kind regards

1 Like

Awesome! Thanks! Worked like a charm. Now I need to tidy it, and that's where I get stuck.

10.20.130.34
projuris-app-clone
141636
CentOS 7 : openldap (CESA-2020:4041)
Update the affected packages.
36
10.20.130.34
projuris-app-clone
155757
CentOS 7 : openssh (CESA-2021:4782)
Update the affected packages.
37
10.20.130.34
projuris-app-clone
161196
CentOS 7 : gzip (CESA-2022:2191)
Update the affected gzip package.
38
10.20.130.34
projuris-app-clone
141607
CentOS 7 : dbus (CESA-2020:4032)
Update the affected packages.
39
10.20.130.34
projuris-app-clone
146958
CentOS 7 : bind (CESA-2021:0671)
Update the affected packages.
40
10.20.130.34
projuris-app-clone
141611
CentOS 7 : cups (CESA-2020:3864)
Update the affected packages.
41
10.20.130.34
projuris-app-clone
142603
CentOS 7 : libX11 (CESA-2020:4908)
Update the affected libX11, libX11-common and / or libX11-devel packages.
42
10.20.130.34
projuris-app-clone
156860
Apache Log4j 1.x Multiple Vulnerabilities
Upgrade to a version of Apache Log4j that is currently supported.
43
44
Upgrading to the latest versions for Apache Log4j is highly recommended as intermediate versions / patches have known high severity vulnerabilities and the vendor is updating their advisories often as new research and knowledge about the impact of Log4j is discovered. Refer to Log4j – for the latest versions."
45
10.20.130.34
projuris-app-clone
161374
CentOS 7 : kernel (CESA-2022:4642)
Update the affected packages.
46
10.20.130.34
projuris-app-clone
153767
CentOS 7 : kernel (CESA-2021:3438)
Update the affected packages.
47
10.20.121.30
cjblvjump
163855
Ubuntu 18.04 LTS / 20.04 LTS / 22.04 LTS : OpenJDK vulnerabilities (USN-5546-1)
Update the affected packages.
48
10.20.130.34
projuris-app-clone
148426
CentOS 7 : libldb (CESA-2021:1072)
Update the affected packages.
49
10.20.130.34
projuris-app-clone
143046
CentOS 7 : python (CESA-2020:5009)
Update the affected packages.
50
10.20.130.34
projuris-app-clone
143048
CentOS 7 : python3 (CESA-2020:5010)
Update the affected packages.
51
10.20.130.34
projuris-app-clone
150770
CentOS 7 : kernel (CESA-2021:2314)
Update the affected packages.
52
10.20.130.34
projuris-app-clone
157294
CentOS 7 : samba (CESA-2022:0328)
Update the affected packages.
53
10.20.130.34
projuris-app-clone
156103
Apache Log4j 1.2 JMSAppender Remote Code Execution (CVE-2021-4104)
Upgrade to Apache Log4j version 2.16.0 or later since 1.x is end of life.

What exactly do you want? Deleting empty columns (like X1 etc.)?

Maybe you can specify a bit, what you mean by "tidying" your data and what your expected output would be.

For example:


I know the text on line 55 belongs to the text on the far right on line 53. How do I join them?

With the example file you provided there is not such a case. It seems like this is a problem with the *.csv file, since there is text inside the IP column. It might be easier to fix the file you load into R then doing a clean up.

If you need to do it in R, you could flag all rows which do not match with the structure of an IP adress (with regex), split the data.frame based on this flag, rework the problematic columns and then join the tables again. But this is only sufficient if you have (or create) a unique id which can be used for this. Otherwise it will be not that easy to clean your table.

I understand. Thank you.

A common issue with cvs files is that the data itself contains the separator character. If it is then quoted then you need to tell read.csv which character is being used as a quote

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.