How to Read this Text Files(when they are in thousands) & how to convert RowNames into Variables, not able to do with Readr

Shail_Prithvi · August 31, 2022, 6:16am

Could anyone help in, How to read below .txt file( such files are in thousands) in tabular form and convert the row names into Variables names along with DataType

Delivery_person_Age 22.000000
Delivery_person_Ratings 4.700000
Restaurant_latitude 18.530963
Restaurant_longitude 73.828972
Delivery_location_latitude 18.560963
Delivery_location_longitude 73.858972
Order_Date 01-03-2022
Time_Orderd 23:35
Time_Order_picked 23:40
Weather conditions Sunny
Road_traffic_density Low
Vehicle_condition 2
Type_of_order Drinks
Type_of_vehicle scooter
multiple_deliveries 0.000000
Festival No
City NaN
Time_taken (min) 22.000000
Name: 13, dtype: object

FactOREO · August 31, 2022, 7:37am

Is this file complete and in the way, your actual *.txt files are saved?
Your last two lines will cause problems, since the rowname Time_taken (min) is poorly chosen. Especially the last line with two ":" will break the reading.

However, if you always have 18 rows, you could just do this and skip the last line:

library(tidyverse)
data <- read_delim('given_file.txt', col_names = FALSE, n_max = 18) |>
  pivot_wider(names_from = X1, values_from = X2) |>
  mutate(Time_taken = str_extract(string = Time_taken, "[0-9\\.,]+"))
#> Warning: One or more parsing issues, see `problems()` for details
#> Rows: 18 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: " "
#> chr (2): X1, X2
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
#> # A tibble: 1 × 18
#>   Delivery_per…¹ Deliv…² Resta…³ Resta…⁴ Deliv…⁵ Deliv…⁶ Order…⁷ Time_…⁸ Time_…⁹
#>   <chr>          <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1 22.000000      4.7000… 18.530… 73.828… 18.560… 73.858… 01-03-… 23:35   23:40  
#> # … with 9 more variables: Weather <chr>, Road_traffic_density <chr>,
#> #   Vehicle_condition <chr>, Type_of_order <chr>, Type_of_vehicle <chr>,
#> #   multiple_deliveries <chr>, Festival <chr>, City <chr>, Time_taken <chr>,
#> #   and abbreviated variable names ¹Delivery_person_Age,
#> #   ²Delivery_person_Ratings, ³Restaurant_latitude, ⁴Restaurant_longitude,
#> #   ⁵Delivery_location_latitude, ⁶Delivery_location_longitude, ⁷Order_Date,
#> #   ⁸Time_Orderd, ⁹Time_Order_picked
#> # ℹ Use `colnames()` to see all variable names

^{Created on 2022-08-31 by the reprex package (v2.0.1)}

You can now take care of the columns by yourself and clean them up (e.g. convert to correct types instead of chr).

Kind regards

system · September 21, 2022, 7:38am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.