Apologies if this is already covered ad nauseaum, but I haven’t been able to find an example matching my needs. Here’s an overview of what I’m trying to accomplish:
I have text files (albeit with non-.txt extensions) generated as exports from proprietary software that I would like to parse into a tibble for management and analysis, then parse back into the native format to upload any changes. These files have a consistent structure, similar to that of JSON/XML/HTML; ideally, they could be harvested/scraped in the same way one would with a website, but I have a feeling that’s too ambitious for my current needs.
RegEx has gotten me only so far, and I have a feeling there’s a better/efficient way to do this. Can anyone help identify a method or strategy? Examples below:
There are two ‘components’ in the following sample text that exemplify the entire document:
[ProcedureOfOrigin,Export (CatalogType, [ComponentReference,Add (IsActive,TRUE) (ComponentProperties, [CatalogDocument,Find (Name,"Foo") (ScopeOfFunction,"1") (DocumentType,"0") ]) (DocumentProperties, [DocumentSubType,Find (Description,"Foo Document for Production") (Name,"Foo Document") ]) ]) ] [ProcedureOfOrigin,Export (CatalogType, [ComponentReference,Add (IsActive,TRUE) (ComponentProperties, [CatalogDocument,Find (Name,"Bar") (ScopeOfFunction,"1") (DocumentType,"0") ]) (DocumentProperties, [DocumentSubType,Find (Description,"Bar Document for Production") (Name,"Bar Document") ]) ]) ]
When considered as a Template, I’m looking for values after almost every comma:
[Variable1,Value1 (SubSection1, [Variable2,Value2 (Variable3,Value3) (SubSection2, [Variable4,Value4 (Variable5,"Value5") (Variable6,"Value6") (Variable7,"Value7:") ]) (SubSection3, [Variable8,Value8 (Variable9,"Value9") (Variable10,"Value10") ]) ]) ]
Desired Output for the JSON/CSV-Like Document:
The other exported file has a template like the following:
Section1.0: SubSection1.1: Value1;; SubSection1.2: Value2;; SubSection1.3: Value3;; SubSection1.4: Value4;; SubSection1.5: Value5;; SubSection1.6: Value6;; SubSection1.7: Value7;; SubSection1.8: YYYY-MM-DD;; SubSection1.9: Value9;; Section2.0: SubSection2.1: This can be a very large block of text with /* Comments in between */ ;; SubSection2.2: Same thing for this and the rest of the following sections. ;; SubSection2.3: /****** Comments can sometimes take this form *****/ ;; SubSection2.4: And so-on. ;; Section3.0: SubSection3.1: Usually a two-word-phrase;; SubSection3.2: /* These comments can be OBNOXIOUS and be multi- line With any characters in them Less important for me to have in general */ ;; Section4.0: Integer ;; Section5.0: Block of Text ;; Section6.0: If-Then Statements + Conclusions. ;; Section7.0: ;; Section8.0: Integer;; Section9.0: End-of-Document
Desired Output for XML-Like Document:
Each Section would be its own tibble (think Normalized Relational Database).