Win10 Holy Grail or another distraction?


#1

I am trying to get Tesseract-ocr to be "ready for R". I am by no means the developer most of you are, but I did work at Nortel in their Unix Sys Admin group for about 7 years between 1997 and 2004. Most of my knowledge is dated, but most of the good programs with staying power that were around between 1997 and 2004 are still around. I am trying to convert a lot of pdf images into text. During my search, I have discovered Git, TCPKG, (and possibly CMAKE) are the "holy grail" that permits me to investigate a GIT package because it permits me to compile it on my desktop rather than getting a ZIP package for GitHub, putting it in a place where I can remember where it is, and trying to get the toolset to compile it on Win10 when it was really written for Linux.

I view R as an extremely good software and RStudio is also an extremely good organization. The R software package is similar to the X Windows software package in that it has been around a long time and has the "test of time" going in its favor. Like the X Windows system, there are certain factors that have to be maintained in order to have a functioning X Windows system. Although I have beating my way thru the Ubuntu on Win10 and have a partially running X Windows system on my Win10 Desktop, it is not stable. I do not want to run into the same issues with R that I ran into with X Windows that are not fully resolved as I write this message.

I know when I need to collaborate, and I need to collaborate now. Reply to this thread if you see some value in collaborating with an old Unix head that is trying to become "modernized".


#2

Ahoy. I will say I'm a little confused by what you are asking. What are you trying to and how can folks here help?


#3

I am a little confused by what I am asking, too.

However, I know that a collaboration is slower but steadier than the free-wheeling “do it yourself” approach because pretty soon you run out of “you”.

Since I am in my infancy with the RStudio community, it is time to collaborate so I can “do it myself”. I will eventually go back to collaboration once I’ve mastered RStudio, but that is far in the future.

Mike Mazarick


#4

If I’m reading you correctly (a major assumption!), you’ve been experimenting with the Windows Subsystem for Linux in the hopes that it might finally be the best of both worlds — running Windows without having to give up all the great stuff the *nix world has to offer. And you’ve run into the rather significant limitations on what WSL can actually do right now (and maybe forever? I’m not totally clear on where nouveau-Microsoft is going with this experiment, and maybe they aren’t either!). To answer the question in your topic title: personally, unless your *nix-y desires are fairly limited in scope, I do think the WSL is more of a distraction than a holy grail right now.

When it comes to running R, I’m not sure I’m following you but — speaking for myself, I’d definitely stick to running R for Windows on Windows, rather than trying to like, run R for Linux via the WSL (is that even what you meant?). Windows is not always the easiest platform to run R on, but things are much better than they used to be and generally undramatic. If you want to run R on Linux, (again, speaking only for myself) I think you’re better off just running Linux.

As for Tesseract — you know about this, yes?
https://ropensci.org/blog/2016/11/16/tesseract/
As I understand it, it’s pretty much plug and play if you’re on Windows or Mac (Linux is the platform where it has external dependencies) — maybe another reason to resist the temptation of the WSL rabbit hole? :wink:


#5

Thanks, jcblum! During my sojourn at Nortel, there was an effort to get all desktops running Windows, even the Unix sys admins. I succumbed to a Windows desktop at that time and have tried to make the most of this situation. Windows has definitely gotten better between Win3 and Win10.

You are right, it Ubuntu partition on a Win10 platform via WSL is (what I consider to be) a distraction at this time. What is not a distraction is the GIT, VCPKG (and possibly CMAKE) that was solidified in 2017 for vcpkg. One thing I didn't mention was the need to download and install Visual Studio 2017 first.

Another possibility is MONO, which gives you the ability to run Linux on Windows and Windows on Linux. It is an open source version of .NET library and C#, I don't want to give you the impression that I actually know what I'm talking about. I don't and I know it.

Mike Mazarick


#6

I am amazed that the post "Win10 Holy Grail or another distraction" has been greyed out. I was under the impression that topics that were greyed out were not public yet and they were being moderated. The reason I believe this is applicable is because (I believe) almost all people have Win10 on their desktop and almost all people want to know about "the Holy Grail" of seeing the large amount of stuff on GITHUB, putting a copy on their local Windows desktop with Git Clone, and compiling what Git Clone has provided with vcpkg. I don't konw about CMAKE yet, so I'm asking an opinion of the package. This is counter to the normal developer mantra of developing, but it seems to automate the "Big Red Button" of the build process. I think every developer would be interested in the history of Windows vs Linux, which is why I posted it.

What is a mystery to me is how my post can be "greyed out" and still have 3 parties in addition to me that have commented on it. Do I have bad breath? You don't want to leave the reason up to me because I will produce a reason from my imagination and as I said, I have a very over-active imagination.


#7

No moderation actions have been taken on your post. Posts look grayed out to you when there aren’t any new replies since the last time you checked.


#8

I'm currently on a MacBook but looking to buy a Windows laptop in a month or two, so I've been experimenting with WSL a bit. I'm surprised and pleased by how powerful it is, but I think that maybe it's fine to just use the native Windows R for most stuff. The only really big hiccups I know of are:

  • UTF-8 support on Windows (including emoji support and skimr) makes me cry.
  • Line ending compatibility could potentially be a pain, but I haven't actually tried it out yet, and I know most code editors like VSCode allow you to convert line endings when you commit. Plus, I think Notepad allows Mac/Linux line endings as of the last big update.
  • I'm not sure how will external tools like git and TeX (and maybe cmake, because I'm interested in exploring that after my PhD) work on Windows, and whether they present any compatibility problems. That's still on my to-do list, but I'm hoping someone else can comment :slightly_smiling_face:

For me, it's mostly about having an analysis process that could run about as easily on a Linux server (which I often use) as my local machine. One thing that did surprise me is that Windows PowerShell actually supports a lot of *nix aliases for common operations (ls = dir, cp, mv, etc.) even without WSL. So if you just want that familiarity for super basic file manipulation, you probably don't need WSL.


#9

Thanks, Rensa!! As you know already, I am not the developer most here are, but I do have a historical background. In delving into the WSL partition for Linux on a Win10 box, I can readily say that you have to do to much "shucking and jiving" to get it to almost work. For instance, I had to install the secure shell daemon (sshd) and modify the /etc/ssh/sshconfig file in order to get putty to work. I also had to download Xming and start that up on the Win10 desktop. I finally had to remember all of the original commands on the server I used when setting up an X Terminal (DISPLAY=your.ip.address:0.0, export DISPLAY, etc.). I finally got Xeyes to work using the Win10 desktop. It crashed every 10 hours or so because there was a timeout error and it couldn't get an IP address during this time. I just gave up and thought you had to know too much to get it to work (I do, but...).

Even in the link that was so kindly provided to me about Tesseract there are some embedded Unix commands in there. For instance, :"cat text" and the fact I have to change the \ produced in windows to a / to get it to work in R.

I am a real newbie in R and RStudio and know it. I can be a resource to you, but I know a lot about related topics and not much about being a R developer.

Mike


#10

I don't have a whole heap of experience running an X Server on Windows and its compatibility with WSL, but I'm pretty confident that Windows 10 now comes with OpenSSH, so I think you should at least be able to skip installing sshd!

Microsoft has pretty explicitly stated that supporting GUI programs with an X Server is not a support priority for them, but a few folks have had success with WSL and Xming. But it also looks like this is an area that has changed rapidly over the last 18 months.

If using X Server is just about getting plots and Shiny apps up in R, you could skip the X Server altogether and use rmote to deliver them over a local (or remote!) web server. I personally prefer this over X Server for EDA, because XQuartz takes aaaaaaages to start on my MacBook. I haven't tried it with R on WSL, but it's another option to explore :slight_smile:

EDIT: oh, I totallyu forgot! I can't remember if the local RStudio can run R over WSL, but I have installed RStudio Server on WSL and accessed that throug hthe browser. Works a treat IIRC :slight_smile:


#11

I believe the danger is in doing something that is 98% "there" like WSL is currently. The Ubuntu distro that I placed on my system is similar to this... almost there but not quite. Things that work, work well and things that don't work.... well they can be made to work if there is sufficient time and energy spent on them. Computer languages is an area to watch out for (Python, Ruby, etc.). They are not complete and can be a big time suck to get it working. WSL is like having two independent computers inside one computer. It is similar to an epileptic patient that has had their nerve bundle severed and become in essence two people with one brain and body. Although I like the separation, I would like for them to be able to talk with each other.

However, the discussion is really about GIT (clone) and vcpkg. It can be about Mono and Cmake too These programs are the "holy grail" because they point to an automated way of doing things. The topic of interest to me is "what do you do before there is an R system set up to gain efficiency?", I will ask other R and RStudio questions too, but they will be a lot more basic. You may want to put these package on first before you install R and RStudio if they make it easier to install all packages. Otherwise, you'll be in the same place I was trying to install many packages independent or R and RStudio. 5000 packages is a lot of packages that come with R, but it is by no means all of them.

I have seen that R and RStudio has efficiency built in once it is set up. For instance, in the tesseract webpage so kindly provided by jcblum, there is a reference to how to do a lot of something rather than one. The command "text <- ocr("http://jeroen.github.io/images/testocr.png")" is an example. The .png file can be of any size. I am interested in other productivity enhancers inside of R and RStudio, but I am a newbie and don't know how to ask for them. I am so lame that I was trying to run the first command "install.packages("tesseract")" in powershell or cmd until I remembered that these are probably R commands.

I would like to thank jcblum for pointing out to me that the "grey" that I was seeing was because the topic hadn't changed since my last visit. I don't know and pointing this out provides me with a lot of guidance.