I’m investigating and considering different technologies that I want to make use of as I enhance my author classification tool.
A complication in the project involved the number of memory errors that I was receiving as I accessed the resources that are available on the Gutenberg website. One solution that I implemented, involved accessing, cleaning, and saving the data to a different store. No matter what, the cleaned data has to be saved and I’m going to have to put it somewhere. As I clean and manipulate it; I might want to have a good version history to go along with it.
An interesting storage and data version tracking solution that I stumbled upon is DagsHub.
The tutorial has some great pointers including how to spin up a virtual environment in Python, but left out one key detail that I had to find out on my own. Naturally, as I progress, I may find some easier shortcuts; but for now, I’ll blog that the following command is used to activate the batch file from the environments script directory:
source activate
From there installing and initializing DVC is all cool beans; so long as you’re certain to be inside a working git or SCM directory. If you happen to be using gitbash on Windows; this is a reminder to use the Linux commands.
Then there’s the act of pushing to the DVC; wherein the application asks you to provide your password. Should you panic and decide to exit out of the terminal before authentication occurs; this site will provide you with troubleshooting tips for an ‘unable to acquire lock’ error.
That didn’t work for me, so I reached out to the team; where we determined that we weren’t familiar with the terminal I was using.
gitbash looks exactly the same as Mintty (MSYS2). It even shows up as Mintty in my shortcut bar and in my task manager.
But instead of opening it via my shortcut; I opened it as gitbash via the search bar.
If I access the shell directly using Mintty; it won’t work. I think it might be because the Cygwin-console-helper and the console host don’t open when I use the MSYS2 terminal to select my shell.
– Kalika Kay Curry, during a DVC Communication
The MSYS terminal that I’ve been using to select my shell; doesn’t utilize the cygwin helper and console host processes – which may be what was needed in order to authenticate. At least, that’s what it looks like, anyway. Now, it works!
Also, for some reason my little terminal doesn’t like the python3 command as provided in the preprocessing portion of the tutorial. Fortunately, I am not alone in my complications. The question was raised in stack exchange and my preprocessing gets accomplished with the following command:
py src/data_preprocessing.py
Conclusion
This tool is awesome! I have one set of files and a versioning history in a single repository that also holds the results of my experiments. I’m excited to use it.
When I work with DagsHub: Use GitBash to open the shell, use py to run any python files in the virtual environment, and use the source command to activate a virtual environment.
That’s it – that’s the Hello-World tutorial on DAGsHub.
Leave a Reply