In a discussion with a colleague concerning model tracking and data version control, I was introduced to another versioning tool called Weights & Biases. In a previous article, I shared my first impression about DAGsHub. This is a comparison on DAGsHub and Weights & Biases.
To perform the comparison, I followed this DAGsHub tutorial.
DAGsHub recommended a virtual Python environment. This can be tricky in GitBash on Windows. If you’re a Windows user and you’ve run into some issues running the commands on the tutorial; the code below worked for me:
#make py -m venv .venv #start source .venv/Scripts/activate
This issue was opened with DAGsHub for further explanation.
There were complications configuring my login. The tutorial suggested that I could copy and paste the dvc login information directly into my terminal. That was not the case. I returned to the Hello World tutorial, which directed me to copy and paste the information from my repository.
I added the following alias to my bash profile; to hasten my virtual environment setup.
alias ve='source .venv/Scripts/activate'
That’s it! I completed the tutorial without writing the code to train the model. With proper aliasing and/or scripting, this can be accomplished easily. The repository can be found here.
Weights and Biases
I used the same main.py code and data that was provided by my friends at DAGsHub to explore Weights and Biases.
To emulate the DAGsHub tutorial, I created a separate Python 3.7 environment for this project using the Anaconda Navigator named WBTutorial. I installed the Spyder IDE to edit and work with the main.py file.
I aimed to run the same model created by DAGsHub and store it on Weights and Biases by updating the main.py file. After waiting several minutes for the data to log in, I got an error message:
The dataset wasn’t present in the artifacts section. There was no versioning done on the Python file.
Who won the comparison on DAGsHub and Weights & Biases?
Weights and Biases has some delicious visualizations. The tool does look like a lot of fun! If you only want to discuss modeling results, it could be a good choice. The tool does not seem to recognize a Logistic Regression Classifier. It takes a while to run and log the results, which led me to a swift exit.
DAGsHub, on the other hand, takes a little extra bit of finesse when committing the code. It requires interaction with the command line. We can require credentials. There may be some troubleshooting to get the right commands working on Windows. That said; my regressor model can fail and I can still version my project. It’s a lot faster, too.
Hands down, this makes DAGsHub the winner when it comes to what I want with version control. Now I’m going to go go delete the WBTutorial “virtual” environment.