We all know I love coffee. This year, I would like to learn about my tea preferences. Unlike my coffee options; wherein I can use an outside source to help me find my preferred coffee bean styles – tea is more varied. I decided to take part in a Twinings Tea Adventure which encouraged me to track my reviews. Because the review spreadsheet is in Microsoft Excel, I am expanding upon the concept. This year I’m doing review tracking with Excel Power Query.
The tracking source starts out with the Twining Tea sampling. The group has a daily schedule where we sample teas in this Twinings Sample pack.
The review sheet contains columns that represent the number (as an index), the date of the trial, the name of the tea, the class or type of tea, a description of the ingredients, my star rating, and a review. There is potential with that review there, for NLP; but right now my intent with this is to learn whether or not I might want to buy or have this tea in stock for when I’m craving tea.
I have other things that I’ve tried that I’d like to start tracking. I’d like to know the following:
- What I should avoid
- I forget what I’ve tried and would like to refrain from trying and buying the same thing I dislike over and over again.
- What I should buy.
- Maybe these are the characteristics that I like so I might want to grab something similar.
- If I’m not in the mood to try something new; I might like to select one of the items that are in the buy list.
- What is ‘acceptable’ if my buy options are not available?
- When I filter down on something, I might want to know why I’ve made that decision.
Using that above thought process, I typed out a quick spreadsheet to capture the items that I might like:
I have created category, type, brand, name of the product, avoid, buy, explanation columns to represent the situation.
Here are some observations about this current state, as I work:
- I may want to use a 1 or 0 to better represent avoid/buy. Yes or no is a better visual cue and I’m focused on convenience of entry.
- Type, Category, and Brand should probably be a drop down list of options that would prevent me from mistyping, but as it stands – this ad-hoc list in it’s early stages is in a period of uncertainty.
- See that sipsby subscription? Does that belong? Why am I including Sipsby, but excluding crema.co (a subscription service that handles all my coffee ratings)?
- I’m doing the review for the Twining’s tasting on a different spreadsheet. What is that about?
- As I think on categories, do I want to do a separate spreadsheet for each category?
- Would it be better or more manageable to. . .
The questions are endless. This is a start. While the structure may not be ideal, the data is being collected. Updates may be necessary so there are no restrictions at this juncture. This will require some data cleaning in the future.
The Twinings Tea Adventure review sheet does not match my idea. I will continue to rate this way because it’s a fun way of communicating with the group about how I liked the tea. On it’s own, it doesn’t tell me what I want to know when it comes to deciding which tea to buy when I’m grocery shopping.
With that notion, I introduce PowerQuery. I’m going to transform the review sheet to match the ad-hoc one.
The first thing to note is that the review sheet contains ALL of the anticipated teas that I will drink over the next 48 days. I don’t want these empty fields included in my results. Rather than do everything all at once, I decided to fix that first by filtering for only those that are rated, then close and load the table.
Next up in the process is to make the filtered tea reviews match the ad-hoc table. This is done with a separate query.
Why a separate query, do you ask? The answer is a hunch that over the next 48 days, I might want to take a look at something that ONLY involves these rated teas and that I will not want to write ANOTHER query or filter out the table in order to perform this function. I’d like to just – use what I’ve created already.
Another hunch involves troubleshooting. Is that the best way to filter for rated values? I think that null or blank is the right answer; but I’m supposed to limit myself to a star rating from 1-5. What if I type a 21? Do I want the 21 to come through? Probably. That might be a bad idea. As this data gets larger, it’ll be harder for me to recognize that I’m seeing 21 because the user entered 21 rather than an error in the calculation that determines whether or not I should avoid or buy something.
To match the ad-hoc spreadsheet, I started a new query for the transformation. I added some, what one would perceive as, nonsense columns if they weren’t aware of the goal. The category column’s value is beverage, the type column is tea, and the Brand is Twinings. I renamed two columns. The review column is explanation and the tea column is now name.
I created the avoid and buy columns by selecting the stars 1-5 column and using PowerQuery’s option to create a column from examples using selection. For the avoid column, I typed in a yes or a no as to whether or not I would avoid the item based on the rating. For the buy column, I typed in a yes or a no as to whether or not I thought I should buy the item based on the rating.
Using this technique, PowerQuery generated the conditional logic for my columns.
Reviewing the logic, I can see that I might wan to update this later. I feel comfortable with a star rating 2 and blow as being something to avoid, but is 3.9 something to always buy or not? Well, for the time being, the answer is yes, I think. It’s why I rated the Lapsang Souchong as a 3.9. It’s something I intend to buy again.
There is an intention to union the ad-hoc sheet to the transformed tea worksheet. This is why I transformed the data that way; however, I have decided to wait a little while before completing this task.
The reason for the wait is because I’m not confident about the ad-hoc sheet. It feels good to have the Twinings data ready for the union, but to join them at this instant would be a little pointless. I’d have a page full of nothing but Twinings Tea. While the data is being worked on separate (ad-hoc and Twinings have their own table(s)); there is still the overwhelming factor of what and how large is the ad-hoc table going to get? When and how am I going to make that decision to pull a category out of the ad-hoc table?
The answer to this question is, if I need to start another table I can start a new table. I don’t need to worry about what’s in the ad-hoc table. If I find repeats of trials (which will occur); I should be able to clean that up in PowerQuery, with the expectation that the later rating is the correct one. Or maybe it isn’t. A logical step may be introduced to help this decision, but right now; like the union – it’s unnecessary.
Which leads me to add, perhaps I should include a date field – so that when it comes to the avoid or buy decision I can employ some analytics to the case and I don’t have to search for repeats in my entry.
I conclude this blog with an update. I added in a date column to the ad-hoc table and removed the ‘remove date’ step from the Twinings Query.
The Review Tracking with Excel PowerQuery workbook is ready for some experimentation. Once the ad-hoc entries exceed the number of entries in the Twinings table, I may look into setting up a union table and perform some analytics on it to determine what more needs to be done to answer the questions that I forgot I was asking.
Ke’s preferences are also included in this thing, although I stuck to her greens and not the hay. Perhaps she could use her own sheet? Union first. More adjustments, later.