Query Tagger

A tool to effortlessly categorize search terms in order to train Walmart's next generation of machine learning algorithms

User

Research

My first step was to talk to my users in order to better understand their current process and the associated pain points. In these user interviews, I asked the following:

  • Walk me through your current process for completing a tagging task

  • What is the most difficult part of this process?

  • What is the least enjoyable part of this process?

  • How many tasks can you complete in an hour?

  • Do you ever feel burned out? After how long?

  • If you could change one thing, what would it be?

From these discussions I learned that while the task itself was rarely difficult, the tedium of manually typing token after token can be draining. My users also disliked the research aspect of the task in which they would have to open up their web browser to verify taxonomical tags ("Product Type"). As for what they would change about the task, my users wished there were fewer categories they needed to tag, though this was a difficult ask since Engineering were the ultimate stakeholders of these taggings and it's up to them what they wish to have captured. 

Lastly, I learned that my users targeted a rate of 40 tasks per hour. In order to be successful, my designs would have to surpass this rate, ideally by at least 20%.

Background

Walmart wanted to build a machine learning algorithm to help categorize search terms (queries), but first they needed human inputs to help calibrate the system. Initially, such inputs were being captured within excel spreadsheets - a tedious and slow endeavor. My task was streamlining this process, bringing the tagging task online into a web-based tool of my own design.

For clarity, here is some of the terminology surrounding this project

Query - an entire search term, comprised of tokens

Token - the individual words making up a query

Tag  - the act of categorizing a token

                i.e. deciding how to define a token (is it a Brand? Size? Color? etc.)

Early

Designs

I started with 2 possible designs in my head, the first of which did not pass an initial stage of a paper sketch. The second design worked on paper and received a lo-fi digital mockup, though ultimately left something to be desired

Design 1

Query is presented with a dropdown box under each word/token. The user clicks on the dropdown of each relevant token and selects the appropriate tag

Concerns:

  • Dropdowns would be created under every word, whether they are viable tokens or not. Words such as "of", which have no appropriate tag, would still receive a dropdown, creating visual clutter
     

  • Token length varies but dropdown box size is uniform. Single character tokens will require larger space buffers than they need simply to make room for their dropdowns. This will lead to odd spacing of the query, hurting readability

Design 2

Query is presented at the top of the page and each possible tag appears below. Each tag is accompanied by a dropdown box containing every token from the query. The user goes through each tag, selecting the appropriate token from the dropdown

Concerns:

  • A bit click-intensive. Every tag requires at least 2 clicks: one to open the dropdown and one to select the appropriate tag
     

  • Long queries will result in longer dropdowns, requiring scrolls in addition to the minimum of 2 clicks
     

  • Breaking down the query and presenting each token as an individual line may hinder comprehension of the query as a whole. Users are being steered to view the tokens individually as opposed to collectively as a phrase
     

  • Potential technical complexity in presenting the query broken up within a dropdown (had to speak to an engineer to determine the exact LoE on this)

Design 3

I liked the general format of Design 2 - query at the top with category tags below - but the constant dropdown navigation felt clunky. I began thinking of alternative ways of tagging that did not involve either manually typing nor dropdown menus.

 

What if simply clicking a token tagged it? That would be about as simple of a process as I can imagine - the challenge then becomes informing the user as to what tag will be applied to their next click.

I decided to create the following wireframe where the query appears at the top with the category tags below. The user clicks on a token to autofill the highlighted line, with each additional click filling subsequent lines.

Concerns:

  • The core interaction is a bit novel; it may take users time to get used to clicking words to fill lines/forms
     

  • Tagging is based on the order of the category list, not the order of the tokens within the query. This process may not be intuitive
     

  • Certain tags, namely Department and Product Type, would still require dropdowns as they are not tied to explicit query tokens. Presenting dropdowns alongside click-to-fill fields may be confusing
     

  • Technical concerns: it was unclear at the time whether this design was even technically feasible (I had to speak to an engineer, who assured me this interaction would be possible/practical)

Design Elements

Primary Interaction

The blue highlight indicates the active input. Clicking on a token will fill this box next. In this example, the user would click "3"

Users click on tokens within the query to fill the input boxes below

Related Items Carousel

"Related items" shows the first 4 search results for the given query, helping users understand the search intent

Clicking a PT value under a related item autofills the Product Type field. Otherwise, users can use the dropdown to search the defined set of PTs making up Walmart's taxonomy. This feature keeps users in the app when they otherwise would have to seek this information out elsewhere

Color Scheme

The primary colors used are blue (#007BC4) and orange (#C44900). This shade of blue is one of Walmart's official colors and this shade of orange is its complimentary color

Text & Learnability

Learnability is a key design focus of mine, particularly for tools such as this one. Not only should new users be able to teach themselves how to use the tool, old users should be able to easily re-learn it should they become unfamiliar.

A challenge here was finding the balance between clear instructions/explanations and pithy wording. With text all over the page, from the core query to all 12 input fields, I did not want to overwhelm my users with even more text.

 

To this end, the instructions at the top of the page as well as the description of the 'Contextual Attributes' section are kept to a single line. Definitions of input fields/attributes are nested behind info bubbles, such as the one below.

Logo Design

Above are the iterations of the logo - left to right, earliest to final.

"Query Tagger" is easily shortened to "QT", which I like to pronounce as "cutie". I wanted to design a logo that leaned into the playfulness of this term.

First, I made attempts to incorporate the initials Q and T into the face of a young woman. Q lends itself to being head-shaped, but I had difficulty incorporating the T. I wasn't satisfied with either of the first 2 designs - first using T as the nose and then trying it as an eye. 

I decided to abandon the face motif and try just a Q with a flower. I again tried to incorporate the T, this time as petals in the flower. However, I still wasn't satisfied with how it looked.

While testing the 3rd design without the blue T within the flower, I began questioning whether initials were even needed for the logo. I decided to scrap the idea of incorporating a Q and T into my design in favor of something much simpler: a single flower. I started with purple petals before deciding it made sense to go with orange, a color incorporated throughout my design.

Final Design

Default (blank) Page
Completed Page

Usability 

Testing

At this point, I sat down with my users again to see if I was on the right path. I asked the following questions:
 

  • What do you think the blue lettering for "lb" and "cuties" means in the query?

  • What do you think the highlighting of the Size Value line means?

  • What would be your next action in completing this task?

Users had a general understanding of what the blue lettering and highlighting implied, but no one was able to intuit the click-to-tag interaction. It was clear onboarding and/or messaging would be crucial for this design.

Once I told users about the click interactions, I had them simulate tasks for a handful of queries using this proposed design. They were able to pickup the process very quickly, which was reassuring. 

Two things I noticed from these simulations:

  • With the streamlined click-to-tag interaction, the majority of the time it took to fully tag a query was spent filling the Product Type field, which users had to look up the appropriate tag for in an external tool. 
     

  • As my users had previously mentioned, there are quite a few tags required for each query. My initial mockups included only a subset, but in total there were 13 different categories to tag. 

Now that I was closer to a design that worked, I wanted to address these concerns. 

Data-Driven

Decisions

A complexity of this project was that while I was creating a tool for analysts, the entire process was in service to engineers - analysts were creating data for the engineers to use. This meant that the stakeholders in Engineering had the final say on what was included, even if it meant a less enjoyable experience for my users. Whenever disagreements arose between myself and Engineering, I had to rely on hard data to justify my position.

Related Items Carousel

To address the time sink of users navigating outside of my tool in order to identify Product Types (PT), I decided to show the first 4 results (determined by screen real estate) for the given query alongside their associated PTs.

The Engineering lead expressed concern about this carousel, stating that it could lead to inaccurate inputs should this data be inaccurate. Walmart's item tagging is far from perfect, so there is a chance users would be shown a wrong PT that they would still feel compelled to select.

I decided to test 50 random queries to see how accurate the PT tagging was within the first 4 results (200 total items). I observed: 

  • 176 (88%) items had the correct PT tagged

  • 46 (92%) queries had at least 3 of their 4 items displaying accurate PTs

  • No queries had more than 2 items with incorrect PTs tagged
     

    • ​I had users take a look at queries split with 2 accurate items and 2 inaccurate. Most of the time the user was able to determine the correct tag immediately, and in cases where they were unsure, the user researched further instead of making a quick selection. There were no instances in which an incorrect tag was selected. ​​

Armed with this data, I was able to have another discussion with the head engineer, ultimately convincing her the carousel was relatively safe from a data purity standpoint and well worth the gains in user productivity. 

Abolishing the Department Tag

With my users wanting fewer fields to tag, I had multiple discussions with engineers to see if any cutbacks could be made. After being told that every single field was providing needed data, I started to investigate ways these tags could be inferred technically without the need for human input. 

My first thought was that Product Type tagging could be automated. With ~88% of items having accurate PT tags (per my research), perhaps we could assign a likely PT to a query based on its current search results. Alas, it was hard to get traction with this idea (88% isn't exactly 100%...) so I decided to pick my battles and move on to my next target: the Department tag.

With my experience on the Search team at Walmart, I knew there was a system that automatically predicted the department categorization for any given query. There is also a strong correlation between Product Type and Department. With this knowledge, I could provide more compelling arguments through pointed questions

  • "Why can't we leverage our current query understanding framework to determine Department algorithmically?"

  • "How well do we understand the correlation between Product Type and department?"

For the first time, the discussion to possibly remove a tag was gaining traction as we realized there were more efficient ways to determine Department. The talks blew wide open when an engineer brought up how difficult it is to do anything with Department tags due to deep-rooted issues with Walmart's taxonomy (the short of it is there are a lot of redundancies/overlap in departments).

After numerous discussions and eventually finding the right questions to ask, Engineering agreed to drop the Department tag requirement. For all this effort, I was able to drop the total number of tags from 13 to 12. 

Measuring Success

Early user research had determined that the average rate of users completing the tagging task within Excel was roughly 40 queries per hour. My goal for this project was to increase the rate by 20%, or ~48 queries per hour. 

A month after launching Query Tagger, we observed the average tagging rate to be 51.2 queries per hour for an increase of 28%, exceeding the goal.

Iteration

Now that I had a tentative feature set and content list, it was time to hammer out final formatting with a final round of usability testing.

Version 1

I observed users on laptops needed to scroll to see all of the related items. There was also some confusion around the dropdown menus within the tag list - these are tags that are not based on token terms thus required a different input type. 

Actionable feedback

  • Present 'Related Items' horizontally so the full carousel is always visible

  • Differentiate fields that are click-to-fill and fields that are dropdowns

Version 2

The horizontal alignment of the Related Items made the feature more usable. However, it starrted to blend in with the query and tags. One positive was that users were able to intuit that clicking on a product type within the carousel automatically filled the PT Token field (similar to how clicking tokens fills tag fields). I felt confident that I could remove this explanation from the instructions at the top of the page, reducing its length to a single line.

For the tag section, I decided to indent all dropdown lines in hopes of differentiating them from the click-to-fills. However, many users began to read this as a formatting error.

Actionable feedback

  • Visually differentiate the Related Items carousel from the rest of the page

  • Remove unnecessary text from instructions at top of the page

  • Further differentiate dropdown tags from click-to-fill tags

Version 3

With the 3 dropdown fields moved to the right, Product Type now stood out as a potential alignment issue. Using the aforementioned dropdowns still looked a bit clunky when everything else was able to be filled via simple clicks​. Lastly, Related Items stood out too much after being given a colored background - it drew attention away from the query and tags

Actionable feedback

  • Present Product Type in a way that looks natural

  • Make Related Items appear less visually dominating, either by removing the colored field or by adding similar elements elsewhere on the page

As I began working on my next iteration (which ended up being what we launched with) I was handed a last minute requirement to add secondary tags to the design. These are tokens that give context to other tokens. While this last minute requirement didn't feel great, it got me thinking about how to better distinguish the main tag field, leading to a design that is more visually interesting.