Textual Tagging vs. Visual Tagging

By Yuval Ginor, Aug 18,2020

When comparing textual and visual taggings, it’s tempting to think that “a picture is worth a thousand words.” But the ecommerce world shows us that it is not necessarily so, and when it comes to tagging - a few words go a lot further than an image.
In previous articles we discussed the importance of tagging in ecommerce: it significantly boosts products’ exposure on the website, as well as protecting them in times of reduced traffic. Now that we understand the advantages of tagging, we are going to look into two different types of tagging: textual tagging and visual tagging. Textual tagging, as the name implies, is the process of extracting structured data from a product’s textual description that the seller provides. Structured data is the product’s different attributes, which are then used to tag the product and appear in navigation refinements. For example: a product described as “New stainless steel automatic 38mm wristwatch with silver dial and leather strap” will be automatically tagged according to attributes of color dial (silver), movement (automatic), material (stainless steel) and size (38mm). Visual tagging works in the same way, with the only difference being that the data is extracted from an image of the product instead from a text. The image is scanned extraction models are applied and then tagged in accordance with the attribute detected in the image. When comparing textual and visual tagging, it’s tempting to think that “a picture is worth a thousand words.” But the ecommerce world shows us that it is not necessarily so, and when it comes to tagging - a few words go a lot further than an image.


One of the biggest advantages of textual tagging over visual tagging is that it provides much more details about the product, which enriches the structured data. Although taking a photo is much quicker than writing a description, the text allows the seller to provide much more details about the product than an image can. For example, while it is simple and easy to tag a specific shirt by scanning its image, there is so much you cannot extract visually. Looking at an image of a wristwatch may tell you the shape and color of the dial, but not other important attributes, such as size, material, condition, etc. You may not agree, but I think these attributes are usually very important when buying a watch. In this case, visual tagging is not enough. The rather short text “New stainless steel automatic 38mm wristwatch with silver dial and leather strap” provides much more information about the product than an image could. In most cases, images don’t provide sufficient information for proper tagging. That is why textual tagging is applicable in a much broader range of categories than visual tagging.


A better example is laptops. What will an image of a laptop tell you? Not much. Again, as with the shirts, maybe the brand and the color, but nothing more. Tagging an image of a laptop will not tell you a single thing about the hardware or software inside the laptop, screen size, RAM size, processor, and other attributes which may arguably be very important to anybody who buys a laptop.


Textual tagging, on the other hand, allows the seller to include every bit of important information about the product relevant to the buyer. Sure, it may take a little longer to write a description than take an image. But if images are useless, this one extra minute of writing goes a very long way to increasing the exposure and conversion rate of the product.


This advantage is also expressed in the ability of textual tagging to include several variations of the product in a single listing - something visual tagging cannot. When a seller uploads an image of a red shirt, then this is exactly what he will sell: red, and only red shirt. In order to sell a yellow shirt, the seller must upload an image of a yellow shirt, and so forth for every color of shirt sold in the store. Textual tagging, thanks to the flexibility of text, allows sellers to upload anything they want, stating in the description that the shirt comes in all colors. In this case, it may certainly be said that “five words are worth five pictures.”


Another big advantage of textual tagging over visual tagging is its applicability across platforms and its possible uses outside the traditional ecommerce world. Above we discussed extracting data from images of shirts and laptops, but how do you visually tag a flight in a travel website, or dishes at a food delivery website? You simply cannot. Visual tagging will be useless here. Since text is flexible, unlike images, textual tagging may be used anywhere and everywhere, and match any product exactly, whether its a shirt, a flight or delicious food. Visual tagging is also not helpful when it comes to "one of a kind" products, such as arts and crafts. Building extraction models requires a certain amount of learning and a certain - hundreds to thousands of examples. When attempting to tag an image of such a unique product, the AI model won’t be able to recognize it due to its uniqueness. In such cases, textual tagging is the only option.


Visual tagging’s advantage over textual tagging is mentioned across this article, and that is the effort required by the seller before tagging. It is indeed faster to take images of products than to type in their description. The tagging process in both textual and visual tagging, however, uses sophisticated AI technology, which in both cases reduces the tagging process time to a few seconds.


Overall, the differences between the two types of tagging are:
  • Visual tagging is more helpful in tagging a narrow scope inventory, where all the relevant attributes can be extracted from an image of the product.
  • Textual tagging is more suitable for a wide variety of categories and variations of products, whose relevant attributes cannot be sufficiently extracted only from images.


With the skyrocketing penetration of ecommerce and intensifying competition, marketplaces must constantly keep improving their customer experience and specifically navigation experience on site. Tagging provides the solution, by enabling customers to navigate through the website and refine their results. Since sellers always provide textual description and often the image of the product, there is absolutely no reason not to tag the data. The decision on the type of relevant tagging - textual or visual is largely dependent on the store, but if the scope of products is wide, budget is limited and integration time is important -  textual tagging is a bigger bang for the buck.

Request a demo



    PhonePlease, indicate the full number. Ex.: +9720501234567