Types of data tagging objectives

By Pavel Zaslavsky, Jan 20,2022
Types of data tagging objectives

The output of NPL algorithms are inference predictions. There can be many types and objectives of these predictions, but let’s have a brief look at  just 3 common types of data tagging objectives.

One of the most significant subfields of artificial intelligence is natural language processing (NLP), which deals with interactions between computers and human language. NLP is widely used online to increase efficiency in a variety of ways, such as chatbots, speech recognition, etc. NLP has also the potential to assist people with speech impairments to better communicate with others. 
In order to “teach” NLP’s underlying algorithms, large sets of data must be first tagged according to what is required by the NLP tool. The output of NPL algorithms are inference predictions. There can be many types and objectives of these predictions, but let’s have a brief look at  just 3 common types of data tagging objectives. 

  1. Entity Tagging

Entity tagging is the process of scanning, identifying, extracting and tagging predetermined entities in text. There are several types of entities:

  • Named-entity recognition (NER) - locating and classifying named entities in text into predefined categories.
  • Keywords and keyphrases tagging - identifying and tagging keywords and keyphrases in a text. 
  • Part-of-speech tagging (POS tagging) -  highlighting parts in a text that correspond to a particular part of speech (noun, verb, adj.), based on both its definition and its context.

  1. Text Classification/Categorization

In this type of tagging, the NLP is used to determine the classification or category of a given text. Unlike entity tagging, in document classification the tool does not tag specific words and phrases, but rather classifies the text as a whole according to its content. There are a few types of types of data tagging objectives:

  • Product categorization - specifically important in ecommerce, it is the process by which products are sorted into matching product categories, in order to improve site navigation and overall shopping experience. The NLP tool scans the product description, tags the product with relevant attribute tags and places it in a predefined category. 
  • Document classification - the classification of a text according to its content, for example, the subject of the text, or other attributes. 
  • Sentiment tagging - used to classify text based on subjective information such as mood, emotions, feelings expressed in the text etc. 

Since the topic of text classification/categorization is pretty wide, we’ll focus on product categorization.

As mentioned above, this type of categorization is very important in ecommerce. Product categorization contains two aspects:

Item Classification - the process by which an item is classified into a specific category. The most common challenge of item classification is the category range width. The amount of potential categories into which the item can be classified, can be very significant - sometimes thousands while the distinction between these categories can be very small. It therefore becomes a challenge to determine the most relevant one. 

This is a classic problem that marketplaces need to solve when they classify sellers’ listings into a category.

The objective may become even more challenging for the sellers that operate in a number of marketplaces.  These may be required to classify their inventory into a specific category on each individual marketplace. Obviously, each marketplace has its own unique category structure - taxonomy. For example, when classifying a t-shirt, some marketplaces will require to place it under the ‘Fashion’ meta category, while others might require to place it under the ‘Clothing’ meta category. As the taxonomy of each marketplace differs, the range of possible categories for classification only gets wider. This requires businesses to keep separate classifications of each item to each marketplace according to each marketplace requirements. Moreover, these taxonomies often change, so the sellers need to continuously  keep track of these changes.

Another common issue with NLP based item classification is that sometimes the provided product data is not sufficient to classify the item. This is one of the good reasons why product data should always be as complete and accurate as possible. 

Attribute value tagging - this is similar to item classification, but on a smaller scale. It is the process of tagging products with values of specific attributes. Unlike categories, where the range of possible values can reach thousands, attributes’ range of values is rather limited. For example, when a product should be tagged with a ‘color’ tag, there are only that many colors to choose from. 

However, attribute value tagging is not lacking in difficulties either. One common problem in attribute value tagging is value ambiguity. Tagging a certain color on a product  is quite simple. But classifying a pair of shoes as either elegant, formal, or dress shoes, may not be as simple. As the value ambiguity increases, the classification becomes more difficult. 

Another challenge with attribute tagging is value entropy. In most cases the distribution of the products over attribute values is not uniform, but Pareto, sometimes with a pretty long tail. Training a proper classification model for such attributes may be difficult. Since examples of products with values in the long tail are scarce and the number or these values can be significant - collection of the training material for such models can be a significant challenge and failure to provide such material will lead to precision problems in prediction of the tail values.   


  1. Sentiment Analysis

One of the most challenging types of data tagging objectives is the analysis and interpretation of human emotions. When real humans sometimes have a hard time identifying the true emotions of the author, it is easy to imagine how difficult it is for a computer to detect humor, sarcasm, and other manners of human communication. 

Sentiment analysis, or opinion mining, is the process by which the NLP tool analyses a given text and tags it according to the emotions, feelings, opinions or general mood expressed in it. A common example of use of sentiment analysis is tagging of user reviews as either positive or negative. 

Relying on accurate data, sentiment analysis tools can scour social media networks and review sites and provide useful insight on brand awareness that help businesses adjust their marketing strategy.

Request a demo



    PhonePlease, indicate the full number. Ex.: +9720501234567