Big Data Matching

Matching for competitive advantage using big data

Behind the term of big data stands an enormous volume of data on the Internet, which is growing every second by an estimated 70 terabytes. This offers the benefit that high-performance computers can recognise a pattern in this sea of data that an individual cannot detect. From these findings, studies for research and scientific purposes can be achieved in the shortest of time. One inherent component of big data, however, is also duplicated, incomplete or even false information. Among products, for example, from which considerable problems for retailers and manufacturers result. To challenge competitors in eCommerce, there is no alternative but to purge false data. This takes place through product matching.

But how exactly does false data arise? A major role here is played by duplicates. Duplicate means that a product appears twice in a particular shop. This can occur upon feeding in the data, for example, of the manufacturer’s designation: One time it is written HP and another time Hewlett & Packard. It becomes more difficult when the attributes of a product are concerned: For one training shoe, the attributes of material, heel height and fastening style are fed in, and then again for another shoe colour, width and sole material are entered.

In some cases, an article number describes the different variations of a product.  These problems impede a clear identification of the product. And competitors here have not only to struggle in their own shops: For market analysis purposes, they also have to scrutinise other online marketplaces and compare that product data with their own. This turns out to be difficult when product information does not match.

Matching for competitive advantage using big data

In this case, an article number describes the different variations of a product. The attributes differ from one another sharply here. This missing clarity impedes identification of the product when it is to be compared with other offerings on the Internet.

It is therefore worthwhile to improve product data and to delete duplicates. With a huge volume of data, this can hardly be achieved manually. Now for an example of the calculation. When you have 1,000 different pairs of training shoes in your lineup, you would need to make 49,500 comparisons. And in this example, the number of training shoes is relatively small. Especially in times when the customer makes rapid purchasing decisions with just a few clicks and the help of price comparison portals, retailers have to prepare their product data clearly and explicitly. Required here is matching, as performed by the blackbee Business Intelligence software. We show you below by which steps blackbee attains successful matching.

Step 1: We extract the data

You, as the client, firstly provide us with a product list, with which our blackbee software conducts the matching. It is irrelevant here whether hundreds or tens of thousands of products are involved.  You then determine the sources that blackbee should examine: This can encompass online marketplaces like Amazon and price comparison portals such as billiger.de, for example. Using a querying strategy, our software now generates a list with the all offerings from your sources in a process known as crawling. The system adapts itself in this process to the varying URL and page structures of the sources. You can decide whether you wish to perform a daily or weekly query.

Step 2: We standardise the attribute values

Before the actual matching begins, blackbee first performs a preprocessing. When we recall the example of HP and Hewlett & Packard, blackbee now standardises this product data and adds further, missing attributes. With the help of these supplementary attributes, products can now be precisely identified.

Step 3: We compare the datasets against one another by way of matching

Now comes the actual matching: Using the attributes, our blackbee software compares the product data together. To efficiently organise the results, the software combines several attribute values. blackbee applies machine learning for this purpose: The software generates so-called training data, which are examples for matches and non-matches. This training data gives the system feedback and shows it where association errors arise and need to be corrected. The system takes note of these corrections, thus learning upon each run and attaining a very high accuracy. In this way, the data gathered enjoys a higher validity.

Step 4: We prepare the results

In the final step of reporting, blackbee provides the data to you. The software offers here the possibility of using the results for further analysis and for the most varied of reports. When you wish to perform price observation for a particular product, the software can generate, for example, a list of the top five providers.

The use of blackbee’s intuitive matching algorithm, in particular, improves the quality of product data. This is how you ensure a strategic advantage over other competitors when dealing with huge data volumes. With highly-valid data and error-free product management, after all, blackbee will establish for you the foundation for successful pricing.

For more background on the topic of matching, read our two white papers “Matching of product data“ and “Product matching excellence“.

Would you like to clean up your product data and monitor the development of your products on the market? Test blackbee now!