DataFeedWatch Blog | Data feed optimization tips

Ninja Level Data Feed Optimization Using RegExp

Written by Mateusz Miodek | April 17, 2014 9:40:57 AM Z

Those of you who already are using DataFeedWatch might have noticed the word “regexp” in mapping options. In this article I will explain how RegExp can be used in our app, but let’s first clarify what exactly RegExp is.

A regular expression (RegExp for short) is a special text string to describe a search pattern. You can think of regular expressions as wildcards on steroids.

You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. RegExp works on the same principles but can do so much more.

It takes some practice to get the hang of RegExp, but once mastered it comes it very handy. For those who are interested in learning regular expressions I can recommend this tutorial.

Also it is a good idea to test your RegExp before deploying it. There are many online tools out there to do just that. The tool I use is called Rebular.

So let’s move on to real life examples to see how RegExp can be helpful when it comes to feed optimization.

Example 1

Imagine you need to create a field ‘color’ for your Google Shopping feed. You do not have an field for color in your store but you know that all the titles of your products end with a color name (e.g. Adidas Mens Snova Glide 5 Running Shoes Green).

The best way to deal with this situation is to map color from name and use an additional replace rule with RegExp like this:

What it does is:

  1. divide each name into two groups:
    group 1 - everything except the last word represented by (.*) where
    .* => any single character appearing any number of timesgroup 2 –last word represented by (s[^s]+) where
    s => any whitespace character
    [^s]+ => any single character except whitespace appearing at least once
  2. Replace existing value which can be described as (.*)(s[^s]+) with a new value which is group 2 (in RegExp taxonomy written as $2)

The outcome of this mapping for “Adidas Mens Snova Glide 5 Running Shoes Green” would be “Green”.

Example 2

Imagine you create a price field for a channel that accepts 2 decimal points (e.g. 12.45) and your prices have 4 (12.4500). Again replace rule with RegExp comes in handy. To fix the format we need to set it like this:

Similarly to the previous example this rule:

  1. divides each price into 2 groups:
    group 1 – everything except the last two decimal points ([0-9]+.[0-9]{2}) where
    [0-9]+ => any whole number
    . => dot character
    ([0-9]{2} => any 2-digit numbergroup 2 – the last two decimal points ([0-9]{2})
  2. replaces existing value which can be described as ([0-9]+.[0-9]{2})([0-9]{2}) with a new value which is group 1 ($1)

The outcome of this mapping for 12.4500 is 12.45.

Be advised that this mapping does not round up the price to two decimal points, but instead cuts off the last two digits.

Example 3

Let’s say you want to set product_type for Google Shopping as a main category of your products (e.g. Car parts) but in your system you have only the whole category paths (e.g. Car parts > BMW > 320i > 2013).

What you need to do here is remove everything beginning from “ >”. The rule that covers this would look like this:

where
s>.* => any single character followed by “>” followed by any single character appearing any number of times

The outcome of this mapping for “Car parts > BMW > 320i > 2013” would be “Car parts”.

Example 4

For the last example imagine a channel that requires UPCs, but in your system not all products have UPCs and the UPCs that you do have do not all have a proper format (12-digit).

If you send a feed with products for which UPCs are empty or improper, the whole feed could be rejected. What you need to do is exclude those products. This can be achieved with a single exclude rule using guess what … RegExp.

What we do here is include only products for which UPC is exactly a 12-digit number. In other words include products only if UPC matches regexp ^[0-9]{12}$

Those are only a few of numberless examples of how RegExp can be used. The rule of a thumb is that whenever there is some complex mapping you need to RegExp is your “weapon of choice”.

If you have any mapping issues please describe them in the comments and I will try to find a proper RegExp to deal with it (if possible).