How N-grams, Levenshtein Distance, and Jaccard Similarity Help SEO and PPC
Now AI may generate keywords within minutes, and even launch a paid search campaign within minutes. However, it is not sufficient to merely apply AI. To have your SEO or Google Ad campaigns work in reality, you should learn how search works and how to format your keywords.
The following are three easy ways of making sense of search data by marketers:
- N-grams: Dissecting keywords into smaller segment.
- Levenshtein Distance: Searching for spelling errors or other words.
- Jaccard Similarity: Finding the similarity between two phrases.
N-grams
This is the division of a keyword into a sequence of words.
Keyword: “private caregiver nearby”
- Unigrams (1 word): private, caregiver, nearby
- Bigrams (2 words): private caregiver, caregiver nearby
- Trigrams (3 words): private caregiver nearby
Why use N-grams?
- N-grams can be used in cases where you have 100,000 search terms and you want to identify the words or phrases that work well. And even such bad words as [free] (those who type [free] in the search engine mostly do not purchase it) can be turned into bad keywords.
- You can recognize the words that do well such as the ones that contain near and develop campaigns targeting them.
- You can identify high-performing words like “nearby” and create campaigns targeting them.
N-grams are used to transform the sloppy search data into small and useful groups, thus that you can refine your campaigns or content.
Levenshtein Distance
Levenshtein Distance is used to indicate the number of letters required to be modified in order to convert a single word into another.
- cat → cats = distance 1 (just add “s”)
- cat → dog = distance 3 (c→d, a→o, t→g)
Use in SEO and PPC:
- Find misspelled brand or competitor keywords. Example: uber → uver (distance 1)
- Combine very similar keywords to simplify ad groups.
The following is an example of grouping keywords using the Levenshtein Distance:
| Keyword | 24/7 plumber | 24 7 plumber | 247 plumber |
|---|---|---|---|
| 24/7 plumber | 0 | 1 | 1 |
| 24 7 plumber | 1 | 0 | 1 |
| 247 plumber | 1 | 1 | 0 |
The meaning of all three keywords is the same. You can monitor them under a single ad group. Reporting, bidding and control of campaigns become so easy.
Jaccard Similarity
Jaccard similarity compares two phrases on the number of similar words. It does not really matter how the words are ordered.
- “new york plumber” & “plumber new york” = similarity 1 (all words same, order different)
- “new york plumber” & “NYC plumber” = similarity 0.25 (only “plumber” is common)
Use: Combine similar keywords that are just reordered. Note: It won’t understand meaning (NYC vs New York), but it works well for duplicates.
Combining Techniques for Campaigns
You can use these three techniques together for bigger campaigns:
- Use N-grams to find high-performing words and phrases.
- Use Levenshtein Distance to combine very similar keywords.
- Use Jaccard Similarity to remove reordered or slightly different duplicates.
Example: Cybersecurity keywords:
- cybersecurity courses
- cybersecurity online course
- free cybersecurity courses
- online cybersecurity courses
- cybersecurity course
- google cybersecurity course
After combining, you can make four main keyword groups:
- Cybersecurity courses
- Cybersecurity courses online
- Free cybersecurity courses
- Google cybersecurity course
This keeps campaigns simple, structured, and easier to manage.
Quick Summary
| Scenario | Best Technique | Why |
|---|---|---|
| Find high-intent patterns in big search data | N-grams | Shows themes fast and reduces data size |
| Remove duplicate or similar keywords | Levenshtein Distance | Detects spelling and small differences |
| Combine reordered or slightly changed phrases | Jaccard Similarity | Checks similarity ignoring word order |
| Build scalable keyword clusters | Combine all three | Accurate and compact campaign structure |
Conclusion: N-grams, Levenshtein Distance and Jaccard Similarity are required to develop structured, high-performing campaigns in AI in real time, creating keywords. These methods purify your data, eliminate duplicates and enhance your campaigns to be profitable.
No comments yet. Be the first to post!