I came across an incredibly trivial issue at work the other day that left made me stop and think for a second - How do you count occurences of a string in SQL?
The actual issue I was trying to solve was a particular column had multiple values within it, seperated by a semi colon. I needed to extract these and count how many values there were. It wes a very straightforward import that was coming into SQL - I could have passed it through into another pipeline to process it and calculate the character count in Python, but that was a lot of needless steps to get a single column.
To demonstrate the solution I will use an example table called assets_import
(creative name, I know!). It looks as follows
id | asset_title | asset_tags |
---|---|---|
1 | Leather Jacket | leather;jacket;casual;black |
2 | Jeans | denim;pants;slim-fit;casual;blue |
3 | Suit Jacket | business;black;jacket;cotton |
4 | Keychain | keychain |
Appreciating it is quite a niche use case - Lets say you need to add a column to the end of this called “count of tag” - How would you solve it?
The solution is, thankfully, remarkably simple. I will be using postgres, but adapt the inbuilt function names to the needs of your RDBMS:
|
|
Which outputs:
id | asset_title | asset_tags | asset_tag_count |
---|---|---|---|
1 | Leather Jacket | leather;jacket;casual;black | 4 |
2 | Jeans | denim;pants;slim-fit;casual;blue | 5 |
3 | Suit Jacket | business;black;jacket;cotton | 4 |
4 | Keychain | keychain | 1 |
So how does it work?
LENGTH(asset_tags)
LENGTH(REPLACE(asset_tags, ';',''))
and subtract it from the original lengthone
- Has no semi colon but one tagone;two
- Has one semi colon, but two tagsone;two;three
- Has two semi colons, three tagsThe data set I was working with always had atleast one tag, but its worth mentioning you may encounter a scenario where there is an empty string or null value and want present 0 tags.
I’ve added a couple more rows to the database to show how this may look:
id | asset_title | asset_tags |
---|---|---|
5 | Coat Hanger | ’’ or empty string |
6 | Clothes Peg | NULL |
In this scenario we can use a CASE
statement:
|
|
We now get the following output:
id | asset_title | asset_tags | asset_tag_count |
---|---|---|---|
1 | Leather Jacket | leather;jacket;casual;black | 4 |
2 | Jeans | denim;pants;slim-fit;casual;blue | 5 |
3 | Suit Jacket | business;black;jacket;cotton | 4 |
4 | Keychain | keychain | 1 |
5 | Coat Hanger | 0 | |
6 | Clothes Peg | NULL | 0 |
Hopefully you found this useful, I found this incredibly simple problem quite novel in its solution and in the spirit of sharing thought others may be interested too!
I have included a full version of the final code below if you want to give it a try
|
|