I recently came across an interesting client situation where a number of URLs were internally linked to in both an encoded and “decoded” manner.
Canonical tags were the same, and they weren’t always consistent.
A URL might be internally linked to as https://detailed.com/site-advice-(next-level)
But then have a canonical tag as https://detailed.com/site-advice-%28next-level%29
Notice how the left and right brackets change to %28 and %29 respectively.
Although I have been doing SEO and auditing websites for more than 15 years, I couldn’t name from memory which characters fit into standard encoding, and there was no single source of truth for how to handle canonical tags in this situation either.
As I’ve done hours of research into this, I decided that I would document my findings and solution for anyone who comes across this problem in the future.
Regarding special characters in URLs, we have multiple classification types.
Reserved Characters for URL Syntax
The reserved characters are: ! * ‘ ( ) ; : @ & = + $ , / ? % # [ ]
Written out with UTF-8 encodings and the name of each character, you get:
- ! – Exclamation mark (%20 with UTF-8 Encoding)
- * – Asterisk (%2A with UTF-8 Encoding)
- ‘ – Apostrophe / Single quote (%27 with UTF-8 Encoding)
- ( – Left parenthesis (%28 with UTF-8 Encoding)
- ) – Right parenthesis (%29 with UTF-8 Encoding)
- ; – Semi colon (%3B with UTF-8 Encoding)
- : – Colon (%3A with UTF-8 Encoding)
- @ – At sign (%40 with UTF-8 Encoding)
- & – Ampersand (%26 with UTF-8 Encoding)
- = – Equals (%3D with UTF-8 Encoding)
- + – Plus (%2B with UTF-8 Encoding)
- $ – Dollar sign (%24 with UTF-8 Encoding)
- , – Comma (%2E with UTF-8 Encoding)
- / – Forward slash (%2F with UTF-8 Encoding)
- ? – Question mark (%3F with UTF-8 Encoding)
- % – Percent (%25 with UTF-8 Encoding)
- # – Pound sign (%23 with UTF-8 Encoding)
- [ – Left square bracket (%5B with UTF-8 Encoding)
- ] – Right square bracket (%5D with UTF-8 Encoding)
If you want to create URLs with emojis, you probably shouldn’t. Google’s advice is that any special characters in URLs, including with foreign languages, should be written with UTF-8 encoding in mind.
A good real world example of URLs with characters in them that can rank is Wikipedia.
As you end up with URLs like: https://en.wikipedia.org/wiki/28_(number)
If we look at the canonical tag for the page, it’s also written in the same way. With paranthesis, rather than %28 or %29.
Another example is https://en.wikipedia.org/wiki/40%25_(song), where 40% is actually referencing the content of the page (a song, named 40%).
They rank first in Google for the name of the song followed by its creator (granted, it is Wikipedia) and a few other well ranking results include % as well.
If You Can Avoid Special Characters, Then Do So
Google is just one platform on the internet.
Even if you know how things are processed there, links may still break on chat applications, forums, social media platforms like Twitter and Faceook and so on.
If you can keep URL simple, that’s always best.
I also understand though that if you’re doing SEO for a site, these things may have been set-up without your knowledge, or before you joined the project.