TransWikia.com

Why is the use of TAB (%09) characters in the middle a 'javascript:' URL valid?

Information Security Asked by Paradoxis on October 28, 2021

Some context: I was assinged on a pentest and found an application that let me place my own links in an a tag’s href attribute. As expected, all strange values like javascript: were correctly filtered by an XSS filter, however I discovered that you could bypass this filter by injecting a TAB character in the middle of the protocol specification like such:

<a href="javascri   pt:alert(1)"></a>

Now what I would like to know is, why do browsers accept this as a valid javscript URL which will happily execute code whereas other characters like SPACE character are not allowed? Is there a historic reason for them allowing this strange format?


Note: Tested on Chrome, for those that like to test it, it also works with window.location like so:

window.location.href = 'javascrx09ipt:alert(1)'

2 Answers

I found an old discussion that you might find interesting for understanding the possible historical reasons behind this choice.

https://bugzilla.mozilla.org/show_bug.cgi?id=87298

That's a 19-year-old bug in Mozilla. The problem was that one website was not working as expected, because Mozilla didn't strip tab characters inside the URL in a link. The page worked as expected in Internet Explorer, which apparently ignored the tabs. Tabs are often used for indentation in HTML files, so sometimes you can expect a few tabs after a new line. Somebody cited a IETF standard suggesting that "whitespace should be ignored when extracting the URI". However, others were not fully convinced that removing all whitespace characters would be a good idea, because sometimes you might run across URIs with unencoded spaces (for example: https://www.example.com/path with spaces/), even though that would be wrong, at least according to current standards. Therefore they decided to just add tabs to the list of removed characters (carriage-return and line-feed characters were already being removed). Note though that spaces are allowed, and ignored, when they are at the beginning or at the end of the URI (example: <a href=" http://www.example.com "></a>).

So I suppose the historical reason for this choice is that they wanted to make sure the following code would work:

<!-- URL with new lines and TABS for indentation -->
<a href="https://www.example.com/?
         param1=foo&
         param2=bar">
   Click on this example link
</a>

<!-- URL with unencoded spaces -->
<a href="https://www.example.com/path with spaces/foo">Click here</a>

However they did not check exactly where the spaces or tabs were in the URL, they just decided to keep the spaces and remove the tabs. As a result, the first example doesn't work if you use spaces for indentation, and tab characters can be included anywhere in the URL without affecting anything (so even java<tab>script will be ok).

Answered by reed on October 28, 2021

When an element is clicked, the link is followed and its URL is resolved. This causes the string to be processed by the basic URL parser.

In your example, the parameter is processed in the path state during which TAB (U+0009) characters, along with carriage return and line feed characters, are ignored, and therefore do not form part of the resolved URL.Why does anchor tag remove tab character in URL specified inside href attribute?

As per the specification of Basic URL Parser, the ASCII tab or newline will be removed from the URL (or just ignored).

Answered by elsadek on October 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP