// http://example.com/runme.js document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains
<script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
As a general rule, never trust input coming from a client. Every GET, POST, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php echo '<div>' . htmlspecialchars($_GET['input']) . '</div>'; // or echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Everything inside the
When outputting a dynamically generated URL, PHP provides the
urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php $input = urlencode($_GET['input']); // or $input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL); echo '<a href="http://example.com/page?input="' . $input . '">Link</a>';
Any malicious input will be converted to an encoded URL parameter.
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use. For example you have HTML Purifier