Don't use .NET System.Uri.UnescapeDataString in URL Decoding
URL Encoding should encode Space into "+" or "%20". URL Decoding should decode "+" or "20" into Space. However by design, System.Uri.UnescapeDataString doesn't decode "+" into Space.
The MSDN remark of Uri.UnescapeDataString says:
“Many
Web browsers escape spaces inside of URIs into plus ("+") characters;
however, the UnescapeDataString method does not convert plus characters
into spaces because this behavior is not standard across all URI
schemes.”
The issue will rise when your web application has query string like:
If you use System.Uri.UnescapeDataString
to decode the query string value "just+do+it", the result is
"just+do+it" instead of "just do it". When the downstream application need to URL encode the value again, it becomes "just%2bdo%2bit ". The final URL will looks like
The spaces get lost and application could interpret the value as "just+do+it" instead of "just do it".
Detailed discussion:
RFC2396
defined reserved characters such as &, $, + and excluded characters
such as space, %, < > must be escaped (URL encoded) when used as
values in query string of URL in order to keep the original meaning of
the character.
For example: to pass information such as
Products : Windows&Office Price: $200 Comment: In Stock Sign:+
The URL could be
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In%20Stock&sign=%2b
or
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In+Stock&sign=%2b
URL
may be used as return URL value in other URL. In the case, the URL need
to be encoded and already encoded characters will be double encoded.
http%3a%2f%2fwww.ms.com%2fdefault.aspx%3fProducts%3dWindows%2526Office%26Price%3d%2524200%26Comment%3dIn%2520Stock%26sign%3d%252b
or
http%3a%2f%2fwww.ms.com%2fdefault.aspx%3fProducts%3dWindows%2526Office%26Price%3d%2524200%26Comment%3dIn%2bStock%26sign%3d%252b
Characters
|
Single Encoded
|
Double Encoded
|
&
|
%26
|
%2526
|
$
|
%24
|
%2524
|
+
|
%2b
|
%252b
|
Space
|
%20, +
|
%2520, %2b
|
%
|
%25
|
%2525
|
<
|
%3c
|
%253c
|
Notice Space's single encoding can be "+" and double encoding can be "%2b" and + sign's single encoding is %2b.
If the function doesn't handle the encoding properly, the original meaning of the character could be lost in transaction.
The right encoding or decoding methods should do what the above table defines.
.NET encoding methods
Characters
|
HttpUtility.UrlEncode
|
System.Uri.EscapeDataString
|
System.Uri.EscapeUriString
|
&
|
%26
|
%26
|
&
|
$
|
%24
|
%24
|
$
|
+
|
%2b
|
%2B
|
+
|
Space
|
+
|
%20
|
%20
|
%
|
%25
|
%25
|
%25
|
<
|
%3c
|
%3C
|
%3C
|
Notice:
1. System.Uri.EscapeUriString doesn't encode RFC reserved characters
2. URLEncode encodes Space as "+" and EscapeDataString encode Space as "%20".
3. To encode the whole URL as return URL, EscapdeUriString should not be used.
.NET Methods
|
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In+Stock&sign=%2b
|
URLEncode
|
http%3a%2f%2fwww.ms.com%2fdefault.aspx%3fProducts%3dWindows%2526Office%26Price%3d%2524200%26Comment%3dIn%2bStock%26sign%3d%252b
|
EscapeDataString
|
http%3A%2F%2Fwww.ms.com%2Fdefault.aspx%3FProducts%3DWindows%2526Office%26Price%3D%2524200%26Comment%3DIn%2BStock%26sign%3D%252b
|
EscapdeUriString
(not right)
|
http://www.ms.com/default.aspx?Products=Windows%2526Office&Price=%2524200&Comment=In+Stock&sign=%252b
|
Or
.NET Methods
|
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In%20Stock&sign=%2b
|
URLEncode
|
http%3a%2f%2fwww.ms.com%2fdefault.aspx%3fProducts%3dWindows%2526Office%26Price%3d%2524200%26Comment%3dIn%2520Stock%26sign%3d%252b
|
EscapeDataString
|
http%3A%2F%2Fwww.ms.com%2Fdefault.aspx%3FProducts%3DWindows%2526Office%26Price%3D%2524200%26Comment%3DIn%2520Stock%26sign%3D%252b
|
EscapdeUriString
(not right)
|
http://www.ms.com/default.aspx?Products=Windows%2526Office&Price=%2524200&Comment=In%2520Stock&sign=%252b
|
There are two decoding methods in .NET
Encoded Characters
|
HttpUtility.UrlDecode
|
System.Uri.UnescapeDataString
|
%26
|
&
|
&
|
%24
|
$
|
$
|
%2b
|
+
|
+
|
%20
|
Space
|
Space
|
+
|
Space
|
+
|
%25
|
%
|
%
|
%3c
|
<
|
<
|
Notice
that UrlDecode UnescapeDataString decode "+" differently. This will
cause problem when decoding return URL which contains double encoded
Space as "%2b".
For example: "Comment%3dIn%2bStock" in encoded return URL should be double decoded into
Variable: "Comment" Value: "In Stock"
Call UrlDecode twice on it
"Comment%3dIn%2bStock" à "Comment=In+Stock" à "Comment=In Stock"
Call UnescapeDataString twice on it
"Comment%3dIn%2bStock" à "Comment=In+Stock" à "Comment=In+Stock"
The original string "In Stock" is broken by UnescapeDataString.
If
the downstream application assumes the URL string had be restored to
not encoded format "In Stock" and use it as input to encode it again,
the single encoding will become
"Comment=In+Stock" à "Comment%3dIn%2bStock"
Instead of
"Comment=In Stock" à "Comment=In+Stock"
Conclusion:
Since
an application has no control of its upstream (use input or config), it
can only assume the right encoding is in the URL query string: Single
encoded special character as query string parameter value. Especially
the Space can be "+" or "%20". When the URL needs to used as return URL
in query string, it must be encoded again. Space will be double encoded
as "%2b" or %2520".
When
the receiving application received the encoded URL, if it uses method
like UnescapeDataString for decoding, the "%2b" will not decoded into
Space, Instead it becomes "+" as final result.
Developer should avoid encoding Space into "+" or double encoded into "%2b". It is recommended that when encode URL use "System.Uri.EscapeDataString", when decode URL use " HttpUtility.UrlDecode"
Tester should ensure that
1.
Reserved and Excluded characters as defined by RFC2396 should be
singled encode when used as value in query string of URL as next table.
(URL as links, config values or test values).
2.
If the URL is used in return URL or value of another query string, the
Reserved and Excluded characters should be doubled encoded as next
table.
Characters
|
Single Encoded
|
Double Encoded
|
&
|
%26
|
%2526
|
$
|
%24
|
%2524
|
+
|
%2b
|
%252b
|
Space
|
%20, +
|
%2520, %2b
|
%
|
%25
|
%2525
|
<
|
%3c
|
%253c
|
Two test URL can be
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In%20Stock&sign=%2b
or
http://www.ms.com/default.aspx?Products=Windows%26Office&Price=%24200&Comment=In+Stock&sign=%2bsource: http://blogs.msdn.com/b/yangxind/archive/2006/11/09/don-t-use-net-system-uri-unescapedatastring-in-url-decoding.aspx
No comments:
Post a Comment