Class 19 CS 480-008 7 April 2016 On the board ------------ 1. Last time 2. Warmup: shellshock 3. XSS 4. SQL injection 5. other injection 6. cookie issues 7. side channel attacks 8. postMessage 9. other attacks --------------------------------------------------------------------------- 1. Last time --Web security --a note about sub-resource integrity JavaScript can't do it on its own, but the browser can https://www.w3.org/TR/SRI/ https://hacks.mozilla.org/2015/09/subresource-integrity-in-firefox-43/ --today: focus on defenses. theme will be how difficult it is to truly defend. 2. Warmup: shellshock in bash, one process can export not only a var=value as part of its environment but also a *function*: VAR1=() { line1; line2; } but bash parses such functions buggily. in particular, in prior versions of bash, creating the following as an environment variable: VAR2=() { ignored; }; /bin/id causes bash to execute /bin/id (which displays the UID and GID information for the current user). [What's going on? bash is incorrectly passing the string to its interpreter, as a way of getting the string to be "read in", but bash isn't checking that it's passing *only* the string.] Attacker can exploit this via CGI, wherein client-supplied values show up as environment variables. Example: GET /index.cgi?name=value HTTP/1.1 Host: www.example.com Custom-header: Custom-value So what happens if the attacker replaces www.example.com with: (){ blah; }; /bin/id ? Then bash is given an environment with REMOTE_HOST=(){ blah; }; /bin/id Other attacks are possible too (replace custom-value, replace GET, replace the path, etc.). For more info: http://seclists.org/oss-sec/2014/q3/650 Shellshock is a particular instance of security bugs which arise from improper content sanitzation. 3. XSS Another type of content sanitzation failure occurs during cross-site scripting attacks (XSS)..... Example: suppose that a CGI script embeds a query string parameter in the HTML that it generates. That is, server has template code like this: hello, {{ user }} and intended use is something like: http://victim.com/index.cgi?user=me Now, what happens if attacker convinces user to submit: http://victim.com/index.cgi?user= ? Real attacks have to be a bit more savvy: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet Why is cross-site scripting so prevalent? -Dynamic web sites incorporate user content in HTML pages (e.g., comments sections). -Web sites host uploaded user documents. *HTML documents can contain arbitrary Javascript code! *Non-HTML documents may be content-sniffed as HTML by browsers. -Insecure Javascript programs may directly execute code that comes from external parties (e.g., eval(), setTimeout(), etc.). Defenses Defense 1: Detect with heuristics --Chrome and IE have a built-in feature which uses heuristics to detect potential cross-site scripting attacks. *Ex: Is a script which is about to execute included in the request that fetched the enclosing page? http://foo.com?q=" --This would result in malformed HTML in the request (and reply), but the browser would ultimately execute the result. * And: filters can't catch persistent XSS attacks in which the server saves attacker-provided data, which is then permanently distributed to clients. --Classic example: A "comments" section which allows users to post HTML messages. The browser isn't looking at the internal contents of the form --Another example: Suppose that a dating site allows users to include HTML in their profiles (as in Zoobar). An attacker can add HTML that will run in a *different* user's browser when that user looks at the attacker's profile! Attacker could steal the user's cookie. [summary: "you look at attacker's comment, and you're running its code" "you look at attacker's profile, and you're running its code" server should be sanitizing... ] Defense 2: "httponly" cookies. *A server can tell a browser that client-side JavaScript should not be able to access a cookie. [The server does this by adding the "Httponly" token to a "Set-cookie" HTTP response value.] *This is only a partial defense, since the attacker can still issue requests that contain a user's cookies (CSRF). Defense 3: Privilege separation: Use a separate domain for untrusted content. *For example, Google stores untrusted content in googleusercontent.com (e.g., cached copies of pages, Gmail attachments). *Even if XSS is possible in the untrusted content, the attacker code will run in a different origin. *There may still be problems if the content in googleusercontent.com points to URLs in google.com. Defense 4: Encode untrusted content and encode it in a way that constrains how it can be interpreted. *Ex: Django templates: Define an output page as a bunch of HTML that has some "holes" where external content can be inserted. [https://docs.djangoproject.com/en/dev/topics/templates/#automatic-html-escaping] A template might contain code like this . . . Hello {{ name }} . . . where "name" is a variable that is resolved when the page is processed by the Django template engine. That engine will take the value of "name" (e.g., from a user-supplied HTTP query string), and then automatically escape dangerous characters. For example, angle brackets < and > --> < and > double quotes " --> " This prevents untrusted content from injecting HTML into the rendered page. Templates cannot defend against all attacks! For example . . .
...
Now, what if attacker sets 'var' equal to: 'class1 onmouseover=javascript:func()' ? This may succeed as an XSS attack, depending on how the browser parses the malformed HTML. *So, content sanitization kind of works, but it's extremely difficult to parse HTML in an unambiguous way. *Possibly better approach: Completely disallow externally-provided HTML, and force external content to be expressed in a smaller language (e.g., Markdown: http://daringfireball.net/projects/markdown/syntax). Validated Markdown can then be translated into HTML. Defense 5: Content Security Policy (CSP) Allows a web server to tell the browser which kinds of resources can be loaded, and the allowable origins for those resources. --Server specifies one or more headers of the type "Content-Security-Policy". --Example: Content-Security-Policy: default-src 'self' *.mydomain.com //Only allow content from the page's domain //and its subdomains. You can specify separate policies for where images can come from, where scripts can come from, frames, plugins, etc. --CSP also prevents inline JavaScript, and JavaScript interfaces like eval() which allow for dynamic JavaScript generation. --Some browsers allow servers to disable content-type sniffing (X-Content-Type-Options: nosniff). 4. SQL injection attacks --Suppose that the application needs to issue SQL query based on user input: query = "SELECT * FROM table WHERE userid=" + userid --Problem: adversary can supply userid that changes SQL query structure, e.g., "0; DELETE FROM table;" --What if we add quoting around userid? query = "SELECT * FROM table WHERE userid='" + userid + "'" The vulnerability still exists! The attacker can just add another quote as first byte of userid. --Real solution: unambiguously encode data. Ex: replace ' with \', etc. *SQL libraries provide escaping functions. --Django defines a query abstraction layer which sits atop SQL and allows applications to avoid writing raw SQL (although they can do it if they really want to). --(Possibly fake) German license plate which says ";DROP TABLE" to avoid speeding cameras which use OCR+SQL to extract license plate number. 5. Other injection If untrusted entities can supply filenames, that is bad. --Example: Suppose that a web server reads files based on user-supplied parameters. open("/www/images/" + filename) --Problem: filename might look like this: ../../../../../etc/passwd (chroot jails would help here) --As with SQL injection, the server must sanitize the user input: the server must reject file names with slashes, or encode the slashes in some way. [Aside: the Django Web framework -Moderately popular web framework, used by some large sites like Instagram, Mozilla, and Pinterest. *A "web framework" is a software system that provides infrastructure for tasks like database accesses, session management, and the creation of templated content that can be used throughout a site. *Other frameworks are more popular: PHP, Ruby on Rails. *In the enterprise world, Java servlets and ASP are also widely used. -Django developers have put some amount of thought into security. *So, Django is a good case study to see how people implement web security in practice. -Django is probably better in terms of security than some of the alternatives like PHP or Ruby on Rails, but there are still devils in the details ] 6. Cookie issues Zoobar, Django, and many web frameworks put a random session ID in the cookie. --The Session ID refers to an entry in some session table on the web server. The entry stores a bunch of per-user information. --Session cookies are sensitive: adversary can use them to impersonate a user! *As we discussed in the last class, the same-origin policy helps to protect cookies . . . *. . . but you shouldn't share a domain with sites that you don't trust! Otherwise, those sites can launch a session fixation attack: 1) Attacker sets the session ID in the shared cookie. 2) User navigates to the victim site; the attacker-choosen session ID is sent to the server and used to identify the user's session entry. 3) Later, the attacker can navigate to the victim site using the attacker-chosen session id, and access the user's state! This is because the attacker knows the id! (The attacker chose it.) -Hmmm, but what if we don't want to have server-side state for every logged in user? An alternative to cookies for session management: -Use HTML5 local storage, and implement your own authentication in Javascript. *Some web frameworks like Meteor do this. *Benefit: The cookie is not sent over the network to the server. *Benefit: Your authentication scheme is not subject to complex same-origin policy for cookies (e.g., DOM storage is bound to a single origin, unlike a cookie, which can be bound to multiple subdomains). 7. Side channels -A side channel is a mechanism that allows two applications to exchange information, even though the security model in theory prohibits those applications from communicating. *The channel is called "side" because it doesn't use official mechanisms for cross-app communication. * (closely related idea: _covert_ channel, wherein an observer cannot even tell that two entities are communicating.) -Example #1: CSS-based sniffing attacks *Attacker has a website; user is convinced to visit *Attacker goal: Figure out the other websites that the user has visited (e.g., to determine the user's political views, medical history, etc.). *Exploit vector: A web browser uses different colors to display visited versus unvisited links! So, attacker page can generate a big list of candidate URLs, and then inspect the colors to see if the user has visited any of them. -Can check thousands of URLs a second! -Can go breadth-first, find hits for top-level domains, then go depth-first for each hit. *Fix: Force getComputedStyle() and related JavaScript interfaces to always say that a link is unvisited. [https://blog.mozilla.org/security/2010/03/31/plugging-the-css-history-leak/] -Example #2: Cache-based attacks *Attacker setup and goal are the same as before. *Exploit vector: It's much faster for a browser to access data that's cached instead of fetching it over the network. So, attacker page can generate a list of candidate images, try to load them, and see which ones load quickly! *This attack can reveal your location if the candidate images come from geographically specific images, e.g., Google Map tiles. [http://w2spconf.com/2014/papers/geo_inference.pdf] *Fix: No good ones. A page could never cache objects, but this will hurt performance. But suppose that a site doesn't cache anything. Is it safe from history sniffing? No! -Example #3: DNS-based attacks *Attacker setup and goal are the same as before. *Exploit vector: Attacker page generates references to objects in various domains. If the user has already accessed objects from that domain, the hostnames will already reside in the DNS cache, making subsequent object accesses faster! [http://sip.cs.princeton.edu/pub/webtiming.pdf] *Fix: No good ones. Could use raw IP addresses for links, but this breaks a lot of things (e.g., DNS-based load balancing). However, suppose that a site doesn't cache anything and uses raw IP addresses for hostnames. Is it safe from history sniffing? No! -Example #4: Rendering attacks. *Attacker setup and goal are the same as before. *Exploit vector: Attacker page loads a candidate URL in an iframe. Before the browser has fetched the content, the attacker page can access . . . window.frames[1].location.href . . . and read the value that the attacker set. However, once the browser has fetched the content, accessing that reference will return "undefined" due to the same-origin policy. So, the attacker can poll the value and see how long it takes to turn "undefined". If it takes a long time, the page must not have been cached! [http://lcamtuf.coredump.cx/cachetime/firefox.html] *Fix: Stop using computers. 8. postMessage -Two frames from different origins can use postMessage() to asynchronously exchange immutable strings. *Sender gets a reference to a window object, and does this: window.postMessage(msg, origin); *Receiver defines an event handler for the special "message" event. The event handler receives the msg and the origin. -Q: Why does the receiver have to check the origin of received message? A: To perform access control on senders! If the receiver implements sensitive functionality, it shouldn't respond to requests from arbitary origins. *Common mistake: The receiver uses regular expressions to check the sender's origin. *Even if origin matches /.foo.com/, doesn't mean it's from foo.com! Could be "xfoo.com", or "www.foo.com.bar.com". [More details: https://www.cs.utexas.edu/~shmat/shmat_ndss13postman.pdf] -Q: Why does the sender have to specify the intended origin of the receiver? A: postMessage() is applied to a window, not an origin. *Remember that an attacker may be able to navigate a window to a different location. *If the attacker navigates the window, another origin may receive message! *If the sender explictly specifies a target origin, the browser checks recipient origin before delivering the msg. [More details: https://www.usenix.org/legacy/event/sec08/tech/full_papers/barth/barth.pdf] 9. Other attacks The web stack has some protocol ambiguities that can lead to security holes. -HTTP header injection from XMLHttpRequests *Javascript can ask browser to add extra headers in the request. So, what happens if we do this? var x = new XMLHttpRequest(); x.open("GET", "http://foo.com"); x.setRequestHeader("Content-Length", "7"); //Overrides the browser-computed field! x.send("Gotcha!\r\n" + "GET /something.html HTTP/1.1\r\n" + "Host: bar.com"); The server at foo.com may interpret this as two separate requests! Later, when the browser (or an interposed cache) receives the response to the second request, it may overwrite a cache entry belonging to bar.com with content from foo.com! *Solution: Prevent XMLHttpRequests from setting sensitive fields like "Host:" or "Content-Length". *Takehome point: Unambiguous encoding is critical! Build reliable escaping/encoding! -URL parsing ("The Tangled Web" page 154) *Flash had a slightly different URL parser than the browser. *Suppose the URL was http://example.com:80@foo.com/ -Flash would compute the origin as "example.com". -Browser would compute the origin as "foo.com". *Bad idea: complex parsing rules just to determine the principal. *Bad idea: re-implementing complex parsing code. -Here's a hilarious/terrifying way to launch attacks using Java applets that are stored in the .jar format. *In 2007, Lifehacker.com posted an article which described how you could hide .zip files inside of .gif files. *Leverage the fact that image renderers process a file top-down, whereas decompressors for .zip files typically start from the end and go upwards. *Attackers realized that .jar files are based on the .zip format! *THUS THE GIFAR WAS BORN: half-gif, half-jar, all-evil. -Really simple to make a GIFAR: Just use "cat" on Linux or "cp" on Windows. -Suppose that target.com allows external parties to upload only image objects. The attacker can upload a GIFAR, and the GIFAR will pass target.com's image validation tests! -Then, if the attacker can launch a XSS attack, the attacker can inject HTML which refers to the ".gif" as an applet. -The browser will load that applet and give it the authority of target.com! 10. Other aspects There are many other aspects to building a secure web application. -Ex: ensure proper access control for server-side operations. *Django provides Python decorators to check access control rules. -Ex: Maintain logs for auditing, prevent an attacker from modifying the log. --------------------------------------------------------------------------- Acknowledgments: MIT's 6.858 staff, especially James Mickens, for both content and humor.