Title: Isogeometric BDDC Preconditioners with Deluxe Scaling
Author(s): Beirao Da Veiga, Lourenco; Pavarino, Luca; Scacchi, Simone; Widlund, Olof; Zampni, Stefano
A BDDC (Balancing Domain Decomposition by Constraints) preconditioner with a novel scaling, introduced by Dohrmann for problems with more than one variable coeffcient and here denoted as deluxe scaling, is extended to Isogeometric Analysis of scalar elliptic problems. This new scaling turns out to be more powerful than the standard rho- and stiffness scalings considered in a previous isogeometric BDDC study. Our h-analysis shows that the condition number of the resulting deluxe BDDC preconditioner is scalable with a quasi-optimal polylogarithmic bound which is also independent of coeffcient discontinuities across subdomain interfaces. Extensive numerical experiments support the theory and show that the deluxe scaling yields a remarkable improvement over the older scalings, in particular, for large isogeometric polynomial degree and high regularity.
Title: An Efficient Active Learning Framework for New Relation Types
Candidate: Fu, Lisheng
Advisor(s): Grishman, Ralph; Davis, Ernest
Relation extraction is a fundamental task in information extraction. Different methods have been studied for building a relation extraction system. Supervised training of models for this task has yielded good performance, but at substantial cost for the annotation of large training corpora (About 40K same-sentence entity pairs). Semi-supervised methods can only require a seed set, but the performance is very limited when the seed set is very small, which is not very satisfactory for real relation extraction applications. The trade-off of annotation and performance is also hard to decide in practice. Active learning strategies allow users to gradually improve the model and to achieve comparable performance to supervised methods with limited annotation. Recent study shows active learning on this task needs much fewer labels for each type to build a useful relation extraction application. We feel active learning is a good direction to do relation extraction and presents a more efficient active learning framework. This framework starts from a better balance between positive and negative samples, and boosts by interleaving self-training and co-testing. We also studied the reduction of annotation cost by enforcing argument type constraints. Experiments show a substantial speed-up by comparison to previous state-of-the-art pure co-testing active learning framework. We obtain reasonable performance with only a hundred labels for individual ACE 2004 relation types. We also developed a GUI tool for real human-in-the-loop active learning trials. The goal of building relation extraction systems in a very short time seems to be promising.
Title: Incentive-Centered Design of Money-Free Mechanisms
Candidate: Gkatzelis, Vasilis
Advisor(s): Cole, Richard
This thesis serves as a step toward a better understanding of how to design fair and efficient multiagent resource allocation systems by bringing the incentives of the participating agents to the center of the design process. As the quality of these systems critically depends on the ways in which the participants interact with each other and with the system, an ill-designed set of incentives can lead to severe inefficiencies. The special focus of this work is on the problems that arise when the use of monetary exchanges between the system and the participants is prohibited. This is a common restriction that substantially complicates the designer's task; we nevertheless provide a sequence of positive results in the form of mechanisms that maximize efficiency or fairness despite the possibly self-interested behavior of the participating agents.
The first part of this work is a contribution to the literature on approximate mechanism design without money. Given a set of divisible resources, our goal is to design a mechanism that allocates them among the agents. The main complication here is due to the fact that the agents' preferences over different allocations of these resources may not be known to the system. Therefore, the mechanism needs to be designed in such a way that it is in the best interest of every agent to report the truth about her preferences; since monetary rewards and penalties cannot be used in order to elicit the truth, a much more delicate regulation of the resource allocation is necessary. Our contribution mostly revolves around a new truthful mechanism that we propose, which we call the /Partial Allocation/ mechanism. We first show how to use the two-agent version of this mechanism to create a system with the best currently known worst-case efficiency guarantees for problem instances involving two agents. We then consider fairness measures and prove that the general version of this elegant mechanism yields surprisingly good approximation guarantees for the classic problem of fair division. More specifically, we use the well established solution of /Proportional Fairness/ as a benchmark and we show that for an arbitrary number of agents and resources, and for a very large class of agent preferences, our mechanism provides /every agent/ with a value close to her proportionally fair value. We complement these results by also studying the limits of truthful money-free mechanisms, and by providing other mechanisms for special classes of problem instances. Finally, we uncover interesting connections between our mechanism and the Vickrey-Clarke-Groves mechanism from the literature on mechanism design with money.
The second part of this work concerns the design of money-free resource allocation mechanisms for /decentralized/ multiagent systems. As the world has become increasingly interconnected, such systems are using more and more resources that are geographically dispersed; in order to provide scalability in these systems, the mechanisms need to be decentralized. That is, the allocation decisions for any given resource should not assume global information regarding the system's resources or participants. We approach this restriction by using /coordination mechanisms/: a collection of simple resource allocation policies, each of which controls only one of the resources and uses only local information regarding the state of the system. The system's participants, facing these policies, have the option of choosing which resources they will access. We study a variety of coordination mechanisms and we prove that the social welfare of any equilibrium of the games that these mechanisms induce is a good approximation of the optimal welfare. Once again, we complement our positive results by studying the limits of coordination mechanisms. We also provide a detailed explanation of the seemingly counter-intuitive incentives that some of these mechanisms yield. Finally, we use this understanding in order to design a combinatorial constant-factor approximation algorithm for maximizing the social welfare, thus providing evidence that a game-theoretic mindset can lead to novel optimization algorithms.
Title: Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation
Author(s): Greenberg, Spencer; Mohri, Mehryar
We give the proof of a tight lower bound on the probability that a binomial random variable exceeds its expected value. The inequality plays an important role in a variety of contexts, including the analysis of relative deviation bounds in learning theory and generalization bounds for unbounded loss functions.
Title: Locality Optimization for Data Parallel Programs
Candidate: Hielscher, Eric
Advisor(s): Shasha, Dennis
Productivity languages such as NumPy and Matlab make it much easier to implement data-intensive numerical algorithms than it is to implement them in efficiency languages such as C++. This is important as many programmers (1) aren't expert programmers; or (2) don't have time to tune their software for performance, as their main job focus is not programming per se. The tradeoff is typically one of execution time versus programming time, as unless there are specialized library functions or precompiled primitives for your particular task a productivity language is likely to be orders of magnitude slower than an efficiency language.
In this thesis, we present Parakeet, an array-oriented language embedded within Python, a widely-used productivity language. The Parakeet just-in-time compiler dynamically translates whole user functions to high performance multi-threaded native code. This thesis focuses in particular on our use of data parallel operators as a basis for locality enhancing program optimizations. e transform Parakeet programs written with the classic data parallel operators (Map, Reduce, and Scan; in Parakeet these are called adverbs) to process small local pieces (called tiles) of data at a time. To express this locality we introduce three new adverbs: TiledMap, TiledReduce, and TiledScan. These tiled adverbs are not exposed to the programmer but rather are automatically generated by a tiling transformation.
We use this tiling algorithm to bring two classic locality optimizations to a data parallel setting: cache tiling, and register tiling. We set register tile sizes statically at compile time, but use an online autotuning search to find good cache tile sizes at runtime. We evaluate Parakeet and these optimizations on various benchmark programs, and exhibit excellent performance even compared to typical C implementations.
Title: Diet Planner: Finding a Nutritionally Sound Diet While Following (Most) of a Dieter’s Desires
Author(s): Jermsurawong, Mick Jermsak; Shasha, Dennis
We describe the design and implementation of a diet website currently housed at http://nutrientdata.herokuapp.com/. The site allows users or dieticians to enter nutritional constraints (e.g. at least this much calcium but not more than that amount of calcium) and objectives (e.g. minimize calories), a list of foods/brands the person likes. The site then determines, if possible, the quantities of at least some of those desired foods that would meet the nutritional constraints. If not possible, then the site guides the user in the choice of other foods that may meet the nutritional constraints. The net result is a tailored diet measured in servings.
Title: A General Method for Energy-Error Tradeoffs in Approximate Adders
Author(s): Kedem, Zvi; Muntimadugu, Kirthi Krishna
Approximate adders are adders with conventional architectures run in an overclocked mode. With this mode, erroneous sums may be produced at the savings of energy required to execute the computation. The results presented in this report lead to a procedure for allocating the available energy budgets among the adders modules so as to minimize the expected error. For simplicity, only the uniform distribution of the inputs is considered.
Title: Reversibility of Turing Machine Computations
Author(s): Kedem, Zvi M.
Since Bennett's 1973 seminal paper, there has been a growing interest in general-purpose, reversible computations and they have been studied using both mathematical and physical models. Following Bennett, given a terminating computation of a deterministic Turing Machine, one may be interested in constructing a new Turing Machine, whose computation consists of two stages. The first stage emulates the original Turing Machine computation on its working tape, while also producing the trace of the computation on a new history tape. The second stage reverses the first stage using the trace information. Ideally, one would want the second stage to traverse whole-machine states in the reverse order from that traversed in the first stage. But this is impossible other than for trivial computations. Bennett constructs the second phase by using additional controller states, beyond those used during the first stage. In this report, a construction of the new machine is presented in which the second stage uses the same and only those controller states that the first stage used and they are traversed in the reverse order. The sole element that is not fully reversed is the position of the head on the history tape, where it is out of phase by one square compared to the first stage.
Title: Piecewise Smooth Surfaces with Features
Candidate: Kovacs, Denis
Advisor(s): Zorin, Denis
The creation, manipulation and display of piecewise smooth surfaces has been a fundamental topic in computer graphics since its inception. The applications range from highest-quality surfaces for manufacturing in CAD, to believable animations of virtual creatures in Special Effects, to virtual worlds rendered in real-time in computer games.
Our focus is on improving the a) mathematical representation and b) automatic construction of such surfaces from finely sampled meshes in the presence of features. Features can be areas of higher geometric detail in an otherwise smooth area of the mesh, or sharp creases that contrast the overall smooth appearance of an object.
In the first part, we build on techniques that define piecewise smooth surfaces, to improve their quality in the presence of features. We present a crease technique suitable for real-time applications that helps increases the perceived visual detail of objects that are required to be very compactly represented and efficiently evaluated.
We then introduce a new subdivision scheme that allows the use of T-junctions for better local refinement. It thus reduces the need for extraordinary vertices, which can cause surface artifacts especially on animated objects.
In the second part, we consider the problem of how to build the control meshes of piecewise smooth surfaces, in a way that the resulting surface closely approximates an existing data set (such as a 3D range scan), particularly in the presence of features. To this end, we introduce a simple modification that can be applied to a wide range of parameterization techniques to obtain an anisotropic parameterization. We show that a resulting quadrangulation can indeed better approximate the original surface. Finally, we present a quadrangulation scheme that turns a data set into a quad mesh with T-junctions, which we then use as a T-Spline control mesh to obtain a smooth surface.
Title: Low-level Image Priors and Laplacian Preconditioners for Applications in Computer Graphics and Computational Photography
Candidate: Krishnan, Dilip
Advisor(s): Fergus, Rob
In the first part of this thesis, we develop novel image priors and efficient algorithms for image denoising and deconvolution applications. Our priors and algorithms enable fast, high-quality restoration of images corrupted by noise or blur. In the second part, we develop effective preconditioners for Laplacian matrices. Such matrices arise in a number of computer graphics and computational photography problems such as image colorization, tone mapping and geodesic distance computation on 3D meshes.
The first prior we develop is a spectral prior which models correlations between different spectral bands. We introduce a prototype camera and flash system, used in conjunction with the spectral prior, to enable taking photographs at very low light levels. Our second prior is a sparsity-based measure for blind image deconvolution. This prior gives lower costs to sharp images than blurred ones, enabling the use simple and efficient Maximum a-Posteriori algorithms.
We develop a new algorithm for the non-blind deconvolution problem. This enables extremely fast deconvolution of images blurred by a known blur kernel. Our algorithm uses Fast Fourier Transforms and Lookup Tables to achieve real-time deconvolution performance with non convex gradient-based priors. Finally, for certain image restoration problems with no clear formation model, we demonstrate how learning a direct mapping between original/corrupted patch pairs enables effective restoration.
We develop multi-level preconditioners to solve discrete Poisson equations. Existing multilevel preconditioners have two major drawbacks: excessive bandwidth growth at coarse levels; and the inability to adapt to problems with highly varying coefficients. Our approach tackles both these problems by introducing sparsification and compensation steps at each level. We interleave the selection of fine and coarse-level variables with the removal of weak connections between potential fine-level variables (sparsification) and compensate for these changes by strengthening nearby connections. By applying these operations before each elimination step and repeating the procedure recursively on the resulting smaller systems, we obtain highly efficient schemes. The construction is linear in time and memory. Numerical experiments demonstrate that our new schemes outperform state of the art methods, both in terms of operation count and wall-clock time, over a range of 2D and 3D problems.
Title: A Balancing Domain Decomposition By Constraints Deluxe Method For Numerically Thin Reissner-Mindlin Plates Approximated With Falk-tu Finite Elements
Author(s): Lee, Jong Ho
The Reissner-Mindlin plate models thin plates. The condition numbers of finite element approximations of these plate models increase very rapidly as the thickness of the plate goes to 0. A Balancing Domain Decomposition by Constraints (BDDC) De Luxe method is developed for these plate problems discretized by Falk-Tu finite elements. In this new algorithm, subdomain Schur complements restricted to individual edges are used to define the average operator for the BDDC De Luxe method. It is established that the condition number of this preconditioned iterative method is bounded by C(1 + log (H/h))^2 if t, the thickness of the plate, is on the order of the element size h or smaller; H is the maximum diameter of the subdomains. The constant C is independent of the thickness t as well as H and h. Numerical results, which verify the theory, and a comparison with a traditional BDDC method are also provided.
Title: Cryptographic Security of Macaroon Authorization Credentials
Author(s): Lopez-Alt, Adriana
Macaroons, recently introduced by Birgisson et al., are authorization credentials that provide support for controlled sharing in decentralized systems. Macaroons are similar to cookies in that they are bearer credentials, but unlike cookies, macaroons include caveats that attenuate and contextually confine when, where, by who, and for what purpose authorization should be granted.
In this work, we formally study the cryptographic security of macaroons. We define macaroon schemes, introduce corresponding security definitions and provide several constructions. In particular, the MAC-based and certificate-based constructions outlined by Birgisson et al., can be seen as instantiations of our definitions. We also present a new construction that is privately-verifiable (similar to the MAC-based construction) but where the verifying party does not learn the intermediate keys of the macaroon, a problem already observed by Birgisson et al.
We also formalize the notion of a protocol for "discharging" third-party caveats and present a security definition for such a protocol. The encryption-based protocol outlined by Birgisson et al. can be seen as an instantiation of our definition, and we also present a new signature-based construction.
Finally, we formally prove the security of all constructions in the given security models.
Title: Relation Extraction with Weak Supervision and Distributional Semantics
Candidate: Min, Bonan
Advisor(s): Grishman, Ralph
Relation Extraction aims at detecting and categorizing semantic relations between pairs of entities in unstructured text. It benefits an enormous number of applications such as Web search and Question Answering. Traditional approaches for relation extraction either rely on learning from a large number of accurate human-labeled examples or pattern matching with hand-crafted rules. These resources are very laborious to obtain and can only be applied to a narrow set of target types of interest.
This talk focuses on learning relations with little or no human supervision. First, we examine the approach that treats relation extraction as a supervised learning problem. We develop an algorithm that is able to train a model with approximately 1/3 of the human-annotation cost and that matches the performance of models trained with high-quality annotation. Second, we investigate distant supervision, a weakly supervised algorithm that automatically generates its own labeled training data. We develop a latent Bayesian framework for this purpose. By using a model which provides a better approximation of the weak source of supervision, it outperforms the state-of-the-art methods. Finally, we investigate the possibility of building all relational tables beforehand with an unsupervised relation extraction algorithm. We develop an effective yet efficient algorithm that combines the power of various semantic resources that are automatically mined from a corpus based on distributional semantics. The algorithm is able to extract a very large set of relations from the web at high precision.
Title: Foundations of a Formal Theory of Time Travel
Author(s): Morgenstern, Leora
Although the phenomenon of time travel is common in popular culture, there has been little work in AI on developing a formal theory of time travel. This paper develops such a theory. The paper introduces a branching-time ontology that maintains the classical restriction of forward movement through a temporal tree structure, but permits the representation of paths in which one can perform inferences about time-travel scenarios. Central to the ontology is the notion of an agent embodiment whose beliefs are equivalent to those of an agent who has time-traveled from the future. We show how to formalize an example scenario and demonstrate what it means for such a scenario to be motivated with respect to an agent embodiment.
Title: A BDDC Algorithm for Raviart-Thomas Vector Fields
Author(s): Oh, Duk-Soon; Widlund, Olof B.; Dohrmann, Clark R.
A BDDC preconditioner is defined by a coarse component, expressed in terms of primal constraints and a weighted average across the interface between the subdomains, and local components given in terms of Schur complements of local subdomain problems. A BDDC method for vector field problems discretized with Raviart-Thomas finite elements is introduced. Our method is based on a new type of weighted average developed to deal with more than one variable coefficient. A bound on the condition number of the preconditioned linear system is also provided which is independent of the values and jumps of the coefficients across the interface and has a polylogarithmic condition number bound in terms of the number of degrees of freedom of the individual subdomains. Numerical experiments for two and three dimensional problems are also presented, which support the theory and show the effectiveness of our algorithm even for certain problems not covered by our theory.
Title: Usable Security Mechanisms in the Developing World
Candidate: Paik, Michael
Advisor(s): Subramanian; Lakshminarayanan
Security and privacy are increasingly important in our interconnected world. Cybercrimes, including identity theft, phishing, and other attacks, are on the rise, and computer-assisted crimes such as theft and stalking are becoming commonplace.
Contemporary with this trend is the uptake of technology in the developing world, proceeding at a pace often outstripping that of the developed world. Penetration of mobile phones and services such as healthcare delivery, mobile money, and social networking is higher than that of even amenities like electricity. Connectivity is empowering disenfranchised people, providing information and services to the heretofore disconnected poor.
There are efforts to use technology to enhance physical security and well-being in the developing world, including citizen journalism, education, improving drug security, attendance tracking, etc.
However, there are significant challenges to security both in the digital and the physical domains that are particular to these contexts. Infrastructure is constrained, literacy, numeracy, and familiarity with basic technologies cannot be assumed, and environments are harsh on hardware. These circumstances often prevent security best practices from being transplanted directly to these regions â in many ways, the adoption of technology has overtaken the users ability to use it safely, and their trust in it is oftentimes reater than it should be.
This dissertation describes several systems and methodologies designed to operate in the developing world, using technologies and metaphors that are familiar to users and that are robust against the operating environments.
It begins with an overview of the state of affairs, and several threat models. It continues with a description of Signet, a method to use SIM cards as trusted computing hardware to provide secure signed receipts. Next, Epothecary describes a low-infrastructure system for tracking pharmaceuticals that also significantly and asymmetrically increases costs for counterfeiters. The balance consists of a description of a low-cost Biometric Terminal currently in use by NGOs in India performing DOTS-based tuberculosis treatment, Blacknoise, an investigation into the use of low-cost cameraphones with noisy imaging sensors for image-based steganography, and finally Innoculous, a low-cost, crowdsourcing system for combating the spread of computer viruses, particularly among non-networked computers, while also collecting valuable "epidemiological" data.
Title: Automating Separation Logic Using SMT
Author(s): Piskac, Ruzica; Wies, Thomas; Zufferey, Damien
Separation logic (SL) has gained widespread popularity because of its ability to succinctly express complex invariants of a program's heap configurations. Several specialized provers have been developed for decidable SL fragments. However, these provers cannot be easily extended or combined with solvers for other theories that are important in program verification, e.g., linear arithmetic. In this paper, we present a reduction of decidable SL fragments to a decidable first-order theory that fits well into the satisﬁability modulo theories (SMT) framework. We show how to use this reduction to automate satisfiability, entailment, frame inference, and abduction problems for separafion logic using SMT solvers. Our approach provides a simple method of integrating separation logic into existing verification tools that provide SMT backends, and an elegant way of combining SL fragments with other decidable first-order theories. We implemented this approach in a verification tool and applied it to heap-manipulating programs whose verification involves reasoning in theory combinations.
Title: Inapproximability Reductions and Integrality Gaps
Candidate: Popat, Preyas
Advisor(s): Khot, Subhash
In this thesis we prove intractability results for several well studied problems in combinatorial optimization.
Closest Vector Problem with Preprocessing (CVPP): We show that the preprocessing version of the well known Closest Vector Problem is hard to approximate to an almost polynomial factor unless NP is in quasi polynomial time. The approximability of CVPP is closely related to the security of lattice based cryptosystems.
Pricing Loss Leaders: We show hardness of approximation results for the problem of maximizing profit from buyers with single minded valuations where each buyer is interested in bundles of at most k items, and the items are allowed to have negative prices ("Loss Leaders"). For k = 2, we show that assuming the Unique Games Conjecture, it is hard to approximate the profit to any constant factor. For k > 2, we show the same result assuming P != N P.
Integrality gaps: We show SemiDefinite Programming (SDP) integrality gaps for Unique Games and 2 to 1 Games. Inapproximability results for these problems imply inapproximability results for many fundamental optimization problems. For the first problem, we show "approximate" integrality gaps for super constant rounds of the powerful Lasserre hierarchy. For the second problem we show integrality gaps for the basic SDP relaxation with perfect completeness.
Title: Natural Interaction with a Virtual World
Candidate: Rosenberg, Ilya
Advisor(s): Perlin, Ken
A large portion of computer graphics and human/computer interaction is concerned with the creation, manipulation and use of two and three dimensional objects existing in a virtual world. By creating more natural physical interfaces and virtual worlds which behave in physically plausible ways, it is possible to empower nonexpert users to create, work and play in virtual environments. This thesis is concerned with the design, creation, and optimization of user-input devices which break down the barriers between the real and the virtual as well as the development of software algorithms which allow for the creation of physically realistic virtual worlds.
Title: Parsing and Analyzing POSIX API behavior on different platforms
Candidate: Savvides, Savvas
Advisor(s): Cappos, Justin; Li, Jinyang
Because of the increased variety of operating systems and architectures, developing applications that are supported by multiple platforms has become a cumbersome task. To mitigate this problem, many portable APIs were created which are responsible for hiding the details of the underlying layers of the system and they provide a universal interface on top of which applications can be built. Many times it is necessary to examine the interactions between an application and an API, either to check that the behavior of these interactions is the expected one or to confirm that this behavior is the same across platforms. In this thesis, the POSIX Omni Tracer tool is described. POSIX Omni Tracer provides an easy way to analyze the behavior of the POSIX API under various platforms. The behavior of the POSIX API can be captured in traces during an application�۪s execution using various tracing utilities. These traces are then parsed into a uniform representation. Since the captured behavior from different platforms share the same format, they can be easily analyzed or compared between them.
Title: Security Mechanisms for Physical Authentication
Candidate: Sharma, Ashlesh
Advisor(s): Subramanian; Lakshminarayanan
Counterfeiting of goods is a worldwide problem where the losses are in billions of dollars. It is estimated that 10% of all the world trade is counterfeit. To alleviate counterfeiting, a number of techniques are used from barcodes to holograms. But these technologies are easily reproducible and hence they are ineffective against counterfeiters.
In this thesis, we introduce PaperSpeckle, a novel way to fingerprint any piece of paper based on its unique microscopic properties. Next, we extend and generalize this work to introduce TextureSpeckle, a novel way to fingerprint and characterize the uniqueness of the surface of a material based on the interaction of light with the natural randomness present in the rough structure at the microscopic level of the surface. We show the existence and uniqueness of these fingerprints by analyzing a large number of surfaces (over 20,000 microscopic surfaces and 200 million pairwise comparisons) of different materials. We also define the entropy of the fingerprints and show how each surface can be uniquely identified in a robust manner even in case of damage.
From a theoretical perspective, we consider a discrete approximation model from light scattering theory which allows us to compute the speckle pattern for a given surface. Under this computational model, we show that given a speckle pattern, it is computationally hard to reconstruct the physical surface characteristics by simulating the multiple scattering of light. Using TextureSpeckle as a security primitive, we design secure protocols to enable a variety of scenarios such as: i) supply chain security, where applications range from drug tracking to inventory management, ii) mobile based secure transfer of money (mobile money), where any paper can be changed to an on-demand currency, and iii) fingerprint ecosystem, a cloud based system, where any physical object can be identified and authenticated on-demand.
We discuss the construction of the prototype device ranging from optical lens design to usability aspects and show how our technique can be applied in the real world to alleviate counterfeiting and forgery. In addition, we introduce Pattern Matching Puzzles (PMPs), a usable security mechanism that provides a 'human computable' one-time-MAC (message authentication code) for every transaction,making each transaction information-theoretically secure against various adversarial attacks. The puzzles are easy tosolve even for semi-literate users with simple pattern recognition skills.
Title: Online Machine Learning Algorithms For Currency Exchange Prediction
Author(s): Soulas, Eleftherios; Shasha, Dennis
Using Machine Learning Algorithms to analyze and predict security price patterns is an area of active interest. Most practical stock traders combine computational tools with their intuitions and knowledge to make decisions.
This document explains the algorithms and discusses various metrics of accuracy. It validates the models by applying the model to a real-life trading price stream. Though it is very hard to replace the expertise of an experienced trader, software like this may enhance the trader's performance.
Title: Augmenting Information Flow for Visual Privacy
Candidate: Spiro, Ian
Advisor(s): Bregler, Christopher
In the Information Age, visual media take on powerful new forms. Photographs once printed on paper and stored in physical albums now exist as digital files. With the rise of social media, photo data has moved to the cloud for rapid dissemination. The upside can be measured in terms of increased efficiency, greater reach, or reduced printing costs. But there is a downside that is harder to quantify: the risk of private photos or videos leaking inappropriately. Human imagery is potentially sensitive, revealing private details of a persons body, lifestyle, activities, and more. Images create visceral responses and have the potential to permanently damage a persons reputation.
We employed the theory of contextual integrity to explore privacy aspects of transmitting the human form. In response to privacy threats from new sociotechnical systems, we developed practical solutions that have the potential to restore balance. The main work is a set of client-side, technical interventions that can be used to alter information flows and provide features to support visual privacy. In the first approach, we use crowdsourcing to extract specific, useful human signal from video to decouple it from bundled identity information. The second approach is an attempt to achieve similar ends with pure software. Instead of using information workers, we developed a series of filters that alter video to hide identity information while still revealing motion signal. The final approach is an attempt to control the recipients of photos by encoding them in the visual channel. The software completely protects data from third-parties who lack proper credentials and maintains data integrity by exploiting the visual coherence of uploaded images, even in the face of JPEG compression. The software offers end-to-end encryption that is compatible with existing social media applications.
Title: Toward a computational solution to the inverse problem of how hypoxia arises in metabolically heterogeneous cancer cell populations
Candidate: Sundstrom, Andrew
Advisor(s): Mishra, Bud; Bar-Sagi, Dafna
As a tumor grows, it rapidly outstrips its blood supply, leaving portions of tumor that undergo hypoxia. Hypoxia is strongly correlated with poor prognosis as it renders tumors less responsive to chemotherapy and radiotherapy. During hypoxia, HIFs upregulate production of glycolysis enzymes and VEGF, thereby promoting metabolic heterogeneity and angiogenesis, and proving to be directly instrumental in tumor progression. Prolonged hypoxia leads to necrosis, which in turn activates inflammatory responses that produce cytokines that stimulate tumor growth. Hypoxic tumor cells interact with macrophages and fibroblasts, both involved with inflammatory processes tied to tumor progression. So it is of clinical and theoretical significance to understand: Under what conditions does hypoxia arise in a heterogeneous cell population? Our aim is to transform this biological origins problem into a computational inverse problem, and then attack it using approaches from computer science. First, we develop a minimal, stochastic, spatiotemporal simulation of large heterogeneous cell populations interacting in three dimensions. The simulation can manifest stable localized regions of hypoxia. Second, we employ and develop a variety of algorithms to analyze histological images of hypoxia in xenographed colorectal tumors, and extract features to construct a spatiotemporal logical characterization of hypoxia. We also consider characterizing hypoxia by a linear regression functional learning mechanism that yields a similarity score. Third, we employ a Bayesian statistical model checking algorithm that can determine, over some bounded number of simulation executions, whether hypoxia is likely to emerge under some fixed set of simulation parameters, and some fixed logical or functional description of hypoxia. Driving the model checking process is one of three adaptive Monte Carlo sampling algorithms we developed to explore the high dimensional space of simulation initial conditions and operational parameters. Taken together, these three system components formulate a novel approach to the inverse problem above, and constitute a design for a tool that can be placed into the hands of experimentalists, for testing hypotheses based upon known parameter values or ones the tool might discover. In principle, this design can be generalized to other biological phenomena involving large heterogeneous populations of interacting cells.
Title: PhyloBrowser: A visual tool to explore phylogenetic trees
Candidate: Tershakovec, Tamara
Advisor(s): Shasha, Dennis; Coruzzi, Gloria
Primary acknowledgements go to my research advisor, Dennis Shasha, for his patient and unwavering support. I would also like to thank my second reader, Gloria Coruzzi, for encouraging and showcasing my work. Kranthi Varala helped immensely in explaining the biological and statistical concepts involved in this project. The Virtual Plant team got me started in biological data visualization and was a joy to work with. And finally, thanks to Chris Poultney, who put me in touch with Professor Shasha and started me on my way back to NYU.
Title: Rethinking Information Privacy for the Web
Candidate: Tierney, Matthew
Advisor(s): Subramanian; Lakshminarayanan
In response to Supreme Court Justice Samuel Alitoâs opinion that society should accept a decline in personal privacy with modern technology, Hanni M. Fakhoury, staff attorney with the Electronic Frontier Foundation, argued âTechnology doesnât involve an âinevitableâ tradeoff [of increased convenience] with privacy. The only inevitability must be the demand that privacy be a value built into our technologyâ . Our position resonates with Mr. Fakhouryâs. In this thesis, we present three artifacts that address the balance between usability, efficiency, and privacy as we rethink information privacy for the web.
In the first part of this thesis, we present the design, implementation and evaluation of Cryptagram, a system designed to enhance online photo privacy. Cryptagram enables users to convert photos into encrypted images, which the users upload to Online Social Networks (OSNs). Users directly manage access control to those photos via shared keys that are independent of OSNs or other third parties. OSNs apply standard image transformations (JPEG compression) to all uploaded images so Cryptagram provides image encoding and encryption protocols that are tolerant to these transformations. Cryptagram guarantees that the recipient with the right credentials can completely retrieve the original image from the transformed version of the uploaded encrypted image while the OSN cannot infer the original image. Cryptagramâs browser extension integrates seamlessly with preexisting OSNs, including Facebook and Google+, and currently has over 400 active users.
In the second part of this thesis, we present the design and implementation of Lockbox, a system designed to provide end-to-end private file-sharing with the convenience of Google Drive or Dropbox. Lockbox uniquely combines two important design points: (1) a federated system for detecting and recovering from server equivocation and (2) a hybrid cryptosystem over delta encoded data to balance storage and bandwidth costs with efficiency for syncing end-user data. To facilitate appropriate use of public keys in the hybrid cryptosystem, we integrate a service that we call KeyNet, which is a web service designed to leverage existing authentication media (e.g., OAuth, verified email addresses) to improve the usability of public key cryptography.
In the third part of this thesis, we present the design of Compass, which realizes the philosophical privacy framework of contextual integrity (CI) as a full OSN design. CI), which we believe better captures users privacy expectations in OSNs. In Compass, three properties hold: (a) users are associated with roles in specific contexts; (b) every piece of information posted by a user is associated with a specific context; (c) norms defined on roles and attributes of posts in a context govern how information is shared across users within that context. Given the definition of a context and its corresponding norm set, we describe the design of a compiler that converts the human-readable norm definitions to generate appropriate information flow verification logic including: (a) a compact binary decision diagram for the norm set; and (b) access control code that evaluates how a new post to a context will flow. We have implemented a prototype that shows how the philosophical framework of contextual integrity can be realized in practice to achieve strong privacy guarantees with limited additional verification overhead.
Title: PAC-Learning for Energy-based Models
Candidate: Zhang, Xiang
Advisor(s): LeCun, Yann; Sontag, David
In this thesis we prove that probably approximately correct (PAC) learning is guaranteed for the framework of energy-based models. Starting from the very basic inequalities, we establish our theory based on the existence of metric between hypothesis, to which the energy function is Lipschitz continuous. The result of the theory provides a new scheme of regularization called central regularization, which puts the effect of deep learning and feature learning in a new perspective. Experiments of this scheme shows that it achieved both good generalization error and testing error.
Title: Transaction chains: achieving serializability with low latency in geo-distributed storage systems
Author(s): Zhang, Yang; Power, Russell; Zhou, Siyuan; Sovran, Yair; Aguilera, Marcos K.; Li, Jinyang
Currently, users of geo-distributed storage systems face a hard choice between having serializable transactions with high latency, or limited or no transactions with low latency. We show that it is possible to obtain both serializable transactions and low latency, under two conditions. First, transactions are known ahead of time, permitting an a priori static analysis of conflicts. Second, transactions are structured as transaction chains consisting of a sequence of hops, each hop modifying data at one server. To demonstrate this idea, we built Lynx, a geo-distributed storage system that offers transaction chains, secondary indexes, materialized join views, and geo-replication. Lynx uses static analysis to determine if each hop can execute separately while preserving serializability.if so, a client needs wait only for the first hop to complete, which occurs quickly. To evaluate Lynx, we built three applications: an auction service, a Twitter-like microblogging site and a social networking site. These applications successfully use chains to achieve low latency operation and good throughput.