Zhiqiang Zuo

Postdoc Scholar
Department of Computer Science
University of California, Irvine

Office: 3208 Donald Bren Hall
Email: zhiqiang DOT zuo AT uci DOT edu
Address: Donald Bren Hall, Irvine, CA, 92697-3435

Zhiqiang Zuo

About Me

I am now a postdoctoral scholar advised by Dr. Guoqing (Harry) Xu in Department of Computer Science, University of California, Irvine since April 2015. Before joining UCI, I pursued my PhD degree in National University of Singapore from August 2010 to February 2015 under the supervision of Dr. Siau-Cheng Khoo.

Here is my brief CV.

Research Interests

My research interests span Programming Languages, Software Engineering, Parallel/Distributed Computing, and Big Data Systems. I am recently focusing on building Big Data supports for scalable program analysis, synthesis and SAT solving. Moreover, I'm also interested in data mining and machine learning techniques, especially their applications on programming languages and software engineering.

News
  • I will join the Department of Computer Science at Nanjing University as an Assistant Professor soon.
  • [2017.12] I have been invited to be a committee member in Artifact Evaluation Committee of OOPSLA'2018.

Teaching Assistant

  • NUS CS5218: Principles of Program Analysis in Semester 2, 2012/2013
  • NUS CS5218: Principles of Program Analysis in Semester 2, 2013/2014

Artifact Evaluation Committee

  • 2018: OOPSLA

Reviewer

  • 2017: ASPLOS, ENASE, ISCA-AGP
  • 2016: ECOOP, ISMM, FSE-SRC
  • 2015: ISSTA

Projects

  • BigSAT: A Distributed SAT Solver on Spark
    • Abstract: click
    • The Boolean Satisfiability (a.k.a. SAT) problem is one of the most important problems in mathematical logic and computing theory. It serves as the foundation for a wide range of applications, including hardware/software verification, model checking, theorem proving, cryptography, computational biology, planning, and so on. Despite the significant advances in the research of SAT solving over the past decade, modern SAT solvers are still far from being satisfactory. Driven by the insight that multi-core systems and computing clusters are becoming increasingly accessible to regular developers, the aim of the BigSAT project is to develop parallel and distributed solvers that leverage these modern computing resources to solve very large SAT instances. Although there already exist many parallel solvers, all of these existing solvers are based on the Davis-Putnam-Logemann-Loveland (DPLL) procedure with clause learning, which performs backtracking search and suffers from poor scalability. This project revisits the SAT problem from a “Big Data” perspective. We shift our focus to a long overlooked algorithm, called the DP procedure, that consists of a sequence of explicit resolutions to eliminate the propositional variables one at a time. DP consumes more memory but exhibits data parallelism, lending itself to systems optimizations that achieve efficiency and scalability by utilizing large computing resources. Based on this insight, we develop BigSAT, a distributed SAT solver on Spark . We devise a novel Bulk Synchronous Parallel DP algorithm particularly tailored to cluster computing, leading to minimized network computation costs. Moreover, by applying efficient data structures (e.g., ZBDDs) and effective load balancing strategy, our distributed SAT solver supposes to be able to sufficiently take advantage of the computation resources and achieve promising scalability.

  • Distributed Graspan: A Scalable Static Analysis Engine in the Cloud
    • Abstract: click
    • We developed Graspan – a single-machine disk-based engine for scalable static analysis. Graspan makes it possible to perform fully context-sensitive analyses on Linux kernel codebase (16M lines of code) by utilizing disk spaces. However, it takes us more than 10 hours to finish running. In order to shorten the time costs, further improving its scalability, we are planning to extend Graspan to the substantially more challenging distributed setting. We would like to propose the Distributed Graspan – a distributed engine readily running on a cluster of hundreds or thousands of commodity computers. Distributed Graspan will be built on the top of Spark by implementing the novel distributed edge-pair centric computation model. By proposing the advanced partitioning schemes and load balancing strategies, we are confident to make Distributed Graspan highly scalable.

  • Graspan: A Scalable Parallel Disk-based Static Analyses Engine
    • Abstract: click
    • Static program analyses are widely used along the whole process of software development for bug detection, code optimization, testing, debugging etc. Unfortunately, a context-sensitive interprocedural analysis is often not scalable enough to analyze large codebases such as the Linux kernel. The high complexity of context-sensitivity makes the analysis both computation- and memory-intensive. Furthermore, most interprocedural analyses are difficult to parallelize, because they frequently involve decision making based on information discovered dynamically. In this work, we revisit the scalability problem of interprocedural static analysis from a Big Data perspective. That is, we turn Big Code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. Our key observation is that many interprocedural analyses can be formulated as a graph reachability problem. Therefore, we turn the programs into graphs and treat the analysis as graph traversal. We represent transitive edges explicitly rather than implicitly by physically adding transitive edges into the program graph, providing us a large (evolving) dataset to process and a simple enough computation logic which is to match the labels of consecutive edges with the production rules. This approach opens up opportunities to leverage parallel graph processing systems to analyze large programs efficiently. We develop Graspan, a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on large program graphs. We implement fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations. Moreover, we show that these analyses can be used to augment existing checkers for bug finding; these augmented checkers uncovered a total of 85 potential bugs and 1308 unnecessary NULL tests.
    • Paper: accepted to ASPLOS’17.
    • Code: released via Github.

  • Scalable Statistical Debugging via Abstraction Refinement
    • Abstract: click
    • Statistical debugging approaches have been well investigated over the past decade, which basically collect failing and passing executions and apply statistical techniques to identify discriminative elements as potential bug causes. Most of these approaches instrument the entire program to produce execution profiles for debugging. Consequently, they often incur hefty instrumentation and analysis cost. However, as in fact major part of the program code is error-free, full-scale program instrumentation is wasteful and unnecessary. We propose a general, rigorous, and automated predicate-space pruning technique for statistical debugging Our technique only needs to instrument and analyze partial code. While guided by a mathematically rigorous analysis, our technique is guaranteed to produce the same debugging results as an exhaustive analysis in deterministic settings. Our technique is inspired by --- and formulated as an instance of --- the abstraction refinement framework. Our key observation is simple: predicates are concrete program entities constituting a huge space; profiling and analyzing them directly is doomed to result in a high cost. If we can raise the abstraction level by first profiling and analyzing data from coarse-grained program entities (e.g., functions), we may obtain a bird-eye view of how each coarse-grained entity is correlated with a failure. This view may then help us decide, (1) which coarse-grained entity should be refined, with all the fine-grained entities (e.g., predicates) it represents profiled and analyzed, and (2) which coarse-grained entity does not need to be refined, with all the fine-grained entities it represents pruned away. We apply this technique to two different statistical debugging scenarios: in-house and production-run statistical debugging The experiments validate that our technique can significantly improve the efficiency of debugging.
    • Paper: accepted to ISSTA'14 DS, ISSTA'14, OOPSLA'16.
    • Code: released via Bitbucket.

  • Semantics-directed Specification Mining
    • Abstract: click
    • To tackle the lack of precise and complete specifications, specification mining is proposed to automatically infer software behavior from the execution traces as specifications. The majority of these specification mining approaches adopt a statistical technique, and share a common assumption: significant program properties occur frequently. Due to the presence of semantically insignificant events, a great number of meaningless specifications could be produced by directly mining frequent patterns over the raw execution traces. This severely affects the efficiency and effectiveness of specification mining. To address the lack of semantic significance and further improve the efficiency of specification mining, we introduce semantics information to refine the execution traces before mining, and propose a semantics-directed specification mining framework to discover semantically significant specifications from execution traces. We propose the respective semantics analysis according to user-specific semantics to extract semantically relevant sequences from raw execution traces, and then perform frequent pattern mining on these sequences to generate semantically significant specifications. We develop a particular system called dataflow sensitive specification mining where dataflow semantics is considered. The experimental results indicate that our approach can efficiently discover semantically significant specifications.
    • Paper: accepted to ICFEM'13.
    • Code: released via Bitbucket.

Tools

  • Graspan: a disk-based parallel program analysis engine for large-scale code
  • JSampler: a sampled predicate-based instrumentor for Java

Publications

  • Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code [pdf] [slides]
    by Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing (Harry) Xu and Ardalan Amiri Sani.
    In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS'17), Xi'an, China, April 8 - 12, 2017.
  • Low-Overhead and Fully Automated Statistical Debugging with Abstraction Refinement [pdf] [slides]
    by Zhiqiang Zuo, Lu Fang, Siau-Cheng Khoo, Guoqing (Harry) Xu and Shan Lu.
    In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, (OOPSLA'16), Amsterdam, The Netherlands, Oct. 30 - Nov. 4, 2016.
  • Refinement Techniques in Mining Software Behavior [pdf]
    by Zhiqiang Zuo.
    Dissertation, February 2015.
  • Iterative Statistical Bug Isolation via Hierarchical Instrumentation [pdf]
    by Zhiqiang Zuo and Siau-Cheng Khoo.
    In DSpace at School of Computing, NUS, (TRC7-14).
  • Efficient Predicated Bug Signature Mining via Hierarchical Instrumentation [pdf] [slides]
    by Zhiqiang Zuo, Siau-Cheng Khoo and Chengnian Sun.
    In Proceedings of the 2014 International Symposium on Software Testing and Analysis, (ISSTA'14), San Jose, CA, USA, July 21-25, 2014.
  • Efficient Statistical Debugging via Hierarchical Instrumentation [pdf] [slides]
    by Zhiqiang Zuo.
    In Proceedings of the 2014 International Symposium on Software Testing and Analysis, (ISSTA'14 Doctoral Symposium), San Jose, CA, USA, July 21-25, 2014.
  • Mining Dataflow Sensitive Specifications [pdf] [slides]
    by Zhiqiang Zuo and Siau-Cheng Khoo.
    In Proceedings of the 15th International Conference on Formal Engineering Methods, (ICFEM'13), Queenstown, New Zealand, Oct. 29 - Nov. 1, 2013.

Tutorials

  • "Systemized" Program Analyses – A "Big Data" Perspective on Static Analysis Scalability [link] [slides]
    by Guoqing (Harry) Xu and Zhiqiang Zuo.
    The 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS'17), Xi'an, China, April 8 - 12, 2017.