Dr. Zhiqiang Zuo   (左志强)

Assistant Professor
Software Engineering Group
Department of Computer Science and Technology
Nanjing University, China

719 Computer Science Building
zqzuo AT nju.edu.cn
163 Xianlin Road, Qixia District, Nanjing, China

NJU

Zhiqiang Zuo

About Me

I am now an assistant research professor in the Department of Computer Science & Technology, Nanjing University. Before joining NJU, I got my PhD from National University of Singapore in 2015, and worked as a postdoc scholar at University of California, Irvine from 2015 to 2017.

I am recruiting PhD, Master and Undergraduate students. If interested, please feel free to drop me an email.
现招收博士、硕士以及南京大学本科生,对系统软件、大数据系统方向感兴趣的同学,欢迎随时邮件联系。

Research Interests

  • System software
  • Big Data systems
  • Dynamic and static analyses

My research interests span Programming Languages, Software Engineering, and Systems. I am recently focusing on building and optimizing customized system software (e.g., Big Data systems, compilers, and operating systems) for various applications and environments, such as program analyses, data analytics, system profiling, and SAT solving.

News

  • [2019.9] A grant (general program) is funded from NSF of Jiangsu Province.
  • [2019.8] Our Cod paper for coverage profiler testing was accepted to ASE'19.
  • [2019.6] Invited to serve on the external review committee (ERC) for ASPLOS'20.
  • [2019.6] Invited to serve on the program committee (PC) for the 13th Innovations in Software Engineering Conference ISEC'20.
  • [2019.5] The presentation video and audio of Grapple at EuroSys'19 are available.
  • [2019.1] Our BigSpa paper was accepted to IPDPS'19.
  • [2019.1] Invited to give talks on "Systemized Program Analysis" at The University of Queensland and Griffith University, Australia.
  • [2018.12] Our Grapple paper was accepted to EuroSys'19.
  • [2018.12] Our ML-based memory leak detection paper was accepted to Chinese Journal of Software. My first paper in Chinese!
  • [2018.12] Our code coverage testing paper was accepted to ICSE'19.
  • [2018.08] A grant is funded from NSFC.
  • [2018.07] Our RStream paper was accepted to OSDI'18.
  • [2018.02] Our context translation paper was accepted to PLDI'18.
  • I will join the Department of Computer Science at Nanjing University as an Assistant Professor soon.
  • [2017.12] Invited to be a committee member in Artifact Evaluation Committee of OOPSLA'2018.

Publications

† Research papers over 10 pages at top tier venues* are marked as red.
  • Automatic Self-Validation for Code Coverage Profilers
    by Yibiao Yang, Yanyan Jiang, Zhiqiang Zuo, Yang Wang, Hao Sun, Hongmin Lu, Yuming Zhou, and Baowen Xu.
    In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, (ASE'19), San Diego, CA, USA, November 11-15, 2019.
  • BigSpa: An Efficient Interprocedural Static Analysis Engine in the Cloud
    by Zhiqiang Zuo, Rong Gu, Xi Jiang, Zhaokang Wang, Yihua Huang, Linzhang Wang, and Xuandong Li.
    In Proceedings of the 33rd IEEE International Parallel & Distributed Processing Symposium, (IPDPS'19), Rio de Janeiro, Brazil, May 20-24, 2019.
  • Grapple: A Graph System for Static Finite-State Property Checking of Large-Scale Systems Code
    by Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Harry Xu, Linzhang Wang, and Xuandong Li.
    In Proceedings of the European Conference on Computer Systems, (EuroSys'19), Dresden, Germany, March 25-28, 2019.
  • Machine Learning-based Memory Leak Detection for C (In Chinese)
    by Yawei Zhu, Zhiqiang Zuo, Linzhang Wang, and Xuandong Li.
    In Chinese Journal of Software, (软件学报), Volumn 5, 2019.
  • Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing
    by Yibiao Yang, Yuming Zhou, Hao Sun, Zhendong Su, Zhiqiang Zuo, Lei Xu, and Baowen Xu.
    In Proceedings of the 41st International Conference on Software Engineering, (ICSE'19), Montréal, QC, Canada, May 25–31, 2019.
  • RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine
    by Kai Wang, Zhiqiang Zuo, John Thorpe, Tim Nguyen, and Guoqing Xu.
    In Proceedings of USENIX Symposium on Operating System Design and Implementation, (OSDI'18), Carlsbad, CA, USA, October 8–10, 2018.
  • Calling-to-Reference Context Translation via Constraint-Guided CFL-Reachability
    by Cheng Cai, Qirun Zhang, Zhiqiang Zuo, Khanh Nguyen, Harry Xu, and Zhendong Su.
    In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI'18), Philadelphia, PA, USA, June 18-22, 2018.
  • Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code
    by Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing (Harry) Xu and Ardalan Amiri Sani.
    In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS'17), Xi'an, China, April 8 - 12, 2017.
  • Low-Overhead and Fully Automated Statistical Debugging with Abstraction Refinement
    by Zhiqiang Zuo, Lu Fang, Siau-Cheng Khoo, Guoqing (Harry) Xu and Shan Lu.
    In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, (OOPSLA'16), Amsterdam, The Netherlands, Oct. 30 - Nov. 4, 2016.
  • Efficient Predicated Bug Signature Mining via Hierarchical Instrumentation
    by Zhiqiang Zuo, Siau-Cheng Khoo and Chengnian Sun.
    In Proceedings of the 2014 International Symposium on Software Testing and Analysis, (ISSTA'14), San Jose, CA, USA, July 21-25, 2014.
  • Efficient Statistical Debugging via Hierarchical Instrumentation
    by Zhiqiang Zuo.
    In Proceedings of the 2014 International Symposium on Software Testing and Analysis, (ISSTA'14 Doctoral Symposium), San Jose, CA, USA, July 21-25, 2014.
  • Mining Dataflow Sensitive Specifications
    by Zhiqiang Zuo and Siau-Cheng Khoo.
    In Proceedings of the 15th International Conference on Formal Engineering Methods, (ICFEM'13), Queenstown, New Zealand, Oct. 29 - Nov. 1, 2013.

Reports

  • "Systemized" Program Analyses – A "Big Data" Perspective on Scaling Large-Scale Code Analyses [pdf]
    by Guoqing (Harry) Xu, Zhiqiang Zuo, Kai Wang, Aftab Hussain, and Khanh Nguyen.
    Technical report, 2017.
  • Refinement Techniques in Mining Software Behavior [pdf]
    by Zhiqiang Zuo.
    Dissertation, February 2015.
  • Iterative Statistical Bug Isolation via Hierarchical Instrumentation [pdf]
    by Zhiqiang Zuo and Siau-Cheng Khoo.
    In DSpace at School of Computing, NUS, (TRC7-14).

Tutorials

  • "Systemized" Program Analyses – A "Big Data" Perspective on Static Analysis Scalability [link]
    by Guoqing (Harry) Xu and Zhiqiang Zuo.
    At the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS'17), Xi'an, China, April 8 - 12, 2017.

Talks

  • Grapple: A Graph System for Static Finite-State Property Checking of Large-Scale Systems Code
    At the European Conference on Computer Systems, (EuroSys'19), Dresden, Germany, March 25-28, 2019.
  • "Systemized" Program Analysis -- A "Big Data" Perspective
    At Tianjin University, Tianjin, China, Sep. 17, 2019.
    At Griffith University, Brisbane, Australia, Jan. 18, 2019.
    At University of Queensland, Brisbane, Australia, Jan. 15, 2019.
  • Systems Support for Sophisticated Program Analysis
    At the 2018 Workshop on Software Analysis and Verification, (SAVE'2018), Hangzhou, China, May 26-27, 2018.
    At the 2018 Annual Meeting of Jiangsu Provincial Computer Software Committee, Yangzhou, China, May 12-13, 2018.
    At University of California, Los Angeles, (UCLA), LA, CA, USA, Jan. 26, 2018.
  • Towards Highly Scalable Static and Dynamic Program Analysis
    At Nanjing University, Nanjing, China, Sep. 21, 2017.
    At Tianjin University, Tianjin, China, April 20, 2017.
  • "Big Data Thinking" for Highly Scalable Static Program Analyses
    At Shanghai Jiao Tong University, Shanghai, China, May 4, 2017.
    At Nanjing University, Nanjing, China, May 2, 2017.
  • Low-Overhead and Fully Automated Statistical Debugging with Abstraction Refinement
    At the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, (OOPSLA’16), Amsterdam, The Netherlands, Oct. 30 - Nov. 4, 2016.
  • Efficient Predicated Bug Signature Mining via Hierarchical Instrumentation
    At the 2014 International Symposium on Software Testing and Analysis, (ISSTA’14), San Jose, CA, USA, July 21-25, 2014.
  • Statistical Debugging via Hierarchical Instrumentation
    At the 2014 International Symposium on Software Testing and Analysis, (ISSTA’14 Doctoral Symposium), San Jose, CA, USA, July 21-25, 2014.
  • Mining Dataflow Sensitive Specifications
    At the 15th International Conference on Formal Engineering Methods, (ICFEM’13), Queenstown, New Zealand, Oct. 29 - Nov. 1, 2013.

Grants

  • PI, National Natural Science Foundation of China (Young Scientists Program), 2019.1-2021.12, Grant No. 61802168
  • PI, Natural Science Foundation of Jiangsu Province (General Program), 2019.7-2022.6, Grant No. BK20191247
  • Co-PI, National Natural Science Foundation of China (Key Program), 2020.1-2024.12, Grant No. 61932021

Projects

  • Customized Compiler and Runtime Design for Emerging Applications
    • Abstract: click

  • Trustworthy Compiler Toolchain: Validation and Verification
    • Abstract: click

  • BigSpa: An Efficient Interprocedural Static Analysis Engine in the Cloud
    • Abstract: click
    • Static program analysis is widely used in various application areas to solve many practical problems. Although researchers have made significant achievements in static analysis, it is still too challenging to perform sophisticated interprocedural analysis on large-scale modern software. The underlying reason is that interprocedural analysis for large-scale modern software is highly computation- and memory-intensive, leading to poor scalability. We aim to tackle the scalability problem by proposing a novel big data solution for sophisticated static analysis. Specifically, we propose a data-parallel algorithm and a join-process-filter computation model for the CFL-reachability based interprocedural analysis and develop an efficient distributed static analysis engine in the cloud, called BigSpa. Our experiments validated that BigSpa running on a cluster scales greatly to perform precise interprocedural analyses on millions of lines of code, and runs an order of magnitude or more faster than the existing state-of-the-art analysis tools.
    • Paper: accepted to IPDPS'19.

  • Graspan: A Scalable Parallel Disk-based Static Analyses Engine
    • Abstract: click
    • Static program analyses are widely used along the whole process of software development for bug detection, code optimization, testing, debugging etc. Unfortunately, a context-sensitive interprocedural analysis is often not scalable enough to analyze large codebases such as the Linux kernel. The high complexity of context-sensitivity makes the analysis both computation- and memory-intensive. Furthermore, most interprocedural analyses are difficult to parallelize, because they frequently involve decision making based on information discovered dynamically. In this work, we revisit the scalability problem of interprocedural static analysis from a Big Data perspective. That is, we turn Big Code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. Our key observation is that many interprocedural analyses can be formulated as a graph reachability problem. Therefore, we turn the programs into graphs and treat the analysis as graph traversal. We represent transitive edges explicitly rather than implicitly by physically adding transitive edges into the program graph, providing us a large (evolving) dataset to process and a simple enough computation logic which is to match the labels of consecutive edges with the production rules. This approach opens up opportunities to leverage parallel graph processing systems to analyze large programs efficiently. We develop Graspan, a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on large program graphs. We implement fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations. Moreover, we show that these analyses can be used to augment existing checkers for bug finding; these augmented checkers uncovered a total of 85 potential bugs and 1308 unnecessary NULL tests.
    • Paper: accepted to ASPLOS'17.
    • Code: released via Github.

  • Scalable Statistical Debugging via Abstraction Refinement
    • Abstract: click
    • Statistical debugging approaches have been well investigated over the past decade, which basically collect failing and passing executions and apply statistical techniques to identify discriminative elements as potential bug causes. Most of these approaches instrument the entire program to produce execution profiles for debugging. Consequently, they often incur hefty instrumentation and analysis cost. However, as in fact major part of the program code is error-free, full-scale program instrumentation is wasteful and unnecessary. We propose a general, rigorous, and automated predicate-space pruning technique for statistical debugging Our technique only needs to instrument and analyze partial code. While guided by a mathematically rigorous analysis, our technique is guaranteed to produce the same debugging results as an exhaustive analysis in deterministic settings. Our technique is inspired by --- and formulated as an instance of --- the abstraction refinement framework. Our key observation is simple: predicates are concrete program entities constituting a huge space; profiling and analyzing them directly is doomed to result in a high cost. If we can raise the abstraction level by first profiling and analyzing data from coarse-grained program entities (e.g., functions), we may obtain a bird-eye view of how each coarse-grained entity is correlated with a failure. This view may then help us decide, (1) which coarse-grained entity should be refined, with all the fine-grained entities (e.g., predicates) it represents profiled and analyzed, and (2) which coarse-grained entity does not need to be refined, with all the fine-grained entities it represents pruned away. We apply this technique to two different statistical debugging scenarios: in-house and production-run statistical debugging The experiments validate that our technique can significantly improve the efficiency of debugging.
    • Paper: accepted to ISSTA'14 DS, ISSTA'14, OOPSLA'16.
    • Code: released via Bitbucket.

  • Semantics-directed Specification Mining
    • Abstract: click
    • To tackle the lack of precise and complete specifications, specification mining is proposed to automatically infer software behavior from the execution traces as specifications. The majority of these specification mining approaches adopt a statistical technique, and share a common assumption: significant program properties occur frequently. Due to the presence of semantically insignificant events, a great number of meaningless specifications could be produced by directly mining frequent patterns over the raw execution traces. This severely affects the efficiency and effectiveness of specification mining. To address the lack of semantic significance and further improve the efficiency of specification mining, we introduce semantics information to refine the execution traces before mining, and propose a semantics-directed specification mining framework to discover semantically significant specifications from execution traces. We propose the respective semantics analysis according to user-specific semantics to extract semantically relevant sequences from raw execution traces, and then perform frequent pattern mining on these sequences to generate semantically significant specifications. We develop a particular system called dataflow sensitive specification mining where dataflow semantics is considered. The experimental results indicate that our approach can efficiently discover semantically significant specifications.
    • Paper: accepted to ICFEM'13.
    • Code: released via Bitbucket.

Tools

  • Grapple: a graph system for finite-state property checking of large-scale systems code
  • RStream: a single-machine, disk-based graph mining system
  • Graspan: a disk-based parallel program analysis engine for large-scale code
  • JSampler: a sampled predicate-based instrumentor for Java

I'm co-advising many excellent students together with Prof. Xuandong Li, Prof. Linzhang Wang and other faculties in the Software Engineering Group.

Current Students

  • Yiyu Zhang (PhD, starting Fall 2019)
  • Zewen Sun (PhD, starting Fall 2020)

  • Shenming Lu (Master, starting Fall 2017)
  • Qiuhong Pan (Master, starting Fall 2017)
  • Lu Lu (Master, starting Fall 2017)
  • Yifei Wang (Master, starting Fall 2018)
  • Guihang Wang (Master, starting Fall 2019)
  • Duanchen Xu (Master, starting Fall 2019)
  • Siyuan He (Master, starting Fall 2019)
  • Xinyue Zhang (Master, starting Fall 2019)
  • Kai Ji (Master, starting Fall 2020)
  • Wei Tao (Master, starting Fall 2020)
  • Yuhui Deng (Master, starting Fall 2020)

  • Tianyu Liu (Undergraduate student)
  • Yi Luo (Undergraduate student)
  • Lanlan Zhang (Undergraduate student)

Graduated Students

  • Chuang Pan (Undergraduate student)
  • Jin Shi (Undergraduate student)
  • Xiutian Yang (Undergraduate student)

Artifact Evaluation Committee

  • 2018: OOPSLA

Reviewer

  • 2017: ASPLOS, ENASE, ISCA-AGP
  • 2016: ECOOP, ISMM, FSE-SRC
  • 2015: ISSTA

Lecturer

  • NJU 22010100: Advanced Programming in C++ in Semester 1, 2019/2020
  • NJU 22000010: Basics of Programming with C in Semester 2, 2018/2019
  • NJU 22010100: Advanced Programming in C++ in Semester 1, 2018/2019

Teaching Assistant

  • NUS CS5218: Principles of Program Analysis in Semester 2, 2012/2013
  • NUS CS5218: Principles of Program Analysis in Semester 2, 2013/2014