Software Analytics: Mining Software Open Datasets and Repositories (STREAM’ 24)



In recent years, software engineering has benefitted from collecting, processing, and visualizing large volumes of data that are produced along the software development process, the use of software applications per se, and from the people that are involved in the development and operation of software. By using data stemming from systems that support software development and maintenance (such as JIRA, GitHub, the Maven Repository, etc.), one can track the progress of software development, as well as the quality of software processes and products. Lessons learnt from exploiting the aforementioned data sources through qualitative or quantitative studies are considered extremely useful for software processes’ and products’ improvement. So far, advances in software engineering provide us with progressively improved software development and the ability to continuously deploy software products. However, these new software products, systems, or services create new challenges to their specification, development, quality assurance, and security, where innovative solutions are required. To be able to take the next step in further advancing modern software development, we need to apply research methods and tools for mining, analyzing, representing, and properly using large quantities of software data. The mission of this session is to provide a forum, in which practitioners and researchers can exchange experiences and ideas around software analytics, mining software repositories, software data visualization, and big data systems in software engineering.



This track will consider papers related (but not limited) to the following topics:

  • Methods, tools, and applications of software analytics or mining software repositories for software process and product improvement
  • Machine learning and artificial intelligence for software engineering
  • Software engineering for machine learning and artificial intelligence
  • Visualization methods for representing software data that can support software engineering processes including program comprehension, software testing, refactoring, performance analysis, etc.
  • Methods and tools that use software data for the specification, design, development, quality assurance, deployment, and operation of software systems and products
  • Human and social aspects of developing modern software development approaches, software systems, and products that use developers’ or users’ feedback
  • Security and privacy regarding techniques that use software data and developers’ and users’ feedback in software engineering
  • Empirical studies that rely on software analytics, data science, software data visualization, and mining software repositories
  • Industrial experience with software analytics and data science in software engineering
  • Repository mining and management for modelling artefacts
  • Model searching, indexing, retrieval, storage, and automated program repair
  • Software evolution analysis as mined from software repositories
  • Dependency Management: Build tools, continuous integration, external dependencies, 3rd party libraries, and system configuration
  • Experiences with Collaborative Software Development Tools (e.g., GitHub, Bitbucket), Issue Trackers, Bug Trackers of Industrial and Open-Source Software Development
  • Software engineering for emerging technologies such as smart contracts and blockchain