I currently lead the Ultra-scale AIOps Lab. I take a dual role of Chief Architect and Engineer Manager at
HUAWEI CLOUD in Munich, Germany and Dublin, Ireland. You can find more information about our work here:

Using machine learning and deep learning techniques, we apply AI to various areas related to HUAWEI CLOUD such as: anomaly detection, root cause analysis, failure prediction, reliability and availability, risk estimation and security, network verification, and low-latency object tracking. Our work fits under the AI Engineering umbrella as discussed in IEEE Software, Nov.-Dec. 2022.

Cloud Reliability

Our current work involves the development of the next generation of AI-driven IT Operations tools and platforms. This field is generally called AIOps (artificial intelligence for IT operations). In planet-scale deployments, the Operation and Maintenance (O&M) of cloud platforms cannot be done any longer manually or simply with off-the-shelf solutions.

It requires self-developed automated systems, ideally exploiting the use of AI to provide tools for autonomous cloud operations. Our work looks into how deep learning, machine learning, distributed traces, graph analysis, time-series analysis (sequence analysis), and log analysis can be used to effectively detect and localize anomalous cloud infrastructure behaviours during operations to reduce the workload of human operators. These techniques are typically applied to Big Data coming from microservice observability data:

We create innovative systems for:

  • Service health analysis: Resource utilization (e.g., memory leaks), anomaly detection using KPI and logs
  • Predictive analytics: fault prevention, SW/HW failure prediction
  • Automated recovery: fault localization and recovery
  • Operational risk analysis: CLI command analysis

We are currently developing the iForesight system which is being used to evaluate this new O&M approach. iForesight 7.0 is the result of more than 6 years of R&D with the goal to provide an intelligent new tool aimed at SRE cloud maintenance teams. It enables them to quickly detect, localize and predict anomalies thanks to the use of artificial intelligence when cloud services are slow or unresponsive. Many of our innovation and system developments is done in collaboration with the Technical University of Berlin and the Huawei-TUB Innovation Lab for AI-driven Autonomous Operations.

Observability

Design of Cloud Monitoring Services for monitoring and managing the performance, health, and security of global cloud-based infrastructures using machine learning.

Failure Prevention

Design a global, centralized and scalable Cloud Log Service to collect, analyze, and manage petabytes of logs and event data generated by various cloud-based and on-premises systems.

Failure Prediction

Design systems for failure prediction of HDD, SDD, RAM, and Optical network transceivers using Machine Learning.

Anomaly detection

Build a distributed Cloud Trace Service to follow and profile the execution of public cloud services’ requests as they travel across multiple infrastructure services, components, middleware, and systems in a public and private cloud.

Root-cause Analysis

Cloud Root Cause Analysis (RCA) refers to the process of identifying and understanding the underlying causes of issues or incidents that occur in cloud computing environments.

Recovery

Recovery or mitigation of cloud failures involves the use of automated processes and tools to identify, respond to, and recover from failures or issues in a cloud computing environment.

  • To come

About me

After ~15 years of working for different industrial and academic research organizations (e.g., SAP Research, The Boeing Company, CCG/Zentrum fur Graphische Datenverarbeitung, KIT, University of Dresden, University of Coimbra), Jorge joined Huawei Munich Research Center as a Chief Architect for Ultra-scale AIOps in April 2015 with the mission of building a new team to develop innovative solutions which explore AI/ML to operate and manage the troubleshooting of HUAWEI CLOUD.

As a strategist, he leads the vision, technical planning and research innovation roadmaps for applied AI to IT operation and maintenance. As a chief architect, he designs and implements AI-driven systems and algorithms. As an engineer manager, he leads 3 teams in the fields of AIOps, Edge AI and AI for Networks.

Jorge enjoys his current role and is always seeking for new technological challenges and breakthroughs in the fields of cloud computing, artificial intelligence and the Internet of Things.

In 2021, he co-founded the Huawei-TUB Innovation Lab for AI-driven Autonomous Operations. Jorge has published over 180 scientific publications in top peer-reviewed conferences and journals in the field of AI for IT operations, distributed systems, workflow management and semantic web (10000+ citations, h-index 45+), and holds 10 patents on related fields. He serves as an associate editor of IEEE Software since 2014. His latest book Fundamentals of Service Systems compiles results from his research work in 2010-2015. He created and led until 2009 the development of the W3C Unified Service Description Language (USDL).

He participated in European, German, US, and National research projects financed by the European Commission (FP7, EACEA), the German Ministry for Education and Research (BMBF), SAP Research (SAP) and Portuguese NSF (FCT). He is a founding member of the IFIP Working Group 12.7 on Social Semantics.

He is also Professor at the University of Coimbra, and affiliated to the Information Systems Group. He has interests in the fields of Cloud Computing, AI, SRE, BPM, Semantic Web, Web Services, and Enterprise Systems (see Google Scholar, DBLP, and LinkedIn)

Jorge received his Ph.D. in Computer Science from the University of Georgia, USA, and B.S. and M.S. degrees with top honors in Informatics Engineering from the University of Coimbra, Portugal.

Random info

I discovered by a random chance my passion in programming and computing when I was 14. My first computer was a Timex Computer 2068 and BASIC was the first language I learned.

Our lab’s culture of innovation and R&D is based on 5 main guiding principles:

A good researcher says, “Lets find out”, others say “Nobody knows”. When a good researcher makes a mistake, he says, I was wrong”, others say “It wasn’t my fault”. A good researcher works harder than others and has more time. Others are always “too busy” to do what is necessary. [Unknown source]

Contact