Centralisation, repeatability, and automation in a modular SOC

2023-11-26  Cyber Security

The dictionary definition of “modular” leaves a little to be desired: “Employing or involving a module or modules as the basis of design or construction.” What is implied, but that I would make explicit, is that parts of the whole can be swapped out easily while maintaining the functionality of the product.

Take Framework, for example – it bucks the trend for impenetrable, unrepairable, and uncustomisable laptops by introducing a modular design. Want a different processor? Or a keyboard with a numpad? Or even a coloured bezel around your screen? Easy. Everything simply clips in and out, meaning that your device can be adapted over time without the waste of abandoning it for a whole new model.

The same should apply to your security operations centre (SOC). You’ll always want additional functionality to adapt to advances in technology and changing business requirements, but you should build a foundation that allows for these more tactical changes without wasting time altering the fundamentals, or introducing inefficiency by tacking on extra processes for your team to remember.

CRA as an approach to a modular SOC

To achieve this, I try to keep three things in mind: centralisation, repeatability, and automation.

Centralisation

An initial priority should be to develop one queue to rule them all. Rather than having analysts jump between different security tools to look for detections, they should be centralised and integrated into a single pane of glass showing the current state of play across all platforms.

This doesn’t just apply to detections, but also to communications with your users or customers and also any output from automation tools that must be reviewed. Storing all of this in central tickets or cases not only enables quicker responses to incoming messages and replies, but also ensures the information is easily available to analysts working on similar detections and queries in future.

This setup also means analysts themselves can be modular - not in such a way that individuals are viewed as interchangeable in their roles, but in a way that any ticket that enters (or returns to) the queue can be claimed by any available analyst, empowering them to work more efficiently and exposing them to a greater variety of scenarios. For example, by picking up a reply to an existing ticket, a more junior analyst has the opportunity to learn about the analysis performed by the original owner.

Repeatability

Supporting the technology should be processes, templates, and so on that ensure that work is completed consistently, and providing less experienced analysts with documentation to refer to for unfamiliar case types. Templates are huge timesavers for any content that analysts must compose multiple times per day, and pre-cooked log queries and scripts can help to cut the time spent on common investigations.

Centralisation helps here as it reduces the number of processes required - for example, if closing a case in the security orchestration, automation, and response (SOAR) tool automatically closes detections via an API, analysts only need to know how to do that, and not how to resolve detections in tools A, B, and C.

By this point, onboarding a new tool should be as simple as performing an API integration and producing response processes for any new detection types. Once detections and tickets are centralised and processes are recorded, the workflow for each can become so standardised that it may even be possible (in theory, at least) to map out the entire analyst-facing process into a single flowchart.

Automation

A great way to save analysts’ time is to automate basic, often-repeated response processes, but even before that the easiest and most impactful automation workflows are often those associated with your SOC’s housekeeping - agent health checks, weekly metrics, sweeps for unresolved detections, and so on.

Further down the line, of course, your automation suite will extend into more complex and specialised processes - automatic triage and remediation of commonly seen types of malware, for example. At this point the testing process becomes critical, because a wrong step during eradication can be disastrous.

You might be thinking, “How can I ever fully trust the automation logic to make the right call all the time?” But to this I’d argue that even partial automation is more efficient than none at all. Sending output to a ticket for an analyst to review, or running a triage script as a first step to provide the basis for an analyst to perform remediation on the discovered artefacts, still saves time and makes people’s jobs easier.

Adapting and expanding

Once the above has been implemented, even to a basic level, adapting SOC operations to new challenges becomes far easier than it would be in a more manual, less consolidated environment.

Both IT and cyber security are rapidly changing fields, and therefore it makes sense to build a SOC that is agile and adaptable to those changes. By structuring your SOC with modularity in mind, you’re putting yourself and your team in the best position to respond swiftly when change inevitably occurs.

Looking for the comments? My website doesn't have a comments section because it would take a fair amount of effort to maintain and wouldn't usually present much value to readers. However, if you have thoughts to share I'd love to hear from you - feel free to send me a tweet or an email.