Emulation, Simulation, & False Flags

false black beard flag?

There are many write-ups distinguishing emulation from simulation, but this one is mine (which is why I also added in “False Flags” for more flavorful fun).

First, let’s establish some stodgy academic definitions:

Emulation: (computing definition) reproduction of a function or action on a different computer or software system. This is an older word used since before the 16th century English.

Simulation: imitation of a situation or process; the action of pretending; the production of a computer model for the purpose of study or learning. This is a newer word, first in widespread use during the mid 20th century.

False Flag: a covert operation designed to deceive by creating the appearance of a particular party or group being responsible for the activity, disguising the true source of responsibility.

All of those definitions are very close to each other, so it’s completely understandable that they would be confused. However, there is a landmine of a distinction to be made here: emulation implies an EXACTNESS to the copy, whereas simulation only implies SIMILARITY with some freedom to be different. For Example, a video game emulator allows you to play old video game ROMs exactly the way they were in their original condition, while a simulator is a remake of the game using newer (or maybe just different) technology and development tooling.

In the CYBER world, this has direct implications to testing and training an organization to be ready for threat actors.


Emulating an adversary group means duplicating their behavior with precision and exactness. This could literally be the replay of TTPs (Tactics, Techniques, and Procedures) known to be attributed to the threat actor from intelligence gathered during incident response. These TTPs can almost be thought of as programmed behaviors, which means emulating them requires sticking the program specifically in order with no deviations, basically as a sort of replay attack.


Simulating an adversary group means becoming a composite of behaviors (TTPs) from one or more adversary groups, potentially with new behaviors (TTPs) not previously used by that adversary group. Think of it as having the flexibility to introduce some edits to the programmed TTPs from the collected intelligence.

Emulation in Practice

In practice, emulating is very hard. First, not all threat actors have publicly or privately available intelligence in the format necessary to complete all of the threat actors’ steps with the precision required to meet the definition. Second, even for those that do, certain key steps may be out of bounds, legally, for the person “replaying them” (such as compromising third party infrastructure). Third, the “programmed TTPs” were collected at a single point in time, and techniques that were used during that string of events may not be reused in the future by that threat actor, so replaying them with precision may not be that valuable of an exercise.

Let’s look an example: suppose you wanted to emulate FIN7, and during your research you came across MITRE ATT&CK’s list of FIN7's known TTPs.

First problem: those are TTPs collected from a number of breaches over a course of 5+ years, and the members within the collective known as FIN7 have reportedly been fluid, so the techniques may be fluid as well.

Second problem: there isn’t a defined list of which TTPs are used in conjunction with each other — just that these were all used at one point in time by that actor group. At this point, just to continue pursuing emulation over simulation, you will likely have to acquire a private ($) intelligence feed to get a string of TTPs used in a single run event for a proper emulation run.

Third major problem: infrastructure. The best incident response programs today collect threat actors’ infrastructure metadata to inform response decisions. If you simply replay FIN7’s TTPs on a local VM inside your enterprise, it doesn’t look real and the traffic likely won’t traverse the correct portions of your egress network to generate telemetry. If you spin up a simple VM on a cloud host, that hosting provider may not match the infrastructure used by the threat actor — and that choice most certainly is as much of a TTP as their choice in execution methods, so your test is inexact and no longer an emulation. If you choose the proper hosting provider but do not use a DNS domain known to be used by that actor (do you even own a domain that the threat actor used to own?) and used for that particular phase of the TTP sequence, then it’s not exact, and therefore not an emulation. And if you do own that domain, can you keep your ownership of it a secret so that it remains associated and categorized with that threat actor for the purpose of correctly assessing your response program’s full sequence of steps and readiness? These infrastructure details are hard, but they matter. They matter more today than they did a couple years ago and the trends in cyber intelligence suggests they’ll matter even more in the future.

Finally, once you have all of those details correct — and you perform the exercise, which should indicate to your organization where your detections and response processes are strong and where additional improvement should be made, you find that your organization is ready for that actor’s actions as of the date that intelligence was collected, which by this point is likely a couple years ago at least. Threat actors — just like defenders and other engineers — innovate and improve, which means their TTPs change, and your difficult (and likely expensive) emulation exercise doesn’t mean your organization is prepared for that threat actor today.

So, in practice, emulation is dead out of the gate. It’s simply unrealistic to pull off.

What most InfoSec people mean when they say “emulation” is to emulate a subset of the full TTP sequence. Maybe a subset of the behaviors, maybe ignore the infrastructure details altogether, or maybe similar in nature, but not exact. Interestingly, there is a word for that: simulation. It’s similar, but not exact, with leeway for variances for things that do not matter precisely.

That said, I can see a future where threat intel vendors supply a feed of TTPs in a sequence to a vendor whose product can replay them with precision from infrastructure that matches as well. In that future, it won’t be security professionals with offense backgrounds performing the assessments, either. It will be defenders responsible for unit testing detection telemetry. They may not understand nuances of the tradecraft chosen by the attackers — that’s OK. They’re shooting for a Pareto Principle (80/20 rule) security program: focus on the 20% of threat actors that generate 80% of the likely TTPs they will see in their environments. This future excites me, because it’s putting the knowledge and tooling that once stayed behind veiled paywalls from intel and offensive security vendors into the direct hands of the engineer doing the work. I believe the word du jour is democratizing — it’s democratizing cyber (ooh, the buzzwords in this paragraph).

There’s also the problem of nascent threat actors, for whom there are no intel feeds indicating their preferred TTPs. An enterprise should never discount their potential to pop up for the first time in your space.

All of this leads back to adversary simulation. If emulation in practice is extremely difficult, then the security professionals whose jobs are to train and assess an organization’s readiness to respond to threat actors must embrace some freedom to innovate or mutate their actions. The questions are:

· To what extent should the red team innovate?

· To what extent should the blue team concern themselves with detection coverage for novel techniques?

· Is my red/blue program even mature enough that any of this discussion matters?

For most people, your organizations should just focus on that third question. It probably doesn’t matter. You may not have enough resources or investment to even determine if the red team exercise reflects real world or not. The red team you hired may not know the difference between FIN7 and APT39. Maybe your blue team doesn’t either. That’s ok. Focus on the plausible, make a list, prioritize the list, and start working the list. Come back to this subject when elements of the red team exercise start to feel significantly different from what you read in breach reports.

If your red team has to innovate to achieve a level of “success” (defining what that means is another long topic for another day), consider that a badge of honor, while at the same time focusing on sharing intelligence information with them to educate them about the threat actors that attack organizations like yours. They simply may not realize how far they’ve strayed away from “normal” while focusing on their objectives. Be careful when addressing them, also, because they may have a hard time coming to terms with the reality that, well, they’re not chasing “reality” anymore. Rely on good team building, assuming and verifying positive intent in all communications, etc., to solve this and keep things moving forward.

Your red team probably will have to innovate on some level, if for no other reason than to innovate past initial access controls because initial access is hard. Yes, “assume breach” models where the blue team hands the red team some sort of seeded access to simulate a user clicking a phish, etc., can be helpful, but they’re an incomplete attack chain — there is no beginning for the responders to chase back to. This is like a partial or limited movement repetition in weightlifting; you’ll train part of the muscle but leave part of the muscle weak and untrained at the beginning or end of the movement when we need functional strength based on the whole movement.

The answer to the second question is probably the hardest. “It depends” is a great standby answer here, because it really does depend on the team, the resources, the time required to research and implement the control, etc. The bottom line is that if both red and blue are focused on improving the security posture, this question should be easy enough to answer.

False Flags

No good discussion like this would be incomplete without a twist.

To summarize: there’s value in emulation, but it’s basically impossible to be exact and threat actors mutate their behaviors over time. False Flags can be a way to address this.

Imagine having an intelligence powered red team who is aware, at least on some level, of a threat actor that is of concern for the enterprise organization they are training. Let that red team take on the identity that they are a similar but competing organization and their goal is to cast doubt on attribution, blaming another threat actor group. This is massively empowering, because it handles the idea that the red team will be simulating not emulating (won’t be exact). They will have their own tools and TTPs, but they will intentionally overlap certain TTPs, tooling, and infrastructure choices in order to blame another threat actor for their activity.

The result:

· The blue team will have to contend with the important TTP sequences, tooling, and infrastructure choices of top adversaries

· The red team will look more authentic and less “like a red team”

· The blue team’s intel processes will be challenged to spot the variances and come to a decision if this is the same threat actor, an imposter, or a brand new one

· All parties will learn more about real actors, breaches, TTPs, tooling, infrastructure, and more, which is always a win

· Rather than the red team feeling like they’re painted into a corner with what they can and can’t do with a simple TTP replay, they’ll have freedom to innovate

· All parties will likely have more fun with the exercise

Give False Flags a shot.

Source: @malcomvetter

Emulation, Simulation, & False Flags