mnemonic security podcast

Reverse Engineering

mnemonic

To kick off 2025, Robby chats with Duncan Ogilvie, a renowned expert in Reverse Engineering (RE), the creator of x64dbg (a popular open-source x64/x32 debugger for Windows), and the mind behind 100+ other cool projects.

Their conversation covers the evolving field of RE, discussing common challenges, practical techniques, and how professionals navigate the landscape. Duncan also shares his insights on the current tools shaping the field, explores the role of "AI" in RE, and speculates on what the future might hold for the industry niche.

Listeners will also get a sneak peek into Duncan’s upcoming course, scheduled for February 20-21 in Oslo. The course will focus on using LLVM for binary analysis and is designed to help intermediate reverse engineers sharpen their skills. If you’re interested, sign up here!
https://www.mnemonic.io/resources/events-webinars/exclusive-training-with-duncan-ogilvie-LLVM-IR-and-binary-lifting/

Send us a text

Speaker 1:

From our headquarters in Oslo, norway, and on behalf of our host, robbie Perelta. Welcome to the Mnemonic Security Podcast.

Speaker 2:

To a guy like me, reverse engineering is like the definition of technical Whether it's decoding malware or just trying to figure out how a specific type of software works. You kind of have to have knowledge of programming languages, operating systems, computer architecture and, in general, just a solid foundation in security and networking. There are, of course, tools available that can help you Things like disassemblers, debuggers and whatever a hex editor is but I would guess that the most important of all is to be genuinely curious in whatever it is you're looking at, kind of like the bad guys are when a vendor releases a patch. So if you enjoy puzzles, deciphering obfuscated code, exploring evasion techniques or are simply curious about what reverse engineering means in a cyber context, hopefully this episode will be enlightening for you as well. Duncan Ogilvie, welcome to the podcast.

Speaker 1:

Yeah, thanks Thanks.

Speaker 2:

We have quite a task ahead of us. Today. We're supposed to talk about reverse engineering over a podcast.

Speaker 1:

Yeah, it's going to be interesting, for sure.

Speaker 2:

And, to complicate things even more, we had a big Christmas party last night, so my brain is like half working. So, duncan, tell us a bit about yourself.

Speaker 1:

Yeah, so I'm Duncan, I'm 28 years old at this point and, well, I got into reverse engineering when I was 14. Wow, I was always interested in computers. You know, ever since I was a child, I had like I was building fake computers. Well, they were real parts but they were broken parts and I had like printouts of fake screen and you know, I was always into it.

Speaker 2:

so, yeah, that's, uh, that's my, my story, I guess cool, were you like like modding games and stuff, or what was your definition?

Speaker 1:

of reversing back, then you know yeah, no, I was into like um, key generators, basically software piracy. I was super interested in it. So that's where my interest started. You know, at some point I downloaded this key generator and there was like this cool music and this cool graphics. So then I wanted to understand, right, like, how does it work, how do you get into that? And yeah, that's how I started.

Speaker 2:

And look at you, now 28 years old, and you're like I hear from my colleagues, you're like a god within reverse engineering.

Speaker 1:

So congratulations not even 30 yet. I'm just glad to not be in the Forbes 30 under 30 list, because all of them are frauds, right.

Speaker 2:

Well, be careful what you say. You still got two more years, you know. Yeah, fast forward to today. I know that you're coming to mnemonic here in I don't know a couple of months soon to hold the reverse engineering course. Yeah, yeah, end of February. Yeah, so is that what you do for your day job? Do you just travel around teaching people how to do reversing?

Speaker 1:

No, I wish, I wish That'd be really cool. No, that's something on the side that I'm doing for my day job. I work in mobile security, so I well, reverse engineering is part of that. But basically I try to find ways to detect, routing, detect abnormal environments, I suppose, to make sure, let's say, you're trying to do a bank transaction and the bank wants to know that this transaction came from a trusted environment, like a real user initiated it and the device was not routed, there was no hooking tools injected into the app or anything. So yeah, that's, I guess, my job, right now, what is your definition of reverse engineering?

Speaker 1:

Reverse engineer. Well, I guess the definition is in the word in a way. So you know, when you make something, you engineer it. So the reverse engineering is that you well do that in reverse. You try to figure out well how something works. That's like one part. Or also sometimes you know how it was made like, how it was engineered.

Speaker 1:

So, like I am, there's also reverse engineering. You know, on hardware, like on maybe you have even on physical structures, you could reverse engineer it. Right, you could like saw it and try to figure out. But you know, for me reverse engineering is more about software. So let's say you get a binary that's I don't know, maybe does something malicious, or you want to know some what it actually does. Right, then you, the process of reverse engineering is tried to figure that out just from the binary.

Speaker 1:

So you disassemble it, you try to, you know, find like, let's say, strings in it. You try to find maybe behavioral patterns, what it might be doing. So that whole thing is reverse engineering and there's a lot. You have static analysis, which is when you just look at the code and you try to figure out. You have dynamic analysis, where you actually run it, maybe in its environment that it's supposed to run, or maybe in a virtual environment, and then you also can try to figure out, so like it's a very broad topic, but I guess that's that's what it is, what do you do?

Speaker 2:

do you do both the dynamic and static? Do you actually just?

Speaker 1:

look at the code and just understand what it does sometimes.

Speaker 1:

Yeah, sure, sure yeah, yeah, it's, it's. You do every, you do a bit of everything. Yeah, uh, right, like it depends on on the task. Sometimes you cannot run the binary because you I don't know let's say you don't have the, you just have the binary but you don't have the whole hardware that it's supposed to run on. Then you know you cannot run it. So then you need to look at that. It's statically like so you have tools like ida pro and binary ninja where you can load those binaries in and they help you, basically step-by-step, to understand the bigger picture what's?

Speaker 1:

going on, yeah, yeah.

Speaker 2:

And you also created some of your own tools. I know that some of your tools. I had a podcast way a long time ago, when I was 28 probably, and I was asking people what their favorite open source tools are and a guy one of my friends at our biggest bank in Norway said he named your tool. It was DX.

Speaker 1:

Oh, really, yeah, yeah. So that's a project that I started at the end of high school. No way, yeah it was. I had to pick basically a project to do and I was already kind of working on on that. You know a bit. So then I had to pick between piano tuning I wanted to learn how to tune the piano and and this like to I don't know make the debugger or somehow make something out of it, right. So I chose, I chose that and my actually my German, my German teacher was my supervisor, so he didn't understand anything, basically on a technical level, but you know, he helped me to plan it right and so I spent the whole summer I think it was two months or something. I spent full time. My parents went on vacation, with my siblings, I think and I just had the living room table, I put my PC there and I was just working the whole day for the whole vacation on it.

Speaker 2:

So yeah, that's how that started. You better have gotten an A on that I actually didn't what?

Speaker 1:

Yeah, I got 70%, I think, or 75%. That is hilarious. It's just funny how the world works out, isn't it? Yeah, the grade isn't what mattered in the end. Right, it's the project.

Speaker 2:

I think you should send this podcast to your teacher back then Maybe he'll understand.

Speaker 1:

reversing the level I will.

Speaker 2:

Yeah.

Speaker 1:

Yeah, so you know that's like, and basically it was kind of I don't know lucky timing or something, because basically there was no 64-bit version. Back then there was this debugger called OliDebug, which was a 32-bit debugger only, and then 64-bit processors and operating systems just started to come out. So it was kind of this gap at least in my view there were a few. Like, windows has a debugger called WinDebug and of course that one supports Windows 64-bit because you know it was made, they made Windows and they used it. But yeah, there was kind of this gap and yeah, people started using it and it kind of just grew from there organically. It was all more or less an accident, I guess.

Speaker 2:

While we're there like a debugger. Can you explain that concept?

Speaker 1:

Yeah, sure, sure. So a debugger is a tool that you. It runs kind of on top of the regular executable. So you have an executable and then the debugger kind of runs on top and it gets information that, okay, now this executable loaded some module or now there is this break point. So you can how to say? I guess it's difficult to explain. I know I knew this was going to happen.

Speaker 1:

Visiting is a tool that allows you to introspect what happens on a different program on your computer. And introspecting means that you can also say that okay, at this point I want to pause, so that's called a breakpoint. So let's say, maybe you know that there is some malicious network communication going on and you can put a breakpoint on this function that does the communication and then the debugger. When it hits that point it will stop and it will give control to the debugger, and then you can look in the memory and see, oh, maybe there is some malicious payload being downloaded or maybe it's nothing, right, that depends on your use case. But yeah, it's for that At least.

Speaker 1:

X64 debug is made for debugging binaries. Normally it's called a debugger because the idea is that it helps you to remove bugs from your software. So usually you use it. Like when you talk about a debugger it's usually more. You know you have some C++ code and something is not quite working. So then you also use a debugger to figure out what's going on. But for x64 debug it's optimized to work when you don't have source code. So that's like a reverse engineering tool, yeah interesting.

Speaker 2:

It's kind of like a recorder then. Basically it just records what's going on and allows you to pause time and go dig into stuff. I guess.

Speaker 1:

Well, that also exists, but the x64.wikit doesn't record. It's like a live thing. So if you were to mess something up then you would have to restart the execution from the beginning yourself. There is also this thing called time travel debugging, which is exactly what you said, where you like take a recording of it. And this is very often, let's say, you know, software bugs. They never happen on the developer's machine. Yeah, I mean they also happen there, but usually they happen in some niche environment, right. So then you can ask someone, hey, can you record the execution? And then you can like take that recording and then do what you said. You can step back and forward in time and you can see, like maybe you can find the bug that way. So that also exists as also really powerful tool. Yeah, but x64 bug doesn't support that.

Speaker 2:

Yeah, well, you have you, can you always do another summer project? Yes, at your' house during vacation Kind of off topic though, but like when you send a crash report to Apple, that's basically what you're doing, is what you're just talking about, right there, right?

Speaker 1:

Almost so like a crash report is just usually records, like the memory that's close to the crash. So it doesn't actually record like a time series of all the stuff that happened just at the crash. It maybe dumps you like a time series of all the stuff that happened just at the crash. It maybe dumps you like you know, the stack, the register state, maybe some memory. But usually those crash reports are rather small, so like if you would save the whole state of the whole pc and you have 64 gigabytes of ram. You know you can imagine that it starts to become very big. So usually the crash reports are more, they're smaller and they're sculpted on just that crash right, yeah, yeah.

Speaker 2:

Interesting Because I've heard that you know I was always skeptical of sending crash reports because I'm like I'm not sharing shit with you. But now I kind of realized like the team at Apple right, like if they get a crash report and they've never seen it before, somebody's going gonna dig into that and be like what happened here, right, just because they're probably worried yeah, for sure, for sure, at least they will try.

Speaker 1:

I mean, you know it's, it's hard when you have like millions and millions of devices and you get millions and millions of crash reports. But no, no for sure. So you also use it like because you, when you have a crash report, you also record like the call stack. So it's like it crashed in this function, but that function, you know, was called by another function, another, and there's like it crashed in this function, but that function, you know, was called by another function, another, and there's this whole list of functions that was involved in it. So they also do like statistical analysis. So if they see that, oh, in this, you know, we got this one crash report, that's like only once with this specific call stack, they're like okay, whatever, right, maybe it was something. But then if they get hundreds of thousands of some kind of crash report, they try to fingerprint it and then they can focus their effort. So that's why developers always want crash reports in telemetry, because they can well, they can see what is kind of important to them.

Speaker 2:

So yeah, Interesting, yeah, and I guess if, depending who that specific person or client is, that will maybe go to a different team and Apple, like we got a crash report from a nuclear defense agency, whatever they're like. Okay, maybe we'll take it. Yeah, so Apple has a bunch of people that, yeah, your friends, I guess that are like looking at interesting things right.

Speaker 1:

I don't know anyone who works at Apple on that, but yeah for sure. And also, you know those crash reports are very anonymized, so they are not you can share them.

Speaker 2:

It's not like going to contain your credit cards or anything like that so reverse engineering, like if there's a patch, right, and it goes a really quick amount of time from when a CVE comes out to let, the threat actors are like hopping on it and they're reverse engineering a patch at that point, right. Can you just explain that one for me, or how do you look?

Speaker 1:

at that. Sure, yeah, yeah, they might do that. So if you have two versions of a binary that you know that, okay, in this version there was this kind of vulnerability and then in this version it was fixed, then you try to diff those two binaries. And that's also a form of reverse engineering, for sure, because you know it's not that easy to just diff two binaries. Right, because all the code moves around. If you add some more code in a function higher up, then everything is moving. So to do that in a consistent way, you also need reverse engineering tools to analyze the binary first and you need to try to find, like, the function. Sometimes you have the names of them, so that makes it easier, but if you don't have the name of the function, it becomes even more tricky. So, yeah, that's also reverse engineering.

Speaker 1:

And yeah, for sure, I guess threat actors are doing that. I don't know of any threat actors in that way, but like, yeah, yeah, for sure, if they know that there was some kind of vulnerability patch, then they will try to like div those two versions and they will try to find the vulnerability. Yeah, right, but you know, very often they're very complicated vulnerabilities. That's also something to add. So it's not like they added the check. If it's vulnerable, don't do it. Usually there's like huge chains of exploitation involved. So if you know what was changed, it doesn't necessarily mean that you can reproduce the vulnerability. Especially on modern, like with all the mitigations and all of that, it's like not that easy. You need to be like an expert expert in that field. I guess to, to, to maybe see it.

Speaker 2:

Yeah, so yeah so those damn bad guys, they're actually pretty good sometimes. Yeah, for sure, for sure, yeah, yeah, yeah. By the way, a dumb, really dumb question, but a binary that's basically just a bunch of code doing something right and that's like yeah, yeah, yeah well, okay, so when?

Speaker 1:

when I say a binary, usually what you mean is like an exe file on your, on your windows. So that is a bunch of code, but it also has information, like it's like a kind of a package. You know, when you run an exe file on your computer it doesn't just start, you know, executing the code at the very beginning and then just go, you know, through it. There's some kind of file format that explains that. Well, there are different sections, so there's some data section, let's say, with some text in it or whatever, and like it explains, that sort of the format contains this information, that here's like the entry point where you should start executing the code. So it's like, yeah, it's like a container, I suppose. So, yeah, that's what a binary is, and there's been different ones for mac os, for linux, for windows, for android. They all have different formats and there's, uh, loads of loads of those formats.

Speaker 2:

So yeah, okay, interesting. Yeah, and the bad guys just put the. Put bad stuff in a binary like an executable.

Speaker 1:

Yeah well, sure, yeah, if you have some malware like, uh, if you make a piece of malware, let's say, that wants to, um, remote, have a remote trojan on your like a remote access tool on your computer that can watch your screen, then, yeah, they would have to have some kind of executable, so that that could be one way that you just like write some code that takes a screenshot every five seconds and uploads it to some server, and then you build that into an executable file and then someone runs it. So, yeah, that could be a binary, but malware can also hide their code in legitimate binaries. That's also something. So you have like some, let's say, uh, google chrome or whatever, which is like a very trusted thing. Then sometimes malware authors will like add some codes, inject some code in there, and then they will try to uh, yeah, make someone run it.

Speaker 2:

essentially, yeah, that there's different ways to deliver your yeah, one of my colleagues was telling me because we have a reverse engineering team in Mnemonic right, and he said something that surprised me. He said most of the malware doesn't get reverse engineered. And in my head I was like, well, how does CrowdStrike or how do these EDR tools actually let or how do they stop these things if they don't understand them? Do you agree with what he said or how do you look at that?

Speaker 1:

I mean, like I said, I'm not working for an antivirus or EDR vendor, so I don't know for sure what CrowdStrike or anyone is doing. But I think what he meant, or probably what is happening, is that, yeah, a lot of malware is not reverse engineered, not by hand reverse engineered, not by hand. So you have, like, systems that are trying, for example, to emulate this executable, like just to run through it in a completely virtual environment, like a virtual machine or something.

Speaker 2:

Is that called the honeypot?

Speaker 1:

And then they look for honey. Well, no, not exactly. Honeypot is where you purposefully try to get people to log into your system and then you Then do bad stuff. Yeah, yeah, well, they do bad things, but it's like a fake system. So that way you get their payloads whatever they want to inject you. So it's a different thing. But I think what he meant is a lot of stuff gets analyzed by automated systems. Meant is like a lot of stuff gets analyzed by automated systems. Um, so you have, um, like, let's say, some kind of emulation where you can look at all the interactions that the malware does with the system and then you can somehow from that make some heuristic decision that, okay, it looks kind of bad, so probably we shouldn't, like probably should, probably we should report it or throw an alert or something. Yeah, that might be what he meant.

Speaker 2:

Yeah, that makes sense, why would you have to reverse engineer something by hand? You know.

Speaker 1:

Well, you have to reverse engineer something by hand, because you cannot make a tool to do something automatically if you cannot do it by hand. That makes sense, yeah, or you have to. Yeah, right, so and you have this escalation. So, like malware very often has like layers of of, uh, packing, you know, it like packs itself into different like layers, so sometimes you even have like eight layers of yeah, packing this is called. And let's say that your automated system fails on layer three and says that, okay, I don't know, this is something that I cannot handle. Then it would go to an actual reverse engineer who has to then by hand, you know, look at it, maybe unpack it, and then also, usually they will report back to the system authors, right, they will say that, okay, it wasn't working here, but we think that we can improve the automated system and it's like this kind of feedback loop. So you need like a human there to improve it.

Speaker 2:

Yeah, exactly, yeah, the teach machines. I'm not going to bring out the AI question yet, but it's coming.

Speaker 1:

I have to do that. Everybody wants to know about AI.

Speaker 2:

Exactly. Anyway, questions for my colleagues. Actually, the most common techniques that you've observed recently evasion techniques, Evasion techniques.

Speaker 1:

Yeah, so again, I'm not working in the malware space, I only observe from the outside, but I think a trend that I'm seeing which is also related to my course, coincidentally is that malware is getting more sophisticated when it comes to code obfuscation stuff. So I think, let's say, five years ago, most of the malware that you looked at was just very easy to reverse engineer. I suppose you know it's just some stealer. It tries to grab your Chrome cookies, it tries to grab some stuff you know, and you just look at the code, you read it. They make lots of mistakes, as always, and I think that in the past, like five years or so, so it's become much more common that they will use like code obfuscation techniques um, quite advanced ones to well obfuscate what they're doing. So just read the code.

Speaker 1:

Yeah, and also, this is kind of a sharing is going on, I think there. So you have those dark boards, a dark dark board, dark web market boards and then people are selling obfuscation toolkits to each other and there's a lot of sharing, usually with some profit going on. So, yeah, that's definitely a trend that I see. And when it comes to evasion, I think that's a technical term like how do you hide from your EDR? And I don't do that, that's not my job, so I cannot answer that specifically. But I do think that this code obfuscation is like a topic that's going to get bigger in the future because it's becoming kind of more widespread the knowledge about it. So yeah, Code obfuscation.

Speaker 2:

Can you explain that? What do they do? They can't just blur it out on the screen, right?

Speaker 1:

That's not what code obfuscation is so how do they?

Speaker 2:

do it.

Speaker 1:

How do they do it? Well, there's a lot of mathematical techniques, I guess. But to keep it very simple, let's say you want to calculate A plus B, then you could calculate a plus 5 billion, 325 plus b minus 5 billion, 325, right? So the the end result is still a plus b, but you made it more like syntactic complex to read. Yeah, like the behavior is the same, but you try to muddy the more layers yeah, yeah.

Speaker 1:

Or you just insert, like, instructions that do not affect the outcome of the program. So you just insert a bunch of instructions that are doing a lot of computations and then in the end the whole result is deleted and you actually do a plus b. So you might say that, okay, you would see it right. But you can imagine that like, let's say, a function that does something is like 300 bytes long, so that's like a small function. You could make it 30 megabytes of instructions with fake branches to other functions or fake calls. You can also encrypt the code. So there's lots of different techniques to make that very easy, simple function. Actually megabytes of data and as a human you cannot do much with that. You can. Maybe if you look on a very zoomed in level you can see one technique right. You just, oh, here they add five and then later they remove five. Okay, so I see what they're doing. But you know, you can imagine if this skills up, then it becomes much harder to to analyze.

Speaker 2:

So that's like the kind of obfuscation cool that I'm talking about yeah and so yeah, for for a person that would just uh, you'd be pulling your hair out trying to understand everything, I guess for a machine, that's a little easier, because maybe it'll just do all those things, or how does that work then from like machine perspective?

Speaker 1:

yeah, that's the difficult part, I, I guess. So you have like, um, that's a whole field of study, I guess, and it is actually not that easy to make a system that will generically remove such obfuscation. It's not some like mathematically it does the same, but it doesn't mean you can find the simplest form of something, because there's actually some kind of uh. Well, it's a difficult problem to to solve, yeah. So, yeah, when it comes to that, you, you have different uh techniques. So you have like symbolic execution where you can try to go through the code and you can prove that. Okay, let's see, here we have a jump that goes to some other function. Then we can try to go through the code and you can prove that. Okay, let's see, here we have a jump that goes to some other function. Then we can try to prove that actually that jump will never happen, and then you can remove it. So you can kind of, yeah, you can build an automated system.

Speaker 1:

Yeah, yeah, wow.

Speaker 2:

There's a lot of math in this, huh.

Speaker 1:

I mean, yeah, on one sure, there's, yeah, math, yes, there can be, but a lot of it is actually very simple stuff. Like I explained, you have A plus B, then you do A plus 500 plus B minus 500. And you can imagine that there's like a lot of such simple transformations that you can do to already make it hard. So, yes, there can like a lot of such simple transformations that you can do to already make it hard. So, yes, there can be a lot of math, uh, but personally I don't. I'm not like a math guy and I can still reason about these things. Uh, not that you need some crazy math background to to get into this. There's, there's a lot of, uh, yeah, low-ranking fruit as well, I suppose you're really good at puzzles, at least.

Speaker 1:

I mean, I guess you know, reverse engineering is kind of a puzzle.

Speaker 2:

It is a big-ass puzzle yeah, so I enjoy puzzles, but I'm not good at Sudoku for example Okay, interesting, can't be good at everything, duncan the connection between reverse engineering and attribution. I'm asking you a bunch of stuff that I realize now that maybe you feel like you're not the best person to answer, but you're what we got.

Speaker 1:

I can try, you know I can try. So, yeah, for me, well, and again, this is from an outside perspective, so I'm not in this field, but for me this concept of attribution is actually a bit silly, okay, in a way, because it's like, well, part of it is, of course, like hard science, but you can say that, okay, there is some malware and they have some kind of common sequences of how they organize their code or like they use certain libraries or certain techniques, and then you can say that, okay, that happened also in this other malware, so therefore they are the same family of malware, right?

Speaker 2:

so that that is like a well, kind of hard science that's probably like the strongest indication, I guess, that they I've heard before, or like that's what they base most attribution on. Is that?

Speaker 1:

I don't know exactly. Okay, but that, yeah, that's definitely has to be a big part of it. But the thing is that you cannot say much about the I think about the actors, you know. Let's say that there's some malware and then very often what happens is that this malware author, I don't know either they get caught or they get sick of it and they sell their source code to other people. So then it's like, yeah, sure, at the beginning it's okay, we source code to other people. So then it's like, yeah, sure, at the beginning it's okay, we saw this one sample and it looks like this one. So, yeah, it's probably the same. Oh, it's connecting to the same servers, maybe. Or you know, you can say, okay, just the same part of some kind of campaign of malware, the same criminal. But at some point it's like, yeah, he sold the code. Then that guy was not supposed to sell the code, but he gave it to his friend. So then it becomes very blurry.

Speaker 1:

And also, when it comes to nation state stuff, I think it's yeah, the binary was built in Russian time zone, so therefore, russia did it. Yeah, right, I mean you can speculate on that, but I don't know. If that's so true, you can easily add, fake, uh, attribution, attribution, metadata. You know there's a lot of metadata in binary, like which compiler produced it, maybe when it was built, and all that, but you can also fake all of that. So, um, for me the attribution is a bit of a well strange concept in in that way. Yeah, but when it comes to families, sure, there are families of malware and usually it is the same people. There's some kind of phishing campaign and they're doing 20 different organizations. Sure, then you can say that's the same malware in that way.

Speaker 2:

Hackers need to start having cameras in their rooms so we can actually see them do it. You did it, you pressed the button.

Speaker 1:

I have a camera, but it's always unplugged.

Speaker 2:

Yeah, we'll never call it Tools. When it comes to reversing, what are you looking forward or what do you see? How's the development going to be over the next couple of years?

Speaker 1:

So tools, I think, for reverse engineering. I think most professional reverse engineers use a tool called IDA Pro. So there's also Binary Ninja, which I like a lot, and they are much cheaper. So also for personal use, you could actually feasibly buy it. Those two tools. They cost money but they're definitely worth it. I mean they help a lot and I think what's the reason that? Well, I think IDida pro was the first tool, so it's called uh, ida, which stands for interactive disassembler this, if I get it right.

Speaker 1:

And the nice part is that you're not, because I think before that what you would just do is read basically dead listings of just assembly instructions. But with either you can like, you can rename functions to something right, so you can add comments, and it's like this interactive process. And also what's really cool is that they have decompilers. So this is a really hard problem and they solved it at least to some extent in a practical sense where you take assembly code and then you get back readable C code and you know this allows you to reverse engineer much quicker and also at a much higher level. So you can also like change the function arguments and you can get back if you put in a lot of enough time. You can get back almost readable C C code as she would write it. You can really see, like what, what happened, what, what John did there. So, yeah, they, they are definitely super worth it.

Speaker 1:

Binary Ninja is a bit newer on the space, but they are also like improving. Well, every month, every version gets better. They also have their decompiler now. They're starting to add support for a lot of stuff. So it's like definitely worth it. Yeah, they're expensive but worth it, and we use all of stuff. So it's like definitely worth it. Yeah, they're expensive but worth it, and we use all of them. And also there's Git, by the way. I forgot, but this is the paid tools, but there's also free ones. So x16 quarterback is open source and free. Git drive by the NSA is also free, really, and they are also really good. Yeah, yeah, yeah, it's a really good tool. You're a beginner and you're like I need tens of thousands of dollars. No, no, yeah, right, like we're also using a lot of free tools.

Speaker 2:

Uh, on a daily basis. But, uh, what are the like, what are some of those free tools that you would recommend? Or if I was to do this open source, uh, favorite open source technologies episode, again like what would be some of yours in there I mean I would definitely.

Speaker 1:

Uh, I think hydra is really good. It's especially good, I I think, for embedded stuff. So then of course I would recommend x64 debug.

Speaker 2:

Sameless plug.

Speaker 1:

Yeah, I mean, if you need to debug in Windows, then I think it's a nice tool, and there is also WinDebug. So this is for Microsoft you can, especially for kernel-level stuff. This is really good. Then there's Cutter, another tool similar to IDA Pro and Binary Ninja. It's kind of trying to compete with them or compete. You know, it's open source. There's IMHacks, which is more of a hacks editor, but also really useful, you must use a lot of tools in your like normal work life.

Speaker 2:

Whatever it is that you do Like. How many different?

Speaker 1:

tools do you use? Yeah, for sure. Yeah, so for Android there's JetX J-A-D I don't know how it's pronounced so, yeah, we're using that. Frida is also really popular, too for dynamic analysis. So with Frida it's really easy to just hook some functions and then see that, oh, the app you're analyzing is trying to open this file and this file and it's trying to write there or something. So it's like, yeah, you have different tools that you use.

Speaker 2:

Are there dedicated tools for code obfuscation?

Speaker 1:

For code obfuscation or for deobfuscation, that one For deobfuscation or for de-obfuscation, that one for de-obfuscation, I think. I think right now there is no dedicated tool for it, but you basically have extensions that you can build. For ida pro, you have a bunch of plugins they're called there where you basically you can like add on to. That's also why the tools are really good. You can add extensions onto them and you can put some passes in there that try to simplify the code. So that's something that you can build on top of the tools. So Binary Ranger has that, ida has it, gitra has it as well.

Speaker 2:

So kind of customize your own workflow sort of thing.

Speaker 1:

Yeah, yeah yeah, you build your own tools on top of that kind of framework. So Ghidra especially I think this is a really underappreciated feature of Ghidra that you can also program in Java. You can use it as a kind of library. So you could also do headless analysis, which means that there's no UI, but you just have a server and you put the binary in there and then you have some automation script that try to analyze it somehow means that there's no UI, but you just have a server and you put the binary in there and then you have some automation script that try to analyze it somehow. So there's also to discuss before you could make some kind of malware analysis pipeline based on Ghidra. I mean, how could it work? I don't know, but that's possible to do.

Speaker 2:

So, yeah, I mean back to the other question. There there is probably tools for code obfuscation as well, but those are tools on the other side that we don't want to talk about well, no, I, that's a common thing that I hear, but I actually don't agree with that, because you need to.

Speaker 1:

If you want to learn about something, you need to be able to use it right, and there is a lot of open source, like on GitHub. There's a lot of open source obfuscators. So, on different levels, like based on LLVM, there's one called OLLVM. This was, I think, a paper from 2015. You know, this is something you can just download, build it and you can obfuscate some binaries, and I definitely encourage people to actually try that right if you're interested, because they are just tools. They're not used for bad, they're also used for good. Like obfuscation is not just a tool that malware uses, it's also something that people use to protect their intellectual property.

Speaker 2:

I was just going to say that, yeah, if you own your own software, you kind of don't want competitors to be able to reverse it and do it.

Speaker 1:

Right.

Speaker 2:

Yeah, Exactly exactly.

Speaker 1:

So yeah, it's just a tool. It's not evil in itself.

Speaker 2:

So the code obfuscation is actually a legitimate use case in this world.

Speaker 1:

Yeah, yeah, for sure, for sure.

Speaker 2:

Yeah.

Speaker 1:

Yeah, I mean, you know, imagine, like governments, if you send your rocket somewhere you know you want to, if they land that it don't explode, if they explode it doesn't matter, but if they don't explode you don't want your enemies to figure out what it's doing, right? So there's definitely code obfuscation in all those type of stuff Interesting.

Speaker 2:

So yeah, yeah, and just within code obfuscation like how, how big is that world? Like, uh, I have probably quite small actually, yeah, no, yeah, I know, I'm thinking about like, um, like the amount of different techniques, cause I'm assuming that there are people out there that are they're finding new ways, they're making it to make your life harder as a somebody that's trying to reverse engineer things, right, right, so you kind of have to constantly keep yourself up to date, I guess yeah, yeah, for sure, for sure, yeah, there's always new techniques and you know, usually you don't talk about like if you find a new technique to hide your secrets, like you're not going to share it put that shit on linkedin?

Speaker 1:

yeah, not usually not, but you know, there's also the academic uh world. So there is like this uh in academia. There's a huge interest in this, in this topic, I mean rightfully. So this is super interesting, of course. Um, but yeah, it's uh. I guess it's a mix. There's definitely some very hidden stuff out there that almost nobody knows about because they are just not going to say that, oh, we invented some new technique to hide this rocket's firmware. Nobody's going to talk about that. So you have a kind of academia, but then they are trying to keep up with the state of the art of the field. But also very often, like if you write like an academic paper, you can imagine that you're not going to say that oh look, we reverse engineered this commercial piece of software and we found this super cool technique. I mean, you know, nobody's going to be doing that because it's a lawsuit waiting to happen.

Speaker 2:

Yeah, I was going to say, that's the legal consequences for those things.

Speaker 1:

But yeah, there's a lot of techniques.

Speaker 2:

Okay, I'm going to pull it out now. The AI question. The big question how will ai affect your future, since you're so young? My future.

Speaker 1:

Well, I ah, that's a very broad question okay, your reverse engineering future let me think so, at least right now, with those large language models as as they are, like you know, I cannot say what will happen in the future. Maybe there will be agi and then it will be able to reverse engineer just like us, right, because it's general intelligence. But you know, with the current state of things, as I see it, there are some attempts to automatically reverse engineer things. So, let's say, you give the AI a bunch of assembly code and you're like what is this doing? But the results are, you know, not that great. Sometimes it works on simple stuff, but very often it just makes up something, or it doesn't make up something or just leave something out. Right, they get sick.

Speaker 1:

So for that like to actually replace me, looking at the code, saying that you know what this is doing, I don't think that, at least right now, there is anything out there that that does it. But that being said, um, I do think that AI is very good at summarizing stuff. So if you as a user, if you already gave, if you already named a bunch of functions, let's say you have some kind of binary and you give names to all the functions, like open file, read file, connect to server this and that, tag it. Yeah, you know from that it might be able to, because it's kind of human language. You might be able to get a summary right Like what is this function doing? Oh, it opens a file, then it connects to the server, then it uploads the file. You know it can do that kind of stuff pretty nicely. Yeah, also a friend of mine. He has a company called Unpack Me and he is basically well well his company. They do this report generation automatically.

Speaker 1:

So when you have some kind of malware family you know of the last week, then you can generate the report automatically what it's doing, where it's connecting and it kind of generates this nice executive summary and ai is extremely good at it, so that's definitely something that's um, like this interface between some analyst who gets a bunch of traces and technical information and to translate that to something that, let's say, an executive or manager or anyone can read on a high level. I think AI is going to be used for that a lot.

Speaker 2:

Yeah, I was just thinking while you were speaking about LLMs, because LLMs, they don't know anything. They're just guessing what the next word is going to be used for that a lot. Yeah, I was just thinking while you were speaking about LLMs, because LLMs, they don't know anything. They're just guessing what the next word is going to be. So if they've never seen it before, they're obviously not going to be able to. So let's say there's an AI that sits over Duncan's shoulder and watches everything you do and then kind of copies the way you're doing things.

Speaker 1:

I guess, yeah, maybe, I don don't have that experience, I can't say. I mean there is also other uses for AI. So you have this concept of vector database, so you have those embeddings where you give the AI the model, you give it some data, let's say, you gave it some assembly instructions. It can give you this kind of vector embedding, whatever it's called, and then you can, for example, use that to look for similar code. This is also AI in a way. It's not like AI how you think about it, like oh, it will be a robot that does the work, but this is also AI and this is also a very big application, I think. So you can definitely use AI, those machine learning models, to do that type of stuff. So clustering of data, you know it can be very good at that.

Speaker 2:

So I guess I mean everything you're saying is there. It is being used in your world, I guess yeah yeah, I mean, you know I also use it right.

Speaker 1:

So, like I said, I play with microcontrollers sometimes and, yeah, if I see some chip with some kind of protocol how to communicate with it, then I'm also using ChatGPT to at least get some exploratory information right so that I need to know the terms to look for, because you know, when you're in a new field you don't know what stuff is called, so you don't know how to google it right, the professional lingo for something. So yeah, for that type of stuff I also use chat gpt. You know also the right code for me. You know I use those auto completion, uh, co-pilot, different type of products. So it's not that I don't use ai, but I just don't. I don't know, I don't see it automating my job, I guess, because it's a very kind of high-level way of thinking. Maybe some individual parts can be automated, which is great, it makes my work easier and it's faster, but the higher-level thinking it's just not there with those. There's no reasoning in those models, at least not for my definition of reasoning.

Speaker 2:

Very off-topic, but I read this other article the other day. That was like you, Duncan and me we can talk to an AI for two hours and they can completely copy our personality completely. They very closely copy what we are right or what we would say and think, which is fucking scary. But hey, that'd be great if you can. You know, ah, duncan, he's busy, but I got his AI right here. I can talk to him and ask him some questions. That'd be one day?

Speaker 1:

Maybe, yeah, maybe. But you know, it's not cloning your brain, it's just cloning your mannerisms and maybe your voice and your like sentence structures and all that. But yeah, you're well the actual thinking. I don't think. Yeah, I mean, you know, but it's a very philosophical question. Yeah right, we're not going to go down there. We don't have any beer with us right now.

Speaker 2:

We'll take that one when you're in Oslo actually.

Speaker 1:

Yes, by the way, in.

Speaker 2:

Oslo. What are you going to be teaching at your course?

Speaker 1:

What's your course look like for those that may be interested. Yeah, so, my engineers. So it's not a beginner's course, unfortunately, but the idea is to teach the um, teach the fundamentals of how you use llvm for binary analysis. So that's a lot of words, um, llvm is a compiler framework that's used like normally. You know, when you have some c++ code or c code, whatever, then there is like this intermediate layer, that's this llvm ir, and then this llvm ir is then compiled down to the different machine code. So it's a kind of a reusable system that you can. You don't have to write your own compiler for arm, for x86, for all those, but you compile to this LLVM IR in the middle and then from there it can compile down to other stuff. So in my course we will basically do the opposite. So we will go from assembly code to this LLVM IR and from there basically use that to analyze the binary, to see what it's doing. Yeah, that's the course.

Speaker 2:

Who is that? I'm sure you did a great job explaining that. I just don't understand anything. So who uses these? Like who should come?

Speaker 1:

Who should come. I think it's basically all reverse engineers who want to level up their skills, I guess. So it's like a fundamental course in the sense that I think right now it's still a fundamental course, in the sense that I think right now it's still like kind of a niche topic and my goal is to that more people get into this topic because it's very interesting. So, yeah, like the idea is to teach the fundamentals to people who, you know, know how to reverse engineer, they know how to program, but they just never used an lvm to to help them do their job. So it's like kind of this to bring people into the topic. Yeah, um, yeah is that?

Speaker 2:

is that a product or something?

Speaker 1:

llvm hello, uh, no, it's. It's an open source project, okay, yeah, okay, and lots of companies contribute to it. So it's like a really open ecosystem and that's and that's also like the reason why it's nice for binary analysis, I mean for anything because you get to build on top of this ecosystem. So if there's any tools that are used for you know, let's say, for fuzzing or something, you can also use those to fuzz binaries if you know how to get your binary into this, into this ecosystem. There's a lot of stuff you can reuse there and there's a lot of research on it, a lot of people working on it, so it's kind of a nice place to be. You don't have to do everything yourself from scratch.

Speaker 2:

That reminds me of something. I saw a guy speak at a conference and he was the author of Fuzz Faster. You Fool. If you've heard of that tool, f-o-f-f-o-o, I can't say. Anyway, I is a finnish guy what?

Speaker 1:

is fuzzing, okay, fuzzing. So fuzzing is basically you have some function and you want to find vulnerabilities or crashes. So, like, you have some code that you write like, let's say, a parser for a file format. That's a very common thing and very often those parsers, they're written in like a bad way so that they're exploitable. So fuzzing is this process where you basically throw data at it, at this function, like different stuff, and you mutate it in different ways, like there's some kind of well, there's some feedback process there, so you try to go deeper and deeper into the code and you generate more and more data and then the goal is to find a crash.

Speaker 1:

So, like, you want the program to crash because then you know that with that input, the, the parser crashes. So if you would send a JPEG file that looks like this, then your phone crashes, which can be used not always, but can be used to then exploit your phone. So on iOS, for example, like also on Android, all the big exploits, they usually involve this parser. So you send, like, an iMessage with an image, then the parser parses it, then it gets exploited and then there's this whole chain of exploits to get kernel access.

Speaker 1:

So fuzzing is kind of a thing that can help you find such vulnerabilities or just bad code. You also use it as a software developer.

Speaker 2:

It was a hacker conference, so I'm pretty sure what he uses it for. Yeah, it's a very common thing you use to find full abilities, I guess. Yeah, it reminded me I needed to text him because he said, yeah, ask me later and I'll come on. But his talk was interesting because his tool he made it open source. I'm not sure. Just what I remember is it was used by the Russians against Ukraine and interesting, how do you feel that? You know you made this tool? That that's, that's, that's the downside of open source, I guess, Right, Um, Sure.

Speaker 1:

Yeah, I guess, so, I guess. So. Yeah, sure, that's a downside, but on the other hand, you know, you, you give the knowledge to everyone, not just the Russians, right? So it's, it's not in that way, it's, um, it's not bad in that way, but sure, like you, you cannot. There's also some ethical, you know, dilemma there, I suppose. But for me personally, I also always think that sharing the knowledge will be better, because, you know, if I discovered it, then maybe some criminal also discovered it, or someone else already discovered it. They just didn't talk about it because they use it for something malicious. So for me it's usually yeah, you should publish stuff, because it also helps the whole community, the whole field, to advance. Right, if there's some new technique and it's being hidden for a very long time, sure, that's nice, but then everyone can use it, while the broader community doesn't know, whereas if you publish about it in a responsible way, of course, then people can adapt, and I think that's usually better.

Speaker 2:

Well, I don't need to have that podcast anymore. Thank you, Duncan. I think that's what he was going to say too.

Speaker 1:

By the way, when are you coming to Oslo? The 20th of February February.

Speaker 2:

Yes, bring warm clothes.

Speaker 1:

Yeah, I kind of didn't think about that Because it might be really cold right.

Speaker 2:

It will not be warm. I'll tell you that.

Speaker 1:

Don't bring shorts. Okay, I will not bring shorts. So yeah, the course is 20th and 21st of February.

Speaker 2:

That's the course. Well, duncan, I owe you a beer. Yes, I would love to cash that in. You have any closing thoughts?

Speaker 1:

Closing thoughts.

Speaker 2:

I guess not really. We spoke an hour about reverse engineering. I feel smart. Good, we survived, nice, cool Well. Thank you so much for your time and for sharing your expertise and for, um yeah, coming to teach mnemonic in our, in our friends.

Speaker 1:

Uh, yeah, all that stuff that you said that I didn't understand I love.

Speaker 2:

I love the LinkedIn post. That was like are you? Yeah, I just said basically what you said and I was like God damn, that is technical. And I was like that's actually a smart way to put it, because there's only going to be technical people there, so cool, yeah, Well, looking forward to meeting you and shaking your hand in person. Yeah, yeah, Me too. Thanks for now, Take care until then.

Speaker 2:

Happy holidays Later, duncan Well. Well, that's all for today, folks, thank you for tuning in to the mnemonic security podcast. If you have any concepts or ideas that you'd like us to discuss on future episodes, please feel free to hit me up on linkedin or to send us a mail to podcast at mnemonicno. Thank you for listening and we'll see you next time.

People on this episode