Why am I qualified to write about this?
In a former life, I was a malware analyst. It was my job to analyze incoming samples and determine if they were false positives, or false negatives. I also worked on automating the process, and it was a very neat job. Unfortunately for me, economic realities and the precarious nature of startup companies dictated other career paths. I had analyzed literally thousands of samples, and took notes on characteristics to help improve the anti-malware product.
Eventually after a while of doing this, I have some observations on where malware is going, and I’m going to share some of them in this post.
Growing Internationalization
In the past, an anti-malware company could focus on English-targeted samples. But an increasing percentage of malware samples are international in origin and targeting international machines. I saw numerous cases of Chinese malware targeting Chinese software or hosts. This was quite a challenge to determine if it was malware or not for several reasons.
Cultural Impact
One of the most fascinating facets of the increasing internationalization of malware is the cultural assumptions around such software. What is considered malware in the US may be commonly accepted in China or Japan, and this is largely due to the society that it exists in.
Anti-cheating rootkits are very common in games released in these countries. What is considered to be invasive in the North American or European world is acceptable there. These anti-cheating rootkits would hook into the kernel space in a very invasive way, and have the behavioral characteristics of malware such as hooking into the keyboard driver. This made it very difficult from a purely technical standpoint to distinguish them. These kits were attempting to protect the application from being tampered with while running, i.e. to reduce the incidence of bots, or modifications to the presentation layer to allow people to see through walls. They would watch for kernel debuggers, or running processes that did specific characteristic behavior. These very techniques would flag them as malware as many such samples would behave similarly to avoid antivirus or to prevent someone from easily reverse engineering them.
If I applied US standards to these particular samples and declared them a true positive, then we would have many angry international customers when their games no longer worked. This also applied to extremely intrusive adware. But these pieces of software could run on US machines as well, so it was a very tricky balance.
Linguistic Barriers
In the past, if I ran into a piece of malware that had foreign language strings in them, I could muddle through them if they were a Latin-derived language. Spanish or French, I did not have any issues with. But when it comes to languages that come from an entirely different root such as Chinese or Japaense written in hanzi or kanji, I was losing vital clues.
By looking at the behavior of the sample alone, I would declare it malware. But what if it was one of the aforementioned game rootkits? How do I know that the game actually includes it, or if it was indeed a trojan’ed game? With English language samples, I would simply look at the strings, or use Google. But I had to muddle through pages in a writing system that I could not easily begin to comprehend.
So, if you want to be a malware analyst, it would be in your best interest to become conversant or fluent in one or more of the non-Roman languages.
Internationalization of Antimalware Tools
As we are dealing more and more with malware samples that are international in scope, it becomes important that the tools themselves are internationalized. With the growth of samples targeted at other languages, the automatic tools that I wrote primarily dealt with ASCII and were becoming inadequate. String and keyword analysis did not work well. Tools need to be Unicode and multi-lingual.
Hints for International Malware Analysis
- Pay close attention to the signers of samples, whether they are signed or not. Once you have verified a signed application, consider it the baseline.
- Once you have multiple samples of what appears to be the same application but has different checksums, pay close attention to file size, and the version, vendor strings. Interestingly, many trojaned applications do not have the correct version and vendor strings.
- Use entropy to your advantage. Measure the entropy of the binary segments that you have. If they have very similar entropy values, and have a minor increment in version, the probability of it being a trojan is much lower.
- Pay close attention to the vendor and version strings of samples. See if you can get an authoritative version of the application from the vendor’s site and compare it. Once you have a sample that you can declare as as false positive, all other similar samples are much easier to analyze.
- Take note of what binary packers they are using. Certain packers have a higher probability of being used by malware. But there are legitimate use of packers, and some antivirus products will trigger a false positive on a packed application, no matter what.
- Build a library of samples, and understand the cultural context of the country of origin and destination. Categorizing the sample characteristics by these criteria will help you determine if it is a true or a false positive for that particular market.
Conclusion
It is becoming more and more important that entire infrastructure of malware analysis, from anti-malware client to backend infrastructure to the analyst herself become multilingual and multicultural. It is a difficult challenge that is going to crop up more and more in the future.