feeds

by zer0x0ne — on

cover-image

some of my favourite websites: the hackers news trail of bits dark reading threatpost tripwire security weekly xkcd



xkcd

Retrieved title: xkcd.com, 3 item(s)
Immune Factory

In the final vote, the doubters were won over by the strength of the name IMMUNION.

Pre-Pandemic Ketchup

I wonder what year I'll discard the last weird food item that I bought online in early 2020.

Bad Map Projection: The Greenland Special

The projection for those who think the Mercator projection gives people a distorted idea of how big Greenland is, but a very accurate idea of how big it SHOULD be.

Security Weekly

Retrieved title: Security Weekly, 3 item(s)
Preventing Criminals from Using Cloud Applications to Inject Chaos Into Work Environments

In 2020, cyber criminals used cloud applications, the cover of a pandemic, and a newly embraced work-from-home culture to serve up ransomware, steal data, and disrupt how companies do business. The year is over, but the challenges and risks remain.  How do we prevent these criminals from injecting chaos into our hybrid work environments? As […]

The post Preventing Criminals from Using Cloud Applications to Inject Chaos Into Work Environments appeared first on Security Weekly.

How the Best Defense Gets Better

Security starts before detection and response, but many organizations focus there first. Mature security teams understand the importance of identification and protection.  Establishing good cyber hygiene and taking proactive measures to secure themselves against the ever-increasing threat landscape is a critical first step in a holistic security program.  How should organizations build a holistic security […]

The post How the Best Defense Gets Better appeared first on Security Weekly.

Making the Case for Supply Chain Behavior Transparency

The Biden Administration’s Cyber Executive Order includes a Software Bill of Materials (SBOM), an electronically readable format designed to provide an inventory of third-party components that make up software components.  It is a critical and necessary first measure for protecting the software supply chain, but is it enough?One of the biggest challenges to supply chain transparency […]

The post Making the Case for Supply Chain Behavior Transparency appeared first on Security Weekly.

Tripwire

Retrieved title: The State of Security, 3 item(s)
What is a SIEM, And Why Should You Have One?

SIEM (pronounced like “sim” from “simulation”), which stands for Security Information and Event Management, was conceived of as primarily a log aggregation device. However, a SIEM’s primary capabilities are to provide threat detection, better enable incident investigation, and speed up your incident response time, while also giving you a unified, holistic view of your infrastructure. […]… Read More

The post What is a SIEM, And Why Should You Have One? appeared first on The State of Security.

Last (Executive) Orders Please: Supply Chains, Policy and Modernising Cybersecurity

On May 12th, the President of the USA, Joe Biden, signed an Executive Order (EO) that would bolster the cyber defences of the USA. The EO is intended to protect against “increasingly sophisticated malicious cyber campaigns that threaten the public sector, the private sector, and ultimately the American people’s security and privacy.” An EO is […]… Read More

The post Last (Executive) Orders Please: Supply Chains, Policy and Modernising Cybersecurity appeared first on The State of Security.

CISO Interview Series: How Aiming for the Sky Can Help Keep Your Organization Secure

Organizations need the right internal personnel like a CISO to keep their systems and data secure. But what kind of skills do these leaders need? And how should they guide their employers in a way that doesn’t overlook the evolving threat landscape? To find out, I spoke decided to speak with Goher Mohammad. Goher is […]… Read More

The post CISO Interview Series: How Aiming for the Sky Can Help Keep Your Organization Secure appeared first on The State of Security.

The Hacker News

Retrieved title: The Hacker News, 3 item(s)
16-Year-Old Security Bug Affects Millions of HP, Samsung, Xerox Printers

Details have emerged about a high severity security vulnerability affecting a software driver used in HP, Xerox, and Samsung printers that has remained undetected since 2005. Tracked as CVE-2021-3438 (CVSS score: 8.8), the issue concerns a buffer overflow in a print driver installer package named "SSPORT.SYS" that can enable remote privilege and arbitrary code execution. Hundreds of millions of

This New Malware Hides Itself Among Windows Defender Exclusions to Evade Detection

Cybersecurity researchers on Tuesday lifted the lid on a previously undocumented malware strain dubbed "MosaicLoader" that singles out individuals searching for cracked software as part of a global campaign. "The attackers behind MosaicLoader created a piece of malware that can deliver any payload on the system, making it potentially profitable as a delivery service," Bitdefender researchers

US and Global Allies Accuse China of Massive Microsoft Exchange Attack

The U.S. government and its key allies, including the European Union, the U.K., and NATO, formally attributed the massive cyberattack against Microsoft Exchange email servers to state-sponsored hacking crews working affiliated with the People's Republic of China's Ministry of State Security (MSS). In a statement issued by the White House on Monday, the administration said, "with a high degree of

Threatpost

Retrieved title: Threatpost, 3 item(s)
Why Your Business Needs a Long-Term Remote Security Strategy

Chris Hass, director of information security and research at Automox, discusses the future of work: A hybrid home/office model that will demand new security approaches.

16-Year-Old HP Printer-Driver Bug Impacts Millions of Windows Machines

The bug could allow cyberattackers to bypass security products, tamper with data and run code in kernel mode.

A New Security Paradigm: External Attack Surface Management

Advanced EASM solutions are crucial to automating the discovery of the downstream third-party (or fourth-party, or fifth-party, etc.) IT infrastructures that your organization is exposed to, and may be vulnerable to attack, posing a critical risk for your organization.

Dark Reading

Retrieved title: Dark Reading: , 3 item(s)
Law Firm for Ford, Pfizer, Exxon Discloses Ransomware Attack

Campbell Conroy & O'Neil reports the attack affected personal data including Social Security numbers, passport numbers, and payment card data for some individuals.

US Accuses China of Using Criminal Hackers in Cyber Espionage Operations

DOJ indicts four Chinese individuals for alleged role in attacks targeting intellectual property, trade secrets belonging to defense contractors, maritime companies, aircraft service firms, and others.

How Gaming Attack Data Aids Defenders Across Industries

Web application attacks against the video game industry quadrupled in 2020 compared to the previous year, but companies outside entertainment can learn from the data.

Trail of Bits

Retrieved title: Trail of Bits Blog, 3 item(s)
Solar: Context-free, interactive analysis for Solidity

We’re hiring for our Research + Engineering team! 

By Aaron Yoo, University of California, Los Angeles

As an intern at Trail of Bits, I worked on Solar, a proof-of-concept static analysis framework. Solar is unique because it enables context-free interactive analysis of Solidity smart contracts. A user can direct Solar to explore program paths (e.g., to expand function calls or follow if statements) and assign constraints or values to variables, all without reference to a concrete execution. As a complement to Solar, I created an integrated development environment (IDE)-like tool that illustrates the types of interactivity and analysis that Solar can provide.

Solar user interface.

The Solar UI has two main panels, “source” and “IR” (intermediate representation). The source panel displays the source code of the functions being analyzed and the IR panel translates those functions into an enhanced variant of SlithIR, our open-source IR. The green highlights show the lines of the function that are being analyzed. The two panes display similar information intended for different uses: The source pane serves as a map, aiding in navigation and providing context. The IR pane enables interactivity as well as visualization of the information deduced by Solar.

Analyses built on top of Solar are called “strategies.” Strategies are modular, and each defines a different mode of operation for Solar. Three strategies are shown in the above screenshot: concrete, symbolic, and constraint. Although each analysis has its own workflow, Solar needs all three to support the simplify, downcall, and upcall primitives, which are explained below.

Solar primitives

Simplify

The simplify primitive instructs the framework to make as many inferences as possible based on the current information. Solar can receive new information either through user input or by deriving it as an axiom from the program. In the screencast below, Solar axiomatically deduces the value of the constant, 2. If the user tells Solar to assume that the value of x is 5 and then clicks the “simplify” button, Solar will deduce the return value.

Downcall

The downcall primitive instructs the framework to inline a function call. Function calls are inlined so that the user can see the entire path in one pane. In a traditional IDE, the user would have to navigate to the function in question. In our framework, the downcalled function is inlined directly into the IR pane, and its source is brought into the source pane.

Upcall

The upcall primitive inlines the current context into a calling function. In other words, upcalling implies traversal up the call stack and is generally the opposite of downcalling. By upcalling, Solar can traverse program paths up through function calls, as demonstrated below. (Pay special attention to the green-highlighted lines.)

Together, these three primitives give Solar its defining properties: context insensitivity and interactivity. Solar is context-free (context-insensitive) because the user can start analysis from any function. It is interactive because the exact program path is determined by the upcalls and downcalls—which are chosen by the user.

Demonstration for reasoning about integer overflow

Solar can help a user reason about nontrivial program properties such as integer overflow. Consider the following program:

contract Overflow {
    function z(uint32 x, uint32 y) public returns (uint32) {
        uint32 ret = 0;
        if (x < 1000 && y < 1000) {
            will_it_overflow(x, y);
        }
        return ret;
    }

    function will_it_overflow(uint32 x, uint32 y) public returns (uint32) {
        return x * y;
    }
}

Here, we want to find out whether will_it_overflow will ever cause an integer overflow. Integer overflow occurs when the mathematical result of an arithmetic operation cannot fit into the physical space allocated for its storage.

Looking at the will_it_overflow function, it’s clear that integer overflow may be possible, as two 32-bit numbers are multiplied and placed into a 32-bit result. However, based on the call sites of will_it_overflow, if z calls will_it_overflow, there can never be an integer overflow; this is because z verifies that arguments to will_it_overflow are small. Let’s see how Solar would reach this conclusion.

Performing this analysis with Solar requires use of the constraint strategy, which works by attempting to find a single valid execution of the program. The user can constrain the execution with arbitrary polynomial constraints. To start the analysis, we select the will_it_overflow function from the left pane to indicate it as the desired starting point. Here is the initial analysis view:

Solar provides one possible execution that evaluates all values to zero. The next step is constraining the values of x and y. We can provide the following constraints (in terms of IR variables, not source variables) to the constraint strategy:

  1.   x_1 < (2 ** 32)
  2.   y_1 < (2 ** 32)
  3.   x_1 * y_1 >= (2 ** 32)

The first two constraints bind x_1 and y_1 to 32-bit integers. The third causes the solver to try to find an execution in which x_1 * y_1 overflows. It is common practice to prove properties using an SAT/SMT solver by showing that the negation of the property is unsatisfiable. With that in mind, we are going to use Solar to show that there are no executions in which x_1 * y_1 overflows, thereby implying that will_it_overflow does not overflow. After simplifying the use of these constraints, we get the following:

Again, Solar provides a single possible execution based on the constraints. Upcalling once puts us at line 5:

Because the green-highlighted lines do not form a logical contradiction, Solar can still return a valid program execution. However, upcalling again returns a different result:

The question marks on the right indicate that the solver cannot find any possible execution given the program path. This is because line 4 contradicts the inequality we wrote earlier: x_1 * y_1 >= (2 ** 32). The above steps constitute informal proof that overflow is not possible if will_it_overflow is called from z.

Conclusion

I am proud of what Solar became. Although it is a prototype, Solar represents a novel type of analysis platform that prioritizes interactivity. Given the potential applications for code auditing and IDE-style semantic checking, I am excited to see what the future holds for Solar and its core ideas. I would like to give a big thank-you to my mentor, Peter Goodman, for making this internship fun and fulfilling. Peter accomplished perhaps the most challenging task for a mentor: striking the delicate balance between providing me guidance and freedom of thought. I would also like to extend thanks to Trail of Bits for hosting the internship. I look forward to seeing the exciting projects that future interns create!

A Year in the Life of a Compiler Fuzzing Campaign

By Alex Groce, Northern Arizona University

In the summer of 2020, we described our work fuzzing the Solidity compiler, solc. So now we’d like to revisit this project, since fuzzing campaigns tend to “saturate,” finding fewer new results over time. Did Solidity fuzzing run out of gas? Is fuzzing a high-stakes project worthwhile, especially if it has its own active and effective fuzzing effort?

The first bugs from that fuzzing campaign were submitted in February of 2020 using an afl variant. Since then, we’ve submitted 74 reports. Sixty-seven have been confirmed as bugs, and 66 of those have been fixed. Seven were duplicates or not considered to be true bugs.


Given that 21 of these bugs were submitted since last December, it’s fair to say that our fuzzing campaign makes a strong case for why independent fuzzing is still important and can find different bugs, even when OSSFuzz-based testing is also involved.

Why is it useful to keep fuzzing such a project perhaps indefinitely? The answer has three parts. First, more fuzzing covers more code execution paths and long fuzzing runs are especially helpful. It would be hard to get anywhere near path-coverage saturation with any fuzzer we know of. Even when running afl for 30 or more days, our tests still find new paths around every 1-2 hours and sometimes find new edges. Some of our reported bugs were discovered only after a month of fuzzing.

A production compiler is fantastically complex, so fuzzing in-depth is useful. It takes time to generate inputs such as the most recent bug we found fuzzing the Solidity level of solc:

pragma experimental SMTChecker;
contract A {
    function f() internal virtual {
        v();
    }
    function v() internal virtual {
    }
}

contract B is A {
    function f() internal virtual override {
        super.f();
    }
}

contract C is B {
    function v() internal override {
        if (0==1)
            f();
    }
}

This code should compile without error, but it didn’t. Until the bug was fixed, it caused the SMT Checker to crash, throwing a fatal “Super contract not available” error, due to incorrect contract context used for variable access in virtual calls inside branches.

Compilers should undergo fuzz testing for long stretches of time because of their complexity and number of possible execution paths. A rule of thumb is that afl hasn’t really started in earnest on any non-trivial target until it hits one million executions, and compilers likely require much more than this. In our experience, compiler runs vary anywhere from less than one execution per second to as many as 40 executions per second. Just getting to one million executions can take a few days!

The second reason we want to fuzz independently alongside OSSFuzz is to approach the target from a different angle. Instead of strictly using a dictionary- or grammar-based approach with traditional fuzzer mutation operators, we used ideas from any-language mutation testing to add “code-specific” mutation operators to afl, and rely mostly (but not exclusively) on those, as opposed to afl’s even more generic mutations, which tend to be focused on binary format data. Doing something different is likely going to be a good solution to fuzzer saturation.

Finally, we keep grabbing the latest code and start fuzzing on new versions of solc. Since the OSSFuzz continuous integration doesn’t include our techniques, bugs that are hard for other fuzzers but easy for our code-mutation approach will sometimes appear, and our fuzzer will find them almost immediately.

But we don’t grab every new release and start over, because we don’t want to lose the ground that was gained with our long fuzzing campaigns. We also don’t continually take up where we left off since the tens of thousands of test corpora that afl can generate are likely full of uninteresting paths that might make finding bugs in the new code easier. We sometimes resume from an existing run, but only infrequently.

Finding bugs in a heavily-fuzzed program like solc is not easy. The next best independent fuzzing effort to ours, that of Charalambos Mitropoulos, also mentioned by the solc team in their post of the OSSFuzz fuzzing, has only discovered 8 bugs, even though it’s been ongoing since October 2019.

Other Languages, Other Compilers

Our success with solc inspired us to fuzz other compilers. First, we tried fuzzing the Vyper compiler—a language intended to provide a safer, Python-like, alternative to Solidity for writing Ethereum blockchain smart contracts. Our previous Vyper fuzzing uncovered some interesting bugs using essentially a grammar-based approach with the TSTL (Template Scripting Testing Language) Python library via python-afl. We found a few bugs during this campaign, but chose not to go to extremes, because of the poor speed and throughput of the instrumented Python testing.

In contrast, my collaborator, Rijnard van Tonder at Sourcegraph, had much greater success fuzzing the Diem project’s Move language—the language for the blockchain formerly known as Facebook’s Libra. Here, the compiler is fast and the instrumentation is cheap. Rijnard has reported 14 bugs in the compiler, so far, all of which have been confirmed and assigned, and 11 of which have been fixed. Given that the fuzzing began just two months ago, this is an impressive bug haul!

Using Rijnard’s notes on fuzzing Rust code using afl.rs, I tried our tools on Fe, a new smart contract language supported by the Ethereum foundation. Fe is, in a sense, a successor to Vyper, but with more inspiration from Rust and a much faster compiler. I began fuzzing Fe on the date of its first alpha release and submitted my first issue nine days later.

To support my fuzzing campaign, the Fe team changed failures in the Yul backend, which uses solc to compile Yul, to produce Rust panics visible to afl, and we were off to the races. So far, this effort has produced 31 issues, slightly over 18% of all GitHub issues for Fe, including feature requests. Of these, 14 have been confirmed as bugs, and ten of those have been fixed; the remaining bugs are still under review.

We didn’t just fuzz smart contract languages. Rijnard fuzzed the Zig compiler—a new systems programming language that aims at simplicity and transparency and found two bugs (confirmed, but not fixed).

The Future of Our Fuzzing Campaigns

We uncovered 88 bugs that were fixed during our afl compiler fuzzing campaign, plus an additional 14 confirmed, but not yet fixed bugs.

What’s interesting is that the fuzzers aren’t using dictionaries or grammars. They know nothing about any of these languages beyond what is expressed by a modest corpus of example programs from the test cases. So how can we fuzz compilers this effectively?

The fuzzers operate at a regular-expression level. They don’t even use context-free language information. Most of the fuzzing has used fast C string-based heuristics to make “code-like” changes, such as removing code between brackets, changing arithmetic or logical operators, or just swapping lines of code, as well as changing if statements to while and removing function arguments. In other words, they apply the kind of changes a mutation testing tool would. This approach works well even though Vyper and Fe aren’t very C-like and only Python’s whitespace, comma, and parentheses usage are represented.

Custom dictionaries and language-aware mutation rules may be more effective, but the goal is to provide compiler projects with effective fuzzing without requiring many resources. We also want to see the impact that a good fuzzing strategy can have on a project during the early stages of development, as with the Fe language. Some of the bugs we’ve reported highlighted tricky corner cases for developers much earlier than might have otherwise been the case. We hope that discussions such as this one will help produce a more robust language and compiler with fewer hacks made to accommodate design flaws detected too late to easily change.

We plan to keep fuzzing most of these compilers since the solc effort has shown that a fuzzing campaign can remain viable for a long time, even if there are other fuzzing efforts targeting the same compiler.

Compilers are complex and most are also rapidly changing. For example, Fe is a brand-new language that isn’t really fully designed, and Solidity is well-known for making dramatic changes to both user-facing syntax and compiler internals.

We’re also talking to Bhargava Shastry, who leads the internal fuzzing effort of Solidity, and applying some of the semantic checks they apply in their protobuf-fuzzing of the Yul optimization level ourselves. We started directly fuzzing Yul via solc’s strict-assembly option, and we already found one amusing bug that was quickly fixed and incited quite a bit of discussion! We hope that the ability to find more than just inputs that crash solc will take this fuzzing to the next level.

The issue at large is whether fuzzing is limited to the bugs it can find due to the inability of a compiler to detect many wrong-code errors. Differential comparison of two compilers, or of a compiler with its own output when optimizations are turned off, usually requires a much more restricted form of the program, which limits the bugs you find, since programs must be compiled and executed to compare results.

One way to get around this problem is to make the compiler crash more often. We imagine a world where compilers include something like a testing option that enables aggressive and expensive checks that wouldn’t be practical in normal runs, such as sanity-checks on register allocation. Although these checks would likely be too expensive for normal runs, they could be turned on for both some fuzzing runs, since the programs compiled are usually small, and, perhaps even more importantly, in final production compilation for extremely critical code (Mars Rover code, nuclear-reactor control code — or high-value smart contracts) to make sure no wrong-code bugs creep into such systems.

Finally, we want to educate compiler developers and developers of other tools that take source code as input, that effective fuzzing doesn’t have to be a high-cost effort requiring significant developer time. Finding crashing inputs for a compiler is often easy, using nothing more than some spare CPU cycles, a decent set of source code examples in the language, and the afl-compiler-fuzzer tool!

We hope you enjoyed learning about our long-term compiler fuzzing project, and we’ve love to hear about your own fuzzing experiences on Twitter @trailofbits.

Un-bee-lievable Performance: Fast Coverage-guided Fuzzing with Honeybee and Intel Processor Trace

By Allison Husain, UC Berkeley

Today, we are releasing an experimental coverage-guided fuzzer called Honeybee that records program control flow using Intel Processor Trace (IPT) technology. Previously, IPT has been scrutinized for severe underperformance due to issues with capture systems and inefficient trace analyses. My winter internship focused on working through these challenges to make IPT-based fuzzing practical and efficient.

IPT is a hardware feature that asynchronously records program control flow, costing a mere 8-15% overhead at record time. However, applying IPT as a fuzzing coverage mechanism isn’t practical except for highly experimental binary-only coverage solutions, since source code instrumentation typically provides far better performance. Honeybee addresses this limitation and makes IPT significantly faster to capture and hundreds of times faster to analyze. So now we have coverage-guided fuzzing—even if source code is unavailable—at performances competitive with, and sometimes faster than, source-level coverage instrumentation. Here, I will describe the development process behind Honeybee and a general overview of its design.

How it started…

IPT is an Intel-specific processor feature that can be used to record the full control flow history of any process for minutes at a time with a minimal performance penalty. IPT drastically improves on Intel’s older hardware tracing systems such as Branch Trace Store, which can have a performance penalty exceeding 100%. Even better, IPT supports the granular selection of the trace target by a specific process or range of virtual addresses.

A hardware mechanism like IPT is especially alluring for security researchers because it can provide code coverage information for closed-source, unmodified target binaries. A less appreciated fact is that not only can IPT be much faster than any existing black-box approaches (like QEMU, interrupts, or binary lifting), IPT should also be faster than inserting source-level instrumentation.

Coverage instrumentation inhibits run-time performance by thrashing various CPU caches via frequent, random writes into large coverage buffers. Source-level instrumentation also inhibits many important compile-time optimizations, like automatic vectorization. Further, due to the multi-threaded nature of most programs, instrumentation code needs to operate on bitmaps atomically, which significantly limits pipeline throughput.

IPT should work on both closed source and open source software with no change in coverage generation strategy and incur only a minimal 8-15% performance penalty. Jaw-dropping, I tell you!

How it’s going…

Unfortunately, the 8-15% performance penalty doesn’t tell the whole story. While IPT has low capture overhead, it does not have a low analysis overhead. To capture long traces on commodity hardware, IPT uses various techniques to minimize the amount of data stored per trace. One technique is to only record control flow information not readily available from static analysis of the underlying binary, such as taken/not-taken branches and indirect branch targets. While this optimization assumes the IPT decoder has access to the underlying program binary, this assumption is often correct. (See Figure 1 for example.)

IPT is a very dense binary format. To showcase what information is stored, I’ve converted it to a more readable format in Figure 1. The packet type is in the left column and the packet payload is on the right.

Example IPT trace showing how IPT minimizes data stored during program execution.

    1. Tracing starts while the program executes at 0x7ffff7f427ef.
    2. The program hits a conditional branch and accepts it. (The first ! in line 2.)
    3. The program hits two conditional branches and does not accept them. (The . . in line 2.)
    4. The program hits a conditional branch and does not accept it. (The last ! in line 2.)
    5. The program hits a conditional branch and accepts it. (Line 4.)
    6. The program hits an indirect branch, at which point it jumps to the last instruction pointer with the lower two bytes replaced with 0x3301.
    7. The program hits a conditional branch and accepts it.
    8. The program continued with no other conditional/indirect branches until the last four bytes of the instruction pointer were 0xf7c189a0 at which point tracing stopped because the program either exited or another piece of code that did not match the filters began executing.

Despite all the trace data provided, there is still a surprising amount of information that is omitted. The trace provides its beginning virtual address, eventual conditional branches, and whether they are taken. However, unconditional and unquestionable control flow transfers (i.e. call and jmp instructions) and conditional branch destinations are not provided. This reduces the trace size, because 1) non-branching code is never recorded, 2) conditional branches are represented as a single bit, and 3) indirect branches are only represented by changes to the instruction pointer.

So how is the real control flow reconstructed from this partial data? An IPT decoder can pair trace data with the underlying binary and “walk” through the binary from the trace start address. When the decoder encounters a control flow transfer that can’t be trivially determined, like a conditional branch, it consults the trace. Data in the trace indicates which branches were taken/not taken and the result of indirect control flow transfers. By walking the binary and trace until the trace ends, a decoder can reconstruct the full flow.

But herein lies the gotcha of IPT: although capture is fast, walking through the code is ostensibly not, because the decoder must disassemble and analyze multiple x86-64 instructions at every decoding step. While overhead for disassembly and analysis isn’t a problem for debugging and profiling scenarios, it severely hampers fuzzing throughput. Unfortunately, such expensive analysis is fundamentally unavoidable as traces cannot be decoded without analyzing the original program.

But…is this the end? Was it the beautiful promise of IPT fuzzing just…a dream? A mere illusion? Say it ain’t so!

Making IPT faster!

While profiling Intel’s reference IPT decoder, libipt, I noticed that over 85% of the CPU time was spent decoding instructions during a trace analysis. This is not surprising given that IPT data must be decoded by walking through a binary looking for control flow transfers. An enormous amount of time spent during instruction decoding, however, is actually good news.

Why? A fuzzer needs to decode a multitude of traces against the same binary. It may be reasonable to continuously analyze instructions for a single trace of one binary, but re-analyzing the same instructions millions of times for the same binary is extraordinarily wasteful. If your “hey we should probably use a cache” sense is tingling, you’re totally right! Of course, the importance of instruction decode caches is not a novel realization.

An open-source decoder that claims to be the fastest IPT decoder (more on that later) named libxdc tries to solve this issue using a fast runtime cache. Using a runtime cache and other performance programming techniques, libxdc operates 10 to 40 times faster than Intel’s reference decoder, which demonstrates that caching is very important.

I thought I could do better though. My critique of libxdc was that its dynamic instruction cache introduced unnecessary and expensive overhead in two ways. First, a dynamic cache typically has expensive lookups, because it needs to calculate the target’s hash and ensure that the cache is actually a hit. This introduces more overhead and complexity to one of the hottest parts of the entire algorithm and cannot be overlooked.

Second, and frankly much worse, is that a dynamic cache is typically expected to fill and evict older results. Even the best cache eviction strategy will cause future work: Any evicted instructions will eventually need to be re-decoded because a fuzzer never targets code just once. This creates duplicated effort and decoder performance penalties with every single cache eviction

My idea was to introduce a static, ahead-of-time generated cache that holds all data IPT decoding that could conceivably require. The cache would be shared between multiple threads without penalty and could be accessed without expensive hashing or locking. By entirely eliminating binary analysis and cache access overhead, I could decode traces significantly faster than libxdc, because my decoder would simply be doing less work.

The Honeybee architecture.

Keeping with the theme of Honeybee, I named these static, ahead-of-time generated caches “hives” since they require work to create but only need to be made once. To make the hives, I created hive_generator, which consumes an ELF executable and captures information that may be needed to decode a trace. The hive_generator searches for all control flow instructions and generates an extremely compact encoding of all basic blocks and where code execution could continue. There are two important features of this new design worth discussing. (Full details are available on Github.)

First, this encoding is data cache-friendly, because not only are blocks the size of cache lines, encoded blocks are stored in the same order as the original binary, which is a small important detail. It means that Honeybee’s decoder can take full advantage of the original binary’s cache locality optimization since compilers generally put relevant basic blocks close to each other. This is not generally possible in dynamic caches, like in libxdc, since the cache’s hash function by design will send neighboring blocks to random locations. This is harmful to performance because it evicts meaningful data from the CPU’s data cache.

The other important feature is that blocks are encoded in a bitwise-friendly format, so Honeybee can process the compacted blocks using exclusively high-throughput ALU operations. This design makes several critical operations completely branchless — like determining whether a block ends in a direct, indirect, or conditional branch. Combining this with high-throughput ALU operations avoids many costly branch mispredictions and pipeline purges.

These changes seemed relatively trivial, but I hoped that they would combine to a respectable performance boost over the current state of the art, libxdc.

Honeybee decoder benchmarks

To compare the performance of Honeybee’s decoder with Intel’s reference decoder, I ran traces ranging from tens of kilobytes up to 1.25 GB among binaries of sizes of 100 kb to 45 MB. Tests were performed 25 times, and I verified that both decodes traveled identical control flow paths for the same trace files.

Comparing Honeybee’s decoding speed to libipt.

These tests showed promising results (Figure 3). On large programs like clang, Honeybee outperformed Intel’s reference decode and libxdc by an order of magnitude (and two orders of magnitude in one case).

For context, the largest trace in this test suite, “honey_mirror_1/clang_huge.pt,” is 1.25GB and originates from a trace of a complicated static analysis program that disassembled the entire 35MB clang binary.

Honeybee takes only 3.5 seconds to do what Intel’s reference decoder does in two-and-a-half minutes, which is a 44x improvement! This is the difference between stepping away while the trace decodes and being able to take a sip of water while you wait.

This difference is even more pronounced on small traces, which are more similar to fuzzing loads like “html_fast_parse/6_txt.pt.” In this case, Honeybee needed only 6.6 microseconds to finish what Intel’s reference coder took 451 microseconds to do. An order of magnitude improvement!

Integrating Honeybee with honggfuzz.

Now to actually integrate this new coverage mechanism into a fuzzer. I chose Google’s honggfuzz since it’s modular and, notably, because it actually already has another slow and partially broken version of IPT-based coverage that uses Intel’s reference decoder. My plan was to simply rip out Intel’s decoder, bolt Honeybee in place, and get a wonderful speedup. However, this was more complicated than I expected.

The challenge is how Linux typically collects IPT data, which is meant to be fairly simple since the mainline kernel actually has support for IPT built right into perf. But I discovered that the complex and aggressive filtering mechanisms that Honeybee needs to clean up IPT data expose stability and performance issues in perf.

This was problematic. Not only was perf not terribly fast to begin with, but it was highly unstable. Complex configurations used by Honeybee triggered serious bugs in perf which could cause CPUs to be misconfigured and require a full system reboot to recover from lockup. Understandably, both of these issues ultimately made perf unusable for capturing IPT data for any Honeybee-related fuzzing tasks.

But, as my mother says, if you want something done right, sometimes you just need to do it yourself. Following in her footsteps, I wrote a small kernel module for IPT named “honey_driver” that was specifically optimized for fuzzing. While this new kernel module is certainly less featureful than perf and a likely security hazard, honey_driver is extremely fast and enables a user-space client to rapidly reconfigure tracing and analyze the results with little overhead.

And so, with this small constellation of custom code, honggfuzz was ready to roll with IPT data from Honeybee!

Fuzzer benchmarks

Fuzzer performance measurement is complex and so there are many more reliable and definitive means to measure performance. As a rough benchmark, I persistently fuzzed a small HTML parser using four different coverage strategies. Then, I allowed honggfuzz to fuzz the binary using the chosen coverage technique before recording the average number of executions over the test period.

Comparing Honeybee to source-level instrumentation.

The first contender in the experiment was no coverage whatsoever. I considered this to be the baseline since it’s essentially as fast as honggfuzz can run on this binary by feeding random input into the test program. In this configuration, honggfuzz achieved an average of 239K executions per second. In the context of my system, this is decently fast but is still certainly limited by the fuzzing target’s CPU performance.

Next, I tested honggfuzz’s source-level software instrumentation by compiling my target using the instrumenting compiler with no other features enabled. This led to an average of 98K executions per second or a 41% drop in efficiency compared to the no coverage baseline, which is a generally accepted and expected penalty when fuzzing due to missed compiler optimizations, many function calls, expensive locking, and cache thrashing due to essentially random writes into coverage bitmaps.

After software instrumentation, we get into the more interesting coverage techniques. As mentioned earlier, honggfuzz has support for processor trace using libipt for analysis and unfiltered perf data for IPT capture. However, honggfuzz’s existing IPT support does not generate full or even possibly correct coverage information, because honggfuzz only extracts indirect branch IPs from the trace and completely ignores any conditional branches. Additionally, since no filtering is employed using perf, honggfuzz generates coverage for every piece of code in the process, including uninteresting libraries like libc. And this leads to bitmap pollution.

Even with these shortcuts, honggfuzz’s existing IPT can only achieve an average of 1.1K executions per second (roughly half of a percent of the theoretical max). Due to the inaccurate coverage data, a true comparison cannot be made to software instrumentation, because it is possible that it found a more difficult path sooner. Realistically, however, the gap is so enormous that such an issue is unlikely to account for most of the overhead given the previously established performance issues of both perf and libipt.

Lastly, we have Honeybee with its custom decoder and capture system. Unlike the existing honggfuzz implementation, Honeybee decodes the entire trace and so it is able to generate a full, correct basic block and edge coverage information. Honeybee achieved an average of 171K executions per second, which is only a 28% performance dip compared to the baseline.

This shouldn’t come as a shock, since IPT only has an 8-15% record time overhead. This leaves 14-21% of the baseline’s total execution time to process IPT data and generate coverage. Given the incredible performance of Honeybee’s decoder and its ability to quickly decode traces, it is entirely reasonable to assume that the total overhead of Honeybee’s data processing could add up to a 29% performance penalty.

I analyzed Honeybee’s coverage and confirmed that it was operating normally and processing all the data correctly. As such, I’m happy to say that Honeybee is (at least in this case) able to fuzz both closed and open-source software faster and more efficiently than even conventional software instrumentation methods!

What’s next?

While it is very exciting to claim to have dethroned an industry-standard fuzzing method, these methods have not been rigorously tested or verified at a large scale or across a large array of fuzzing targets. I can attest, however, that Honeybee has been either faster or at least able to trade blows with software instrumentation while fuzzing many different large open source projects like libpcap, libpng, and libjpegturbo.

If these patterns apply more generally, this could mean a great speedup for those who need to perform source-level fuzzing. More excitingly, however, this is an absolutely wonderful speedup for those who need to perform black-box fuzzing and have been relying on slow and unreliable tools like QEMU instrumentation, Branch Trace Store (BTS), or binary lifting, since it means they can fuzz at equal or greater speeds than if they had source without making any serious compromises. Even outside fuzzing, however, Honeybee is still a proven and extraordinarily fast IPT decoder. This high-performance decoding is useful outside of fuzzing because it enables many novel applications ranging from live control flow integrity enforcement to more advanced debuggers and performance analysis tools.

Honeybee is a very young project and is still under active development in its home on GitHub. If you’re interested in IPT, fuzzing, or any combination thereof please feel free to reach out to me over on Twitter where I’m @ezhes_ or over email at allison.husain on the berkeley.edu domain!

Acknowledgments

As mentioned earlier, IPT has caught many researchers’ eyes and so, naturally, many have tried to use it for fuzzing. Before starting my own project, I studied others’ research to learn from their work and see where I might be able to offer improvements. I’d first like to acknowledge the authors behind PTrix (PTrix: Efficient Hardware-Assisted Fuzzing for COTS Binary) and PTfuzz (PTfuzz: Guided Fuzzing With Processor Trace Feedback) as they provided substantial insight into how a fuzzer could be structured around IPT data. Additionally, I’d like to thank the team behind libxdc as their fast IPT packet decoder forms the basis for the packet decoder in Honeybee. Finally, I’d like to give a big thank you to the team at Trail of Bits, and especially my mentors Artem Dinaburg and Peter Goodman, for their support through this project and for having me on as an intern this winter!