diff --git a/tutorials/README.md b/tutorials/README.md index 33549aaaa..69b7075e1 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -2,19 +2,27 @@ ### [Volume 1 — NetWalker Dropper][0x01] -Extract a NetWalker sample and its configuration from a PowerShell loader. The tutorial touches on all fundamental binary refinery concepts. +Extract a NetWalker sample and its configuration from a PowerShell loader. +The tutorial touches on all fundamental binary refinery concepts. ### [Volume 2 — Amadey Loader Strings][0x02] -A short tutorial extracting the strings (including C2 configuration) of an Amadey Loader sample. Revisits most of the concepts that were introduced in the tutorial. +A short tutorial extracting the strings (including C2 configuration) of an Amadey Loader sample. +Revisits most of the concepts that were introduced in the tutorial. ### [Volume 3 — SedUpLoader C2s][0x03] -In this tutorial, we extract the C2 configuration from a SedUpLoader sample. The tutorial introduces the push/pop mechanic, which is used to first extract a decryption key, store it as a variable, continue to extract the C2 data, and then decrypt the C2 domains using the stored key. +In this tutorial, we extract the C2 configuration from a SedUpLoader sample. +The tutorial introduces the push/pop mechanic, +which is used to first extract a decryption key, +store it as a variable, +continue to extract the C2 data, +and then decrypt the C2 domains using the stored key. ### [Volume 4 — Run Length Encoding][0x04] -A short tutorial about a loader using a custom run-length encoding. The tutorial showcases how to define custom refinery units when it would be too difficult to implement a decoding step using existing units. +A short tutorial about a loader using a custom run-length encoding. +It showcases how to define custom refinery units when it would be too difficult to implement a decoding step using existing units. ### [Volume 5 — FlareOn 9][0x05] @@ -32,6 +40,11 @@ Another showcase of writing custom units for very specific tasks, in this case r This is a refinery-focused write-up of how to solve FlareOn10. +### [Volume 9 — Layer Cake][0x09] + +The tutorial goes through several layers of a multi-stage downloader. +It illustrates the use of path extraction units and features some steganography. + [0x01]: tbr-files.v0x01.netwalker.dropper.ipynb [0x02]: tbr-files.v0x02.amadey.loader.ipynb @@ -40,4 +53,5 @@ This is a refinery-focused write-up of how to solve FlareOn10. [0x05]: tbr-files.v0x05.flare.on.9.ipynb [0x06]: tbr-files.v0x06.qakbot.decoder.ipynb [0x07]: tbr-files.v0x07.dc.rat.ipynb -[0x08]: tbr-files.v0x08.flare.on.10.ipynb \ No newline at end of file +[0x08]: tbr-files.v0x08.flare.on.10.ipynb +[0x08]: tbr-files.v0x09.exploit.document.ipynb \ No newline at end of file diff --git a/tutorials/tbr-files.v0x09.exploit.document.ipynb b/tutorials/tbr-files.v0x09.exploit.document.ipynb new file mode 100644 index 000000000..489319272 --- /dev/null +++ b/tutorials/tbr-files.v0x09.exploit.document.ipynb @@ -0,0 +1,912 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# The Refinery Files 0x0A: Layer Cake\n", + "\n", + "There is a [Malcat blog post][src] discussing an infection chain from an equation editor exploit document to a Formbook payload.\n", + "It seemed like a very good candidate for a refinery tutorial.\n", + "I got to show a glimpse of this on [video][yyt], but this tutorial details how to get from the first stage to final payload using refinery.\n", + "\n", + "[src]: https://malcat.fr/blog/exploit-steganography-and-delphi-unpacking-dbatloader/\n", + "[yyt]: https://www.youtube.com/live/-B072w0qjNk" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import tutorials.boilerplate as bp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stage 1 - Exploit Document\n", + "\n", + "We begin our journey with the following sample:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "bp.store_sample(\n", + " '13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05', 'eqn.doc')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's have a very first [peek][] at it, only displaying some metadata and no hex dump of the contents.\n", + "\n", + "[peek]: https://binref.github.io/#refinery.peek" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + " crc32 = 36d72a79\n", + " entropy = 99.55%\n", + " magic = CDFV2 Encrypted\n", + " sha256 = 13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05\n", + " size = 00.271 MB\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | peek -mml0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is an encrypted office document that can be decrypted using the [officecrypt][] unit:\n", + "\n", + "[officecrypt]: https://binref.github.io/#refinery.officecrypt" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + "00.265 MB; 97.92% entropy; Microsoft Excel 2007+\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: 50 4B 03 04 14 00 06 00 08 00 00 00 21 00 21 5D 2F 7E 2F 02 00 00 EE 09 00 00 13 00 PK..........!.!]/~/.........\n", + "0001C: E4 01 5B 43 6F 6E 74 65 6E 74 5F 54 79 70 65 73 5D 2E 78 6D 6C 20 A2 E0 01 28 A0 00 ..[Content_Types].xml....(..\n", + "00038: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + "00054: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + ".....: 15 repetitions\n", + "00214: 00 C4 56 4D 6F DA 40 10 BD 57 EA 7F B0 7C 8D EC 05 2A 55 55 05 E4 10 92 53 D5 44 4A ..VMo.@..W...|...*UU....S.DJ\n", + "00230: FA 03 96 DD 01 36 EC 57 77 16 02 FF BE B3 36 20 85 18 1C E4 44 BD 18 E3 F5 BC 79 33 .....6.Ww.....6.....D.....y3\n", + "0024C: FB E6 AD 87 D7 1B A3 B3 35 04 54 CE 8E F2 7E D9 CB 33 B0 C2 49 65 E7 A3 FC CF D3 5D ........5.T...~..3..Ie.....]\n", + "00268: F1 23 CF 30 72 2B B9 76 16 46 F9 16 30 BF 1E 7F FD 32 7C DA 7A C0 8C A2 2D 8E F2 45 .#.0r+.v.F..0....2|.z...-..E\n", + "00284: 8C FE 27 63 28 16 60 38 96 CE 83 A5 95 99 0B 86 47 FA 1B E6 CC 73 B1 E4 73 60 83 5E ..'c(.`8........G....s..s`.^\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | peek" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first impulse might be to inspect the contents using [xlxtr][] or look for VBA macros using the [xtvba][] unit,\n", + "but there's nothing there:\n", + "\n", + "[xlxtr]: https://binref.github.io/#refinery.xlxtr\n", + "[xtvba]: https://binref.github.io/#refinery.xtvba" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------------------------------[empty chunk]---\n", + " entropy = 00.00%\n", + " magic = empty\n", + " size = 00.000 kB\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xtvba | peek -m" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------------------------------[empty chunk]---\n", + " entropy = 00.00%\n", + " magic = empty\n", + " size = 00.000 kB\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xlxtr | peek -m" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Frustrated, we turn to simply extracting the document contents to look for something interesting.\n", + "The [xt][] unit aims to extract most archive formats that refinery can handle.\n", + "All archive extraction units follow a common interface;\n", + "The `--list` parameter (or `-l` for short) causes these units to list the paths of all items the unit is able to extract from the input:\n", + "\n", + "[xt]: https://binref.github.io/#refinery.xt" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Content_Types].xml\n", + "_rels/.rels\n", + "xl/diagrams/data1.xml\n", + "xl/_rels/workbook.xml.rels\n", + "xl/workbook.xml\n", + "xl/styles.xml\n", + "xl/media/image6.emf\n", + "xl/diagrams/colors1.xml\n", + "xl/diagrams/quickStyle1.xml\n", + "xl/diagrams/layout1.xml\n", + "xl/worksheets/sheet3.xml\n", + "xl/worksheets/sheet2.xml\n", + "xl/worksheets/_rels/sheet1.xml.rels\n", + "xl/worksheets/_rels/sheet2.xml.rels\n", + "xl/drawings/_rels/drawing1.xml.rels\n", + "xl/drawings/_rels/vmlDrawing2.vml.rels\n", + "xl/theme/theme1.xml\n", + "xl/media/image5.jpeg\n", + "xl/media/image4.png\n", + "xl/drawings/vmlDrawing1.vml\n", + "xl/embeddings/oleObject1.bin\n", + "xl/drawings/drawing1.xml\n", + "xl/worksheets/sheet1.xml\n", + "xl/drawings/vmlDrawing2.vml\n", + "xl/media/image1.png\n", + "xl/media/image3.png\n", + "xl/media/image2.png\n", + "xl/embeddings/Microsoft_Office_Word_Macro-Enabled_Document1.docm\n", + "xl/printerSettings/printerSettings2.bin\n", + "xl/printerSettings/printerSettings1.bin\n", + "docProps/core.xml\n", + "docProps/app.xml\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt -l" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `oleObject1.bin` stands out among these files.\n", + "The [xt][] unit expects filename pattern expressions as positional arguments which specify what items to extract.\n", + "The pattern is matched with increasingly fuzzy logic against all available paths until either a match is found or until no match is found using full substring search.\n", + "For example, we can find `oleObject1.bin` by extracting any item matching `ole`:\n", + "\n", + "[xt]: https://binref.github.io/#refinery.xt" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "xl/embeddings/oleObject1.bin\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole -l" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And without the `-l` switch, the unit extracts the corresponding item from the archive:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + "03.584 kB; 52.27% entropy; Composite Document File V2 Document, Cannot read section info\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3E 00 03 00 ........................>...\n", + "0001C: FE FF 09 00 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 ............................\n", + "00038: 00 10 00 00 02 00 00 00 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 FF FF FF FF ............................\n", + "00054: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ............................\n", + ".....: 14 repetitions\n", + "001F8: FF FF FF FF FF FF FF FF FD FF FF FF FE FF FF FF FE FF FF FF 04 00 00 00 05 00 00 00 ............................\n", + "00214: FE FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ............................\n", + "00230: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ............................\n", + ".....: 15 repetitions\n", + "003F0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 52 00 6F 00 6F 00 74 00 20 00 45 00 ................R.o.o.t...E.\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | peek" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The result is another OLE object:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1]Ole\n", + "[1]oLE10NATive\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | xt -l" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The pattern matching logic of [xt][] is case sensitive only if there are two paths among the extractible items that would conflict otherwise. In this case, we can extract `[1]oLE10NATive` simply by matching, e.g., `native`:\n", + "\n", + "[xt]: https://binref.github.io/#refinery.xt" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1]oLE10NATive\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | xt -l native" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + "01.215 kB; 94.42% entropy; data\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: 00 2F 1E 02 03 7E 01 EB 47 0A 01 05 75 63 A3 EC 00 00 00 00 00 00 00 00 00 00 00 00 ./...~..G...uc..............\n", + "0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 50 06 45 00 00 00 00 00 00 00 00 00 00 00 00 .............P.E............\n", + "00038: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 29 C3 44 00 00 00 00 57 5F EB 75 .................).D....W_.u\n", + "00054: 81 C7 84 01 00 00 8D AF 6D 02 00 00 EB 57 EB 28 EB EE EB 1C 69 C0 C7 3C 00 7C EB 05 ........m....W.(....i..<.|..\n", + "00070: B4 73 38 66 E1 05 11 50 CA 57 EB 12 EB 3B EB E4 50 58 50 58 EB 31 EB DC EB 46 EB 70 .s8f...P.W...;..PXPX.1...F.p\n", + "0008C: EB 48 EB 71 31 07 9C 53 57 53 81 C3 BC 5F 00 00 81 C3 B7 44 00 00 81 EB 85 2F 00 00 .H.q1..SWS..._.....D...../..\n", + "000A8: 8D 9B 80 2A 00 00 5B 5F 5B 9D 83 C7 04 EB 10 EB 17 6B C0 00 EB 3E EB 04 04 21 57 5F ...*..[_[........k...>...!W_\n", + "000C4: 5F EB BF EB 07 E8 F6 FF FF FF EB F4 39 EF EB 2F EB A6 EB B8 EB 8E 9C 53 8D 9B 48 4F _...........9../.......S..HO\n", + "000E0: 00 00 81 EB 49 1C 00 00 81 C3 35 69 00 00 90 81 EB 67 12 00 00 5B 9D E9 68 FF FF FF ....I.....5i.....g...[..h...\n", + "000FC: E9 67 FF FF FF EB 8D 0F 82 5F FF FF FF E9 44 01 00 00 4F 38 58 A5 4A 2E 16 0A BE 30 .g......._....D...O8X.J....0\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | xt native | peek" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This does look a lot like an exploit document and the bytes down there look like shellcode.\n", + "I wouldn't bother to recover the exact logic here since there is a quicker way to get what we are looking for:\n", + "The first few bytes of what might be the beginning of the shellcode are `57 5F EB 75`.\n", + "We can extract this data using the regular expression unit [rex][].\n", + "Feeding this data to the stack string extractor [vstack][] with a very liberal `--wait` parameter already produces some interesting results:\n", + "\n", + "[rex]: https://binref.github.io/#refinery.rex\n", + "[vstack]: https://binref.github.io/#refinery.vstack\n", + "[yara]: https://yara.readthedocs.io/en/stable/writingrules.html#hexadecimal-strings" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "---------------------------------------------------------------------------------------------------------------------\n", + "00.008 kB; 13.27% entropy; Cracklib password index, big endian (\"64-bit\")\n", + "---------------------------------------------------------------------------------------------------------------------\n", + "0D 02 00 00 00 00 00 00 ........ \n", + "---------------------------------------------------------------------------------------------------------------------\n", + "00.624 kB; 72.65% entropy; data\n", + "---------------------------------------------------------------------------------------------------------------------\n", + "81 EC 2C 02 00 00 E8 12 00 00 00 6B 00 65 00 72 00 6E 00 65 00 6C 00 33 00 32 00 00 00 ..,........k.e.r.n.e.l.3.2...\n", + "E8 67 01 00 00 89 C3 E8 0D 00 00 00 4C 6F 61 64 4C 69 62 72 61 72 79 57 00 53 E8 C6 01 .g..........LoadLibraryW.S...\n", + "00 00 89 C7 E8 0F 00 00 00 47 65 74 50 72 6F 63 41 64 64 72 65 73 73 00 53 E8 AA 01 00 .........GetProcAddress.S....\n", + "00 89 C6 E8 1A 00 00 00 45 78 70 61 6E 64 45 6E 76 69 72 6F 6E 6D 65 6E 74 53 74 72 69 ........ExpandEnvironmentStri\n", + "6E 67 73 57 00 53 FF D6 68 04 01 00 00 8D 54 24 08 52 E8 22 00 00 00 25 00 50 00 55 00 ngsW.S..h.....T$.R.\"...%.P.U.\n", + "42 00 4C 00 49 00 43 00 25 00 5C 00 76 00 62 00 63 00 2E 00 65 00 78 00 65 00 00 00 FF B.L.I.C.%.\\.v.b.c...e.x.e....\n", + "D0 E8 0E 00 00 00 55 00 72 00 6C 00 4D 00 6F 00 6E 00 00 00 FF D7 E8 13 00 00 00 55 52 ......U.r.l.M.o.n..........UR\n", + "4C 44 6F 77 6E 6C 6F 61 64 54 6F 46 69 6C 65 57 00 50 FF D6 6A 00 6A 00 8D 54 24 0C 52 LDownloadToFileW.P..j.j..T$.R\n", + "E8 42 00 00 00 68 00 74 00 74 00 70 00 3A 00 2F 00 2F 00 31 00 30 00 34 00 2E 00 31 00 .B...h.t.t.p.:././.1.0.4...1.\n", + "36 00 38 00 2E 00 33 00 32 00 2E 00 35 00 30 00 2F 00 30 00 30 00 39 00 2F 00 76 00 62 6.8...3.2...5.0./.0.0.9./.v.b\n", + "---------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | xt native | rex 'W_.u.*' | vstack -w100 [| peek -N ]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is a URL visible at the end of the emulated memory dump, and we can extract it using [xtp][]:\n", + "\n", + "[xtp]: https://binref.github.io/#refinery.xtp" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "http[:]//104.168.32[.]50/009/vbc.exe\n" + ] + } + ], + "source": [ + "%emit eqn.doc | officecrypt | xt ole | xt native | rex 'W_.u.*' | vstack -w100 [| xtp url | defang ]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stage 2 - Delphi Downloader\n", + "\n", + "The URL is already offline at the time of writing.\n", + "Luckily, we know the file that was served at the time when it was active: " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "bp.store_sample(\n", + " '3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d', 'vbc.exe')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some strings and specifically the existence of a very characteristic Linker timestamp betray this sample as having been written in Delphi,\n", + "most likely compiled in October 2021:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TimeStamp.Linker : 1992-06-19 22:22:17\n", + "TimeStamp.Delphi : 2021-10-24 13:25:36\n", + "TimeStamp.RsrcTS : 2014-04-24 01:38:58\n" + ] + } + ], + "source": [ + "%emit vbc.exe | pemeta -tT" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The binary unpacks an embedded payload in the function at address `0x46C8F8`.\n", + "It picks a small number of bits from each pixel's red, green, and blue channel value.\n", + "The number of bits that are taken are computed by the following formula from the first pixel, where `r`, `g`, and `b` represent that pixel's red, green, and blue values, respectively:\n", + "\n", + " (b % 4) + ((g % 2) * 4) + ((r % 2) * 8)\n", + "\n", + "The [stego][] unit can be used to extract pixel color values from various image formats.\n", + "Since the malware is using Delphi's `GetScanline` function which reads the raw bytes from the bitmap,\n", + "we have to extract the color channels in reverse (i.e. blue, green, red):\n", + "\n", + "[stego]: https://binref.github.io/#refinery.stego" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n", + "2\n", + "2\n" + ] + } + ], + "source": [ + "%emit vbc.exe | perc BBTREX | stego BGR | snip :3 | pack -R" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can tell from the data that the number of bits that will be taken from each channel is `3`.\n", + "The [bitsnip][] unit allows us to extract the payload:\n", + "\n", + "[bitsnip]: https://binref.github.io/#refinery.bitsnip" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + "00.106 MB; 82.63% entropy; data\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: 00 66 01 00 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 .f..MZ......................\n", + "0001C: 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @...........................\n", + "00038: 00 00 00 00 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ....................!..L.!Th\n", + "00054: 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 is.program.cannot.be.run.in.\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit vbc.exe | perc BBTREX | stego BGR | snip 3: | bitsnip :3 | peek -l4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The output has a length prefix and we can use the [struct][] unit to extract the PE file.\n", + "In this case, [struct][] is instructed to first read a 32-bit integer named `n` via `{n:I}`, the `I` here is the Python struct symbol for reading a long integer.\n", + "Next, it is instructed to read `n` bytes, i.e. as many bytes as the prefix value indicates.\n", + "By default, [struct][] then emits the last byte string field that was parsed.\n", + "\n", + "[struct]: https://binref.github.io/#refinery.struct" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + " crc32 = b7be8c6d\n", + " entropy = 81.48%\n", + " magic = PE32 executable (DLL) (GUI) Intel 80386, for MS Windows\n", + " sha256 = e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40\n", + " size = 91.648 kB\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00 MZ......................@...\n", + "0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + "00038: 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 ................!..L.!This.p\n", + "00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 rogram.cannot.be.run.in.DOS.\n", + "00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mode....$...................\n", + "0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + ".....: 3 repetitions\n", + "000FC: 00 00 00 00 50 45 00 00 4C 01 06 00 19 5E 2E 2A 00 00 00 00 00 00 00 00 E0 00 8E A1 ....PE..L....^.*............\n", + "00118: 0B 01 02 19 00 34 01 00 00 2E 00 00 00 00 00 00 B4 42 01 00 00 10 00 00 00 50 01 00 .....4...........B.......P..\n", + "00134: 00 00 40 00 00 10 00 00 00 02 00 00 04 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 ..@.........................\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit vbc.exe | perc BBTREX | stego BGR | snip 3: | bitsnip :3 | struct {n:I}{:n} | peek -mm | dump ldr.exe" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, if we want to create a pipeline that works more generically against other samples of this kind, we can also use [struct][] to parse out the first three color channels and compute the number of bits to extract programmatically:\n", + "\n", + "[struct]: https://binref.github.io/#refinery.struct" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + " crc32 = b7be8c6d\n", + " entropy = 81.48%\n", + " magic = PE32 executable (DLL) (GUI) Intel 80386, for MS Windows\n", + " sha256 = e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40\n", + " size = 91.648 kB\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%%emit vbc.exe\n", + " | perc BBTREX\n", + " | stego BGR\n", + " | struct {b:B}{g:B}{r:B}{} [\n", + " | bitsnip :(b%4)+((g%2)*4)+((r%2)*8) ]\n", + " | struct {n:I}{:n}\n", + " | peek -mml0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As it turns out, the extracted payload is a downloader that reads its payload URL from the parent sample.\n", + "The URL can be found encoded between two occurrences of the magic string `^^Nc`. We can easily extract it using [rex][].\n", + "Since `^` is awkward to escape, I opted for a less restrictive regular expression which still works:\n", + "\n", + "[rex]: https://binref.github.io/#refinery.rex" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "https[:]//cdn.discordapp[.]com/attachments/902132472924479511/902136733435592744/Wbjhzkbevojgqfhfalbqxnykvunmobi" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "%emit vbc.exe | rex ..Nc(.*?)..Nc {1:add[7]:defang}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, we also make use of [rex][]'s power full formatter: \n", + "The second argument `{1:add[7]:defang}` instructs [rex][] to compute its output in the following way:\n", + "\n", + "- take the first match group: `{1`\n", + "- [add][] `7` to every byte value: `{1:add[7]`\n", + "- [defang][] the result: `{1:add[7]:defang}`\n", + "\n", + "The suffixes that are supported here are the same as the prefixes supported by all [multibin][] expressions,\n", + "except that they are applied left to right instead of right to left.\n", + "\n", + "[rex]: https://binref.github.io/#refinery.rex\n", + "[add]: https://binref.github.io/#refinery.add\n", + "[defang]: https://binref.github.io/#refinery.defang\n", + "[multibin]: https://binref.github.io/lib/argformats.html" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stage 3 - DBatLoader\n", + "\n", + "The file from the malicious link is no longer available, but here's what it served at the time when it was active:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "bp.store_sample(\n", + " 'bb41df67b503fef9bfd8f74757adcc50137365fbc25b92933573a64c7d419c1b', 'obi.bin')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function to decode the payload is at `0x413b14` in `ldr.exe` and gets called with the hard-coded key value `328`.\n", + "When substituting this key value in the decoder function, a simple expression for the decoding operation can be deduced:\n", + "The encoded byte in each block is subjected to an affine linear transformation; multiplied by `0x81F6` and then increased by `0xF3C7`.\n", + "The high 8 bits of this 16-bit operation yield the XOR key for the next byte.\n", + "Finally, the resulting byte array is reversed.\n", + "This sort of simple encoding is best reverted using the [alu][] unit, and [rev][] to reverse the order of bytes:\n", + "\n", + "[alu]: https://binref.github.io/#refinery.alu\n", + "[rev]: https://binref.github.io/#refinery.rev" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "------------------------------------------------------------------------------------------------------------------------\n", + "00.276 MB; 94.63% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows\n", + "------------------------------------------------------------------------------------------------------------------------\n", + "00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00 MZ......................@...\n", + "0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + "00038: 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 ................!..L.!This.p\n", + "00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 rogram.cannot.be.run.in.DOS.\n", + "00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mode....$...................\n", + "0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................\n", + ".....: 3 repetitions\n", + "000FC: 00 00 00 00 50 45 00 00 4C 01 07 00 19 5E 2E 2A 00 00 00 00 00 00 00 00 E0 00 8E A1 ....PE..L....^.*............\n", + "00118: 0B 01 02 19 00 64 01 00 00 CE 02 00 00 00 00 00 40 66 01 00 00 10 00 00 00 80 01 00 .....d..........@f..........\n", + "00134: 00 00 40 00 00 10 00 00 00 02 00 00 04 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 ..@.........................\n", + "------------------------------------------------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "%emit obi.bin | alu B@S -P2 -s64 -e=R(E*0x81F6+0xF3C7,8) | rev | peek | dump yak.{ext}" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "00.271 MB 13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05 eqn.doc\n", + "00.959 MB 3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d vbc.exe\n", + "91.648 kB e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40 ldr.exe\n", + "00.276 MB bb41df67b503fef9bfd8f74757adcc50137365fbc25b92933573a64c7d419c1b obi.bin\n", + "00.276 MB f8fc925d89baa140c9cb436f158ec91209789e9f8e82a0b7252f05587ce8e06f yak.dll\n" + ] + } + ], + "source": [ + "%ls" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The reason we named the payload `yak.dll` is because of its characteristic PE resource named `YAK`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "RCDATA/YAK/0\n" + ] + } + ], + "source": [ + "%emit yak.dll | perc -l" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This resource is a known artifact of the DBatLoader malware and contains the encoded payload.\n", + "The following is a refinery pipeline to unpack it; we will discuss the details below:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Header.Machine : I386\n", + "Header.Subsystem : Windows GUI\n", + "Header.MinimumOS : Windows XP\n", + "Header.RICH[0x0] : [00ab9d1b] 76 STDLIB Visual Studio 2010 10.10 SP1\n", + "Header.RICH[0x1] : [009e9d1b] 03 MASM Visual Studio 2010 10.10 SP1\n", + "Header.RICH[0x2] : [009d9d1b] 01 LINKER Visual Studio 2010 10.10 SP1\n", + "Header.Type : EXE\n", + "Header.ImageBase : 0x00400000\n", + "Header.ImageSize : 167707\n", + "Header.Bits : 32\n", + "Header.EntryPoint : 0x00429000\n", + "TimeStamp.Linker : 2010-01-13 06:39:15\n" + ] + } + ], + "source": [ + "%%emit yak.dll\n", + " | perc\n", + " | alu '[B,((B+14)%94)+33][32