Reverse Engineering My Own Website
Security researcher Alessandro Innocenzi built his website using esoteric programming languages, a website that you have to first compile in order to get any information out of it.
A month ago I published my new website http://alesanco.it, but only after hundreds of different ideas, drafts, mockups and rough sketches that it took to land on a final solution and design. Since I'm a nerd and I love coding and security, I wrote the source code in C language using some stuff such as base64 decoding, an esoteric language, ASCII art, hexdump, code obfuscation and so on, that you've to compile and run it to have some information about me. I know it's crazy and nobody will do it, but here it is!
Lets take a closer look at my source code.
NOTE: This isn't a C language guide, I'm assuming the reader knows the basics of C language, so I don't explain all the code lines. There are a plethora of resources, like generic information on Wikipedia or more detailed on Wikibooks, a good tutorial on Tutorials Point, free online interactive tutorial and so on...
Compile It
First of all, try to compile it. Let's start with the header comment:
/*
* Hello, you've to compile and run it to get information about me!
*
* Save this code in main.c, GET all other files and run the following command:
*
* gcc -o program main.c brainfuckinterpreter.c asciititle.c base64.c hero.c -std=c99
*
* Then run ./program :)
*/
So, select all and save it in a file named main.c in a folder wherever we want.
The code continues with some includes and defines:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* You need these files, so GET ["/asciititle.h", "/asciititle.c"] */
#include "asciititle.h"
/* You need these files, so GET ["/brainfuckinterpreter.h", "/brainfuckinterpreter.c"] */
#include "brainfuckinterpreter.h"
/* You need these files, so GET ["/base64.h", "/base64.c"] */
#include "base64.h"
/* You need these files, so GET ["/hero.h", "/hero.c"] */
#include "hero.h"
#define SUCCESS 0
#define FAILURE 1
Reading the comments, we've to GET some files: asciititle.h, asciititle.c, brainfuckinterpreter.h, brainfuckinterpreter.c, base64.h, base64.c, hero.h and hero.c. Save all these files into the same folder of main.c.
Ok, now let's try to run the gcc command as the header comment says:
gcc -o program main.c brainfuckinterpreter.c asciititle.c base64.c hero.c -std=c99
NOTE: What's -std=c99? Why? The -std=c99 flag indicates to the compiler to use the ISO/IEC 9899:1999, a version of C programming language standard. If you have an old gcc version and try to compile without this flag, you could have this error message:
main.c: In function ‘main’: main.c:50:11: error: ‘for’ loop initial declarations are only allowed in C99 mode for (size_t i = 0; i < contacts_size; i++) { ^ main.c:50:11: note: use option -std=c99 or -std=gnu99 to compile your code hero.c: In function ‘print_hero_message’: hero.c:15:3: error: ‘for’ loop initial declarations are only allowed in C99 mode for (unsigned int m = 0; m < sizeof(s); ++m) ^ hero.c:15:3: note: use option -std=c99 or -std=gnu99 to compile your code
Declaring variables into a for loop wasn't valid until C99.
Ok, now we have a program file, so, as header comment says, run ./program!
. _____ .__
/ _ \ | | ____ ___________ ____ ____ ____
/ /_\ \| | _/ __ \ / ___/\__ \ / \_/ ___\/ _ \
/ | \ |_\ ___/ \___ \ / __ \| | \ \__( <_> )
\____|__ /____/\___ >____ >(____ /___| /\___ >____/
\/ \/ \/ \/ \/ \/
Segmentation fault (core dumped)
What a beaufitul ASCII art!! But uhm... we don't think that Segmentation fault is a feature, right? Before trying to debug, taking a look to the main function we can see this:
/* You need this file, so GET ["/hello.bf"] */
if (print_ascii_title() || // xxd -i used.
brainfuck_interpreter("hello.bf") || // Esoteric programming language.
print_hero_message()) { // Obfuscated string.
fprintf(stderr, "What a bad curriculum!\n");
return FAILURE;
}
Probably we've to get the hello.bf file too. So get it and try again ./program to have:
. _____ .__
/ _ \ | | ____ ___________ ____ ____ ____
/ /_\ \| | _/ __ \ / ___/\__ \ / \_/ ___\/ _ \
/ | \ |_\ ___/ \___ \ / __ \| | \ \__( <_> )
\____|__ /____/\___ >____ >(____ /___| /\___ >____/
\/ \/ \/ \/ \/ \/
Hello, I'm Alesanco! A software architect & cybersecurity Padawan!
You've run this code... You're my hero!
https://linkedin.com/in/alessandroinnocenzi
https://twitter.com/Alesanco83
https://github.com/alesanco
https://secjuice.com/author/alesanco
Wow! It works! Awesome! But we want to know why and how. So, go to read the source code now.
main.c
Ok, we've just saw the header comment and the include and define sections. After that, we find the main function. Let's read the code blocks inside.
/* You need this file, so GET ["/hello.bf"] */
if (print_ascii_title() || // xxd -i used.
brainfuck_interpreter("hello.bf") || // Esoteric programming language.
print_hero_message()) { // Obfuscated string.
fprintf(stderr, "What a bad curriculum!\n");
return FAILURE;
}
First block is an if statement with different calls to functions print_ascii_title(), brainfuck_interpreter("hello.bf") and print_hero_message(). If one of these functions returns false, the message What a bad curriculum! will be printed and the program exits returning FAILURE. These functions are in other files, we continue with the main.c file for now.
Following the code, we see this:
printf("\n"); // This is H O R R I B L E. Don't try this at home!
Ok, seriously, please don't use this type of "tricks". My eyes are bleeding. You can find other methods to put a newline into the output, depending on the contest. Try it and comment on this article or create a flame on my Twitter account with your solution if you want!
At a quick glance, next block is about base64:
const char protocol[] = "https://";
char *base64_contacts[] = {
"bGlua2VkaW4uY29tL2luL2FsZXNzYW5kcm9pbm5vY2Vuemk=",
"dHdpdHRlci5jb20vQWxlc2FuY284Mw==",
"Z2l0aHViLmNvbS9hbGVzYW5jbw==",
"c2VjanVpY2UuY29tL2F1dGhvci9hbGVzYW5jbw=="
};
We can see a string containing https:// and an array of strings, or more precisely an array of pointers, probably containing base64 encoded strings. The block continues with:
int contacts_size = sizeof(base64_contacts) / sizeof(base64_contacts[0]);
char decoded_contact[32] = ""; // Stack Buffer Overflow? Maybe.
for (size_t i = 0; i < contacts_size; i++) {
base64_decode(base64_contacts[i], decoded_contact);
printf("%s%s\n", protocol, decoded_contact);
}
With a for loop, every strings into the base64_contacts array are processed by a function called base64_decode(), probably to decode the base64 strings. Ok, by the name is obvious. If we take a look into the base64.c file, we can see that it's a simple base64 encode/decode implementation.
The final printf function concatenates the protocol constant with the decoded_contact variable where the base64 decoded string goes.
Just a thing about this variable: we see the comment // Stack Buffer Overflow? Maybe.... The developer knows that something could happen here. Why? Specifying a size of an array, in this case 32, without knowing what could end up inside, is very dangerous. It could cause a stack buffer overflow, exactly, which is one of the most exploited errors ever.
The main function returns SUCCESS and the program exits.
asciititle.c
As we saw before, the main function contains an if statement with several calls to external functions. The first is:
if (print_ascii_title() || // xxd -i used.
The print_ascii_title() is a function into the asciititle.c file. Here the source code:
#include <stdio.h>
#define SUCCESS 0
#define FAILURE 1
unsigned char asciititle_text[] = {
0x20, 0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x2e, 0x5f,
0x5f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x0d, 0x0a,
0x20, 0x20, 0x2f, 0x20, 0x20, 0x5f, 0x20, 0x20, 0x5c, 0x20, 0x7c, 0x20,
0x20, 0x7c, 0x20, 0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x20,
0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x5f, 0x20,
0x20, 0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x20, 0x5f, 0x5f,
0x5f, 0x5f, 0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x0d, 0x0a,
0x20, 0x2f, 0x20, 0x20, 0x2f, 0x5f, 0x5c, 0x20, 0x20, 0x5c, 0x7c, 0x20,
0x20, 0x7c, 0x20, 0x5f, 0x2f, 0x20, 0x5f, 0x5f, 0x20, 0x5c, 0x20, 0x2f,
0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x2f, 0x5c, 0x5f, 0x5f, 0x20, 0x20, 0x5c,
0x20, 0x20, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x5c, 0x5f, 0x2f, 0x20, 0x5f,
0x5f, 0x5f, 0x5c, 0x2f, 0x20, 0x20, 0x5f, 0x20, 0x5c, 0x20, 0x0d, 0x0a,
0x2f, 0x20, 0x20, 0x20, 0x20, 0x7c, 0x20, 0x20, 0x20, 0x20, 0x5c, 0x20,
0x20, 0x7c, 0x5f, 0x5c, 0x20, 0x20, 0x5f, 0x5f, 0x5f, 0x2f, 0x20, 0x5c,
0x5f, 0x5f, 0x5f, 0x20, 0x5c, 0x20, 0x20, 0x2f, 0x20, 0x5f, 0x5f, 0x20,
0x5c, 0x7c, 0x20, 0x20, 0x20, 0x7c, 0x20, 0x20, 0x5c, 0x20, 0x20, 0x5c,
0x5f, 0x5f, 0x28, 0x20, 0x20, 0x3c, 0x5f, 0x3e, 0x20, 0x29, 0x0d, 0x0a,
0x5c, 0x5f, 0x5f, 0x5f, 0x5f, 0x7c, 0x5f, 0x5f, 0x20, 0x20, 0x2f, 0x5f,
0x5f, 0x5f, 0x5f, 0x2f, 0x5c, 0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x3e, 0x5f,
0x5f, 0x5f, 0x5f, 0x20, 0x20, 0x3e, 0x28, 0x5f, 0x5f, 0x5f, 0x5f, 0x20,
0x20, 0x2f, 0x5f, 0x5f, 0x5f, 0x7c, 0x20, 0x20, 0x2f, 0x5c, 0x5f, 0x5f,
0x5f, 0x20, 0x20, 0x3e, 0x5f, 0x5f, 0x5f, 0x5f, 0x2f, 0x20, 0x0d, 0x0a,
0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x5c, 0x2f, 0x20, 0x20,
0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x5c, 0x2f, 0x20, 0x20,
0x20, 0x20, 0x20, 0x5c, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x5c,
0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x5c, 0x2f, 0x20, 0x20, 0x20, 0x20,
0x20, 0x5c, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
unsigned int asciititle_text_len = 358;
int print_ascii_title()
{
printf("%s\n", asciititle_text);
return SUCCESS;
}
Uhm... interesting... an unsigned char array that will be simply printed with a printf function... Probably the values are the hexadecimal representation of the chars. Looking the comment // xxd -i used. into main function and using a search engine, we find that xxd produces a hexadecimal or binary dump of a file. In this case, a file containing the ASCII art for the name Alesanco.
And yeah, I used an online ASCII art generator, saved it in a file and used xxd with the -i parameter to create the C code to print it!
brainfuckinterpreter.c
The second call to an external function is:
brainfuck_interpreter("hello.bf") || // Esoteric programming language.
The brainfuckinterpreter.c file contains:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SUCCESS 0
#define FAILURE 1
struct bfi { char cmd; struct bfi *next, *jmp; };
struct mem { char val; struct mem *next, *prev; };
int brainfuck_interpreter(char filename[])
{
FILE *ifd = fopen(filename, "r");
int ch;
struct bfi *p = 0, *n = 0, *j = 0, *pgm = 0;
struct mem *m = calloc(1, sizeof*m);
setbuf(stdout, NULL);
while((ch = getc(ifd)) != EOF && (ifd != stdin || ch != '!' || j || !pgm)) {
if (ch == '<' || ch == '>' || ch == '+' || ch == '-' ||
ch == ',' || ch == '.' || ch == '[' || (ch == ']' && j)) {
if ((n = calloc(1, sizeof*n)) == 0) return FAILURE;
if (p) p->next = n; else pgm = n;
n->cmd = ch; p = n;
if (n->cmd == '[') { n->jmp=j; j = n; }
else if (n->cmd == ']') { n->jmp = j; j = j->jmp; n->jmp->jmp = n; }
}
}
while(j) { p = j; j = j->jmp; p->jmp = 0; p->cmd = ' '; }
for (n = pgm; n; n = n->next)
switch(n->cmd)
{
case '+': m->val++; break;
case '-': m->val--; break;
case '.': putchar(m->val); break;
case ',': if((ch = getchar()) != EOF) m->val = ch; break;
case '[': if (m->val == 0) n = n->jmp; break;
case ']': if (m->val != 0) n = n->jmp; break;
case '<': if (!(m = m->prev)) return FAILURE; break;
case '>':
if (m->next == 0) {
if ((m->next = calloc(1, sizeof*m)) == 0) return FAILURE;
m->next->prev = m;
}
m = m->next;
break;
}
return SUCCESS;
}
As the name indicates, this is simply an interpreter for brainfuck, an esoteric programming language, as the comment says. This function wants a filename in input, in this case, the hello.bf file that we forgot to GET when we received the segmentation fault error.
Ok, we're curious, so let's see what this file contains.
++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+.+++++++..+++.<<++++++++++++++.------------.>
+.<+++++++.>>--.<<-------.>--------.>-.-------.++++++++++++++.------------------.+++++++++++++.
-----------.++++++++++++.<<+.-.>.<.>>++++.----.---------.++++++++++++++.+++.-------------------
---.+++++++++++++++++.-------------.<<.>>----.+++++++++++++++++.---------------.+++++.+.+++++++
++++.---------------.--.+++++++++++++++++.<<.++++++.------.>>-----------------.++++++++++++++++
++++++.-----------------------.+++.+++++++++++++.+.--------------.--.++++++++++++++++++.---.---
------.+++++++++++.+++++.<<.>+++++++++++++++.+++++++++++++++++.+++.---.>--.<.>---------.<<+.
Ok, it's brainfuck code. It'll certainly print Hello, I'm Alesanco! A software architect & cybersecurity Padawan! under the ASCII art (this function is called after the print_ascii_title() function), but to be sure and check it on the fly, we can use an online brainfuck interpreter that confirms it.
However, knowing an esoteric programming language makes you cool!
hero.c
The third and last function is:
print_hero_message()) { // Obfuscated string.
As with the other functions, the comment is useful here. We know that we're dealing with obfuscation. Let's look into the file hero.c:
#include <stdio.h>
#define SUCCESS 0
#define FAILURE 1
unsigned char s[] = {
0x67, 0xbc, 0x41, 0xf6, 0x4e, 0xb2, 0xf2, 0xc5,
0xcb, 0xcb, 0xbe, 0xc3, 0xb4, 0xb6, 0xcb, 0xbb,
0x65, 0xac, 0x6a, 0x64, 0x16, 0x17, 0xa8, 0xa3,
0x6f, 0x64, 0xc9, 0xde, 0x9a, 0x6a, 0xaa, 0x60,
0xe7, 0x91, 0x5e, 0x54, 0xa2, 0xa0, 0x91, 0x73
};
int print_hero_message() {
for (unsigned int m = 0; m < sizeof(s); ++m)
{
unsigned char c = s[m];
c ^= m;
c -= m;
c ^= 0x40;
c = ~c;
c += 0xab;
c ^= 0xdc;
c = ~c;
c ^= m;
c = -c;
c ^= m;
c -= 0xa4;
c = -c;
c += 0x63;
c = -c;
c += m;
s[m] = c;
}
printf("\n%s\n", s);
return SUCCESS;
}
As in the asciititle.c files, there is an unsigned char array here with the hexadecimal representation of some chars, but, before print them, there is a for loop with many strange things in. This is the obfuscated part.
Why I said "some chars"? Because if we check the first hexadecimal value, 0x67, we can find the letter g, that isn't in the You've run this code... You're my hero! sentence. Probably the operations in the for loop modify the hexadecimal values to change their representation.
To be sure about this, we can modify the code to print all the results of the single operations like this:
for (unsigned int m = 0; m < sizeof(s); ++m)
{
unsigned char c = s[m];
printf("01: %c\n", c);
c ^= m;
printf("02: %c\n", c);
c -= m;
printf("03: %c\n", c);
c ^= 0x40;
printf("04: %c\n", c);
c = ~c;
printf("05: %c\n", c);
c += 0xab;
printf("06: %c\n", c);
c ^= 0xdc;
printf("07: %c\n", c);
c = ~c;
printf("08: %c\n", c);
c ^= m;
printf("09: %c\n", c);
c = -c;
printf("10: %c\n", c);
c ^= m;
printf("11: %c\n", c);
c -= 0xa4;
printf("12: %c\n", c);
c = -c;
printf("13: %c\n", c);
c += 0x63;
printf("14: %c\n", c);
c = -c;
printf("15: %c\n", c);
c += m;
printf("16: %c\n", c);
s[m] = c;
break;
}
Please, note that I add the break statement to try only with the first char.
This will print for the first 0x67 value:
01: g
02: g
03: g
04: '
05:
06:
07: _
08:
09:
10: `
11: `
12:
13: D
14:
15: Y
16: Y
Ok, the Y letter is the first in the You've run this code... You're my hero! sentence, so probably we're right about the obfuscated part.
To not create an endless article, the homework is to understand this simple obfuscated code by yourself.
For your information, I used this online tool with the default settings to obfuscate the following code:
printf("You've run this code... You're my hero!");
My Code Isn't Good
My code contains some errors or blocks that could be optimized.
In the asciititle.c, brainfuckinterpreter.c and hero.c files I defined SUCCESS and FAILURE but I never use them with an error handling.
In the brainfuckinterpreter.c file, for the brainfuck_interpreter() function, it's probably better to have a stream and not a filename, so you can reuse it in other different situations. Also, why is there segmentation fault here?
In the asciititle.c file, the asciititle_text_len variable is never used, in the main.c file there is a possible stack overflow with the decoded_contact variable, and so on...
I left them (and others not mentioned here to not spoil all) to see if anyone would have pointed them out to me. You could improve it and publish your version!
Conclusion
Why did I do this Hell? First of all, as I said, I was looking for an original layout for my personal website. And I think I found it.
Then, I always want to do something that brings me to put hands-on everything's interesting and study something's new. Using different stuff all together is a method to do it.
Also, I like to involve other people and bring them to be curious and improve themselves, studying things that I mention only without going into details.
I hope with my website I give you new ideas to work on!
You can find the source code of my website on this GitHub repo.
Main Image Credit : The awesome piece of artwork used to head this article is called 'ASCII Dino' and it was created by graphic designer Alexandra Hanson.