How do you learn how to read hex dump?

D1G1T4L

[H]ard|Gawd
Joined
Feb 1, 2001
Messages
1,212
i was just looking at Romero' website
http://www.rome.ro/games_mystart.htm

and he said back in the day he learned how to learn a hex dump
"No way would I be able to understand that garbage! I didn't even know hexadecimal! Well, perseverance won out and after a couple more years of manic hacking, I could look at a hex dump like this and tell you pretty much what it all meant."

I was just wondering....
 
A hex dump of what? If you want to look at an executable, I wouldn't bother with a hex dump. Find a decent disassembler.

If you're looking at a binary data file, then you can learn to "think in hex" with a little practice.

Converting unsigned integers in your head from a hex dump isn't too tough, signed ints are tougher, and I don't even bother with floats. I'll use something like hexdump to sort those out.
 
I guess I didn't really answer the question... sorry. Start with learning how to convert unsigned integers from hex to decimal and back. Study up on how different types (unsigned, signed, ASCII, etc) are represented in binary. Learn about big endian and little endian. Write short C programs to write several different variables and types to a file and look at the file in a hex viewer. Practice.
 
Unless you have a knack for memorizing op codes, I agree with the disassembler suggestion.
 
Two other things:

1. Get familiar with recognizing the patterns of ascii digits. They're all in the range of 0x30-0x39.
2. Get familiar with recognizing the patterns of ascii letters, they're in the ranges 0x41-0x5A and 0x61-0x7A.

If you're debugging, and a pointer of yours looks like 0x54657874, you probably overwrote the storage for that pointer with text, perhaps the result of a stack overflow. In this case, 0x54657874 == "Text" (with no terminator).

Again, with regards to debugging, the Visual C++ runtime uses special patterns to indicate freed or uninitialized memory. Values like for 0xCCCCCCCC or 0xCDCDCDCD have special meaning. This page, http://www.docsultant.com/site2/articles\debug_codes.html, documents some of these.
 
There's a -big- difference between reading a hex dump of 6502 machine code from a simple 8-bit computer (like a C=64 or Apple][) and trying to make sense of a modern PC-application. The 6502 had a fixed-length 8-bit instruction set and 3 registers. The x86 instruction set is -much- bigger and more complicated.
 
What amoeba said is spot-on. Modern processors mandate a disassembler. Older procs like the 6502 aren't that hard to get a hang of.

The article you've quoted is a bit silly. "tell you pretty much what it all meant" is hogwash. The screenshot in the article has this code in the first couple of lines:

A9 00 - move 0x00 into the A register
8D FF 44 - store the content of the A register at 44FF
8D FE 44 - store the content of the A register at 44FE
8D FD 44 - store the content of the A register at 44FE
A9 3C - move 0x39 into the A register (0x39 == 57 decimal)
8D FB 44 - store the content of the A register at 44FB
A9 03 - move 0x03 into the A register
8D FC 44 - store the content of the A register at 44FC
20 68 70 - call the subroutine at 7068
4C 93 70 - jump to the code at 7093, unconditionally

So what's that get you? The hex dump doesn't mean anything; it's just code. I just told you what the code did; there's not enough information in the hex dump to give you any idea why it's doing that. What is 57? What is 3? Is it the size of a sprite? The length of a block of data that will be pushed out to disk? The offset of one thing to another? The number of times something is supposed to happen?

It's silly to assert that you can see a hex dump and "know what it all meant".
 
Also, you linked to John Romero's site, not Carmack's. Romero co-founded id and later left to make Daikatana.
 
MonkeyShave said:
Two other things:

1. Get familiar with recognizing the patterns of ascii digits. They're all in the range of 0x30-0x39.
2. Get familiar with recognizing the patterns of ascii letters, they're in the ranges 0x41-0x5A and 0x61-0x7A.

If you're debugging, and a pointer of yours looks like 0x54657874, you probably overwrote the storage for that pointer with text, perhaps the result of a stack overflow. In this case, 0x54657874 == "Text" (with no terminator).

Again, with regards to debugging, the Visual C++ runtime uses special patterns to indicate freed or uninitialized memory. Values like for 0xCCCCCCCC or 0xCDCDCDCD have special meaning. This page, http://www.docsultant.com/site2/articles\debug_codes.html, documents some of these.


oh that really helps, thanks
 
MonkeyShave said:
Also, you linked to John Romero's site, not Carmack's. Romero co-founded id and later left to make Daikatana.


yea, i dont know why i wrote carmack... weird.. fixed now
 
I agree with the point that getting a disassembler (or writing one yourself) is a much smarter choice. In the end, why waste time looking at a hex dump and try to disassemble it yourself, if you can have the PC do it for you in a fraction of the time? While I understand that training your brain is never a bad idea, I would say that there are many more, other things that may be more useful to spend your time on.

If you end up writing your own disassembler, you are bound to learn more than just memorizing the hex codes. I hold people that build tools in a very high regard. In the end, the ability to build tools customized for our application is what separates us from the rest of the animal world.
 
Thing is, you don't memorize the hex codes. You learn them as a byproduct of reading the disassembly all the time, using the debugger, dumping code where a disassembler can't easily go, and so on.
 
Back in the dos days I used to use norton utilities 4.5 to look at the hex of a some sort of game file (data, save game), and the simplest "hack," if you will, was just to change an arbitrary value.

For example, fire up Sim City, load a city and then pause the game. Write down how much money you have. Then exit the game and load the savegame file into the hex editor. Convert that number to hex and search for it. Then you can manipulate that value. The obvious choice in this example was to write it all over with f's so that when you loaded your game you'd have a nice $4 trillion :cool:
 
We had to learn how to read dumps like that for the IBM assembler class I took. You would use the dump of your registers and memory locations to figure out what your program was doing wrong. It wasn't the greatest but after debugging so much you get used to it.
 
I agree with the point that getting a disassembler (or writing one yourself) is a much smarter choice. In the end, why waste time looking at a hex dump and try to disassemble it yourself, if you can have the PC do it for you in a fraction of the time?
Because, what if the disassembler is wrong?
 
"Read a hex dump" means different things depending on what is being performed.

In my line of work, working on wireless transmission interfaces and embedded devices, I am often looking at the hex stream of information coming from them, often reverse engineering the message format and debugging other issues.
 
hrm, pointless comment.

hrm, another pointless comment. :p

Maybe what Romero was saying is he finally figured out how to go between hex, binary, decimal and endian-ness / op codes ? I dunno... that was a part of the first classes we had to take and it's actually helped quite a bit. I actually got the oct 31 = dec 25 (halloween = christmas) joke the first time I saw it...

But like Mike said... there's no way to just look at a hexdump and know exactly what's going on. I've used it on some small programming assignments where you need to have identical output to what the teacher is grading with so you use the hexdump to be 100% positive... and use diff. Altough I've only been using it with data files, not executables.
 
eh, for the original thread romero was talking about the basic apple dump that it could do. you used to be able to load programs. The instruction set for the 6502 was only 256 bytes with 2 registers, x and y.
http://www.masswerk.at/6502/6502_instruction_set.html
programs used relative addressing and could be loaded anywhere in memory. There are a few basic opcodes you can learn for jumping to/returning from procedures. The character dumps help to identify data structures but I don't remember if they were available. Basic apple mode was a 40 character screen and you really did not have much room to do fancy dumps of data. I do believe it had a basic assembler and disassembler. If you remember seeing the original terminator movie, yeah, that was the output. :)
 
Better off just using a dissassembler, but if you want to get an initial idea of whats going on, compare these:

I'm not a coder by any means, but I did manage to throw this together, and it helped me understand a little bit. It just spits out the ABC's.

Code:
.model tiny
.stack
.data
.code

start:

	mov ah,02h	;set Int 21 to write char to standard output
	mov dl,41h	;character to write
	mov cl,1ah	;counter (1a = 26 in decimal)

	again:
	int 21h		;write to screen
	inc dl		;increase character value +1
	loop again	;loop if cx isnt 0

	mov ah,4ch	;set Int 21 to Exit
	int 21h

end start

Here it is in hex:

Code:
B4 02 B2 41 B1 1A CD 21 FE C2 E2 FA B4 4C CD 21

You can even run it from command prompt, copy and paste this in notepad and name it abc.exe, then open command prompt and execute it:

´²A±Í!þÂâú´LÍ!
 
I didn't think you could just copy machine code into a file and name it .exe and expect it to run correctly. I would be surprised if it was a compliant .exe file of any type.
 
sure you can, in this specific case if you are using a ascii notepad and win16 (maybe 32 will work).
I think the main point he is making is that programs can be really simple. Complexity comes when languages and operating systems try to do everything for everyone.
 
Open an exe in notepad, do nothing but click 'save', and try to run the program and it will surely fail. Owned by unicode.

Also, complexity comes far before trying to do "everything for everyone" :rolleyes: You get pretty complex programs the instant you want to do anything useful with it. If it wasn't complicated, we never would have needed to use computers in the first place.

sure you can, in this specific case if you are using a ascii notepad and win16 (maybe 32 will work).
I think the main point he is making is that programs can be really simple. Complexity comes when languages and operating systems try to do everything for everyone.
 
I didn't think you could just copy machine code into a file and name it .exe and expect it to run correctly. I would be surprised if it was a compliant .exe file of any type.

It's not an .EXE file. It's a .COM file. The OS lets you get away with one here. And it won't work on most newer OSes, since 16-bit code is dead.
 
Interesting. I didn’t know the COM format lacked a header or other metadata.

I’m still surprised the copy-paste technique works, since notepad is going to write-out the ASCII or Unicode representation of the machine code.
 
It's not going to write out the ASCII or Unicode representation; it just writes the bytes you've given it. You're a bit lucky that this code doesn't contain a byte of zeros, or an end-of-file character.
 
eh, you would think after 20 years microsoft would come up with a notepad that could display unix files properly
 
I can't figure out what "unix files" has to do with this thread. Can anyone help me make the connection?
 
I assume he is referring to the fact that MS OSes typically require a LF+CR character to move output to the next line, whereas unix OSes require just the LF character.

This means if you open a text file in Windows, which was created in Unix, it is not uncommon for all the text to be on a single line.


edit: Oh yeah it's very much a non sequitur.
 
Last edited:
It's not going to write out the ASCII or Unicode representation; it just writes the bytes you've given it.
My notepad will circumstantially transcode pasted Unicode to ANSI/Windows-1252 on write-out, so I don't think this is strictly true.

In any case, that misses my point which is if you follow the instructions from the post, it requires copying and pasting from your browser window. In practice, I've found that the browser will copy text into the clipboard in some common character-encoding scheme. So in the example given, notepad will write out the ASCII or Unicode representation of the machine code.
 
I assume he is referring to the fact that MS OSes typically require a LF+CR character to move output to the next line, whereas unix OSes require just the LF character.
Probably, but I still can't connect that to the issue at hand.

My notepad will circumstantially transcode pasted Unicode to ANSI/Windows-1252 on write-out, so I don't think this is strictly true.
The circumstances being that you ask it to: check the "encoding" drop down in the "Save As" dialog.

In any case, that misses my point which is if you follow the instructions from the post, it requires copying and pasting from your browser window. In practice, I've found that the browser will copy text into the clipboard in some common character-encoding scheme. So in the example given, notepad will write out the ASCII or Unicode representation of the machine code.
This is an issue with the browser, which supplies the text copied, not with notepad.
 
The circumstances being that you ask it to: check the "encoding" drop down in the "Save As" dialog.
Even just using the "save" dialog, which doesn't prompt the user for the encoding method, it can do this. If I paste Unicode text which can be also be ANSI encoded, then notepad will happily transcode with no indication to the user. This obviously requires a pre-existing file so you can bypass the "save as" dialog.

I'm not at all familiar with the internals of notepad or the Windows clipboard implementation (which I presume uses metadata to indicate the encoding scheme), but it's not a given to me that notepad won't fiddle with the data I input.

This wasn't really my point at all though. I was only trying to say that by following the instructions one would not end up with the desired code in a file.

This is an issue with the browser, which supplies the text copied, not with notepad.
Right, and I don't think I said that notepad was at fault; only that it would do it. And by "do it" I mean "write out the ASCII or Unicode representation of the machine code" under the circumstances presented.

I suppose you're not explicitly accusing me of saying otherwise though.
 
Last edited:
Back
Top