Reverse Engineering Asked by user6916458 on April 14, 2021
I am trying to get the commentary (casters voice) from a dota2 game file. I’ve managed to parse the game file and select what I believe is the voice data. This is in a weird format (CSVCMsg_VoiceData) which has the following struc:
type CSVCMsg_VoiceData struct {
Client *int32 `protobuf:"varint,1,opt,name=client" json:"client,omitempty"`
Proximity *bool `protobuf:"varint,2,opt,name=proximity" json:"proximity,omitempty"`
Xuid *uint64 `protobuf:"fixed64,3,opt,name=xuid" json:"xuid,omitempty"`
AudibleMask *int32 `protobuf:"varint,4,opt,name=audible_mask" json:"audible_mask,omitempty"`
VoiceData []byte `protobuf:"bytes,5,opt,name=voice_data" json:"voice_data,omitempty"`
Caster *bool `protobuf:"varint,6,opt,name=caster" json:"caster,omitempty"`
Format *VoiceDataFormatT `protobuf:"varint,7,opt,name=format,enum=VoiceDataFormatT,def=1" json:"format,omitempty"`
SequenceBytes *int32 `protobuf:"varint,8,opt,name=sequence_bytes" json:"sequence_bytes,omitempty"`
SectionNumber *uint32 `protobuf:"varint,9,opt,name=section_number" json:"section_number,omitempty"`
UncompressedSampleOffset *uint32 `protobuf:"varint,10,opt,name=uncompressed_sample_offset" json:"uncompressed_sample_offset,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
This seems to work when reading the data. Logically I’m probably looking for the VoiceData part of the struct when given this:
"format":0,"voice_data":"uz+ACgEAEAELgD4EQgEWAKV4mxnepfmhxKCQxAnKVNaHhKRXPIsmAH5RjXmJV0u+WTmrvgyCKxcraehjo/ZeKcFjksXQZEeOju4hLNv/MAB9KA7ww14Vc0ndYPB7dDXoXTexuxcW0Jg/diMgdH5ijWhe02Ch48KX86qJZYFyZV81AH76qCgh9AXliMdyWEgWTMbRD6xMX37WJALrXlSnxymIloSq2KGwXCcMXzQiSQIrcLVNfqdNJACCluFOIRKPmugUvsLZmnD04X0xhpAuNkwJECK4t51MBOWNWJlCAIDyZlJwWI45EPTjBB6yKyGOclu96qBV2MhFAh1d2J7WDZwe6YxOVu/BGkGcur9qTP85ZRfjANoiQxQrWvpoHFBFBy0AfX6k8XvbSwrk2nUAEP3P6kcmXORKUNKeu8HDnOUflQqtA5AkkTiun77fZrqnimIfWg==","sequence_bytes":23598094,"section_number":1,"sample_rate":16000
I’m able to pull the voice data out like so:
uz+ACgEAEAELgD4EQgEWAKV4mxnepfmhxKCQxAnKVNaHhKRXPIsmAH5RjXmJV0u+WTmrvgyCKxcraehjo/ZeKcFjksXQZEeOju4hLNv/MAB9KA7ww14Vc0ndYPB7dDXoXTexuxcW0Jg/diMgdH5ijWhe02Ch48KX86qJZYFyZV81AH76qCgh9AXliMdyWEgWTMbRD6xMX37WJALrXlSnxymIloSq2KGwXCcMXzQiSQIrcLVNfqdNJACCluFOIRKPmugUvsLZmnD04X0xhpAuNkwJECK4t51MBOWNWJlCAIDyZlJwWI45EPTjBB6yKyGOclu96qBV2MhFAh1d2J7WDZwe6YxOVu/BGkGcur9qTP85ZRfjANoiQxQrWvpoHFBFBy0AfX6k8XvbSwrk2nUAEP3P6kcmXORKUNKeu8HDnOUflQqtA5AkkTiun77fZrqnimIfWg==
However this is where I’m hitting a bit of a wall. This data is in an unknown format. I’ve tried to do some research on what the format might be and I’ve found that steam started using SILK codec for voice data in 2011 – however when trying to write this data to file and open it with opus (which I believe supports SILK) the opus decoder tells me it can’t open the file – so I’m not 100% convinced it is silk codec. Recognising audio data isn’t something I have a great deal of experience with – so any advice would be great.
I have noticed there’s a VoiceDataFormatT part of the struct but the only definition I can find for it is this:
type VoiceDataFormatT int32
Which doesn’t seem too helpful! :/
EDIT 1:
As per advice from user Ian Cook I’ve decoded the data from base64 into the following (as hex dump):
BB 3F 80 0A 01 00 10 01 0B 80 3E 04 42 01 16 00 A5 78 9B 19 DE A5 F9 A1 C4 A0 90 C4 09 CA 54 D6 87 84 A4 57 3C 8B 26 00 7E 51 8D 79 89 57 4B BE 59 39 AB BE 0C 82 2B 17 2B 69 E8 63 A3 F6 5E 29 C1 63 92 C5 D0 64 47 8E 8E EE 21 2C DB FF 30 00 7D 28 0E F0 C3 5E 15 73 49 DD 60 F0 7B 74 35 E8 5D 37 B1 BB 17 16 D0 98 3F 76 23 20 74 7E 62 8D 68 5E D3 60 A1 E3 C2 97 F3 AA 89 65 81 72 65 5F 35 00 7E FA A8 28 21 F4 05 E5 88 C7 72 58 48 16 4C C6 D1 0F AC 4C 5F 7E D6 24 02 EB 5E 54 A7 C7 29 88 96 84 AA D8 A1 B0 5C 27 0C 5F 34 22 49 02 2B 70 B5 4D 7E A7 4D 24 00 82 96 E1 4E 21 12 8F 9A E8 14 BE C2 D9 9A 70 F4 E1 7D 31 86 90 2E 36 4C 09 10 22 B8 B7 9D 4C 04 E5 8D 58 99 42 00 80 F2 66 52 70 58 8E 39 10 F4 E3 04 1E B2 2B 21 8E 72 5B BD EA A0 55 D8 C8 45 02 1D 5D D8 9E D6 0D 9C 1E E9 8C 4E 56 EF C1 1A 41 9C BA BF 6A 4C FF 39 65 17 E3 00 DA 22 43 14 2B 5A FA 68 1C 50 45 07 2D 00 7D 7E A4 F1 7B DB 4B 0A E4 DA 75 00 10 FD CF EA 47 26 5C E4 4A 50 D2 9E BB C1 C3 9C E5 1F 95 0A AD 03 90 24 91 38 AE 9F BE DF 66 BA A7 8A 62 1F 5A
I’m still at a loss as to what this information is – I’ve tried converting it to a wav file using ffmpeg (assuming is pcm) but it still comes out as white noise.
EDIT 2:
So it’s occurred to me that it might help if I include more samples of the data – the decoded hex of the data can be found here (each sample separated by a new line character):
I’ve noticed that each one seems to start with the following hex:
BB 3F 80 0A 01 00 10 01 0B 80 3E 04
Which translates to:
»?€
�€>
I’m still at a loss as to how to convert this to audio data.
EDIT 3:
I’ve uploaded some more datadumps to the following pastebin (More data), it’s not a full dump as it’s roughly 15mb and pastebin crashed when I was trying to paste!
The data file is a dota2 demo file (extension .dem) which is a collection of protobuf messages that I parse using GoLang and the Manta replay parse (found here). This allows me to pull out any type of message, and I select OnCSVCMsg_VoiceData, which returns m.Audio.VoiceData of the form: CSVCMsg_VoiceData (the struct I display above).
EDIT 4
Here’s (finally) the link to the file with the concatenated voiceData messages.
And here’s the link to the original file of protobuff messages
TL;DR
section
n indicates a separate stream of datasequence_bytes
value indicates the order that the frames should be placed in when decoding.voice_data
is base64-encoded
section
n, order n's structs in ascending order based on the value of sequence_bytes
voice_data
sequence_bytes
)#!SILK_V3
(the SILK header)Long version
Using the sample data you posted, first thing I had to do was replace the final comma with a ]
to make it valid JSON.
I originally used shell scripts to to convert the structs from JSON to SILK, but in the interest of efficiency, I re-implemented the conversion in Python.
import json
import base64
import sys
def main():
if len(sys.argv) < 2:
print("Usage: python3", sys.argv[0], "<CSVCMsg_VoiceData json file>")
exit(1)
with open(sys.argv[1], 'r') as infile:
json_data = json.load(infile)
# Create dictionary with section number as the key and list of
# that section's structs as the value
section_dict = {}
for obj in json_data:
sec_num = obj['section_number']
if sec_num not in section_dict:
section_dict[sec_num] = []
section_dict[sec_num].append(obj)
# Create SILK file for each section number stream
for section in section_dict.keys():
filename=f"section_{section}.slk"
print(f"Generating SILK file {filename} for section {section}...")
with open(filename, 'wb') as outfile:
# SILK header
outfile.write(b"#!SILK_V3")
# Sort frames in ascending order based on sequence_bytes value
for frame in sorted(section_dict[section], key=lambda x : x['sequence_bytes']):
decoded = base64.b64decode(frame['voice_data'])
# strip first 14 bytes and last 4 bytes before writing
outfile.write(decoded[14:-4])
if __name__ == '__main__':
main()
To decode SILK, I used the official SDK (that's what the decoder linked by Gordon Freeman is built on top of). The SDK can be downloaded from this link, which I found from this page.
After I downloaded the SDK, I extracted it, went into the directory named SILK_SDK_SRC_FIX_v1.0.9
, and ran make
(I'm on Kali, but pretty much any Linux variant should be fine).
Once make
completes, you're left with a couple executables; the only one we care about is decoder
.
Simply run decoder
on the SILK payloads generated above, and you'll get a pcm file you can do whatever you want with. For example, ./decoder section_12.slk section_12.pcm
. The output file is at 22050 Hz.
Hat tip to @Gordon Freeman for pointing out that the header isn't 18 bytes like I originally suspected and that the last 4 bytes aren't part of the SILK payload.
For posterity, here's how I converted the JSON to SILK files with shell scripts.
I used the following script to extract the data, de-base64 it, and put each struct's data in its own file.
#!/bin/bash
# Write each decoded VoiceData to a file with the naming convention
# <sequence_bytes>_<section_number>
write_data ()
{
filename=`echo $1 | cut -d_ -f1,2`
data=`echo $1 | cut -d_ -f3`
echo -n "$data" | base64 -d > $filename
}
export -f write_data
jq -r '.[] | "(.sequence_bytes)_(.section_number)_(.voice_data)"' dota2CasterParse.json | xargs -I '{}' bash -c "write_data '{}'"
I then used the following script to create a SILK file for each section:
#!/bin/bash
section_numbers=$(ls [0-9]*_[0-9]* | cut -d_ -f2 | sort -u)
for section in $section_numbers; do
output="section_${section}_voiceData.slk"
echo -n '#!SILK_V3' > $output
for i in $(ls *_${section} | sort -n); do
dd bs=1 skip=14 count=$(($(stat -c "%s" $i)-18)) if=$i of=$output conv=notrunc oflag=append
done
done
Answered by hairlessbear on April 14, 2021
There are 3 types of "frame", i guess 3 casters
BB 3F 80 0A 01 00 10 01 0B 80 3E 04 42 01 (@ 0x0)
D8 76 DD 02 01 00 10 01 0B 80 3E 04 FA 01 (@ 0x5f0)
67 7D 11 05 01 00 10 01 0B 80 3E 04 7E 01 (@ 0x44ccf)
Example for the first one:
BB 3F 80 0A identifier of the caster
01 channel number mono
80 3E = 0x3e80 =16000 the rate
42 01 = 0x142 the size of silk data
After the size the following 0x142 bytes are the datas of silk file
just add it silk header #!SILK_V3
23 21 53 49 4C 4B 5F 56 33
I use silk_v3_decoder.exe (? some python script can do it)
silk_v3_decoder.exe in.hex out.pcm -Fs_API 16000
then
ffmpeg -f s16le -ar 16000 -ac 1 -i out.pcm out.wav
A frame represents a short time, so all the data must be concatenated
(as said hairlessbear)
Nota: at the end of the "frame" there is 4 bytes could be checksum
Answered by Gordon Freeman on April 14, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP