How to troubleshoot AVRO and JSON files on CLI

Learn

When errors make an import of an AVRO or JSON files impossible, use these troubleshooting steps to gain insight into the AVRO file, the structure, the data types and the values. You can find out where or why an import of the file is failing and if the reason is related to structure, types, or values. For these steps, use the Apache AVRO tools

1. Check file header.

$ hexdump -C -n 250 test.avro
00000000 4f 62 6a 01 02 16 61 76 72 6f 2e 73 63 68 65 6d |Obj...avro.schem|
00000010 61 e4 1d 7b 22 74 79 70 65 22 3a 22 72 65 63 6f |a..{"type":"reco|
00000020 72 64 22 2c 22 6e 61 6d 65 22 3a 22 4d 65 73 73 |rd","name":"Mess|
00000030 61 67 65 22 2c 22 6e 61 6d 65 73 70 61 63 65 22 |age","namespace"|
...

2. Get the metadata.

$ java -jar /Applications/Java\ Apps/avro-tools-1.7.7.jar getmeta test.avro

3. Get the schema. If you store the schema in a separate file, you can use it again later.

$ java -jar /Applications/Java\ Apps/avro-tools-1.7.7.jar getschema test.avro > test.avsc 
$ cat test.avsc
{
"type" : "record",
"name" : "table_name",
"doc" : "comment",
"fields" : [ {
"name" : "ID",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "MIMETYPE",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "THESIZE",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "CONTENT",
"type" : [ "null", "bytes" ],
"default" : null
} ]
}
4. Convert to JSON and try to import your data in.
$ java -jar /Applications/Java\ Apps/avro-tools-1.7.7.jar tojson --pretty test.avro > test.json
$ cat test.json

5. Convert from JSON to binary AVRO using your created schema file. At this point, you can change the compression. 

$ java -jar /Applications/Java\ Apps/avro-tools-1.7.7.jar fromjson --schema-file test.avsc test.json > test.avro
$ java -jar /Applications/Java\ Apps/avro-tools-1.7.7.jar fromjson --codec snappy --schema-file test.avsc test.json > test.snappy.avro

Further Information

There are additional tools necessary to generate JSON schema from JSON data