Extracting data from a JSON string using MySQL
In modern database management, the JSON format is widely used due to its flexibility. However, when data is stored in JSON, we often need to convert it to a more manageable format. This article will demonstrate, through a specific SQL query example, how to extract and reformat data from a JSON string stored in MySQL.
1. Background Knowledge
JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy to read and write, and also easy for machines to parse and generate. MySQL has supported the JSON data type since version 5.7, making it possible to store and manipulate JSON data in the database.
In many applications, JSON strings may be stored in a field of a table, and we need to extract and transform this data for further analysis or presentation.
2. Sample Data
Suppose we wf_lcdy have a field in a table lctthat stores the following JSON string:
{"15775d64e52c4ba3a8eef4bafc5f40e5":"875 162","75b67fab657748a9ab4bba141bfa0d36":"375 98","428299fd90814b3eaf129e8246f82b2a":"155 126"}
We want to convert it into an array in the following format:
[{"id":"15775d64e52c4ba3a8eef4bafc5f40e5","x":875,"y":162},{"id":"75b67fab657748a9ab4bba141bfa0d36","x":375,"y":98},{"id":"428299fd90814b3eaf129e8246f82b2a","x":155,"y":126}]
3. SQL Query Analysis
The following is the SQL query to achieve this transformation:
SELECT
CONCAT('[', GROUP_CONCAT(
CONCAT(
'{"id":"',
SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', 1), '"', -1),
'", "x":',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', -1), ' ', 1) AS UNSIGNED),
', "y":',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', -1), ' ', -1) AS UNSIGNED),
'}'
)
), ']') AS result
FROM (
SELECT
TRIM(BOTH '"' FROM kv) AS kv
FROM (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(lct, '{', ''), '}', ''), '"', ''), ',', numbers.n), ',', -1) AS kv
FROM wf_lcdy
JOIN (
SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10
) numbers
WHERE CHAR_LENGTH(lct) - CHAR_LENGTH(REPLACE(lct, ',', '')) >= numbers.n - 1 AND ID = '0c86346993d64d98ad17892974bf8963'
) AS temp
) AS kv_pairs;
3.1 Query Structure Parsing
1. Inner query :
- Remove redundant characters : First, use
REPLACEa function to remove ‘ &’ and ‘& ‘lctfrom the field . This simplifies subsequent processing. Split the string : Use a function to split each key-value pair. We achieve this using a numeric table (1 to 10). The numeric table helps us iterate through each key-value pair, as we cannot know in advance the number of key-value pairs in the JSON.{}"SUBSTRING_INDEX
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(lct, ‘{’, ‘’), ‘}’, ‘’), ‘"’, ‘’), ‘,’, numbers.n), ‘,’, -1) AS kv
This code splits the JSON string into multiple key-value pairs, and kv the columns will contain values like this:
15775d64e52c4ba3a8eef4bafc5f40e5:875 16275b67fab657748a9ab4bba141bfa0d36:375 98428299fd90814b3eaf129e8246f82b2a:155 126
2. Mid-level queries :
- In this query, we will
kvperform further processing on the columns. We’veTRIM(BOTH '"' FROM kv)removed any extra quotes to ensure subsequent operations are not affected.
SELECT TRIM(BOTH ‘"’ FROM kv) AS kv
3. Outer query :
- Aggregation and formatting : In the outer query, we
GROUP_CONCATaggregate allkvpairs andCONCATgenerate a JSON string in the target format. - Extracting data : Use
SUBSTRING_INDEXthe `extract`id, `value`,xandy`container` methods to extract the values and convert them to the appropriate format. The key here is splitting the string and extracting the numbers.
GROUP_CONCAT(
CONCAT(
‘{“id”:"’,
SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, 1), ‘"’, -1),
‘", “x”:’,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, -1), ’ ', 1) AS UNSIGNED),
‘, “y”:’,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, -1), ’ ', -1) AS UNSIGNED),
‘}’
)
)
Final result : The final result will be a string in JSON array format.
4. Query Results
After running the above query, you will get the desired result format:
[{"id":"15775d64e52c4ba3a8eef4bafc5f40e5","x":875,"y":162},{"id":"75b67fab657748a9ab4bba141bfa0d36","x":375,"y":98},{"id":"428299fd90814b3eaf129e8246f82b2a","x":155,"y":126}]
5. Performance considerations
- Character length calculation :
CHAR_LENGTH(lct) - CHAR_LENGTH(REPLACE(lct, ',', ''))This calculation ensures we only process existing key-value pairs. This method has some performance impact, especially for large text files. - Using numeric tables : Because the structure of JSON can change, the use of numeric tables can be extended to support more key-value pairs. In practical applications, you can increase the range of numbers as needed.
6. Summary
Through the SQL query above, we successfully extracted data from a field containing a JSON string and converted it into another structured format. This method demonstrates MySQL’s flexibility and powerful capabilities in handling JSON data.
In practical applications, you can modify queries appropriately according to specific needs to adapt to JSON data with different structures. Furthermore, understanding the use of string manipulation and aggregate functions in SQL is crucial for improving data processing capabilities and efficiency.