Extracting data from a JSON string using MySQL

Extracting data from a JSON string using MySQL

In modern database management, the JSON format is widely used due to its flexibility. However, when data is stored in JSON, we often need to convert it to a more manageable format. This article will demonstrate, through a specific SQL query example, how to extract and reformat data from a JSON string stored in MySQL.

1. Background Knowledge

JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy to read and write, and also easy for machines to parse and generate. MySQL has supported the JSON data type since version 5.7, making it possible to store and manipulate JSON data in the database.

In many applications, JSON strings may be stored in a field of a table, and we need to extract and transform this data for further analysis or presentation.

2. Sample Data

Suppose we  wf_lcdy have a field in a table  lctthat stores the following JSON string:

{"15775d64e52c4ba3a8eef4bafc5f40e5":"875 162","75b67fab657748a9ab4bba141bfa0d36":"375 98","428299fd90814b3eaf129e8246f82b2a":"155 126"}

We want to convert it into an array in the following format:

[{"id":"15775d64e52c4ba3a8eef4bafc5f40e5","x":875,"y":162},{"id":"75b67fab657748a9ab4bba141bfa0d36","x":375,"y":98},{"id":"428299fd90814b3eaf129e8246f82b2a","x":155,"y":126}]
3. SQL Query Analysis

The following is the SQL query to achieve this transformation:

SELECT
CONCAT('[', GROUP_CONCAT(
CONCAT(
'{"id":"',
SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', 1), '"', -1),
'", "x":',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', -1), ' ', 1) AS UNSIGNED),
', "y":',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ':', -1), ' ', -1) AS UNSIGNED),
'}'
)
), ']') AS result
FROM (
SELECT
TRIM(BOTH '"' FROM kv) AS kv
FROM (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(lct, '{', ''), '}', ''), '"', ''), ',', numbers.n), ',', -1) AS kv
FROM wf_lcdy
JOIN (
SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10
) numbers
WHERE CHAR_LENGTH(lct) - CHAR_LENGTH(REPLACE(lct, ',', '')) >= numbers.n - 1 AND ID = '0c86346993d64d98ad17892974bf8963'
) AS temp
) AS kv_pairs;
3.1 Query Structure Parsing

1. Inner query :

  • Remove redundant characters : First, use  REPLACE a function to  remove ‘ &’  and ‘&  ‘ lct from the field   . This simplifies subsequent processing. Split the string : Use   a function to split each key-value pair. We achieve this using a numeric table (1 to 10). The numeric table helps us iterate through each key-value pair, as we cannot know in advance the number of key-value pairs in the JSON.{}"SUBSTRING_INDEX
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(lct, ‘{’, ‘’), ‘}’, ‘’), ‘"’, ‘’), ‘,’, numbers.n), ‘,’, -1) AS kv

This code splits the JSON string into multiple key-value pairs, and kv the columns will contain values ​​like this:

  • 15775d64e52c4ba3a8eef4bafc5f40e5:875 162
  • 75b67fab657748a9ab4bba141bfa0d36:375 98
  • 428299fd90814b3eaf129e8246f82b2a:155 126

2. Mid-level queries :

  • In this query, we will  kv perform further processing on the columns. We’ve  TRIM(BOTH '"' FROM kv) removed any extra quotes to ensure subsequent operations are not affected.
SELECT TRIM(BOTH ‘"’ FROM kv) AS kv

3. Outer query :

  • Aggregation and formatting : In the outer query, we  GROUP_CONCAT aggregate all  kv pairs and  CONCAT generate a JSON string in the target format.
  • Extracting data : Use  SUBSTRING_INDEX the `extract`  id, `value`, x and  y `container` methods to extract the values ​​and convert them to the appropriate format. The key here is splitting the string and extracting the numbers.
GROUP_CONCAT(
CONCAT(
‘{“id”:"’,
SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, 1), ‘"’, -1),
‘", “x”:’,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, -1), ’ ', 1) AS UNSIGNED),
‘, “y”:’,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(kv, ‘:’, -1), ’ ', -1) AS UNSIGNED),
‘}’
)
)

Final result : The final result will be a string in JSON array format.

4. Query Results

After running the above query, you will get the desired result format:

[{"id":"15775d64e52c4ba3a8eef4bafc5f40e5","x":875,"y":162},{"id":"75b67fab657748a9ab4bba141bfa0d36","x":375,"y":98},{"id":"428299fd90814b3eaf129e8246f82b2a","x":155,"y":126}]
5. Performance considerations
  • Character length calculation : CHAR_LENGTH(lct) - CHAR_LENGTH(REPLACE(lct, ',', '')) This calculation ensures we only process existing key-value pairs. This method has some performance impact, especially for large text files.
  • Using numeric tables : Because the structure of JSON can change, the use of numeric tables can be extended to support more key-value pairs. In practical applications, you can increase the range of numbers as needed.
6. Summary

Through the SQL query above, we successfully extracted data from a field containing a JSON string and converted it into another structured format. This method demonstrates MySQL’s flexibility and powerful capabilities in handling JSON data.

In practical applications, you can modify queries appropriately according to specific needs to adapt to JSON data with different structures. Furthermore, understanding the use of string manipulation and aggregate functions in SQL is crucial for improving data processing capabilities and efficiency.