Simple SQL Formatter

Someone should inject a comment here about sql being a programming language and they should probably also mention that a true parser etc is required and how an armchair solution will come close, but will be a real b*tch to get it right.

But this is entertaining, so I will refrain from doing so.

True statement… and for my purposes, “close” will be more than good enough, since it really isn’t expected to parse full SQL “programs”, but simple SQL queries that are 10-15 lines long at most…

And I believe that Simon’s purpose mirrors mine almost exactly…

But this is challenging…

I might take a crack at this myself.

Off the top of my head:

  • Scan the SQL into tokens. These are either words, number streams, strings, or comments.
  • Separate the tokens into blocks. Blocks start and end with parens and can embed and be embedded into other blocks.
  • Determine the type of each block. Blocks that start with certain keywords (select, table, insert, etc.) are designated “master” blocks. Keep drilling down to categorize as many blocks as you can and create the hierarchy.
  • Start writing out code.

See, simple! It only took me 5 minutes to write that. :slight_smile:

Here is the process mine follows

  • break into tokens “words”,“numbers”,“quoted strings”, block and line comments and “everything else”
  • utilize a dictionary of keywords, with each word assigned a line break “code” to split before, after, or both
  • reassign case (upper/lower/title) to both keywords and non-keywords based on user choice (default is keywords UPPER, other words “lower”
  • insert the EOL data based on that dictionary (with special logic to handle comments and user embedded EOL inside quotes)

Right now mine does not “care” about ( ) other than they are “token breaks”… but am considering adding logic to see if ( starts a subquery and indent accordingly)

Dave,

I have been using for a few years and happy with SQL Pretty Printer

[quote=325198:@Rich Hatfield]Dave,

I have been using for a few years and happy with SQL Pretty Printer[/quote]
However the purpose of this exercise/contest is to create a method/class that can be integrated into a XOJO project using 100% Xojo code … :slight_smile:

I am really enjoying this!

Ok, latest version. Kem’s complex code:

[code]–
– Some test sql
– with a multi-line comment
– and stuff

with my_with_1 as (
select col1, col2, col3 as “col 3”
from (
select * from table2) as sub_table1 where
id = ‘some string’
)
with my_with_2 as (
select some_function(param1, param2, ‘string param’)
from table3 t3
left join table4 as t4 on (t4.id = t4.table4_id)
where upper(lower(string_function(‘some
long
string’, otherparam))) like my_with_1.col1
)

/* Here come
the select statements, but not here
select * from table1 where i = 4;
*/
(
select col1, col2, col3 as " select "
from my_with_2 – using my_with_2 here
right join table5 t5 – comment in the middle of join
on (
select distinct * from table6, table7 where table6.table7_id = table7.id
) where
t5 = some_other_function(
param1, param
)
group by col3, col2, col1
order by col1

limit 5 offset 4

) union
(
– just a copy of the one above
select col1, col2, col3 as " select "
from my_with_1 – using my_with_1 here
right join table5 t5 – comment in the middle of join
on (
select distinct * from table6, table7 where table6.table7_id = table7.id
) where
t5 = some_other_function(
param1, param
)
group by col3, col2, col1
order by col1

limit 5 offset 4

)[/code]
The output from my formatter:

[code]–
– Some test sql
– with a multi-line comment
– and stuff

WITH my_with_1 AS (SELECT
col1,
col2,
col3 AS “col 3”
FROM
(SELECT
*
FROM
table2) AS sub_table1
WHERE
id = ‘some string’ )
WITH my_with_2 AS (SELECT
some_function(param1,
param2,
‘string param’)
FROM
table3 t3
LEFT JOIN table4 AS t4 ON (t4.id = t4.table4_id)
WHERE
upper(lower(string_function(‘some long string’,
otherparam)))
LIKE my_with_1.col1 )
/* Here come
the select statements, but not here
select * from table1 where i = 4;
*/
(SELECT
col1,
col2,
col3 AS " select "
FROM
my_with_2 – using my_with_2 here
RIGHT JOIN table5 t5 – comment in the middle of join
ON (SELECT
distinct *
FROM
table6,
table7
WHERE
table6.table7_id = table7.id )
WHERE
t5 = some_other_function( param1,
param )
GROUP BY
col3,
col2,
col1
ORDER BY
col1
LIMIT 5 OFFSET 4 )
UNION
( – just a copy of the one above
SELECT
col1,
col2,
col3 AS " select "
FROM
my_with_1 – using my_with_1 here
RIGHT JOIN table5 t5 – comment in the middle of join
ON (SELECT
distinct *
FROM
table6,
table7
WHERE
table6.table7_id = table7.id )
WHERE
t5 = some_other_function( param1,
param )
GROUP BY
col3,
col2,
col1
ORDER BY
col1
LIMIT 5 OFFSET 4 )[/code]
Latest version here: sqlFormatter

[quote=325178:@Kem Tekinay]I might take a crack at this myself.

Off the top of my head:

  • Scan the SQL into tokens. These are either words, number streams, strings, or comments.
  • Separate the tokens into blocks. Blocks start and end with parens and can embed and be embedded into other blocks.
  • Determine the type of each block. Blocks that start with certain keywords (select, table, insert, etc.) are designated “master” blocks. Keep drilling down to categorize as many blocks as you can and create the hierarchy.
  • Start writing out code.

See, simple! It only took me 5 minutes to write that. :)[/quote]
I have not responded to this as my source code is available to all. I have to admit that I started this as a really simple basic formatter for SQLite only. However, your challenge has really intrigued me!

Sorry to interrupt, but could someone explain to me why I might want to use this?

If I have a complex SQL command to write, I write it in a text editor (TextWrangler) and format it so I can read it. I then take the text and add it to the project. (It would be nice if the Xojo IDE would edit text files directly.) I have a folder in the project for queries.

-Bob (who has obviously missed the point)

It’s for when you have retrieved an unformatted sql statement from somewhere and you want to make it human readable. The opposite of your example.

Also, don’t be a party pooper. :stuck_out_tongue:

and that “somewhere” could in fact be another developer who doesn’t like “end of line”… I had a member of my previous team that would just type a query and press enter when he came to the edge of the screen… I had to refactor his code every time I inherited some of it.

Ah. Now it all makes sense!

Thank you. And, it is an interesting problem to solve.

-Bob

(Note to Kem: I did apologize right at the beginning. :))

[quote=325250:@Bob Gordon]Ah. Now it all makes sense!
(Note to Kem: I did apologize right at the beginning. :))[/quote]
From my viewpoint - no offence taken!

Dave’s original post was because he is in the process of making a Database Manager. I chimed in because I, too, have made (and now reviewing) a database manager and thought I had already solved the problem.

However…

It was pointed out to me very quickly that I had not solved the problem and, indeed, was far far away from solving the problem.

This whole thread has been great from my point of view and competition is great!

At the end of the day there is absolutely no benefit from this, apart from making a single text line (SQL) readable to a human. If you look at a complex statement as a single line it is impossible to work it out.

And, anyway, this is fun!

Upping the difficulty a bit…

--
-- Some test sql
-- with a multi-line comment
-- and stuff
--

with my_with_1 as (
select col1, col2, col3 as "col 3"
from (
select * from table2) as sub_table1 where
id = 'some string'
)
with my_with_2 as (
select some_function(param1, param2, 'string param')
from table3 t3
left join table4 as t4 on (t4.id = t4.table4_id)
where upper(lower(string_function('some 
long
string', otherparam))) like my_with_1.col1
)

/* Here come
the select statements, but not here
select * from table1 where i = 4;
*/
(
select col1, col2, col3 as "   select  "
from my_with_2 -- using my_with_2 here
right join table5 t5 -- comment in the middle of join
on (
select distinct * from table6, table7 where table6.table7_id = table7.id
) where 
t5 = some_other_function(
param1, param
)
group by col3, col2, col1
order by col1

limit 5 offset 4

) union
(
-- just a copy of the one above
select col1, col2, col3 as "   select  "
from my_with_1 -- using my_with_1 here
right join table5 t5 -- comment in the middle of join
on (
select distinct * from table6, table7 where table6.table7_id = table7.id
) where 
t5 = some_other_function(
param1, param
)
group by col3, col2, col1
order by col1

limit 5 offset 4

); select '"this is a quote" within a quote',
'this is a --comment within a quote', 'this is /*
another comment within a quote */' /*
'a quote within a comment'
*/, 0--5, some_field -- should be a zero, comment

The way I’d format that…

--
-- Some test sql
-- with a multi-line comment
-- and stuff
--

WITH my_with_1 AS (
  SELECT
    col1,
    col2,
    col3 as "col 3"
  FROM (
    SELECT
      * 
    FROM
      table2
    ) AS sub_table1
  WHERE
    id = 'some string'
)

WITH my_with_2 AS (
  SELECT
    some_function(param1, param2, 'string param')
  FROM
    table3 t3
    LEFT JOIN table4 AS t4 ON (
      t4.id = t4.table4_id
    )
  WHERE
    UPPER(LOWER(string_function('some 
long
string', otherparam))) LIKE my_with_1.col1
)

/* Here come
the select statements, but not here
select * from table1 where i = 4;
*/
(
  SELECT
    col1,
    col2,
    col3 AS "   select  "
  FROM
    my_with_2 -- using my_with_2 here
    RIGHT JOIN table5 t5 -- comment in the middle of join
    ON (
      SELECT DISTINCT
        *
      FROM
        table6,
        table7
      WHERE
        table6.table7_id = table7.id
    )
  WHERE 
    t5 = some_other_function(param1, param)
  GROUP BY
    col3,
    col2,
    col1
  ORDER BY
    col1
  LIMIT 5
  OFFSET 4
) UNION (
  -- just a copy of the one above
  SELECT
    col1,
    col2,
    col3 AS "   select  "
  FROM
    my_with_2 -- using my_with_2 here
    RIGHT JOIN table5 t5 -- comment in the middle of join
    ON (
      SELECT DISTINCT
        *
      FROM
        table6,
        table7
      WHERE
        table6.table7_id = table7.id
    )
  WHERE 
    t5 = some_other_function(param1, param)
  GROUP BY
    col3,
    col2,
    col1
  ORDER BY
    col1
  LIMIT 5
  OFFSET 4
);
SELECT
  '"this is a quote" within a quote',
  'this is a --comment within a quote',
  'this is /*
another comment within a quote */' /*
'a quote within a comment'
*/,
  0--5, some_field -- should be a zero, comment

Just to be clear, the first part of the SQL remains the same. I added the second clause at the end.

--
-- some test sql
-- with a multi-line comment
-- and stuff
--
WITH my_with_1 AS (
SELECT col1, col2, col3 AS "col 3"
  FROM (
SELECT *
  FROM table2) AS sub_table1
 WHERE id = 'some string') WITH my_with_2 AS (
SELECT some_function (param1, param2, 'string param')
  FROM table3 t3
  LEFT JOIN table4 AS t4
    ON (t4.id = t4.table4_id)
 WHERE UPPER(LOWER(string_function ('some 
long
string', otherparam))) LIKEmy_with_1.col1) /* here come
the select statements, but not here
select * from table1 where i = 4;
* / (
SELECT col1, col2, col3 AS "   select  "
  FROM my_with_2 -- using my_with_2 here
 RIGHT JOIN table5 t5 -- comment in the middle of join
    ON (
SELECT DISTINCT *
  FROM table6, table7
 WHERE table6.table7_id = table7.id)
 WHERE t5 = some_other_function (param1, param)
 GROUP BY col3, col2, col1
 ORDER BY col1
 LIMIT 5
OFFSET 4)
 UNION
   (-- just a copy of the one above
SELECT col1, col2, col3 AS "   select  "
  FROM my_with_1 -- using my_with_1 here
 RIGHT JOIN table5 t5 -- comment in the middle of join
    ON (
SELECT DISTINCT *
  FROM table6, table7
 WHERE table6.table7_id = table7.id)
 WHERE t5 = some_other_function (param1, param)
 GROUP BY col3, col2, col1
 ORDER BY col1
 LIMIT 5
OFFSET 4) ;
SELECT ' "this is a quote" within a QUOTE',
' this IS a --comment within a quote', 'this is /*
another comment within a QUOTE* / ' /*
' a QUOTEwithin a comment '
*/, 0--5, some_field -- should be a zero, comment

not “quite” right it seems… I need to fix how it emits the */ … this example split it which is not right

should have been

I will play with Kem’s latest crazy sql tomorrow.

Oh, to try and have a real job with this!