Query To DXL Translator ======================= The Query To DXL Translator, as known as the algebrizer, takes as input a parsed query object and returns its DXL representation, which is an algebrized tree comprised of logical and scalar operations. The serialized DXL is then shipped to GPORCA for optimization. Nomenclature ------------ Logical Operator: Set based operator that maps one or more sets of tuples to a set of tuples. Physical Operator: Implementation of the logical operator, e.g. a logical join operation can be implemented as a nested-loop, hash- or merge-join. Scalar Operator: Maps a single or set of tuples to a single value. Algebrized Representation (Tree): The DXL representation of the GPDB query object Algebrizer Components --------------------- CQueryMutator: normalize the incoming GPDB Query object. CTranslatorQueryToDXL: translate the normalized Query into a DXL tree structure. This DXL structure will then be serialized into DXL document and be shipped to the optimizer. CTranslatorDXLToQuery: translates the DXL back into a GPDB Query, this is only used to verify the correctness of the translation from Query to DXL. Normalized Query Form --------------------- 1) GROUP BY We assert that the group by operator has aggregate functions, grouping functions and grouping columns and nothing else. Expression on aggregate operators or grouping functions are translated as a project on top of the group by operation. To facilitate this we mutate the original query to a form that conforms to this assumption. To illustrate consider the following queries: Example 1: INPUT: SELECT a, count(*) + 1, grouping(a) + 1 FROM foo GROUP BY a OUTPUT: SELECT q_new.a, q_new.ct + 1, q_new.grouping + 1 FROM (SELECT a, count(*) ct, grouping(a) grouping FROM foo GROUP BY a) q_new Example 2: INPUT: SELECT * FROM bar where bar.c = (SELECT count(*) + sum(a) FROM foo GROUP BY a) OUTPUT: SELECT * FROM bar where bar.c = (SELECT q_new.ct + q_new.sm FROM (SELECT a, count(*) ct, sum(b) sm FROM foo GROUP BY a) q_new) 2) HAVING The having clause is translated into a select on top the original query. To accomplish this, we traverse the having clause to collect all aggregates pertaining to the current grouping operation and replacing it with the corresponding variable. To illustrate consider the following queries: Example: INPUT: SELECT a FROM foo GROUP BY a HAVING (count(*) > (SELECT sum(foo.b) FROM bar)) OUTPUT: SELECT q_new.a FROM (SELECT a, count(*) ct, sum(foo.b) sm FROM foo GROUP BY a) q_new) WHERE q_new.ct > (SELECT q_new.sm FROM bar)) 3) WINDOW Similar to GROUP BY, a window operator only has simple window functions rather than expression on window functions. The expression involving window functions are then translated into a project on top of the window operator. The mutation methodology is identical to the process described in GROUP BY. 4) DISTINCT We transform the distinct clause into a GROUP BY on top the original query. Example: INPUT: SELECT distinct a, count(*) FROM foo GROUP BY a, b OUTPUT: SELECT q_new.a, q_new.ct FROM (SELECT a, count(*) ct, FROM foo GROUP BY a) q_new) GROUP BY q_new.a, q_new.ct Query To DXL Translation ------------------------ Input: Query Object CTranslatorQueryToDXL assumes that the query has already been normalized. Therefore the following members in the query objects are NULL: 1. Distinct Columns 2. Having Clause Output: DXL DXL containing: 1. List of output columns along with their alias name 2. List of CTE Producers where each CTE producer has: a. An unique identifier b. List of output columns 3. The DXL representation of the query Translation Algorithm 1. If the query has set operations then a. Process the inputs to the set op. b. Check if the inputs need to casted - If so the optimizer will generate new output column for the casting functions c. Create a logical set operator that contains: - List of input columns from each input - List of output columns of the set op 2. If not, process the from clause (FROM EXPR) a. Translate each range table entry used in the from clause based on the type of the range table entry - Table Entries: translate into logical get or logical external get - TVF: translate into logical tvf - Value Scan: translate into a constant table get - CTE: represented as a logical CTE consumer with it CTE Producer ID and list of output columns. - Translate a derived table to a logical tree 3. Check if the query has window operation a. If the window specification is a computed column then create a project child b. Add window function in the target list to the project list of the window operator 4. Check if the query has GROUP BY (including grouping sets/rollup) a. Expand rollup/grouping sets into a union all over group bys b. Add the aggregates in the target list to the project list of the group by operator 5. Check if the query has ordering clause a. Create a logical limit operator with - Order Spec (sorting columns and sorting functions used) - limit count, and limit offset 6. Generate the list of output columns and the alias name (for display)