My Project
|
#include <item_sum.h>
Public Types | |
enum | Sumfunctype { COUNT_FUNC, COUNT_DISTINCT_FUNC, SUM_FUNC, SUM_DISTINCT_FUNC, AVG_FUNC, AVG_DISTINCT_FUNC, MIN_FUNC, MAX_FUNC, STD_FUNC, VARIANCE_FUNC, SUM_BIT_FUNC, UDF_SUM_FUNC, GROUP_CONCAT_FUNC } |
Public Member Functions | |
bool | has_force_copy_fields () const |
bool | has_with_distinct () const |
void | mark_as_sum_func () |
Item_sum (Item *a) | |
Item_sum (Item *a, Item *b) | |
Item_sum (List< Item > &list) | |
Item_sum (THD *thd, Item_sum *item) | |
enum Type | type () const |
virtual enum Sumfunctype | sum_func () const =0 |
bool | reset_and_add () |
virtual void | reset_field ()=0 |
virtual void | update_field ()=0 |
virtual bool | keep_field_type (void) const |
virtual void | fix_length_and_dec () |
virtual Item * | result_item (Field *field) |
table_map | used_tables () const |
void | update_used_tables () |
bool | is_null () |
void | make_const () |
virtual bool | const_item () const |
virtual bool | const_during_execution () const |
virtual void | print (String *str, enum_query_type query_type) |
void | fix_num_length_and_dec () |
virtual void | no_rows_in_result () |
virtual void | make_unique () |
Item * | get_tmp_table_item (THD *thd) |
virtual Field * | create_tmp_field (bool group, TABLE *table) |
bool | walk (Item_processor processor, bool walk_subquery, uchar *argument) |
virtual bool | clean_up_after_removal (uchar *arg) |
bool | init_sum_func_check (THD *thd) |
bool | check_sum_func (THD *thd, Item **ref) |
bool | register_sum_func (THD *thd, Item **ref) |
st_select_lex * | depended_from () |
Item * | get_arg (uint i) |
Item * | set_arg (uint i, THD *thd, Item *new_val) |
uint | get_arg_count () const |
void | init_aggregator () |
bool | aggregator_setup (THD *thd) |
void | aggregator_clear () |
bool | aggregator_add () |
void | set_distinct (bool distinct) |
int | set_aggregator (Aggregator::Aggregator_type aggregator) |
virtual void | clear ()=0 |
virtual bool | add ()=0 |
virtual bool | setup (THD *thd) |
virtual void | cleanup () |
Public Attributes | |
Item ** | ref_by |
Item_sum * | next |
Item_sum * | in_sum_func |
st_select_lex * | aggr_sel |
int8 | nest_level |
int8 | aggr_level |
int8 | max_arg_level |
int8 | max_sum_func_level |
bool | quick_group |
List< Item_field > | outer_fields |
Static Protected Member Functions | |
static ulonglong | ram_limitation (THD *thd) |
Protected Attributes | |
Aggregator * | aggr |
uint | arg_count |
Item ** | args |
Item * | tmp_args [2] |
Item ** | orig_args |
Item * | tmp_orig_args [2] |
table_map | used_tables_cache |
bool | forced_const |
Friends | |
class | Aggregator_distinct |
class | Aggregator_simple |
Class Item_sum is the base class used for special expressions that SQL calls 'set functions'. These expressions are formed with the help of aggregate functions such as SUM, MAX, GROUP_CONCAT etc.
GENERAL NOTES
A set function cannot be used in certain positions where expressions are accepted. There are some quite explicable restrictions for the usage of set functions.
In the query: SELECT AVG(b) FROM t1 WHERE SUM(b) > 20 GROUP by a the usage of the set function AVG(b) is legal, while the usage of SUM(b) is illegal. A WHERE condition must contain expressions that can be evaluated for each row of the table. Yet the expression SUM(b) can be evaluated only for each group of rows with the same value of column a. In the query: SELECT AVG(b) FROM t1 WHERE c > 30 GROUP BY a HAVING SUM(b) > 20 both set function expressions AVG(b) and SUM(b) are legal.
We can say that in a query without nested selects an occurrence of a set function in an expression of the SELECT list or/and in the HAVING clause is legal, while in the WHERE clause it's illegal.
The general rule to detect whether a set function is legal in a query with nested subqueries is much more complicated.
Consider the the following query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL (SELECT t2.c FROM t2 WHERE SUM(t1.b) < t2.c). The set function SUM(b) is used here in the WHERE clause of the subquery. Nevertheless it is legal since it is under the HAVING clause of the query to which this function relates. The expression SUM(t1.b) is evaluated for each group defined in the main query, not for groups of the subquery.
The problem of finding the query where to aggregate a particular set function is not so simple as it seems to be.
In the query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.c FROM t2 GROUP BY t2.c HAVING SUM(t1.a) < t2.c) the set function can be evaluated for both outer and inner selects. If we evaluate SUM(t1.a) for the outer query then we get the value of t1.a multiplied by the cardinality of a group in table t1. In this case in each correlated subquery SUM(t1.a) is used as a constant. But we also can evaluate SUM(t1.a) for the inner query. In this case t1.a will be a constant for each correlated subquery and summation is performed for each group of table t2. (Here it makes sense to remind that the query SELECT c FROM t GROUP BY a HAVING SUM(1) < a is quite legal in our SQL).
So depending on what query we assign the set function to we can get different result sets.
The general rule to detect the query where a set function is to be evaluated can be formulated as follows. Consider a set function S(E) where E is an expression with occurrences of column references C1, ..., CN. Resolve these column references against subqueries that contain the set function S(E). Let Q be the innermost subquery of those subqueries. (It should be noted here that S(E) in no way can be evaluated in the subquery embedding the subquery Q, otherwise S(E) would refer to at least one unbound column reference) If S(E) is used in a construct of Q where set functions are allowed then we evaluate S(E) in Q. Otherwise we look for a innermost subquery containing S(E) of those where usage of S(E) is allowed.
Let's demonstrate how this rule is applied to the following queries.
1. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 GROUP BY t2.b HAVING t2.b > ALL(SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) For this query the set function SUM(t1.a+t2.b) depends on t1.a and t2.b with t1.a defined in the outermost query, and t2.b defined for its subquery. The set function is in the HAVING clause of the subquery and can be evaluated in this subquery.
2. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) Here the set function SUM(t1.a+t2.b)is in the WHERE clause of the second subquery - the most upper subquery where t1.a and t2.b are defined. If we evaluate the function in this subquery we violate the context rules. So we evaluate the function in the third subquery (over table t3) where it is used under the HAVING clause.
3. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 WHERE SUM(t1.a+t2.b) < t3.c)) In this query evaluation of SUM(t1.a+t2.b) is not legal neither in the second nor in the third subqueries. So this query is invalid.
Mostly set functions cannot be nested. In the query SELECT t1.a from t1 GROUP BY t1.a HAVING AVG(SUM(t1.b)) > 20 the expression SUM(b) is not acceptable, though it is under a HAVING clause. Yet it is acceptable in the query: SELECT t.1 FROM t1 GROUP BY t1.a HAVING SUM(t1.b) > 20.
An argument of a set function does not have to be a reference to a table column as we saw it in examples above. This can be a more complex expression SELECT t1.a FROM t1 GROUP BY t1.a HAVING SUM(t1.b+1) > 20. The expression SUM(t1.b+1) has a very clear semantics in this context: we sum up the values of t1.b+1 where t1.b varies for all values within a group of rows that contain the same t1.a value.
A set function for an outer query yields a constant within a subquery. So the semantics of the query SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+SUM(t1.b)) > 20) is still clear. For a group of the rows with the same t1.a values we calculate the value of SUM(t1.b). This value 's' is substituted in the the subquery: SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+s) than returns some result set.
By the same reason the following query with a subquery SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(SUM(t1.b)) > 20) is also acceptable.
IMPLEMENTATION NOTES
Three methods were added to the class to check the constraints specified in the previous section. These methods utilize several new members.
The field 'nest_level' contains the number of the level for the subquery containing the set function. The main SELECT is of level 0, its subqueries are of levels 1, the subqueries of the latter are of level 2 and so on.
The field 'aggr_level' is to contain the nest level of the subquery where the set function is aggregated.
The field 'max_arg_level' is for the maximun of the nest levels of the unbound column references occurred in the set function. A column reference is unbound within a set function if it is not bound by any subquery used as a subexpression in this function. A column reference is bound by a subquery if it is a reference to the column by which the aggregation of some set function that is used in the subquery is calculated. For the set function used in the query SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 GROUP BY t2.b HAVING t2.b > ALL(SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) the value of max_arg_level is equal to 1 since t1.a is bound in the main query, and t2.b is bound by the first subquery whose nest level is 1. Obviously a set function cannot be aggregated in the subquery whose nest level is less than max_arg_level. (Yet it can be aggregated in the subqueries whose nest level is greater than max_arg_level.) In the query SELECT t.a FROM t1 HAVING AVG(t1.a+(SELECT MIN(t2.c) FROM t2)) the value of the max_arg_level for the AVG set function is 0 since the reference t2.c is bound in the subquery.
The field 'max_sum_func_level' is to contain the maximum of the nest levels of the set functions that are used as subexpressions of the arguments of the given set function, but not aggregated in any subquery within this set function. A nested set function s1 can be used within set function s0 only if s1.max_sum_func_level < s0.max_sum_func_level. Set function s1 is considered as nested for set function s0 if s1 is not calculated in any subquery within s0.
A set function that is used as a subexpression in an argument of another set function refers to the latter via the field 'in_sum_func'.
The condition imposed on the usage of set functions are checked when we traverse query subexpressions with the help of the recursive method fix_fields. When we apply this method to an object of the class Item_sum, first, on the descent, we call the method init_sum_func_check that initialize members used at checking. Then, on the ascent, we call the method check_sum_func that validates the set function usage and reports an error if it is illegal. The method register_sum_func serves to link the items for the set functions that are aggregated in the embedding (sub)queries. Circular chains of such functions are attached to the corresponding st_select_lex structures through the field inner_sum_func_list.
Exploiting the fact that the members mentioned above are used in one recursive function we could have allocated them on the thread stack. Yet we don't do it now.
We assume that the nesting level of subquries does not exceed 127. TODO: to catch queries where the limit is exceeded to make the code clean here.
Item_sum::Item_sum | ( | THD * | thd, |
Item_sum * | item | ||
) |
Constructor used in processing select with temporary tebles.
bool Item_sum::aggregator_add | ( | ) | [inline] |
Called to add value to the aggregator.
void Item_sum::aggregator_clear | ( | ) | [inline] |
Called to cleanup the aggregator.
bool Item_sum::aggregator_setup | ( | THD * | thd | ) | [inline] |
Called to initialize the aggregator.
bool Item_sum::check_sum_func | ( | THD * | thd, |
Item ** | ref | ||
) |
Check constraints imposed on a usage of a set function.
The method verifies whether context conditions imposed on a usage of any set function are met for this occurrence. It checks whether the set function occurs in the position where it can be aggregated and, when it happens to occur in argument of another set function, the method checks that these two functions are aggregated in different subqueries. If the context conditions are not met the method reports an error. If the set function is aggregated in some outer subquery the method adds it to the chain of items for such set functions that is attached to the the st_select_lex structure for this subquery.
A number of designated members of the object are used to check the conditions. They are specified in the comment before the Item_sum class declaration. Additionally a bitmap variable called allow_sum_func is employed. It is included into the thd->lex structure. The bitmap contains 1 at n-th position if the set function happens to occur under a construct of the n-th level subquery where usage of set functions are allowed (i.e either in the SELECT list or in the HAVING clause of the corresponding subquery) Consider the query:
SELECT SUM(t1.b) FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 WHERE AVG(t1.b) > 20) AND t1.a > (SELECT MIN(t2.d) FROM t2);
allow_sum_func will contain:
thd | reference to the thread context info |
ref | location of the pointer to this item in the embedding expression |
TRUE | if an error is reported |
FALSE | otherwise |
bool Item_sum::clean_up_after_removal | ( | uchar * | arg | ) | [virtual] |
Remove the item from the list of inner aggregation functions in the SELECT_LEX it was moved to by Item_sum::register_sum_func().
This is done to undo some of the effects of Item_sum::register_sum_func() so that the item may be removed from the query.
Reimplemented from Item.
bool Item_sum::init_sum_func_check | ( | THD * | thd | ) |
Prepare an aggregate function item for checking context conditions.
The function initializes the members of the Item_sum object created for a set function that are used to check validity of the set function occurrence. If the set function is not allowed in any subquery where it occurs an error is reported immediately.
thd | reference to the thread context info |
TRUE | if an error is reported |
FALSE | otherwise |
virtual void Item_sum::no_rows_in_result | ( | ) | [inline, virtual] |
Mark an aggregate as having no rows.
This function is called by the execution engine to assign 'NO ROWS FOUND' value to an aggregate item, when the underlying result set has no rows. Such value, in a general case, may be different from the default value of the item after 'clear()': e.g. a numeric item may be initialized to 0 by clear() and to NULL by no_rows_in_result().
Reimplemented from Item.
Reimplemented in Item_func_group_concat, Item_sum_hybrid, Item_sum_variance, Item_sum_avg, Item_sum_count, and Item_sum_sum.
void Item_sum::print | ( | String * | str, |
enum_query_type | query_type | ||
) | [virtual] |
This method is used for to:
For more information about view definition query, INFORMATION_SCHEMA query and why they should be generated from the Item-tree,
Reimplemented from Item.
Reimplemented in Item_func_group_concat.
ulonglong Item_sum::ram_limitation | ( | THD * | thd | ) | [static, protected] |
Calculate the affordable RAM limit for structures like TREE or Unique used in Item_sum_*
bool Item_sum::register_sum_func | ( | THD * | thd, |
Item ** | ref | ||
) |
Attach a set function to the subquery where it must be aggregated.
The function looks for an outer subquery where the set function must be aggregated. If it finds such a subquery then aggr_level is set to the nest level of this subquery and the item for the set function is added to the list of set functions used in nested subqueries inner_sum_func_list defined for each subquery. When the item is placed there the field 'ref_by' is set to ref.
thd | reference to the thread context info |
ref | location of the pointer to this item in the embedding expression |
FALSE | if the executes without failures (currently always) |
TRUE | otherwise |
bool Item_sum::reset_and_add | ( | ) | [inline] |
Resets the aggregate value to its default and aggregates the current value of its attribute(s).
Aggregator* Item_sum::aggr [protected] |
Aggregator class instance. Not set initially. Allocated only after it is determined if the incoming data are already distinct.