My Project
Public Types | Public Member Functions | Public Attributes | Static Protected Member Functions | Protected Attributes | Friends
Item_sum Class Reference

#include <item_sum.h>

Inheritance diagram for Item_sum:
Item_result_field Item Item_func_group_concat Item_sum_hybrid Item_sum_num Item_sum_max Item_sum_min Item_sum_int Item_sum_sum Item_sum_udf_decimal Item_sum_udf_float Item_sum_udf_int Item_sum_udf_str Item_sum_variance

List of all members.

Public Types

enum  Sumfunctype {
  COUNT_FUNC, COUNT_DISTINCT_FUNC, SUM_FUNC, SUM_DISTINCT_FUNC,
  AVG_FUNC, AVG_DISTINCT_FUNC, MIN_FUNC, MAX_FUNC,
  STD_FUNC, VARIANCE_FUNC, SUM_BIT_FUNC, UDF_SUM_FUNC,
  GROUP_CONCAT_FUNC
}

Public Member Functions

bool has_force_copy_fields () const
bool has_with_distinct () const
void mark_as_sum_func ()
 Item_sum (Item *a)
 Item_sum (Item *a, Item *b)
 Item_sum (List< Item > &list)
 Item_sum (THD *thd, Item_sum *item)
enum Type type () const
virtual enum Sumfunctype sum_func () const =0
bool reset_and_add ()
virtual void reset_field ()=0
virtual void update_field ()=0
virtual bool keep_field_type (void) const
virtual void fix_length_and_dec ()
virtual Itemresult_item (Field *field)
table_map used_tables () const
void update_used_tables ()
bool is_null ()
void make_const ()
virtual bool const_item () const
virtual bool const_during_execution () const
virtual void print (String *str, enum_query_type query_type)
void fix_num_length_and_dec ()
virtual void no_rows_in_result ()
virtual void make_unique ()
Itemget_tmp_table_item (THD *thd)
virtual Fieldcreate_tmp_field (bool group, TABLE *table)
bool walk (Item_processor processor, bool walk_subquery, uchar *argument)
virtual bool clean_up_after_removal (uchar *arg)
bool init_sum_func_check (THD *thd)
bool check_sum_func (THD *thd, Item **ref)
bool register_sum_func (THD *thd, Item **ref)
st_select_lex * depended_from ()
Itemget_arg (uint i)
Itemset_arg (uint i, THD *thd, Item *new_val)
uint get_arg_count () const
void init_aggregator ()
bool aggregator_setup (THD *thd)
void aggregator_clear ()
bool aggregator_add ()
void set_distinct (bool distinct)
int set_aggregator (Aggregator::Aggregator_type aggregator)
virtual void clear ()=0
virtual bool add ()=0
virtual bool setup (THD *thd)
virtual void cleanup ()

Public Attributes

Item ** ref_by
Item_sumnext
Item_sumin_sum_func
st_select_lex * aggr_sel
int8 nest_level
int8 aggr_level
int8 max_arg_level
int8 max_sum_func_level
bool quick_group
List< Item_fieldouter_fields

Static Protected Member Functions

static ulonglong ram_limitation (THD *thd)

Protected Attributes

Aggregatoraggr
uint arg_count
Item ** args
Itemtmp_args [2]
Item ** orig_args
Itemtmp_orig_args [2]
table_map used_tables_cache
bool forced_const

Friends

class Aggregator_distinct
class Aggregator_simple

Detailed Description

Class Item_sum is the base class used for special expressions that SQL calls 'set functions'. These expressions are formed with the help of aggregate functions such as SUM, MAX, GROUP_CONCAT etc.

GENERAL NOTES

A set function cannot be used in certain positions where expressions are accepted. There are some quite explicable restrictions for the usage of set functions.

In the query: SELECT AVG(b) FROM t1 WHERE SUM(b) > 20 GROUP by a the usage of the set function AVG(b) is legal, while the usage of SUM(b) is illegal. A WHERE condition must contain expressions that can be evaluated for each row of the table. Yet the expression SUM(b) can be evaluated only for each group of rows with the same value of column a. In the query: SELECT AVG(b) FROM t1 WHERE c > 30 GROUP BY a HAVING SUM(b) > 20 both set function expressions AVG(b) and SUM(b) are legal.

We can say that in a query without nested selects an occurrence of a set function in an expression of the SELECT list or/and in the HAVING clause is legal, while in the WHERE clause it's illegal.

The general rule to detect whether a set function is legal in a query with nested subqueries is much more complicated.

Consider the the following query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL (SELECT t2.c FROM t2 WHERE SUM(t1.b) < t2.c). The set function SUM(b) is used here in the WHERE clause of the subquery. Nevertheless it is legal since it is under the HAVING clause of the query to which this function relates. The expression SUM(t1.b) is evaluated for each group defined in the main query, not for groups of the subquery.

The problem of finding the query where to aggregate a particular set function is not so simple as it seems to be.

In the query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.c FROM t2 GROUP BY t2.c HAVING SUM(t1.a) < t2.c) the set function can be evaluated for both outer and inner selects. If we evaluate SUM(t1.a) for the outer query then we get the value of t1.a multiplied by the cardinality of a group in table t1. In this case in each correlated subquery SUM(t1.a) is used as a constant. But we also can evaluate SUM(t1.a) for the inner query. In this case t1.a will be a constant for each correlated subquery and summation is performed for each group of table t2. (Here it makes sense to remind that the query SELECT c FROM t GROUP BY a HAVING SUM(1) < a is quite legal in our SQL).

So depending on what query we assign the set function to we can get different result sets.

The general rule to detect the query where a set function is to be evaluated can be formulated as follows. Consider a set function S(E) where E is an expression with occurrences of column references C1, ..., CN. Resolve these column references against subqueries that contain the set function S(E). Let Q be the innermost subquery of those subqueries. (It should be noted here that S(E) in no way can be evaluated in the subquery embedding the subquery Q, otherwise S(E) would refer to at least one unbound column reference) If S(E) is used in a construct of Q where set functions are allowed then we evaluate S(E) in Q. Otherwise we look for a innermost subquery containing S(E) of those where usage of S(E) is allowed.

Let's demonstrate how this rule is applied to the following queries.

1. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 GROUP BY t2.b HAVING t2.b > ALL(SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) For this query the set function SUM(t1.a+t2.b) depends on t1.a and t2.b with t1.a defined in the outermost query, and t2.b defined for its subquery. The set function is in the HAVING clause of the subquery and can be evaluated in this subquery.

2. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) Here the set function SUM(t1.a+t2.b)is in the WHERE clause of the second subquery - the most upper subquery where t1.a and t2.b are defined. If we evaluate the function in this subquery we violate the context rules. So we evaluate the function in the third subquery (over table t3) where it is used under the HAVING clause.

3. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 WHERE SUM(t1.a+t2.b) < t3.c)) In this query evaluation of SUM(t1.a+t2.b) is not legal neither in the second nor in the third subqueries. So this query is invalid.

Mostly set functions cannot be nested. In the query SELECT t1.a from t1 GROUP BY t1.a HAVING AVG(SUM(t1.b)) > 20 the expression SUM(b) is not acceptable, though it is under a HAVING clause. Yet it is acceptable in the query: SELECT t.1 FROM t1 GROUP BY t1.a HAVING SUM(t1.b) > 20.

An argument of a set function does not have to be a reference to a table column as we saw it in examples above. This can be a more complex expression SELECT t1.a FROM t1 GROUP BY t1.a HAVING SUM(t1.b+1) > 20. The expression SUM(t1.b+1) has a very clear semantics in this context: we sum up the values of t1.b+1 where t1.b varies for all values within a group of rows that contain the same t1.a value.

A set function for an outer query yields a constant within a subquery. So the semantics of the query SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+SUM(t1.b)) > 20) is still clear. For a group of the rows with the same t1.a values we calculate the value of SUM(t1.b). This value 's' is substituted in the the subquery: SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+s) than returns some result set.

By the same reason the following query with a subquery SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(SUM(t1.b)) > 20) is also acceptable.

IMPLEMENTATION NOTES

Three methods were added to the class to check the constraints specified in the previous section. These methods utilize several new members.

The field 'nest_level' contains the number of the level for the subquery containing the set function. The main SELECT is of level 0, its subqueries are of levels 1, the subqueries of the latter are of level 2 and so on.

The field 'aggr_level' is to contain the nest level of the subquery where the set function is aggregated.

The field 'max_arg_level' is for the maximun of the nest levels of the unbound column references occurred in the set function. A column reference is unbound within a set function if it is not bound by any subquery used as a subexpression in this function. A column reference is bound by a subquery if it is a reference to the column by which the aggregation of some set function that is used in the subquery is calculated. For the set function used in the query SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 GROUP BY t2.b HAVING t2.b > ALL(SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) the value of max_arg_level is equal to 1 since t1.a is bound in the main query, and t2.b is bound by the first subquery whose nest level is 1. Obviously a set function cannot be aggregated in the subquery whose nest level is less than max_arg_level. (Yet it can be aggregated in the subqueries whose nest level is greater than max_arg_level.) In the query SELECT t.a FROM t1 HAVING AVG(t1.a+(SELECT MIN(t2.c) FROM t2)) the value of the max_arg_level for the AVG set function is 0 since the reference t2.c is bound in the subquery.

The field 'max_sum_func_level' is to contain the maximum of the nest levels of the set functions that are used as subexpressions of the arguments of the given set function, but not aggregated in any subquery within this set function. A nested set function s1 can be used within set function s0 only if s1.max_sum_func_level < s0.max_sum_func_level. Set function s1 is considered as nested for set function s0 if s1 is not calculated in any subquery within s0.

A set function that is used as a subexpression in an argument of another set function refers to the latter via the field 'in_sum_func'.

The condition imposed on the usage of set functions are checked when we traverse query subexpressions with the help of the recursive method fix_fields. When we apply this method to an object of the class Item_sum, first, on the descent, we call the method init_sum_func_check that initialize members used at checking. Then, on the ascent, we call the method check_sum_func that validates the set function usage and reports an error if it is illegal. The method register_sum_func serves to link the items for the set functions that are aggregated in the embedding (sub)queries. Circular chains of such functions are attached to the corresponding st_select_lex structures through the field inner_sum_func_list.

Exploiting the fact that the members mentioned above are used in one recursive function we could have allocated them on the thread stack. Yet we don't do it now.

We assume that the nesting level of subquries does not exceed 127. TODO: to catch queries where the limit is exceeded to make the code clean here.


Constructor & Destructor Documentation

Item_sum::Item_sum ( THD *  thd,
Item_sum item 
)

Constructor used in processing select with temporary tebles.


Member Function Documentation

bool Item_sum::aggregator_add ( ) [inline]

Called to add value to the aggregator.

void Item_sum::aggregator_clear ( ) [inline]

Called to cleanup the aggregator.

bool Item_sum::aggregator_setup ( THD *  thd) [inline]

Called to initialize the aggregator.

bool Item_sum::check_sum_func ( THD *  thd,
Item **  ref 
)

Check constraints imposed on a usage of a set function.

The method verifies whether context conditions imposed on a usage of any set function are met for this occurrence. It checks whether the set function occurs in the position where it can be aggregated and, when it happens to occur in argument of another set function, the method checks that these two functions are aggregated in different subqueries. If the context conditions are not met the method reports an error. If the set function is aggregated in some outer subquery the method adds it to the chain of items for such set functions that is attached to the the st_select_lex structure for this subquery.

A number of designated members of the object are used to check the conditions. They are specified in the comment before the Item_sum class declaration. Additionally a bitmap variable called allow_sum_func is employed. It is included into the thd->lex structure. The bitmap contains 1 at n-th position if the set function happens to occur under a construct of the n-th level subquery where usage of set functions are allowed (i.e either in the SELECT list or in the HAVING clause of the corresponding subquery) Consider the query:

       SELECT SUM(t1.b) FROM t1 GROUP BY t1.a
         HAVING t1.a IN (SELECT t2.c FROM t2 WHERE AVG(t1.b) > 20) AND
                t1.a > (SELECT MIN(t2.d) FROM t2);

allow_sum_func will contain:

  • for SUM(t1.b) - 1 at the first position
  • for AVG(t1.b) - 1 at the first position, 0 at the second position
  • for MIN(t2.d) - 1 at the first position, 1 at the second position.
Parameters:
thdreference to the thread context info
reflocation of the pointer to this item in the embedding expression
Note:
This function is to be called for any item created for a set function object when the traversal of trees built for expressions used in the query is performed at the phase of context analysis. This function is to be invoked at the ascent of this traversal.
Return values:
TRUEif an error is reported
FALSEotherwise
bool Item_sum::clean_up_after_removal ( uchar *  arg) [virtual]

Remove the item from the list of inner aggregation functions in the SELECT_LEX it was moved to by Item_sum::register_sum_func().

This is done to undo some of the effects of Item_sum::register_sum_func() so that the item may be removed from the query.

Note:
This doesn't completely undo Item_sum::register_sum_func(), as with_sum_func information is left untouched. This means that if this item is removed, aggr_sel and all Item_subselects between aggr_sel and this item may be left with with_sum_func set to true, even if there are no aggregation functions. To our knowledge, this has no impact on the query result.
See also:
Item_sum::register_sum_func()
remove_redundant_subquery_clauses()

Reimplemented from Item.

bool Item_sum::init_sum_func_check ( THD *  thd)

Prepare an aggregate function item for checking context conditions.

The function initializes the members of the Item_sum object created for a set function that are used to check validity of the set function occurrence. If the set function is not allowed in any subquery where it occurs an error is reported immediately.

Parameters:
thdreference to the thread context info
Note:
This function is to be called for any item created for a set function object when the traversal of trees built for expressions used in the query is performed at the phase of context analysis. This function is to be invoked at the descent of this traversal.
Return values:
TRUEif an error is reported
FALSEotherwise
virtual void Item_sum::no_rows_in_result ( ) [inline, virtual]

Mark an aggregate as having no rows.

This function is called by the execution engine to assign 'NO ROWS FOUND' value to an aggregate item, when the underlying result set has no rows. Such value, in a general case, may be different from the default value of the item after 'clear()': e.g. a numeric item may be initialized to 0 by clear() and to NULL by no_rows_in_result().

Reimplemented from Item.

Reimplemented in Item_func_group_concat, Item_sum_hybrid, Item_sum_variance, Item_sum_avg, Item_sum_count, and Item_sum_sum.

void Item_sum::print ( String str,
enum_query_type  query_type 
) [virtual]

This method is used for to:

  • to generate a view definition query (SELECT-statement);
  • to generate a SQL-query for EXPLAIN EXTENDED;
  • to generate a SQL-query to be shown in INFORMATION_SCHEMA;
  • debug.

For more information about view definition query, INFORMATION_SCHEMA query and why they should be generated from the Item-tree,

See also:
mysql_register_view().

Reimplemented from Item.

Reimplemented in Item_func_group_concat.

ulonglong Item_sum::ram_limitation ( THD *  thd) [static, protected]

Calculate the affordable RAM limit for structures like TREE or Unique used in Item_sum_*

bool Item_sum::register_sum_func ( THD *  thd,
Item **  ref 
)

Attach a set function to the subquery where it must be aggregated.

The function looks for an outer subquery where the set function must be aggregated. If it finds such a subquery then aggr_level is set to the nest level of this subquery and the item for the set function is added to the list of set functions used in nested subqueries inner_sum_func_list defined for each subquery. When the item is placed there the field 'ref_by' is set to ref.

Note:
Now we 'register' only set functions that are aggregated in outer subqueries. Actually it makes sense to link all set function for a subquery in one chain. It would simplify the process of 'splitting' for set functions.
Parameters:
thdreference to the thread context info
reflocation of the pointer to this item in the embedding expression
Return values:
FALSEif the executes without failures (currently always)
TRUEotherwise
bool Item_sum::reset_and_add ( ) [inline]

Resets the aggregate value to its default and aggregates the current value of its attribute(s).


Member Data Documentation

Aggregator* Item_sum::aggr [protected]

Aggregator class instance. Not set initially. Allocated only after it is determined if the incoming data are already distinct.

Intrusive list pointer for free list. If not null, points to the next Item on some Query_arena's free list. For instance, stored procedures have their own Query_arena's.

See also:
Query_arena::free_list

Reimplemented from Item.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines