Table of Contents

The optimize module provides a way to perform various optimizations on queries. It assumes that the query is correct.

**Warning**: It is not 100% granted that the optimized query will work. If you find a non working query please submit a bug.

This code will perform all the available general optimizations on the expression, and will return the optimized expression.

expression=optimizer.general_optimize(expression)

The expression tree allows to manipulate the expressions.

To obtain the expression just call this:

str(root)

To generate the tree, just call: (expression is a valid string representing a relational expression)

optimizer.tree(expression)

Each node will have a name, that corresponds to its string representation. Also a kind variable, that specifies if the node is a relation or an operator.

Constants for kind.

optimizer.RELATION=0 optimizer.UNARY=1 optimizer.BINARY=2

- name: string representation
- child: child expression
- props: parameters of the operation

- name: string representation
- left: left expression
- right: right expression

- name: string representation

This class of optimizations do not need to have informations on the relations.

The functions are implemented in the optimizations.py module.

They will work on the expression tree and will modify it. The return value is the number of changes performed on the tree.

Original | Optimized |
---|---|

σ _{k} ( σ _{k}(C)) | σ _{k} (C) |

σ _{k} ( σ _{j}(C)) | σ _{k ⋀ j} (C) |

The first one will work only if the expression is exactly the same. If the expression is equivalent but different (for example 3+2 is equivalent but different to 2+3) the 2nd kind of optimization will be used.

Yes the name is crazy.

Original | Optimized |
---|---|

σ _{k} (a ᑌ b) | σ _{k}(a) ᑌ σ _{k}(b) |

σ _{k} (a ᑎ b) | σ _{k}(a) ᑎ σ _{k}(b) |

σ _{k} (a - b) | σ _{k}(a) - σ _{k}(b) |

It is an optimization because the selection is a O(n) operation while unions, subtractions or intersections are O(n^{2}) operations. So pushing down the selection, we hope to reduce the size of the problem that the O(n^{2}) operation will have to solve.

Original | Optimized |
---|---|

π _{i} ( π _{j} (R)) | π _{i} (R) |

Doing two projection and in fact ignoring the result of the inner one isn't nice.

Original | Optimized |
---|---|

σ _{j} (π _{k} (R)) | π _{k} (σ _{j} (R)) |

Performing the selection first, gives us hope that the projection operation (more complex) will be performed on a smaller set.

Original | Optimized |
---|---|

σ _{k}(ρ _{j}(R)) | ρ _{j}(σ _{k}(R)) |

renaming the attributes used in the selection, so the operation is still valid.

This is not really an heavy optimization, select is *O(n)* and rename is *O(1)*, but it is an improvement and it might make other optimizations possible as well, in the next step.

Original | Optimized |
---|---|

ρ _{k➡k}(R) | completely removes them. If some renames in the list are valid and some are not, the valid ones will be kept. |

This optimization is performed **before** performing *subsequent renames optimization*.

Original | Optimized |
---|---|

ρ _{k}(R)(ρ _{j}(R)) | ρ _{j,k}(R) |

hence using a single rename operation.

If j,k will contain things like *a➡b,b➡c*, they will be replaced with *a➡c*.

If j.k will contain things like *a➡b,b➡a*, they will be removed. If all the transformations are removed, the rename itself is removed.

*A ᑌ A=A ᑎ A=A*.

So this optimization tries to locate unions and intersections that share the same left and right operand, and replaces them with the operand itself.

Also *A - A=∅*.

So it locates subtractions that share the same left and right operand and replaces them with *σ _{False}(A)*. This is not as fast as replacing with an empty relation, but there is no such operator in relational algebra. Anyway Selection is O(n) and subtraction is O(n²), so we're saving time anyway.

This function locates things like:

Original | Optimized |
---|---|

R ᑌ R | R |

R ᑎ R | R |

R - R | σ _{False} (R) |

σ _{k} (R) ᑌ R | R |

σ _{k} (R) ᑎ R | σ _{k} (R) |

σ _{k} (R) - R | σ _{False} (R) |

R - σ _{k} (R) | σ _{¬k} (R) |

*R* doesn't have to be a relation. It can be a subtree too.

Original | Optimized |
---|---|

ρ _{a➡b}(R) ᑌ ρ _{a➡b}(Q) | ρ _{a➡b}(R ᑌ Q) |

ρ _{a➡b}(R) ᑎ ρ _{a➡b}(Q) | ρ _{a➡b}(R ᑎ Q) |

ρ _{a➡b}(R) - ρ _{a➡b}(Q) | ρ _{a➡b}(R - Q) |

This will save the space taken by an extra relation needed to perform the 2nd rename.

Original | Optimized |
---|---|

π _{k}(ρ _{j}(R)) | ρ _{j}(π _{k}(R)) |

This will let rename work on a hopefully smaller set and more important, will hopefully allow further optimizations.

Will also eliminate fields in the rename that are cutted in the projection. So the memory needed to perform the task will be reduced.

Original | Optimized |
---|---|

σ _{i}(a) ᑌ σ_{q}(a) | σ _{i ∨ q}(a) |

This will allow the removal of an O(n²) operation like the union.

Both select must work on the same expression, the selects will be united into one, according to the following table:

original query | resulting query |
---|---|

σ _{i}(a) ᑌ σ_{q}(a) | σ _{i ∨ q}(a) |

σ _{i}(a) ᑎ σ_{q}(a) | σ _{i ∧ q}(a) |

σ _{i}(a) - σ_{q}(a) | σ _{i ∧ ¬q}(a) |

This class of optimizations requires to have knowledge of the specific relations used (meaning that it will need to have access to real instances of the relations to work).

Original | Optimized |
---|---|

σ _{k} (R*Q) | σ _{l} (σ _{j} (R) * σ _{i} (Q)) |

Where j contains only attributes belonging to R, i contains attributes belonging to Q and l contains attributes belonging to both.

Except where otherwise noted, content on this wiki is licensed under the following license: GNU Free Documentation License 1.2