Relational Databases Quick Start
This tutorial explains how to use the Entity framework to construct and execute queries for relational databases.
It is intended as an example-based quick-start guide, rather than a complete reference. It covers most frequent use cases and illustrates those with concrete examples.
After going through this tutorial, one can expect to have gained the core practical understanding of how to construct Entity framework queries and execute them. Some more advanced material, subtleties and corner cases have been intentionally left out, to keep the tutorial of a reasonably small size and introductory level. However, most core query-building machinery is covered and illustrated with examples, which should be sufficient to get started.
Relational databases are an important and widely used means for information storage, processing and lookup. The associated technology has become very mature over the last decades, and modern relational database backends have exceptional capabilities for storing and processing structured data, from small to very large data volumes. It is important, therefore, to be able to seamlessly integrate relational databases into one's workflows, for doing work in a number of areas, from data science to web applications.
One important aspect of this is the ability to process even extremely large datasets on the database side, "out of core". Most common relational database backends have been heavily optimized and apply cutting-edge technology to be able to process such large datasets very efficiently.
In the context of the Wolfram Language, the ability to offload the computations on large datasets to a database engine is particularly important, since many such computations do not require the full computational power of the Wolfram Language, while the size of the resulting dataset can be considerably reduced, so that it can be further processed with ease within the Wolfram Language.
Direct integration of relational databases into one's work may get complicated for a number of reasons. First, many domain experts do not have the knowledge of SQL that is required for direct use of relational databases. Second, there are (often subtle) differences between SQL dialects and supported feature sets for different database backends in common use. Third, one has to build data pipelines manually, connecting the database to one's programming environment. For these reasons, tight integration of relational databases directly into the language is very beneficial. It allows the users to focus on solving their problems rather than on providing the technical infrastructure.
The very high-level symbolic nature of the Wolfram Language sets stiff requirements for the framework that integrates relational databases into the language. It has to be high level, symbolic and idiomatic to use, yet exposing enough structure to not limit the query building and other capabilities too much with respect to raw SQL. In particular, idiomatic Wolfram Language programming gravitates toward functional programming style, function composition and immutability—the features that the connectivity framework is therefore required to embrace and support.
As it turns out, the Entity framework does largely satisfy the requirements outlined. While there are no objects in the OOP sense in the Wolfram Language, the Entity framework for the Wolfram Language is, in the context of relational databases, in many ways similar to what object-relational mappers (ORMs) are for other languages. Somewhat oversimplifying, entities map to single rows of database tables, entity classes map to database tables (existing or virtual), and entity properties map to database columns.
As a result, for the vast number of cases when one needs to query a database, it is possible to stay within the Wolfram Language and communicate with the database via high-level Entity framework queries, constructed using Entity framework functional query language. The framework is then responsible for translating the query into an appropriate SQL dialect, executing it on the database side and returning the results back to the user in the appropriate format.
For a long time, the main tool to interact with relational databases in the Wolfram Language has been DatabaseLink. It is built on top of J/Link and JDBC and provides a full-featured toolbox to interact with relational databases.
It is important to understand the differences between the DatabaseLink toolbox and the technology discussed in this tutorial. The most important differences are in the the level of abstraction offered by either one of these tools, and the supported features for interacting with databases.
The toolbox provided by DatabaseLink is rather low level: while it has some features that allow the database queries to be constructed symbolically in simple cases, the majority of real-world queries would have to be written as SQL strings when used with DatabaseLink. This has a number of implications for the user, from the need for familiarity with SQL (and a particular SQL dialect that corresponds to a specific database backend used) to the inability to use the strong points of the Wolfram Language in terms of its symbolic capabilities and high-level abstractions. All this leads to an incomplete, low-level integration of the databases into the Wolfram Language.
The technology based on the Entity framework has been designed from the ground up to provide a high-level seamless integration of relational databases into the Wolfram Language and contains features (such as symbolic query language, relations, automatic SQL generation and backend specialization, built-in type-checking, etc.) that enable considerably more powerful and high-level workflows involving relational databases in the Wolfram Language.
On the other hand, DatabaseLink supports a number of core features that are required for common workflows involving databases, such as write operations (SQL INSERT / UPDATE / DELETE), transaction support, control over timeouts, connection pools, etc. It also provides low-level access to database result sets for efficiency purposes. All of these the Entity framework-based technology is currently lacking, so one can say that DatabaseLink is at present more feature-complete.
There is currently no deep interoperability between the two technologies, other than the fact that both are integrated into the Wolfram Language. The level of interoperability may become better in future versions of the Wolfram Language.
The first section explains the standard workflow to prepare for working with relational databases: establishing a connection, creating and registering a database-backed EntityStore object.
The second section reviews and illustrates key features of the Entity framework in the context of relational databases.
The third section goes through core query-building blocks that the Entity framework provides to build more complex queries.
The fourth section contains a brief coverage of some additional useful constructs and tools, which may be considered somewhat more advanced, but are extremely useful in practice.
The fifth section illustrates what kind of SQL can be typically generated for various query-building primitives, supported by the Entity framework.
The sixth section describes how one can programmatically generate queries and illustrates with examples why this may be beneficial.
The seventh section describes a few practical techniques of query construction.
The eighth section contains a brief coverage of error handling and typical errors one may encounter when constructing and running queries.
The final section contains several examples of more complex queries, illustrating how one can use different query building blocks together to construct more sophisticated queries.
Examples in this tutorial are based on sample public domain database Classic Models (in particular its SQLite version), which is a retailer of scale models of classic cars database.
It has the following entity-relational diagram (note that the data types on the diagram correspond to a MySQL version of the database):
- offices table has a 1 to N relationship with employees table: there are generally many employees working for the same office. The relationship is realized via the foreign key officeCode of the employees table corresponding to the primary key officeCode of the offices table.
- employees table has a 1 to N relationship with itself: in general, each employee may have 0, 1 or more other employees reporting to them. The relationship is realized via the foreign key reportsTo, linked to the primary key employeeNumber of the same employees table.
- employees table has 1 to N relationship with customers table: each employee may deal with 0, 1 or more customers, while each customer is always dealt with by at most one employee. The relationship is realized via the foreign key salesRepEmployeeNumber of the customers table, which corresponds to the primary key employeeNumber of the employees table.
- customers table has a 1 to N relationship with payments table: each customer may have made 0, 1 or more payments. The relationship is realized via the foreign key customerNumber of the payments table, which corresponds to the primary key customerNumber of the customers table.
- customers table has a 1 to N relationship with orders table: each customer may have placed 0, 1 or more orders. The relationship is realized via the foreign key customerNumber of the orders table, which corresponds to the primary key customerNumber of the customers table.
- orders table has a 1 to N relationship with orderdetails table: each order consists of 1 or more items, with each item of distinct product type having a single record in orderdetails table. The relationship is realized via the foreign key orderNumber of orderdetails table, which corresponds to the primary key orderNumber of the orders table.
- orderdetails table has an N to 1 relationship with products table: each product may have been ordered 0, 1 or more times. The relationship is realized via foreign key productCode of orderdetails table, which corresponds to primary key productCode of products table.
- products table has an N to 1 relationship with productlines table: each product line may have 1 or more products belonging to it. The relationship is realized via foreign key productLine of products table, which corresponds to primary key productLine of productlines table.
- Establish a connection to the database, using DatabaseReference.
- Create a RelationalDatabase object, using that connection (this operation inspects the database and retrieves database metadata about the database schema etc.).
- Register that EntityStore object with the Entity framework.
This uses the connection to construct RelationalDatabase object:
This creates a database-backed EntityStore:
This registers the EntityStore:
First, unregister the EntityStore:
The specific syntax for constructing the database connection may differ for different database backends.
As described in the previous section, one of the steps required to create the database-backed EntityStore consists of creating the RelationalDatabase object. However, this object is also useful in its own right. It contains information about database schema (tables, columns, constraints, etc.) and can be used to both visually inspect and programmatically extract parts of this information that one is interested in.
This creates a RelationalDatabase object:
The formatting for the RelationalDatabase object makes it easy to visually inspect the database structure, using hierarchical group openers, which can be expanded to inspect more details about a given table or column:
The following lists all table names for a given RelationalDatabase object:
The following lists all properties for a given table column that RelationalDatabase object is aware of:
For more details about the various properties and methods, consult the reference page for RelationalDatabase object.
This section covers important core operations supported by the Entity framework, in the way they are used in the context of relational databases. It is not primarily focused on query construction, which has a separate section dedicated to it. Rather, it lays the groundwork and discusses other important aspects, many of which are prerequisites for effective work with relational databases through the Entity framework.
To use the Entity framework effectively in one's work with relational databases, it is important to have at least a basic understanding of how the concepts and constructs of the Entity framework correspond to the core concepts and constructs of relational databases. This topic is discussed first.
Query execution and the so-called "resolvers" (functions that are used to actually transform symbolically represented queries into specific results by initiating the process of query compilation and execution and returning the results) are discussed next.
Next, computed properties and EntityFunction are covered. These are very important building blocks that allow one to define and compute new properties on the fly, where such new properties may require complex computations performed on the database side.
In a number of cases, being able to work with single entities is important. This can be useful for various purposes, from better visualization and understanding of the results to query construction (for debugging and prototyping purposes, it is very valuable to be able to execute parts of the query on a single entity). A brief account of this topic is given next.
For relational databases, the notion of a primary key for a database table is a central one. The Entity framework counterpart for this is the CanonicalName property for single entities. The other part that, together with the canonical name, defines a single entity, is a type of that entity. These topics are considered important enough to merit a separate section.
While most modern workflows involving relational databases deal with database tables that do have a primary key, cases when a table either does not have one, or when it does but that is not enforced on the database schema level, are not unheard of. These are even more practically important in the context of the Entity framework, since for such tables/entity types, only a part of the Entity framework's functionality will work. This issue is discussed next.
While this tutorial does not focus on types and the type system implicitly involved in the interaction with relational databases, there is one type distinction that is very important to understand for effective work. Namely, entity types may contain properties that are entity valued and entity-class valued. In this tutorial, such properties are called relations and are added by the framework to the set of core properties (properties directly corresponding to database table columns) for each entity type.
The last topic covered in this section concerns missing values and ways in which one may encounter those in results, in the context of relational databases.
Since the Entity framework is used in the Wolfram Language—in particular, to represent relational databases—there clearly has to be a mapping for the core concepts and constructs between the Entity framework and relational databases/SQL.
For many features of the Entity framework, such a mapping is rather direct. In some ways, however, the Entity framework represents a richer data model. For example, for the in-memory use of the Entity framework, entity properties can have arbitrary Wolfram Language expressions as their values—which, if used directly, would not comply even with the first normal form for relational databases (e.g. when entity properties are List valued). As another example, the Entity framework is capable of representing types with an infinite number of entities, something that relational databases cannot easily do. Also, the set of easily possible computations is more limited for relational databases w.r.t. the Wolfram Language, the latter naturally having much richer computational functionality available "out of the box".
Restrictions of the relational model have their advantages as well, such as the ability to provide strong guarantees for data consistency, transactions and other related useful features of ACID-compliant databases. These restrictions also mean similar restrictions imposed on the Entity framework for database-backed entity stores.
That said, the mapping between the core relational and Entity framework constructs is relatively straightforward. It is summarized by the following table.
A collection of entity types (database tables)
A (handle to a) collection of entities with a similar set of properties (table rows)
Database table row
Entity (single entity)
A (handle to a) set of properties/values, representing a single "thing" (table row) with a unique identity
Database table field (column name)
Specific property, usually having a specific type
A property or set of properties, guaranteed to be unique (when taken together for a set of properties case) for a given entity type (db table)
Entity property whose value is a single entity (can be of the same or different entity type). For databases, a field that points to a unique row in a table (can be the same or different table).
A (handle to a) collecttion of entities of the same type (either registered or defined by the query). On the database side, the derived table is a virtual table/subquery, used in a FROM clause.
As explained previously, this mapping has a certain asymmetry. It makes it straightforward to represent existing relational databases as entity stores. However, not all existing in-memory entity stores can be easily mapped to relational databases without creating extra abstraction layers. In particular, the following are some of the restrictions that exist:
- The values of the entity properties can only be of a specific limited set of types supported by a given database backend, for entities of a given type to be easily mapped to a relational database table rows.
- Some entity properties have qualifiers. To support such properties on the database side, extra structures would be required (e.g. qualifier values might need a separate database table, etc.).
While these considerations are more important when one needs to back existing in-memory entity stores by a relational database (the functionality not currently supported by the Entity framework), it is still useful to keep these in mind for a better understanding of the overall picture.
There are a few restrictions that work in the other direction too. For example, it is technically possible to have database tables without a primary key and work with those on the database side. However, on the Entity framework side, it is imperative that each single entity have a CanonicalName that is unique within a given entity class. This means that single entities cannot be defined for such database tables, and therefore all single-entity-related functionality of the Entity framework cannot work for such database tables/types. In particular, functions like EntityList and EntityValue with certain modifiers (like e.g. "EntityPropertyAssociation") will not work in such cases. This issue is further discussed in this section.
There are just two "resolvers" (commands that can be used to execute a query) in the Entity framework: EntityValue and EntityList.
The main and most commonly used one is EntityValue. It is typically used to execute a given query and get the results in a variety of forms. In the context of relational databases, queries typically represent (virtual) database tables, which in the Entity framework correspond to (virtual) entity types. The role of EntityValue in this context is to compile a given query to the suitable SQL dialect, execute it on the database, and extract values for a certain set of entity properties (resulting table columns in database terms) in various forms (lists or associations of values).
Following are a few simple examples of use of EntityValue.
The result can be obtained in a different form, for example, as a list of associations, with property names retained:
For extracted properties, one can use the full EntityProperty form in place of short string names, in which case the resulting keys in the data association will also be EntityProperty expressions:
One can also obtain the results in the form of Dataset:
The full set of possible modifiers for the third argument of EntityValue can be found in the documentation for EntityValue.
In some cases, one may want to obtain a list of entities contained in a given entity type/class. This can be accomplished with EntityList.
Note that, with an exception of certain special cases to be discussed in subsequent sections, the ordering of entities in the result is not guaranteed to be the same for repeated calls to EntityList or EntityValue.
In addition to extracting existing properties, one can also extract computed properties—that is, properties created on the fly, which may require complex computations on the database side. Such properties must be expressed using EntityFunction. The semantics of EntityFunction in this context is similar to Function, with a few important differences to be covered in subsequent sections. One can think of EntityFunction as a construct similar to EntityProperty that (in the context of this section) takes a single entity and returns the result of some computation performed with that entity. It is important that at present EntityFunction can only return a scalar, but cannot return e.g. a list of values, an entity or an entity class.
Following is an example of the use of EntityFunction to retrieve a property that requires computation on the database side.
The following query extracts the property "officeCode" and computed property, representing a full string office address, for every office in the "offices" type:
EntityFunction has the HoldAll attribute, just like Function, to prevent premature evaluation of its variables and body. The body of EntityFunction is an expression, where one can use a limited set of primitives that EntityFunction can understand and compile to SQL. The full list of such primitives can be found in EntityFunction documentation, while many more examples of the practical use of EntityFunction can be found in subsequent sections.
Single entities are an important concept and building block in the Entity framework. The ability to work with single entities adds interactivity to one's workflow and often allows one to get a better understanding of the data, even in cases when the end result should be a query that works on many entities (entity class).
It is important to understand that single entities are essentially handles (or references) to the actual data. They do not contain data, except for their type and canonical name. As such, they are lazy: every time one needs to extract data from an entity, one has to run a new query against it. This lazy nature of entities is a very useful feature of the Entity framework, since it allows one to work with entities in a more abstract fashion. However, it comes with a few caveats that one should be aware of. For example, should some properties of a particular entity change in between two successive executions of the same query, the returned results will generally be different too, reflecting those changes. It also may happen that an entity that was there in a given entity class at some point has been deleted and no longer exists. Of course, these complications can only occur for the data that is changing in time.
Property values can be extracted using EntityValue:
It is important to realize that one can extract computed properties for a single entity, using EntityFunction. This can be very useful for both quick lookups and prototyping more complex queries.
For example, this computes the formatted string out of the "city" and "state" properties for a given office, on the database side:
The last example also illustrates the property-like nature of EntityFunction.
In the Entity framework, each entity must be unique within its entity type/entity class. The other way to say it is that each entity must have a unique identity, and no entity class/entity type is allowed to contain duplicate entities.
Syntactically, an entity is represented by an expression containing two parts: entity type and entity unique identifier, that is called canonical name.
Looking at the InputForm of entities:
One can extract the canonical name of an entity also using CanonicalName.
The CanonicalName function also works on a list of entities:
Entities of type "orderdetails" have pairs of integer order number and string product code as their canonical names:
The entity type of an entity is always the first argument of Entity, but not always a string. In particular, in cases when the type is not the registered type, but a "run-time" type, implicitly defined by an Entity framework query, it will be that query itself.
For entity types that correspond to the database tables in the database, the canonical name for an entity is exactly the primary key for the corresponding database table.
In the context of relational databases, the canonical name for an entity can be a list in one of the following cases (or their combination):
- The entity belongs to the registered type, which corresponds to a database table with a composite primary key.
- The entity belongs to the "runtime" type, represented by CombinedEntityClass[...].
- The entity belongs to the "runtime" type, represented by AggregatedEntityClass[...], where entity groups for aggregation are formed based on more than one property.
The previous example belongs to the last category of cases, where the grouping properties are "city" and "country", while the earlier example with entities of "orderdetails" type belongs to the first category, with the primary key of "orderdetails" table being a composite one, made out of properties "orderNumber" and "productCode".
The following defines an entity class that combines types "employees" and "offices". The entities of this class have canonical names that are lists of pairs of employee number and office code:
The case when certain database tables do not have the primary key constraints in the database schema might not be typical but is important in practice. Note that not in all such cases is a column with unique values actually absent from the table. However, at present there is no way to indicate such a column for the Entity framework to use as a primary key for such tables: all the information about the primary keys is currently obtained at database inspection time (when the RelationalDatabase object is constructed) and entirely from the existing database constraints.
What this means for the Entity-framework-based workflows is that for such tables, single entities can not be defined, since there is no clear way to attach a canonical name to a database row in such a table. This does not mean that one cannot do anything useful with such tables, but certain parts of the Entity framework's functionality will not work with those.
To illustrate the limitations existing for such tables in terms of Entity framework functionality, the Titanic database will be used, which is an SQLite database hosted in the Wolfram Cloud and based on the following Titanic dataset:
As can be seen from the structure of this dataset, there is no field or combination of fields there that would naturally work as a primary key. There is even no guarantee that there are no duplicate records in the dataset, since there are not even passenger names present in the data, but just the passenger's travel class, age, sex and whether or not they survived. Still, this dataset is a useful source of information.
A sample workflow illustrates some of the Entity framework operations that are and are not possible for such datasets.
The first step is to create a RelationalDatabase object:
There should be no problems with this step; the database and a single table it has are correctly identified.
The next step is to create and register the EntityStore object based on this database:
As can be seen, a warning is issued by EntityStore that single entities cannot work for this type/database table. In particular, EntityList and some forms of EntityValue will not work.
EntityValue for certain modifiers does not work either:
It is still possible to extract values for specific properties, as long as the form of the resulting data does not involve single entities:
The following query introduces a coarse-grained age class for passengers, where passengers from 0 to 20 years old are in the first group, from 20 to 40 years old in the second, etc. It then computes the fraction of surviving passengers for each passenger class, age group and sex, and sorts the results by that fraction in descending order:
- Database tables with no primary key constraint in the database schema correspond to entity types with no CanonicalName defined. This means that one cannot define single entities for such types.
- It is possible to register such types in the EntityStore object; however, not all Entity framework functionality will work for them.
- While single entities cannot work for such types (and neither can functions such as EntityList and EntityValue for some modifiers), many forms of EntityValue still work, as well as all queries that do not involve single entities of that type.
Properties can be entity valued or entity-class valued. Such properties do not correspond to any of the database table columns of the corresponding database table, but have been added by the framework and represent related tables.
One can expand this entity class into a list of entities by using EntityList:
This picks the first employee for a given office and extracts entity properties and values for that employee:
In addition to the "officeCode" property, which in database terms is the foreign key to the "offices" table, there is a generated property "offices" whose value is the entity representing the office this employee works for:
Note that such entity- and entity-class-valued properties have no direct database column origin, but have been generated by the framework based on the database schema (relations/foreign keys information).
One important observation to be made at this point is that entity-class-valued properties are not automatically resolved to lists of entities, so that EntityList has to be applied explicitly to their values if one wants to get the list of entities as a result.
Generated properties are further discussed in the section on relations.
For some entities, certain properties may not have values in the database table (values may be NULL). On the Wolfram Language side, this corresponds to Missing values.
This office is in Paris, France, and thus has no value for the "state" field. Missing values like this correspond to NULL values on the database side and are represented by Missing["NotAvailable",...] in the result:
Another type of input that results in Missing values is request values for invalid property names. In this case, however, the value will have a different reason for being missing: Missing["UnknownProperty",...].
This section describes the core primitives that the Entity framework provides to build more complex queries.
First, property-like queries are covered. They are explained and illustrated with a number of examples concerning how one can significantly expand the set of available entity properties with computed properties, and how EntityFunction is used to express those computed properties.
The rest of this section is dedicated to various primitives that take entity classes (and other parameters) and return new (transformed) entity classes. The Entity framework provides such primitives for the following typical data-processing operations:
- Extending entities in an entity class with new properties, computed on the fly (SQL counterpart: SELECT list)
- Aggregating over entity class or groups of entities within it (SQL counterpart: aggregation/GROUP BY clause)
- Combining two entity classes together to form a new entity class, with entities having properties from both original entity classes (SQL counterpart: JOIN clause)
These entity-class-transforming primitives can be combined and nested one within another, making it possible to construct more complex queries.
Property-like queries are Wolfram Language expressions used to build parts of Entity framework queries, which get compiled into SQL expressions in the process of query compilation and execution. The main construct used to construct property-like queries is EntityFunction.
Property-like queries are useful in a number of places, such as defining computed properties, predicates for filtering data, sorting criteria, aggregation and more. In this tutorial, some typical examples of them are considered.
One can use standard arithmetic operations: Times, Plus, Subtract, Divide, Mod, etc. on numeric quantities.
The following computes the total amount paid for each ordered item, taking into account ordered items' quantities:
The following extracts the value of the "state" property for each entity in the "offices" type and also computes the Boolean expression, which checks whether this property value starts with the letter "C":
To test for missing values, the MissingQ predicate can be used.
The following example shows how one can use the MissingQ predicate to check for missing values—in this case, for the property "state" for the type "offices":
Standard comparison operators are also supported. Note that Equal and SameQ can be used interchangeably, as well as Unequal and UnsameQ.
The following example shows the use of comparison operators Equal, SameQ, Unequal, UnsameQ in the context of string comparison:
One can construct fairly complex expressions of numeric and other types , which will be translated into SQL and executed on the database side.
The following query computes the ratio of the difference between MSRP and buy price, to the MSRP price, which is the percentage of income the store gets from selling the item, if it sells it under MSRP price:
Boolean expressions are particularly important, since Boolean-valued EntityFunction expressions are frequently used as filtering predicates, conditions for CombinedEntityClass, etc.
The following query computes a Boolean property that yields True for items ordered in quantity >30 and with price per item >= $200, for each entity of the "orderdetails" type, where Wolfram Language-side post-processing selects only those products for which it yields True:
The following query computes a property that is only True for order numbers divisible by 100:
The following query computes a Boolean property that yields True for all offices for which either their city contains letter "a", or their state exists (in not Missing) and their city contains letter "o":
The following query computes a Boolean property that yields True for all orders that are required to be shipped within eight days from the date of placing the order:
In the following example, a Boolean-valued query expression is used as a filtering criteria inside FilteredEntityClass, to find all orders where an item was ordered with quantity >30:
The operations described in the sections that follow, namely FilteredEntityClass, SortedEntityClass, SampledEntityClass, ExtendedEntityClass, AggregatedEntityClass and CombinedEntityClass, can all be called entity class transformers. They all take an entity class (or two in the case of CombinedEntityClass) and return a new entity class.
It is important to realize that these constructs are completely symbolic and inert. No actual work is performed when one of these is applied to an entity class argument. The resulting query stays symbolic, and one needs to call one of the resolver functions (EntityValue or EntityList) to actually perform the query.
Following is an example of a query containing several of these transformers, nested within one another.
This selects the five top-paying customers based on their past payments and lists them in the order of decreasing total amount paid:
When evaluated, the symbolic Entity framework query evaluates to itself, staying as an inert symbolic expression. One can still do a few useful things with it though, such as, for example, finding all entity properties for the entity class that this query represents (this does not require a call to the database).
One uses EntityValue to actually execute the query.
One can execute this query by using EntityValue:
The symbolic and inert nature of Entity framework queries opens the way for programmatic query construction, since queries on their own are inert Wolfram Language expressions and can be constructed from smaller building blocks either manually or programmatically.
One of the most commonly needed operations is to filter database data according to certain criteria. In Entity framework, FilteredEntityClass is used for that purpose, with the filtering criteria expressed using EntityFunction and passed to FilteredEntityClass as a second argument.
The following examples illustrate typical uses of FilteredEntityClass.
The following lists first name, last name and job title for all employees with a job title other than "Sales Rep":
Sorting entities (rows in the database table) according to some criteria is another commonly used operation. On the database side, this is achieved by using ORDER BY clause of SQL. On the Entity framework side, SortedEntityClass can be used to achieve that.
In the simplest case, one needs to sort with respect to the values of a single existing entity property (database column), in an ascending order. In this case, the string field name is passed to SortedEntityClass as a second argument.
In case descending order of results is required, one can use the fieldName -> "Descending" syntax, as shown following.
The following query lists employee number, first name and last name for all employees, sorted by the length of the first name, in an ascending order:
One can sort by more than one property. In this case, values of the second property are used as tie-breakers for groups of items with identical value of the first property, and the same logic extends to the third etc. sorting property. One can attach the "Ascending" or "Descending" qualifiers for each sorting property separately, to control the sorting order within the (sub)groups of entities.
The following query sorts products by product line and then by quantity in stock in descending order:
One can limit the number of results returned by SortedEntityClass, by utilizing its third argument.
Here is one of the previous queries, listing information about employees sorted by the length of their first name, where the result is now limited to the first seven records:
In some cases, it is useful to pick only a certain number of entities from an entity class (or, in database terms, pick only a certain subset of rows from a table or virtual table). The way to do this in the Entity framework is to use SampledEntityClass.
Note that the ordering of results can only be guaranteed if SampledEntityClass is applied to SortedEntityClass. However, even in other cases it can still be quite useful, for example, when one wants to peek into a large dataset or prototype a query using just a small sample first.
In this case, the order is guaranteed by the fact that the dataset has been sorted before subsetting. This same result can be obtained more economically with SortedEntityClass with three arguments:
It is possible to add new, computed properties to the set of available entity properties for a given type/entity class. The result is a new entity class. The newly added property can be used anywhere the original properties can—in EntityValue, further in the outer layers of the query, etc. The construct that creates such a new extended entity class is ExtendedEntityClass.
The following example illustrates a simple use case where one new property is added to an entity class.
The following uses relations (described in more detail here, the property "employees-reportsTo") to add the first and last name of the manager, for employees who have one:
The new property can use various constructs, including conditional logic etc. This capability allows nontrivial computations.
The following query adds a new, adjusted credit limit for the customers by incrementing the limit for those customers who have the current credit limit >=$100,000, by $15000:
Aggregation is a very commonly needed operation. It is used to compute values, aggregated over more than one entity (table row in the database). Typical examples of aggregation include computing totals, averages, or minimal and maximal values of some properties or more complex expressions, over a range of entities. Within the Entity framework, the construct that can be used to perform aggregation is AggregatedEntityClass.
As its name indicates, the result of its application is a new entity class. If the aggregation is performed over the entire original entity class, the resulting entity class will contain just a single entity, having as its properties the computed aggregated properties. One can also first group entities based on values of a certain set of their properties, and then perform aggregation over each such group—in which case the new entity class will contain as many such new aggregated entities as there are such groups.
At present, there is a limited and small set of core aggregation functions that are supported by the Entity framework for database-backed entity stores: Total, Length, Min, Max, Mean, Variance and StandardDeviation. One can use standard arithmetic functions and other property-like query building blocks, however, to combine these core operations and compute more complex expressions.
It is important to note that EntityFunction for the case of aggregation is bound to an entire entity class being aggregated rather than individual entities of that class.
Following is a simple example of use of AggregatedEntityClass to compute a single aggregate value.
To understand the semantics of EntityFunction in this case, it is useful to compute the same thing on the top level, using the Wolfram Language.
This code computes the same quantity using the Wolfram Language for aggregation, where Function is used to make the analogy most apparent:
One can use a simpler syntax, which uses the same name for the resulting aggregated property as the name for the property being aggregated. This works for a simple case when there is a single property being aggregated over in each case, but one can have more than one aggregation:
And a yet more concise form exists for the case when also there is a single aggregation (returning a scalar rather than a list, however):
An important remark on the last example is that for the current version of the Entity framework, this three-argument form of EntityValue, when used for aggregation in a top-level query, will perform the aggregation on the Wolfram Language side rather than the database side (this does not happen when this syntax is used in subqueries). In contrast, all aggregations performed with an explicit use of AggregatedEntityClass will always be performed on the database side.
The following query defines an aggregated entity class with three aggregated properties: maximal, minimal and average quantity of ordered items for all orders split into single-item orders (which is what the "orderdetails" table provides):
One can compute aggregations over certain groups of entities rather than the entire entity class. In this case, the third argument of AggregatedEntityClass has to be used, to indicate a property or a list of properties whose values will define unique groups of entities for aggregation.
The following query computes the total number of customers coming from the same city and country, sorting by the customer number in descending order:
In this case, aggregation is performed over groups of entities having the same combination of city and country.
The following query computes the total amount to be paid for all placed orders (semantically, in this case, both o["priceEach"] and o["quantityOrdered"] represent columns in the database. Their multiplication is allowed, since they belong to the same table (and thus have the same length) and must be understood as vectorized element-wise operations. The result of this multiplication is semantically another column, which is passed to Total):
The following query computes the average amount per ordered item in two different ways: by explicit computation and by using the built-in function (Mean):
In relational databases, data is typically stored in several tables, connected to each other via foreign key constraints. Only in rare cases does one work with just a single table for the real-world queries one typically wants to write, and considering that the data, being normalized, is typically split and stored in a (possibly large) number of related tables.
The most common way to use data from more than one table in an SQL query is to use SQL JOIN operations. The Entity framework counterpart of SQL JOIN is the CombinedEntityClass construct. It takes two entity classes/types, the specification that tells how they are going to be combined, and optionally the type of JOIN. The result is a new entity class, with new entities containing properties from both of the original entity classes.
The following examples illustrate typical use cases for CombinedEntityClass.
The following query combines entity classes "employees" and "offices", via the "officeCode" property, and then extracts properties "employeeNumber", "firstName", "lastName", "city" and "country", where the former three belong to the type "employees" and the latter two to the type "offices":
This can become more apparent if one looks at the full list of properties for the new class (it is more instructive to actually look at their InputForm):
The entities of the new class are of a new type, which can be seen by e.g. using EntityList:
The new entity classes resulting from application of CombinedEntityClass can be further used in more complex queries, just like any other entity classes.
The following query finds all employees who work in France or the UK and lists their employee numbers, first and last names and the country:
The following returns, for customers with a credit limit larger than $120,000, the payment information including customer number, name, amount paid and the date of payment (where the Wolfram Language post-processing step is required since dates are stored in Unix time on the database side):
Note that in the last example it was necessary to explicitly indicate EntityProperty["customers","customerNumber"] in the list of properties in EntityValue, since both types "customers" and "payments" have the property with this name, and one has to disambiguate exactly which property is requested.
The following displays customer name and the first and last name of their sales representative, for customers from France:
It is possible and often desirable to combine a type with itself. In the context of relational database operations, this case corresponds to a self-join. In this case, more care is needed to disambiguate property names—one has to introduce a (string) alias for at least one of the same types being combined. This case is illustrated with the following example.
The following query lists first and last names of employees, together with the names of their managers. The "manager" alias for the second "employees" type is introduced, which is then used to disambiguate the entity properties:
In some cases, the condition that is tested to decide whether a combination of the two entities from two entity classes being combined should be included into the resulting entity class is more complex than a simple equality between some (combinations of) properties of both entities. For such cases, one can use EntityFunction with two arguments, each one bound to an entity of the corresponding entity class being combined. The body of such an EntityFunction must return a Boolean value, but otherwise can be an arbitrarily complicated (compilable) expression.
The following is a rather interesting example, since it illustrates several different aspects of CombinedEntityClass usage. This query represents one way of finding all countries that have more than one sales office. The idea is to combine the "offices" type with itself, and as a condition for CombinedEntityClass use the predicate that checks that the countries of two entities being combined are the same, while office codes are different. Any such combination of two "office" type entities does correspond to a country that has more than one office. So it only remains to extract the country names of the resulting combined entities and remove possible duplicates:
In the last example, it is also necessary to prefix one of the combined "offices" types, to disambiguate properties, just like it was in the previous example of self-join. Finally, DeleteDuplicates is used as a third argument of EntityValue to remove the duplicates from the results (and as was noted before, it is important to remember that in cases like this, the third argument of EntityValue is currently executed on the Wolfram Language side, rather than on the database).
Subqueries are queries that are used as building blocks in more complex queries. They take the simplest form when the result of a subquery is a scalar. In this section, several examples of subqueries are used to illustrate their general uses.
As a starting point, consider a common task of obtaining a single aggregated property from some entity class. This will not yet be a subquery. The following examples illustrate various ways to do this.
The following aggregation query returns the maximal value of the "MSRP" property for all the products:
It is important to note, however, that while the results are the same, the first (shorter) version does in fact perform the aggregation in the Wolfram Language, while the last two versions aggregate on the database side.
However, when such queries are used as subqueries in larger queries, in that case they are always executed on the database side, including the short three-argument form of EntityValue.
The following query uses the previous query as a subquery. It selects all products with the MSRP price within the top 10% of most expensive products, in terms of the price range:
Subqueries like the one in the last example are called uncorrelated, since they do not refer to entities (table rows) of the outer layers of the query. One can also build so-called correlated subqueries, which do contain such references (and therefore, cannot be executed on their own without modifications).
The following example illustrates the steps one typically goes through to construct queries involving more complex and/or correlated subqueries.
One could also do the same computation with the office's entity, where the hard-coded "4" has been removed from the condition, and now there are nested EntityFunction expressions, with the inner one referencing the outer one's variable:
It is now easy to understand the following code, which computes the number of employees for each office and sorts the offices w.r.t. their number of employees in descending order:
The last example is an example of a correlated subquery. The correlation is reflected by the inner EntityFunction's body referencing the outer EntityFunction's variable.
Subqueries are a powerful tool, but they can be rather easily abused. Depending on the case, they may or may not be the optimal way to solve a given problem. Deeply correlated subqueries might result in an inferior performance, so one should use this powerful tool carefully.
The MemberQ predicate in the context of database-backed entity stores is used in Entity framework queries to expose the SQL IN operator. In the simplest form, it is used to test membership of a value in an explicit list of values.
The following property defined by EntityFunction yields True for an office with the office code being one of "2", "4" or "7", and False otherwise:
One can also use MemberQ in conjunction with subqueries, in which case, the first argument of MemberQ is not a literal list, but a result of a subquery. In such cases, a subquery should return a column rather than a scalar, and typically can be written as class["property"] or, equivalently, EntityValue[class,"property"], where class is some entity class (registered or defined by an inner query).
The following query selects customers who live in some of the cities where the offices are located, within the US:
In some cases, one needs to only keep those selected values or groups of values that are distinct. In SQL, a special DISTINCT keyword is used for this purpose. In the Entity framework, one can use DeleteDuplicates as a third argument of EntityValue to achieve a similar effect.
The following query returns a list of all distinct values for the "status" property of the type "orders":
In the previous example, as was noted before, DeleteDuplicates is currently executed on the Wolfram Language side. However, when a query such as the preceding is used as a subquery, the removal of duplicates will happen on the database side.
A somewhat different use case is when one needs to aggregate over distinct values. Also in this case, one can use DeleteDuplicates inside EntityFunction to achieve that.
The following computes the number of distinct exact prices, as well as rounded prices, and also average distinct price and average rounded price (ignoring the price multiplicities):
In this tutorial, entity-valued and entity-class-valued entity properties, generated by the Entity framework for related entity types (database tables), are called relations.
In addition to the properties that correspond to existing table columns, new properties are created by the Entity framework for tables that are related to other tables. They constitute the mechanism to provide a higher-level (w.r.t. explicit joins or subqueries) way to use data from multiple related entity types (database tables) in queries.
Such properties have been already considered previously, but on the most basic level.
It is important to realize that there is nothing that can be done with relations that cannot be done with core primitives. However, using relations in many cases can lead to considerably more concise queries, which require a lot less effort to both construct and understand.
There are four properties that do not correspond to existing database columns in this example: "offices", "customers", "employees-reportsTo" and "employees-reverse".
The properties in question correspond respectively to customers served by this employee, office for which this employee works, the manager of this employee and all employees for whom this employee is the manager:
The following is a more interesting query that returns all employees who work in the same office and report to the same manager as the given employee:
One can use relations to economically formulate queries. This is illustrated with the following examples.
If that property is needed further in the query, one can use ExtendedEntityClass to extend a given entity class with that property:
An important feature of relations is that one can follow them more than once, particularly for entity-valued relations.
The following query demonstrates the use of relations by adding to each order two new properties: the name of the customer who placed this order and the full name of the employee who served that customer. Note how relations are followed to extract the data belonging to related types (database tables):
The following example computes the total number of all customers and also the total number of customers with high credit limit (>$50000), for each employee, and adds these values as new properties. In this case, e["customers"] is an entity-class-valued relation, which can be used e.g. in FilteredEntityClass.
More examples of how relations can be used to build more complex queries are considered in the final section of this tutorial.
The purpose of this section is to give a basic idea about the kind of SQL code that is typically generated by the framework for various supported constructs. The generated SQL provided here is mostly intended for illustration purposes.
While many of the SQL queries presented in this section actually correspond to the SQL code generated by the current version of the Entity framework query compiler (for the SQLite backend), one should not assume that the generated SQL will always take that exact form. It may change from version to version. From the viewpoint of the Entity framework, the exact form of generated SQL is an internal implementation detail, as long as the end result is correct and has reasonable efficiency. So one should not rely in any way on the details of generated SQL code shown in this section.
You can start with some basic EntityValue call:
SELECT officeCode, city, state, country
In cases when one uses computed properties based on EntityFunction in EntityValue, they are aliased on the SQL side by the autogenerated property names.
SELECT REGEXP(state, '^C', 0) AS synthetic_prop_9
Such generated properties as "synthetic_prop_9" are not visible to the user, since EntityValue extracts only the property values when properties based on EntityFunction are requested.
For the case of EntityList, the database call actually extracts the values of the properties that are required to compute the CanonicalName for the entities of a given type (primary key for the corresponding database table).
Computed properties are defined using EntityFunction. These will typically compile into SQL expressions. Here are a few examples.
orderdetails.priceEach * orderdetails.quantityOrdered
offices.state IS NULL
REGEXP(offices.city, 'a', 0) OR REGEXP(offices.city, 'o', 0) AND NOT (offices.state IS NULL)
In simple cases, arithmetic operations will compile into similar SQL-level arithmetic operations. E.g. for the "products" type:
(products.MSRP - products.buyPrice) / products.MSRP
May compile into something like this (where conversions to reals are performed to preserve the semantics of Wolfram Language operations involved here, on the database side):
power(CAST((power(CAST(products.MSRP AS REAL), 2) - power(CAST(products.buyPrice AS REAL), 2)) / (power(CAST(products.MSRP AS REAL), 2) + power(CAST(products.buyPrice" AS REAL), 2)) AS REAL), 0.5)
orderdetails.quantityOrdered > 30 AND orderdetails.priceEach >= 200
orderdetails.orderNumber % 100 = 0
sum(orderdetails.quantityOrdered * orderdetails.priceEach) / CAST(count(orderdetails.orderLineNumber) AS REAL)
WHEN (1 > length(employees.firstName)) THEN NULL
WHEN 1 THEN substr(employees.firstName, 1, 1)
END IN ('M', 'P', 'D')
This construct generally corresponds to the SQL WHERE clause, with the second argument (being the EntityFunction expression) corresponding to the WHERE clause SQL expression.
SELECT employees.firstName, employees.lastName, employees.jobTitle
WHERE employees.jobTitle != 'Sales Rep'
When it can, the query compiler tries to optimize the query. In particular, successively applying several conditions will not usually result in many nested levels of SELECT in the resulting SQL code.
For example, the following query uses three nested FilteredEntityClass constructs to find all customers from USA, CA, who are located in one of the cities: San Francisco, Los Angeles or San Jose:
WHERE customers.country = 'USA' AND customers.state = 'CA' AND customers.city IN ('San Francisco', 'Los Angeles', 'San Jose')
SELECT employees.employeeNumber, employees.lastName, employees.officeCode
ORDER BY employees.officeCode DESC
One notable difference between SQL and the Entity framework sorting facilities is that not all SQL backends directly support expressions in the ORDER BY clause, while for SortedEntityClass the properties with respect to which one sorts can be either simple entity properties or EntityFunction expressions.
SELECT "T308"."employeeNumber", "T308"."firstName", "T308"."lastName"
"employees_T306"."employeeNumber" AS "employeeNumber",
"employees_T306"."firstName" AS "firstName",
"employees_T306"."lastName" AS "lastName",
length("employees_T306"."firstName") AS synthetic_prop_17
FROM employees AS "employees_T306"
) AS "T308"
ORDER BY "T308".synthetic_prop_17
The most direct SQL counterparts for this construct are LIMIT and OFFSET SQL keywords. However, the actual strategy and internal implementation for subsetting may be different, since in certain cases other strategies may be more efficient to obtain the same result.
"payments_T316"."customerNumber" AS "customerNumber",
"payments_T316".amount AS amount
FROM payments AS "payments_T316"
LIMIT 10 OFFSET 10
On the SQL side, this construct corresponds to the use of SQL expressions to define fields in the SELECT list, whenever one wants to compute properties more complex than just the original fields present in the database table or query.
"employees_T328"."employeeNumber" AS "employeeNumber",
("employees_T328"."firstName" || ' ') || "employees_T328"."lastName" AS "fullName"
FROM employees AS "employees_T328"
Or they can be complex and contain e.g. correlated subqueries like, for example, for the following query, which uses relations:
Where in the generated SQL, the last two fields in the SELECT list are represented by (correlated) scalar subqueries:
"employees_T352"."employeeNumber" AS "employeeNumber",
"employees_T352"."firstName" AS "firstName",
"employees_T352"."lastName" AS "lastName",
SELECT "employees_T355"."firstName" AS "firstName_1"
FROM employees AS "employees_T355"
WHERE "employees_T355"."employeeNumber" = "employees_T352"."reportsTo"
) AS "managerFirstName",
SELECT "employees_T358"."lastName" AS "lastName_1"
FROM employees AS "employees_T358"
WHERE "employees_T358"."employeeNumber" = "employees_T352"."reportsTo"
) AS "managerLastName"
FROM employees AS "employees_T352"
When used without the third argument, AggregatedEntityClass represents aggregation performed over the entire first argument (entity class), and corresponds to SQL aggregate queries. In this case, there is typically no special SQL-side keyword that would correspond to the AggregatedEntityClass[...], but the fields in the SELECT list must all use the SQL aggregation functions.
max("orderdetails_T403"."quantityOrdered") AS "maxOrdered",
min("orderdetails_T403"."quantityOrdered") AS "minOrdered",
avg("orderdetails_T403"."quantityOrdered") AS "avgOrdered"
FROM orderdetails AS "orderdetails_T403"
When the third argument of AggregatedEntityClass is used, this corresponds to the SQL GROUP BY clause.
"customers_T434".city AS city,
"customers_T434".country AS country,
count("customers_T434"."customerNumber") AS "customerCount"
FROM customers AS "customers_T434"
GROUP BY "customers_T434".city, "customers_T434".country
It is important to note that for the operations that one may want to further perform on the AggregatedEntityClass, often the generated SQL will have an extra layer of SELECT.
For example, should one wish to sort the entities resulting from the previous query by the value of the "customerCount" field:
"customers_T495".city AS city,
"customers_T495".country AS country,
count("customers_T495"."customerNumber") AS "customerCount"
FROM customers AS "customers_T495"
GROUP BY "customers_T495".city, "customers_T495".country
) AS "T497"
ORDER BY "T497"."customerCount" DESC
"employees_T529"."employeeNumber" AS "employeeNumber",
"employees_T529"."firstName" AS "firstName",
"employees_T529"."lastName" AS "lastName",
FROM employees AS "employees_T529"
"offices_T532".city AS city,
"offices_T532".country AS country,
"offices_T532"."officeCode" AS "officeCode"
FROM offices AS "offices_T532"
) AS "T534"
ON "employees_T529"."officeCode" = "T534"."officeCode"
Here is a considered earlier example of a query using self-join with a more complicated join condition, which finds all countries having at least two different offices:
Which will translate into SQL similar to this (note that currently DeleteDuplicates in the top-level EntityValue is executed on the Wolfram Language side, so there is no DISTINCT keyword in the SQL query):
SELECT "offices_T571".country AS country
FROM offices AS "offices_T571"
"offices_T574".country AS country_1,
"offices_T574"."officeCode" AS "officeCode"
FROM offices AS "offices_T574"
) AS "T576"
ON "offices_T571".country = "T576".country_1 AND "offices_T571"."officeCode" != "T576"."officeCode"
In this tutorial, subqueries normally means parts of the larger query, which can be expressed using EntityValue. On the SQL side, most of the time this corresponds to the inner SELECT statement with a single field, typically returning a scalar (either because there is a single row or because aggregation is being performed), but sometimes also returning a column (to be used in the IN clause, which would correspond to the use of MemberQ on the Entity framework side).
The following example (already considered before in the section on subqueries, and returning expensive products) represents a simple, uncorrelated subquery:
"products_T583"."productName" AS "productName",
"products_T583"."MSRP" AS "MSRP"
FROM products AS "products_T583"
WHERE "products_T583"."MSRP" >= 0.9 * (
FROM products AS "products_T586"
The following more complex version of the previous query extends each product with the number of products that are within a window of $15 from the current product, in terms of MSRP:
Where the subquery inside EntityFunction that defines the "closelyPricedProductsCount" extended property is a correlated subquery, since the filtering of products now depends on the price of the current product, which has to be referenced from that filtering/aggregation subquery.
This results in a query containing a correlated subquery, where the correlated subquery is in the SELECT list of the inner query and is aliased by "closelyPricedProductsCount", becoming a new extended property:
SELECT count("products_T638"."productCode") AS count
FROM products AS "products_T638"
WHERE abs("products_T638"."MSRP" - "products_T635"."MSRP") <= 15
) AS "T640"
) AS "closelyPricedProductsCount",
"products_T635"."MSRP" AS "MSRP",
"products_T635"."productName" AS "productName"
FROM products AS "products_T635"
) AS "T637"
ORDER BY "T637"."closelyPricedProductsCount" DESC, "T637"."MSRP" DESC
There is currently no automatic optimization of the generated SQL code involving subqueries performed by the Entity framework query compiler (such as attempts to convert them into JOINs etc.). One should be aware of the performance implications of the use of correlated subqueries in the Entity framework, which are similar to those for SQL.
Relations provide a high-level way to perform lookups for properties in entity classes/types related to a given one (related database tables), without performing explicit joins. Their internal implementation can utilize different tools to achieve this goal, such as subqueries and/or joins, but these are hidden from the user.
For an example of the type of SQL that can be generated from queries that use relations, consider the following query that computes, for each office, the maximal number of customers that a single employee in that office deals with (it is also considered later in the last section of this tutorial):
For the current version of the query compiler, the SQL generated for this query may look like this (which is admittedly not very easy to read):
"offices_T652"."officeCode" AS "officeCode",
SELECT max("T657"."customerCount") AS synthetic_prop_20
SELECT count("customers_T658"."customerNumber") AS synthetic_prop_21
FROM customers AS "customers_T658"
WHERE "employees_T655"."employeeNumber" = "customers_T658"."salesRepEmployeeNumber"
) AS "T660"
) AS "customerCount"
FROM employees AS "employees_T655"
WHERE "offices_T652"."officeCode" = "employees_T655"."officeCode"
) AS "T657"
) AS "T662"
) AS "maxEmployeeCustomerCount"
FROM offices AS "offices_T652"
The actual generated SQL for queries using relations might change in the future, as a result of optimizations or change of internal implementation.
The symbolic nature of Entity framework queries allows one to generate such queries programmatically. There are a number of use cases for this capability.
One such use case is to reuse certain parts of the query in more than one place. That can be in either several different queries, or in one and the same query.
One important point to remember is that EntityFunction is HoldAll, and therefore, if one needs to use some part of the query stored in a variable, in the body of EntityFunction, one has to use With (or a similar construct) to inject that query part into the body of EntityFunction.
Reusing inert query blocks in several larger queries can be a common need in practice. Since queries are inert, one can easily do so. In particular, one can store smaller parts of a query in variables and use those variables in larger queries.
However, it can also be used to, for example, get total payments of all customers served by a specific employee:
In some cases, one may need to reuse the same query more than once in one and the same larger query. In SQL, the WITH keyword is used to achieve that. While the Entity framework might acquire native support for this construct in the future, one can emulate it rather easily.
Consider the following query, which computes the total number of times each particular product has been ordered, by aggregating over all orders:
But it can also be used to, for example, find all items that were ordered the most, in which case it will appear twice in the query. Note that in this case, With is used, since totalOrderDetails should be injected into the body of EntityFunction:
One can use the standard Wolfram Language techniques to generate query expressions programmatically.
The following example illustrates this by building an aggregate query that computes various aggregated quantities.
First, generate the AggregatedEntityClass:
This example might be somewhat artificial, but it illustrates the main idea, which is that one can use standard Wolfram Language functions and idioms to automate the query construction process.
Programmatic access to queries and their symbolic nature can also be useful for certain more advanced scenarios, such as nontrivial query transformations or generation.
As an example, consider operator forms for common query building blocks. At present, for various reasons there is no direct support for those, for the core Entity framework query building blocks. Yet users often prefer this style of query writing, and it can in some cases make queries more readable by reducing the amount of nesting in the query.
One can start by defining a new set of symbols, which would serve as proxies for the Entity framework query primitives, but would support operator forms:
The following example query is written in operator style and does the following: selects the five top-paying customers and returns their customer number, customer name and total amount they have paid, sorted in the order of decreasing total amount:
Of course, the symbolic nature of the queries and the excellent Wolfram Language capabilities for symbolic expression manipulations open many more possibilities; this is just one such example.
This section contains brief example-driven coverage of a few practical techniques of query construction, which may be useful to have in one's toolbox.
One of the impediments to building complex queries is that the process may look and feel somewhat similar to writing code in compiled languages, where it is hard to test intermediate results or code pieces, and one has to come up with some complete function or program to compile and test it.
This section is intended to show that this is not really the case for Entity framework queries, and that with proper techniques, one can make the query-building process as interactive as most other activities in the Wolfram Language.
The query to illustrate the technique described in this section should do the following: select the five top-paying customers and return their customer number, customer name and total amount they have paid, in the order of decreasing total amount.
Several steps are shown, which start from a very basic query and gradually build the query of interest, being guided by the intermediate results.
The starting point can be to just look at the contents of the "payments" table/entity type. But for prototyping purposes, it may make sense to take a smaller sample of entities of the "payments" type:
One can see that there typically are several payments for a given customer. To compute the total amount each customer has paid, one therefore needs to aggregate over these values, grouping by the customer number:
Since the customer name is also needed, one way to get it is to use CombinedEntityClass, as follows:
Note that in the last query, in the property list it is important to use the full EntityProperty["customers","customerNumber"], rather than just "customerNumber", since CombinedEntityClass[...] now contains two "customerNumber" properties, and it is necessary to disambiguate which one is really requested.
The next step is to sort the result with respect to the total amount paid, by using SortedEntityClass with a proper specification:
Next, select only the five top customers, by using SampledEntityClass:
This example illustrates how one can gradually build the query, one step at a time, taking each time the query built on the previous step as a starting point and checking the intermediate results.
It is often useful to prototype queries using single entities of a given entity type, before actually constructing them for the entire entity type. This section illustrates this process using a concrete example.
Before building the actual query, get what you can using the top level. The starting point is to see which employees work in this office:
Note that this is a very inefficient approach, which can only make sense in the prototyping stage, since it leads to a large number of queries being executed.
The next step is to construct a query that computes the number of customers for each employee. As a first step, one can hard-code the specific employee number. For example, from the preceding it follows that employee with the number 1165 should have six customers.
Where in the last query, an extra layer of EntityFunction is introduced, which allowed the specific employee entity to be passed into the query rather than its number being hard-coded inside of it.
Now one can test the resulting EntityFunction on all employees of a given office, which produces the result one would expect from the above top-level analysis based on EntityList:
Where the previously constructed query has been used as an inner building block, and also the relation o["employees"] has been used, just as previously inside the EntityList.
Everything is now ready for the construction of the original query of interest. It may look as follows:
Note how using single entities, one can incrementally build nested complex queries, making sure that things work at each level. This particular query technically contains a doubly nested subquery, one level of which is correlated.
This logic also works in reverse: given a query that does not function properly and contains complicated inner structure (involving nested EntityFunction expressions etc.), one can break it into pieces and test inner parts of the query on single entities, to quickly locate and fix the problematic place.
Various errors that might occur during the query execution fall into two broad categories: soft and hard errors.
Soft errors are those that result in normal query evaluation, however not returning the results that would have been produced with a fully correct query.
In which case, Missing["UnknownProperty",…] values are returned for nonexistent property.
Another example of a soft error is when one tries to call EntityList or EntityProperties on a nonexistent type:
In which case, Missing values are also returned.
For the purposes of this tutorial, hard errors are those that return Failure objects. The user-level error-handling may therefore amount to checking EntityValue return values for Failure objects.
One example of such an error is using an invalid property in EntityFunction.
Query compilation fails whenever the body of EntityFunction contains uncompilable parts.
The following query fails because of the presence of a global variable y in the query, which has no value and cannot be compiled to SQL:
The same happens here, but this time because the database-side BesselJ computation is not currently supported:
Another common source of hard errors is when one uses expressions of the wrong type in EntityFunction.
In the following query, the Greater operator cannot take string arguments:
While in the following case, an attempt to add together an integer and a string results in a type error:
Usually, the error message is rather informative in such cases in terms of identification of the cause of the error.
Some types of Wolfram Language expressions are not currently supported by the Entity framework in the context of relational databases.
The following does not fail, however, since here the number has been explicitly converted to the machine-precision double:
Complex numbers are not supported by database backends directly, so the following results in an error as well:
Some operations that take EntityFunction as an argument require specific return types. For example, when used in FilteredEntityClass or CombinedEntityClass, EntityFunction should have a Boolean return type.
In the following example, the return type of the filtering predicate EntityFunction is integer, while Boolean type is required:
Here is an example where the result of EntityFunction semantically is a list of values, the type that cannot be returned from EntityFunction:
It is technically possible to construct an invalid aggregation query using relations, by combining columns from different tables in the same aggregation operation. Such queries fail to compile.
The following query fails since the subtraction is attempted essentially with different columns of different tables, which is not the valid query:
Expressions used to compute aggregated properties in EntityFunction for the case of aggregation must use one of the aggregation functions. It is technically possible to construct queries that do not, in which case they will not compile.
The following query fails, because the body of EntityFunction here has essentially a type of "column" (list of values) rather than a scalar—which looks similar to another error already discussed, but here it is in the context of aggregation:
The following does not fail, since it contains an aggregation function (Total in this case), thereby making the body of EntityFunction be a scalar:
Operational errors are errors that are happening on the database side. While the Entity framework makes an effort to intercept most syntactic, type-related and other errors in the queries early, in some cases this either has not yet been done or might not be easily possible to do.
The following example is an attempt to extend a type "offices" with a new property that is in fact an entity class. Such operations are not currently supported by database-backed entities, and the error in this case happens to occur on the database side:
Most interesting real-world questions require queries that combine several query-building primitives in nontrivial ways. This section illustrates how one can achieve this, with a number of more interesting examples.
The following query extends the "customers" type with a property "totalPaid" that gives the total amount of money paid by that customer and then sorts by "totalPaid" value in descending order.
The same can be achieved, using instead CombinedEntityClass and aggregation with grouping (notice that here one has to add those properties of the "customers" type that one wants to be available after aggregation to the list of properties one uses for grouping. Even though there is a single value of "customerName" that corresponds to specific value of "customerNumber", the query does not know that, unless we tell it this explicitly).
The following version uses CombinedEntityClass:
One way of dealing with this type of problem is to use CombinedEntityClass to combine the types together and then aggregate over the combined type, grouping over some of the properties.
The following query computes the total number of customers served by all employees in the office, for each office. It does this by combining three types: "offices", "employees" and "customers", and then performing aggregation over office code values:
Note that in the last query, it was important to use the longer form EntityProperty["offices","officeCode"] instead of just "officeCode" to disambiguate the property, since CombinedEntityClass["offices","employees","officeCode"] contains two properties with the short name "officeCode": EntityProperty["offices","officeCode"] and EntityProperty["employees","officeCode"] (in this case, either one could be used).
The following more complex version of the preceding query computes the total number of customers served by each office and residing in the country where that office is located. In this case one has to use the full EntityProperty for the "country" property to disambiguate properties, since both types "offices" and "customers" have the property "country":
The following query computes, for each office, the maximal number of customers that a single employee in that office deals with. It relies on relations rather heavily:
Just to appreciate how much work relations are doing for the user here, below is the SQL generated for this query by the current version of the query compiler:
"offices_T652"."officeCode" AS "officeCode",
SELECT max("T657"."customerCount") AS synthetic_prop_20
SELECT count("customers_T658"."customerNumber") AS synthetic_prop_21
FROM customers AS "customers_T658"
WHERE "employees_T655"."employeeNumber" = "customers_T658"."salesRepEmployeeNumber"
) AS "T660"
) AS "customerCount"
FROM employees AS "employees_T655"
WHERE "offices_T652"."officeCode" = "employees_T655"."officeCode"
) AS "T657"
) AS "T662"
) AS "maxEmployeeCustomerCount"
FROM offices AS "offices_T652"
This is another showcase to illustrate the power of relations. The task is to select customers who still owe money—in other words, those for whom the amount of total orders exceeds the amount of total payments to date.
To compute total amount to pay, one can start with all the orders for a given customer, which is given by the relation c["orders"] and is an entity class of type "orders". Since each order may contain multiple items, and each item may be ordered in multiple quantities, the next step is to extend this class with a property "totalToPay", which computes the amount to be paid for each order by reaching to all "orderdetails" entities related to this order and computing the total of the product of item price and ordered quantity. Note that here, relation o["orderdetails"] is used for each order. Then one can do the second aggregation, this time over all the orders for a given customer, to obtain the total amount they should pay.
The computation of the total amount the customer already paid is considerably simpler, since it requires a single relational lookup c["payments"]. The small cutoff 0.00001 is introduced to avoid roundoff errors.
The following query is one possible way to do this and uses correlated subqueries and CombinedEntityClass.
The With in the last example is used for readability and is not required; one could have used a single large query. Note however that With is used with := initialization for scoped variables, so that they get injected into the body of the main EntityFunction without evaluation.
The goal for this example is to compute for each employee the total amount to be paid by the five top-paying customers served by that employee, and sort employees in the order of descending total amount (which can be used as a measure of success for a given employee).
The following query is using relations. It illustrates how one can follow relations and compute aggregates conveniently (like in Total[c["payments"]["amount"]] below). Note that in this case, the use of relations allows one to use data from three different database tables ("employees", "customers" and "payments"), in a concise and economical fashion:
In the preceding example, as also in the example before it, With is used mostly for readability.