Let’s discuss the key development moments that affect the performance, resource intensity, reliability and correctness of application solution.
Many of these moments are already fixed as the standards in the System of standards and techniques of configuration development for the 1С:Enterprise 8 platform. Some of these moments are simply historically established practices of correct or, inversely, incorrect use of platform capabilities.
Work with database
If to try to formulate in short what is the most important during the work with database, then, firstly, this will be the effectiveness of queries with which we access to it, and, secondly, the integrity compliance of those actions which we execute with its data. In addition, there are smaller, but not less important moments. We will also talk about them with respect to work with database.
The basic idea which must be always kept in mind when writing the queries is that the query must be adequate to the solved task.
The query is written in order to receive data from database, the sample. And it should seem that this is not necessary to say, but it is required to select only those data that are required to solve the task. If for some reasons it turns out that the query selects the unnecessary data, it is required to get rid of them and minimize the sample volume. In practice, the code often occurs that firstly select a large amount of data from the base and then, in fact, discards them, does not use. This is inefficient. If only one user works with the base, inefficiency of such query can be noticeable. But when multiple users work with the base, the small inefficiency of each one can develop into the big problems for all of them.
Another point is that often the developers overestimate the capabilities of queries and transfer in them, for example, the logic of entire task execution. As a result, the query becomes complicated. Of course, it will perform its task. But already not so fast as the developer would like.
If, for example, the query contains 15 joins inside which there are the selection operations, OR expressions, etc., then this query will not work knowingly in a normal, effective way. DBMS will not simply understand what to do with this query. The obvious sign of this situation is the fact that in the compiled query plan there is a timeout warning. It means that the optimizer did not have enough time to search the best query plan. This complex query will be poorly executed with a high probability. To say the truth, this problem occurs not for all supported DBMSs equally strongly. But since it is not known in advance what exactly DBMS the user will install, it is required to strive for the fact that the queries would work equally effectively for all supported DBMSs.
The reason of complex query in terms of DBMS may be not only the complexity of data processing by itself contained in the query, but also, seemingly, the harmless developer aspiration to make his query more “readable” and understandable using the subqueries. The subqueries greatly complicate for DBMS the understanding of what the developer wants from it. Therefore, they should not be used in principle without a clear need.
Taking a care of query clarity, the conditions of table join must be written always very simple. In this case, if the fields in these conditions are used that are not in the index, then this join will be ineffective.
A number of tables that participated in the query may also greatly affect its effectiveness. If many tables are used, DBMS will not be able to build for such query the effective plan. The time consumed to build the query plan increases almost exponentially in respect to the number of tables. At some point a time out will occur for the query plan construction, and DBMS will select the plan which it was able to do at this time.
What does it mean “many tables”? It can be even 5-7 tables in a single query. There may be more. It is impossible to say in advance, but certainly the reduction of the number of tables is one of the ways to optimize the query that works ineffectively.
Effective query plans
We have already mentioned several times some “query plan” which DBMS tries to construct. Now it’s time to talk about this in more detail.
When we write the query, we say DBMS what we want to receive and what conditions it must correspond to. DBMS can fulfill our desire in different ways. In explicit form we do not have any opportunities to specify for the DBMS how exactly it should fulfill our desire. DBMS decides this by itself. It constructs the query plan, or simply plan. The plan is a set of physical operators that must be executed by DBMS to receive the requested data. Among these operators there are, for example, such as table scanning, index scanning, selection of some records by index, different join options, etc. Certainly, those who interested in the DBMS work saw what the plan looks like.
From the practical point of view, it may seem that the query work, and what else? Why should we go deep into the “labyrinth” of DBMS and be interested in its plans? But the case is that the execution plan of the same query can be both affective and ineffective. For example, if we have millions of records in the table, then the exhaustive search of one record from this million is very ineffective. DBMS will work slowly. If meanwhile this search is performed when joining with another table (NESTED LOOPS), this can be already a million multiplied by a certain number. Such plan will work terribly long.
Ok, you say, DBMS may select ineffective plan. But we have no opportunities to explicitly tell it what plan to use? Yes, there is no explicit opportunity. But we know that on the base of which data the DBMS optimizer constructs one or another plan. On the base of query text, existing indices and statistics. So, indirectly we can still influence the DBMS to construct an effective plan.
We have already talked a lot about query text, let’s say about one of its features. There are the functions in the query language of 1С:Enterprise. The query language allows applying these functions to the query parameters. For example, like in the example below:
Table.Date = BEGINOFPERIOD(&Parameter, DAY)
It is no goo to do so. Because in this case DBMS cannot effectively optimize the query. It will not understand that what you sent is a constant. Although, it is really a constant. It is evident from the “correct” use of the same function:
Table.Date = BEGINOFPERIOD(&Parameter, DAY)
In the “correct” variant the function value is calculated in the script and already after that is passed as the parameter in the query.
In the “correct” variant the function value is calculated in the script and already after that is passed as the parameter in the query.
It would seem that there is no difference. But many like to write as it written incorrectly. Because it is convenient, there is less script code, because the logic is concentrated in one place. Yes, it is, but it work poorly.
When developing more or less complex queries, one would like to be sure that they use the effective query plans. But how to do that? The only existing way – take the query, execute it and view the plan that came out. No other reliable methods.
Thus, it is required to execute the query for real amount of data and in the real stated of DBMS statistics. Amount of data is important, because if there are, for example, ten records in the table, it will be not hard to select two of them. But if in the real table there are ten thousand records, the to find two of them is not a simple task. The relevance of DBMS statistics, roughly speaking, shows how well it knows how many data are in its tables. If to update the statistics and execute the query, the plan can be good. However, in the real work everything can be different. Because, for example, a massive data input precedes the execution of this query after which the statistics stops to be relevant. As a result, DBMS selects less effective, but more “simple” plan.
Some developers for some simple queries are trying to guess the plan. But usually it is no good. It is not so easy to guess.
How to view the plan? It is possible to adjust the technological log of 1С:Enterprise and view the query plans in it, regardless to the used DBMS.
In addition, each DBMS has for this its own tools. For example, when developing in Microsoft SQL Server, it is possible to use SQL Server Profiler.
As we have seen earlier, the indices are the second important component to make the DBMS optimizer constructing the effecting query plan.
In general, the index is required to accelerate the data selection from the table. The indices in 1С:Enterprise tables is created independently and automatically, at the moment of database restructuring. The developer has no opportunity to specify directly in which tables what indices should be created. However, indirectly we can influence it setting for some attributes the indexing sign to the value Index or Index with additional ordering. It does not mean that 1С:Enterprise will construct the compound index for all the fields for which we have set this property. Or that the index will be only for this single field. The index structure will be enough complex and it will depend on the properties of the configuration object whose attributed we have indexed. It is possible to view the structured of indices for the “live” database using the script function GetDBStorageStructureInfo().
An experience shows that often the indices added by the developers are ineffective. Because the developer has placed them according to some not very obvious reasons, probably to make faster. Bu in fact, it turns out that the application solution uses these indices very rare. Besides, an excessive “diligence” in the placement of indices is too bad. When there many indices, there is a high probability that DBMS will not be able to select among them something suitable, and the query plan will become worse because of that.
The index can be used not as often as you would like. For example, there is a join in a query with another table by some fields. Or there is a selection condition by some fields. Let it be the fields А, B and C. А index in the tables is constructed by the fields B, C and D. This index will not be used. To use this index, it is required to do selection by the fields B and C or by the fields B, C and D. That is, all the fields required for the condition must be in the top index. In this case, DBMS may use the physical index seek operation (INDEX SEEK).
If there is no suitable index in the table, DBMS will select the operation of index scanning (INDEX SCAN) or table scanning. In principle, the table scanning is almost impossible in our tables, because there is a cluster index in all the tables. But it does not change significantly the situation. There is nothing good neither in the cluster index scanning nor in the table scanning. In case when we want to select from the table a small amount of data.
If we want to select, for example, 80% or even 50% or records, then the table scanning is probably the most effective way. Therefore, the presence of operation SCAN by itself in the query plan des not mean that the plan is ineffective. The main thing is to commensurate it with what we want to select.
Also, DBMS may select the index scanning in the case when the index is not suitable for the sample by conditions, but all the necessary fields exist in it. In this situation, DBMS has a choice: to scan the table or to scan the index. The index is usually smaller than the table, so DBMS decides that it is easier to scan it.
Another feature of the index usage is associated not with the fields that participate in the selection condition, but with the fields that we want to select from the table. Suppose we have the catalog table. We are interested in those records in which one of the catalog attributes has a certain value. But we select all the attributes existing in the catalog. That is, almost all the fields existing in each records. Under such conditions, the index is almost guaranteed not to be used. Because DBMS must firstly find the necessary records and then perform for each of found records LOOKUP in the table or cluster index in order to receive all the rest fields. DBMS will do that only in the case if a large number of records will get into the sample. If 10% of the table will get into the sample, it will be guaranteed to do SCAN. Because it likes to do simple things.
Another situation when DBMS may select the table or cluster index scanning is when the table is small by itself. Because if the table contains only a few thousands of records, it is much easier to scan it than to “suffer” with some LOOKUPs.
AS we have mentioned before, there is a cluster index in all the tables of 1С:Enterprise. The table has one cluster index. It is created by the platform and the developer cannot change it in any way. Usually it is an index of primary table key, that is, for the tables of object data (catalogs, documents) this index is by field Reference. For the tables of information register this index is by the measures. If the information register is periodical, then one more field Period is added to the fields of measures, etc.
Such another concept as selectivity is associated with the indices. The index selectivity is a value inverse to the number of records that we may select from the table by condition that uses this index. That is, the smaller percentage of records we can select, the better selectivity. The maximum high index selectivity can be for the primary key.
For example, if we want to select something from catalog by reference, the selectivity will be very large. We will select a single record. On the other hand, if we have an attribute with type of Boolean which we indexed in order to accelerate the receipt of information, this will be very ineffective. Because Boolean has only two values and if they are uniformly distributed in the table, then the half of table will be with the values True and another – with the values False. As a result, DBMS will not use such index, because it will see that it will not receive anything useful from this index. Such index will be used only in case if we want to select, for example, the value False and there are few records with this value. For example, 99% is True and 1% is False.
Therefore, it is not necessary to create the indices using the knowingly non-selective fields. The resources are required to maintain the indices, and if in this case these indices are not used, there will be nothing good in that.
The whole work with data DBMS performs in transactions. Transaction is a group of subsequent operations that can be executed as a whole and successfully or not executed at all. In other words, it is impossible to execute one part of transactions operations and not to execute another one. The most common set of requirements for transactions is a set ACID (Atomicity, Consistency, Isolation, Durability). That is, the transactions must be atomic, consistent, independent and reliable (durable, stable).
Why do we need the transactions? The simplest example is the movement of goods from one warehouse to another. We have to write off the goods from one warehouse and record them as received on another warehouse. If we will break this sequence of operations, execute it no in the transaction, it may turn out that the goods has disappeared in one warehouse but did not appear on another. Or vice versa. Therefore, such sequences of operations must be executed in transactions.
It is not required to open explicitly the transaction in script very often. The most part of necessary handlers is already executed in the transactions that the platform opens by itself. For example, the handlers of object module BeforeWrite, OnWrite, BeforeDelete. Of required, it is possible to open transaction explicitly. To do this, the methods are used BeginTransaction(), CommitTransaction(), RollbackTransaction(). And it is impossible to open the second transaction inside the transaction. Under this action, the internal counter of transactions will be simply increased, but the essence will not change. Therefore, if an exception occurs in the transaction of any “nesting level” or we cancel explicitly the transaction ourselves, then the entire transaction is cancelled, because it is only one.
The concept of locks is associated with transactions. If we read some data in transaction and these data can be changed, we should think about that they might be probably locked. If we will not do this, then the competing transactions can change the same data in parallel with use. As a result, it will turn out, for example, that a single unit of goods is written off two times. That would be wrong.
On the other hand, if the data in transactions do not change or we do not assume their frequent change, or we do not need to receive them in a consistent state with other data, then such data can not be locked. There is no need for this. In any case, inside the transaction we cannot read “dirty data”, because the level of DBMS isolation is high enough and it will not allow doing this. In the worst case, if at the moment of our data read somebody changed these data for a long time, we will get time out on the DBMS lock. But there is nothing wrong in that. And if we added in this place the manageable lock, in fact, we would do nothing useful, but only loaded for nothing the lock manager of 1С:Enterprise. That is, the DBMS lock we would replace with the manageable lock of 1С:Enterprise, which carries no benefit by itself.
Here are a few examples. When conducting the document, it is required to get the prices to generate the records. Suppose the user specified these prices in the catalog interactively. If these prices in the conduction transaction are obtained only once and there is no need to get them in the consistent state with other data, then it is possible not to set the lock before reading the prices.
Another example when during the document conduction it is required to select invoice (batch) to generate the records. In this case, before reading data about invoices, it is required to lock the register of measures in terms of which the selection is performed. Otherwise, several documents will be able to select the same invoice simultaneously.
If we are going to change data in transaction, it is required to make an exclusive lock. Because otherwise we can get deadlock. It is not difficult to get deadlock. Suppose we have two documents that are trying to lock the same resource. Firstly, they lock it successfully using the shared lock and then try to write this resource. During this process, the platform is trying to set an exclusive lock and cannot do that in both sessions. The stat of endless waiting occurs from which nothing good will come. As a result, on of transactions will be completed with an error.
Modifications in 8.3
Speaking about the transactions, it is necessary to make a small remark about the platform release 8.3, Microsoft SQL Server and IBM DB2. Before this release all the read processes executed on these DBMSs outside transaction could lead to such concept as “dirty read”. That is, we had the chances to read data of uncommitted transactions, and this occurred in practice very often. For example, all the reports showed data of non-conducted documents. In release 8.3, if you disable the compatibility mode, “dirty read” will not occur. Therefore, it will be possible to set the shared locks only in the case if the same data are read several times within a single transaction.
Use of continuous transactions
The durability of transaction is very important. It is hard to give the quantitative recommendation, but in any case when we make very long transaction, it is bad. Because everything that happens during transaction is written in the transaction log of DBMS. The most of locks DBMS holds to the end of query, but the locks for data modification are held to the end of entire transaction. In case of Microsoft SQL Server they are held directly in the server memory. And the manageable locks in any case are held in the memory of 1С:Enterprise server. If the versioned DBMSs are used (PostgreSQL, Oracle Database), everything is somewhat different, but we have to ensure an effective work on all supported DBMSs, therefore, it is required to take care about a reasonable reduction of transaction duration.
The longer transaction is executed, the longer the sued resources are occupied. The more disk space is consumed for the transaction log. In case of transaction cancel, it will take more time. Therefore, during the design there is a choice: make one long transaction or divide it into multiple short ones, it is better to select the second option.
Use of dynamic lists
In release 8.2 of platform there is a great ability to write the arbitrary queries in dynamic lists. And the developers use this in their configurations very often. We can say that the abuse it. Of course, on the one hand, there is a desire to output in the list as much useful information as it possible, to show it for the user. On the other hand, this desire leads to the fact that the queries occur that contain up to 20 joins with the use of subqueries. Or course, such query will not work well. Why will it not work well? Because the dynamic list is not magical and all the queries executed by it, if desired, you can write your own.
In its work, the dynamic list uses the cursor queries. It can operate in two different modes and the most effective is the mode of dynamic data read. Provided that the query is acceptable by itself, is not overloaded by the joins and restrictions at the records and database field level. In this mode the dynamic list reads the chunk of data, stores the first and the last row. If the user scrolls the list, the next chunk of data is read. With conditions where the values of sort fields are larger or less than previous ones. Depending on whether forward or backward the scroll is performed.
To make this mode fine operating, there must be an index for the table from which the selection is performed, and this index must fit the selection and sort criteria. If the selection and sort by some fields is used, then you should to enable for it Index with additional ordering. If there is no suitable index in this mode, this query “is dumped” in SCAN and the dynamic list starts to work very slowly. And the convenience to use such list is very low. Therefore, you should try to make the queries of dynamic list to be simple.
Another operating variant of dynamic list is when the dynamic data read is disabled, but the basic table is specified. This mode is quire rare, it is used, for example, in the selection filters. In fact, this is also the cursor selection, but by 1000 records stored in the buffer on the server. And the client receives data already from the buffer. This is less efficient than the dynamic read.
And the last operating variant, when there is no both dynamic read and basic table. In this mode the list begins in fact to read table as a whole accumulating it in the buffer. It is clear that this will not work effectively.
Support of different DBMSs
Another reminder concerns the support of different DBMSs. The platform supports five different DBMSs: file-mode, Microsoft SQL Server, IBM DB2, PostgreSQL and Oracle Database. Thereafter, the application solutions must strive to this, however, in practice, the same code does not work always fine in different DBMS. Ideally, we would like to test the applications on all listed DBMSs. But in practice, it is not always possible. Therefore, it is very advisable to check the work of application solution at least on the file variant, because it is very widespread by itself, as well as on one or two third-party DBMSs.
to be continued..