In this paper, the consistency of storage format is called format consistency. Semantic expression means the way one object is described. An object can be expressed in different ways with no error because of synonyms. Semantic dictionary is an effective way to solve the problem caused by inconsistent semantic expressions. Yao proposed a hashing method to ensure semantic consistency for cross-modal retrieval [ 23 ].
The consistency of semantic expression is called semantic consistency. The value of data represents a measured result of physical quantity. There are many inevitable impacts on data value because of the factors during measurement. Some impacts are from man-made factors, including reading error, recording wrong, or operation error. Others are objective factors, such as precision error of experimental equipment and the difference among tested objects.
The consistency of value is called value consistency. It should be clear that different values in science data just show the different results of objective conditions and phenomena. It cannot directly lead to an error. Data is the general name for numbers and letters which have certain significance. For further analysis, the concept of data is classified into three types: atomic data, data unit, and data set. Atomic data. Atomic data is the smallest identity item with independent meaning [ 24 ]. Atomic data cannot be divided into smaller meaningful items. An atomic data includes not only the value, but also its physical unit.
Data unit. Data unit is a combination of atomic data that describes the complete meaning of a phenomenon. It consists of one or more data items and only has a certain meaning when the data items are put together. Data unit cannot be divided again in the physical sense. A data unit can accurately describe a basic meaning only when these items are grouped together. Data set. Data set is a set of data units. It contains one or several data units. One data unit in a data set is called an element of the data set. To make the concepts more understandable, some material creep testing data are taken as an example, shown in Table 1.
Material creep refers to the slow plastic deformation under the action of longtime constant temperature and constant stress. Creep testing shows one kind of stress performance and service life in a certain temperature, especially in high temperature [ 25 ]. A typical creep testing is executed by adding a fixed stress to a specimen at a fixed temperature.
The testing result is the rupture time of the specimen. In Table 1 , each cell is an atomic data. From the definition, a data unit must represent the complete physical meaning. Since the creep performance of a material must be described by stress, temperature, and time together, every row in Table 1 can be regarded as a data unit.
ERD CASE STUDIES-EXAMPLES Database Management System – Database Management System
From the definition, an atomic data with complete physical meaning is also a data unit. For some simple physical issue, an atomic data can completely represent its physical meaning. It can be regarded as a data unit. Several data units consist of a data set. Each row data unit in Table 1 is an element of the data set. Data consistency. Data consistency is a data characteristic that contradictory conclusions cannot be derived from the given data.
Data consistency is characterized by defining a method of set constraint. Since atomic data is simple and easy to distinguish, this paper did not describe atomic data too much. This paper mainly studies the consistency of data units and data sets. Through the above definition and analysis, we can get some data consistency attributes. The proof is omitted because of the paper length.
Data consistency is reflexive. Any data unit A is consistent with itself. Data consistency is symmetric. If data unit A is consistent with B, B is also consistent with A. Data consistency is transitive. Complete consistency and strong consistency are transitive, weak consistency and conditional consistency are not transitive. To describe the degree of consistency between two data units, the concept of consistency degree is proposed. Consistency degree. Consistency degree is a measure to quantify the degree of consistency between two data. The higher the consistency degree is, the more consistent the two data are.
Since the data inconsistency comes from the storage format, semantic expressions, and numerical values, after defining the degree of consistency, a method of consistent quantification is proposed to quantitatively assess the degree of data consistency. Here, the idea of vector can be applied [ 26 ]. Consistency degree is quantified by a three-dimensional vector. Here, C v , C s , and C f represent the consistency degree of data value, semantic expression, and storage format, respectively.
In order to express more clearly, the value of each dimension is specified as an integer between 0 and 9. The bigger the value is, the more consistent the data are. The detailed quantification theory and calculation method for C v , C s , C f are shown as follows. Assuming there are two data units and each unit contains m items, the data units can be regarded as two points in an m -dimensional space. The distance between the two points can describe the deviation between two data units [ 27 ].
Here, d ij denotes the deviation between data unit i and data unit j. Deviation should meet the following conditions:. If and only if the m variables of the two items are equal, the equation equal sign is true. The smaller the deviation is, the closer two units are. Deviation formula is shown as Equation 2.
Here, m is the total number of items in the data unit. The quantified C v and the corresponding deviation range is shown in Table 2. Semantics indicate the meaning of a word. One meaning can be expressed by different words or styles. Semantic expression is adopted to calculate the similarity of two words. This paper introduces a method of calculating lexical similarity based on a synonym word forest [ 28 ]. WordNet and Tonyicicilin are available semantic dictionaries in English and Chinese, respectively [ 29 , 30 ]. C s can be calculated according to the correspondence of lexical similarity ranges and C s values are shown in Table 3.
The calculation of C f is based on storage format difference. Firstly, the value of C f of the same storage format is 9. To quantify the value of different formats, some rules are defined for structured data, semistructured data, and unstructured data. Unstructured storage format includes documents, images, audio, video, and so on. Rules of C f quantification are defined as follows:. The details are shown in Figure 1. According to the consistency degree, different grades of consistency can be derived for various applications.
According to the influence on application and general presentation customs, data consistency is divided into complete consistency, strong consistency, weak consistency, and conditional consistency. For easier description, only the relationship between two data units are proposed here. Other advanced relationships will be described in Section 2. Complete consistency. Two data units satisfy complete consistency if their semantic expressions, storage formats, and data values are all the same. If two data are consistent completely, their storage formats, data values, and semantic expressions must be exactly the same.
In other words, if the value of C is , the two data are completely consistent. Data can be considered reliable if data from different sources is completely consistent. Completely consistent data is not common in experimental results. Usually, the completely consistent data comes from the same source.
Strong consistency. Two data units satisfy strong consistency if their semantic meanings and data values are the same. It is also called semantic consistency. Two strongly consistent data units can use different semantic expressions. Strong consistency requires that data units have the same value. Strong consistency data may come from the same source but with different treatment. Weak consistency. Two data units satisfy weak consistency if their values have a certain deviation. Two weakly consistent data units can utilize different storage formats and semantic expressions. Weak consistency in material science is common.
Test data is influenced by various factors and data collected from different sources has different parameters, so the trend of similar data can be compared through collecting the same kind of material and the same performance data [ 31 ]. Conditional consistency. Two data units satisfy conditional consistency if their values meet the requirements of predefined conditions. Conditional consistency is associated with specific application. Conditions can be experimental parameter, equipment, data model, and so on. Usually, conditional consistency needs to define a descriptive condition or a threshold, such as the absolute error, relative error, and so forth.
Here, creep testing data of materials is taken as an example of conditional consistency. Experiments should be operated many times in order to avoid error. The creep test data is consistent under this condition only when these creep curves can maintain consistency. Based on the above definition and analysis, the range of values for vector C and consistency grade can be derived, as shown in Table 4. Besides the consistency of two atomic data, there are some advanced consistency relationships. They mainly include the consistency relationship between two data units, data unit and data set, two data sets.
When a data set is composed of multiple data units, the relationship between one data unit and one data set needs to be considered.
- sodium thiosulphate and hydrochloric acid a level coursework?
- edexcel igcse economics past papers.
- literary analysis essay on the odyssey.
The relationship between one data unit and one data set is defined based on the relationship between data units, which is also divided into four categories: complete consistency, strong consistency, weak consistency, and conditional consistency. Here, data set B is not inconsistent, that is, each pair of elements in B is consistent. Complete consistency between data unit and data set. If there is one element b i being completely consistent with a, the relationship between data unit a and data set B is complete consistency.
Strong consistency between data unit and data set. If there is one element b i being strongly consistent with a, the relationship between data unit a and data set B is strong consistency. This means that the storage format and semantic expression of data unit a and b i are allowed to be different. After format conversion and semantic conflict processing, the values of a and b i are exactly the same.
Weak consistency between data unit and data set. If there is one element b i being weakly consistent with a, the relationship between data unit a and data set B is weak consistency. Weak consistency and strong consistency between data units and data sets are similar. The storage format and semantic expression of data unit a and data set B are allowed to be different. However, after format conversion and semantic conflict processing, the numerical deviation between a and b i is within the error range defined by the user.
Conditional consistency between data unit and data set. The prerequisite of conditional consistency is that all data units in data set B are in the same rule, such as being fitted with a certain shape.
Much more than documents.
If the distance between data unit a and the shape is within the user-defined threshold, the data unit and the data set are called conditionally consistent. The consistency relation between two data sets is defined on the basis of the consistency between data unit and data set, which is also divided into four categories: complete consistency, strong consistency, weak consistency, and conditional consistency.
Here, we also assume that data set A and B are not inconsistent, that is, each pair of elements in A and B is consistent, respectively. Complete consistency between data sets. When all data units a i in A have completely consistent elements corresponding to them in data set B, the relationship of data set A and data set B is called complete consistency. The requirement of complete consistency is strict. As long as there is a data unit in A that is not completely consistent with the data unit in B , the two data sets are not considered to be completely consistent.
In fact, if two data sets are completely consistent, one data set must be a subset of the other. Strong consistency between data sets. When there are only two relationships between a i and b j , complete consistency and strong consistency, A and B have a strong consistency relationship.
Generally speaking, when data set A and data set B are strongly consistent, they can be divided into two situations:. Some data units in A are strongly consistent with those in B , and the rest are completely consistent. Weak consistency between data sets. As long as there is a data unit in data set A that is weakly consistent with the data unit in B, the relationship between the two data sets is weak consistency.
When data set A and data set B have weak consistency, they can be divided into the following situations:. There are two relations between the data unit in A and the data unit in B , which can be divided into two situations:. The first kind: complete consistency and weak consistency. There is only a weak consistency between the data unit in A and the data unit in B. Conditional consistency between data sets. Conditional consistency of two data sets means that all data units in data set A can be combined with data units in B to form one united shape.
The conditional consistency of two data sets means that the data in the two sets obeys the same rule. The fragments accessed by a transaction are all assumed independent, which is not the case in the real world. This method neglects site information like storage and processing capacity and it is applied only on a WAN network. Son et al. Our previous work in this field dealt with components and tools concerning DDB design [ 18].
In the latter work , we have analyzed and implemented diverse methods to tackle combinatorial optimization problems in distribution design, which are very complex problems. These methods include exact and heuristics approaches which have been very useful in solving real life problems. The main issue with exact methods is their applicability to large problems, specifically for the type of NP-complete problems for which there is no guarantee to find an optimal solution in a polynomial time .
A good alternative for NP-complete combinatorial optimization problems of large size is to find a reasonable solution in a reasonable time . This is the idea of the heuristic methods which are in general quite simple and based on intuitive and common sense ideas . The general problem with many heuristics is that they may get stuck in local optimal solutions. More recently a number of metaheuristics have evolved that define ways to escape local optima. Metaheuristics are higher level heuristics designed to guide other processes towards achieving reasonable solutions, and do not guarantee in general that one will finish with an optimal solution, though some of them present convergence theories.
However they have been successfully applied to many problems. Here we explore Genetic Algorithms and a much more recent approach, Reinforcement Learning RL for solving the harder problem, namely allocation. RL may be interpreted as a conjunction between machine learning and decision making problems.
ERD CASE STUDIES-EXAMPLES Database Management System
The problem of DDB design comprises first, the fragmentation of database entities and second, the allocation of these fragments to distributed sites. Two approaches are possible in a DDB de sign: top-down and bottom-up. This paper uses the top-down design approach where the input to the design process is the global conceptual schema GCS. Statistical information collected from the design activities includes access patterns of user applications, and information about sites and the network.
The output from the design process is a set of local conceptual schemas LCS over distributed sites . The input to the design process is obtained from system requirements analysis which defines the system environment and collects an approximation of both the data and processing needs of all potential database users. Providing an easy user interface for entering the distribution requirements as well as facilitating user control in driving the distribution process are topics that we addressed when implementing an integrated tool to support the entire DDB design cycle.
Ceri et al. Figure 1 Distribution design activities. As a result, we have created a DDB design tool that integrates various methods for each component of distribution design. Figure 2 Application integration through the design process. Figure 2 shows an abstract representation of the integrated design process for the tool where information is vehicled from one tool to another by feeding the output of one application as input to the next, but not exactly in a linear fashion.
Rather, the common information is stored into a shared database, namely the design catalogue, which can be accessed by each tool through one common interface and is embedded within the integrated tool. Unfortunately, collecting the large amount of required information is a hard task and requires time and effort. The idea is to provide suitable data through a catalogue as part of a workflow see figure 2.
Preceding steps in the workflow fetch data from the catalogue and use them as inputs to the algorithms involved in the design process.
- Distributed database - Wikipedia.
- critical essays on the kite runner;
- critical thinking and nursing judgement.
- pro and con essays on legalizing marijuana.
- Distributed Database Design: A Case Study - ScienceDirect.
Outputs are placed back to the catalogue, so that they can be used for further steps. Unfortunately, the risks of application integration are often discarded. Because integration means to create dependencies between applications this may reduce the ability to adapt to changes. On the other hand, dependencies are good, because they save time and effort to a great extent.
Figure 3 Architectural overview of the integrated tool. This is the central component of the tool. The integrated user interface is responsible for activating any needed tool through the design process workflow. It provides appropriate modules for configuring any aspect of the integrated tool see figure 4. The design process is logically iterative and exploratory. This component provides appropriate features for the definition and redefinition of global conceptual schemas with a variety of constructs from the Extended Entity-Relationship Model  It uses the notation from .
- essay on importance of rank structure!
- Lecture 25: Distributed Transaction Models III.
- antithesis essay writing.
- university of iowa creative writing faculty?
- Distributed Transaction Processing: Case Study.
Additionally, this component can check schemas for correctness by means of structural validations, uniqueness of names, use of identifiers, etc. When the conceptual schema is correct, a visualization of the logical schema by means of relations  and a script for relations creation is generated. ERECASE was implemented with an easy user interface for collecting schema information as well as user control in driving the design process see figure 5. The tool was implemented with an easy user interface for collecting the distribution requirements as well as user control in driving the design process see figure 6a.
This tool provides the user with advanced features for getting supplementary information of transactions relevant to the distribution. This tool provides a number of reports presenting required parameters values of the node in any given simulation. The tool was implemented with an easy user interface for collecting the distribution requirements as well as user control in driving the design process.
Because of space restrictions, more detailed views are not given. Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones named fragments. Three fragment types are defined on a database entity. Horizontal fragmentation is the breaking up of a table into a set of horizontal fragments with only subsets of its tuples [3, 22, 26].
Vertical fragmentation is the breaking up of a table into a set of vertical fragments with only subsets of its attributes [5, 27, 28, 29, 30]. Hybrid also called mixed fragmentation is the breaking up of a table into a set of hybrid fragments with both subsets of its tuples as well as subsets of their attributes . A lot of research work has been published on fragmentation and allocation in the relational data model [3, 4, 27, 31] see figure 7.
While fragmentation is an important issue, our main concern in this section is with how the data should be allocated around the network once it has been partitioned by whatever criteria. Data allocation is a critical aspect of DDBSs: a poorly designed data allocation can lead to inefficient computation, high access costs and high network loads [3, 32] whereas a well designed data allocation can enhance data availability, diminish Access time, and minimize overall usage of resources [3, 33]. It is thus very important to provide DDBSs that find a good solution in a reasonable amount of time, achieving data allocations that minimize the cost of answering the given queries.
This section addresses the problem of determining where to place a given set of fragments on a network in order to minimize the cost of answering a given set of queries Q. We assume that fragmentation of the original relations has been carried out before the data allocation phase. Since the allocation problem is pretty complex and involves combinatorial optimization problems, the tool implements Genetic Algorithms and a Q-Learning method for mapping fragments to sites. Outputs of these methods can be compared and then selected for materialization.
Data replication is a key technology in distributed systems that enables higher availability and performance. Physical designs are completed over the generation of script using calls to Transact-SQL store procedures. This paper outlines issues involved in the conceptual design, fragmentation and allocation in a DDBS. The paper proposes a novel integrated tool for aiding designers in initial distributed database designs. The contribution of this work is the implementation of the integrated tool, built up of a variety of applications and methods for performing distributed database designs.
The algorithms necessary to support the design process are implemented and their complexities are polynomial. A description of the architecture and functions is also provided. The utility of this tool is clear cut. Unfortunately, many design parameters need to be entered by designers, and their estimation is sometimes difficult.
At this moment we work on the integration of several algorithms for this allocation problem, specifically Q-Learning, Genetic Algorithms, Bird Flocks and some other tools developed by our research group. The main goal is to help in the design of Distributed Databases in a more efficient way by using less effort and time. Ezeife, K. Distributed and Parallel Databases. Karlapalem, S. Navathe, M. Issues in distribution design of object-oriented databases.
Dayal, P. Valduriez editors. Distributed Object Management. Morgan Kaufmann. Principles of Distributed Database Systems, 2 nd ed. Upper Saddle River, New Jersey. Navathe, K. Karlapalem, M. A mixed fragmentation methodology for initial distributed database design.
College of Computing. Georgia Institute of Technology. Atlanta, Georgia. Tamhankar, S. Peddemors, L. Sloot, M. Bubak, L. Hertzberger editors. Bellatreche, K. Karlapalem, Q. Lee, Y. Park, G. Jang, S. Huang, J. Journal of Information Science and Engineering. Son, M. Journal of Systems and Software.