Floratec

Thursday, April 21, 2011

Some cloud Computing Information

Cloud computing

Cloud computing as a model for enabling convenient , on-demand network access to a shared pool of configurable computing resources ( e.g networks , servers , storage , applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

Few essential characteristics for cloud are:
1. Board Network Access—heterogeneous thin and thick client platforms
2. On-demand self service – lesser human interaction
3. Resource pooling –Pool of resources for multiple consumers using a multi tenent model
4. Fast scaling – or Rapid elasticity
5. Measured service – resources usage can be monitored , controlled and reported providing transparency for both the provider and consumer of the utilized service.

Cloud deployment Models
Private cloud – The cloud infrastructure is operated solely for an organization . It may be managed by organization or a third party and may exit on premise of off premise,
Community cloud – the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns. It may be managed by the organizations or a third party and may exist on premise of off premise.
Public Cloud – The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid Cloud – The cloud infrastructure is a composition of two or more clouds( private , community or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability .

The Cloud Service models
1. Cloud software as a service (SaaS) -- The consumer does not manage or control the underlying cloud infrastructure including network , servers , operating systems , Storage or even individual application capabilities .This is like on demand applications.
2. Cloud Platform as a Service ( PaaS) – The Capability provided to the consumers is to deploye onto the cloud infrastructure consumer – created or acquired applications created using programming languages and tools supported by the provider.
3. Cloud Infrastructure as a Service ( IaaS) – The Capability provided to the consumer is to provision processing , storage networks and other fundamental computing resources where the consumer is able to deploy and run arbitrary software , which can include operating systems and applications.

Thursday, May 6, 2010

ETL VS ELT

The idealogy for this E , T , L differes as needed in IT applications , so this ETL or ELT term used by IT vendors does not differentiate tools it differential architecture of application and how this ETL term is used . Since both do ETL and ELT does almost same work.

ETL (Extract, Transform and Load) - is software that transforms and migrates data on most platforms with or without source and target databases.

ELT (Extract, Load and Transform)- is software that transforms and migrates data in a database engine, often by generating SQL statements and procedures and moving data between tables. Its largly driven by RDBMS Vendors and they tended to be suitable for just one database platform.

Generally these terminology keeps coming , you may hear ETLT (Extract, Transform, Load and Transform) also in future .

Pros of ETL will be :
1. It can balance the workload with RDBMS
2. It can perform more complex operations .
3. Can scale with separate hardware.
4. Can handle Partitioning and parallelism independent of the data model, database layout, and source data model architecture.
5.Can process data in-stream, as it transfers from source to target
6. does not require co-location of data sets in order to do it's work.
7. captures huge amounts of metadata lineage today.
8. can run on SMP or MPP hardware

DATA QUALITY :- The ETL tools have a head start over ELT in terms of data quality integration . The row-by-row processing method of ETL works well with third party products such as data quality or business rule engines.

Pros for ELT will be :
1. It Leverages RDBMS engine hardware for scalability.
2. All data remains in RDBMS in all time.
3. The Disk I/O is usually optimized at the engine level for faster throughput.
4. All the capability of this is limited to RDBMS MPP platform.

The different part in both is architecture of application and technology used , Time line and other aspect depends on need of application and usage.

Thursday, April 29, 2010

Financial Data Models

What is a financial data warehouse model?

A financial data warehouse mode is a predefined business model of the bank. It consists of entities and relationship between these entities. Because it is geared towards use in a data warehouse environment, most of these models also include special entities for aggregation of data and hierarchies.

What objects do they consist of?
The most common objects included in a data warehouse are:
Involved party – a hierarchy which includes both organizations (also the own banking organization) and individuals;
Product – a hierarchy that not only consists of products but also of services and their features;
Location – the address of a party;
Transaction – a transaction made by the client;
Investments and accounting – positions, balances and transaction accounting environment;
Trading and settlement – Trading, settlement and clearing compliance and regulation;
Global market data – Issue information, identifiers, FX rates and corporate actions and statistics;
Common data – Calendars, time zones, classifications.

Using these objects you can create the most typical banking reports, for instance customer attrition analysis, wallet share analysis, cross sell analysis, campaign analysis, credit profiling, Basel II reporting, liquidity analysis, product profitability, customer lifetime value and customer profitability.

Who are the main providers of these models?

There are several main vendors of these models:
· IBM - Banking Data Warehouse Model;
· Financial Technologies International (FTI) – StreetModel;
· Teradata – Financial Management Logical Data Model;
· FiServ – Informent Financial Services Model;
· Oracle – Oracle Financial Data Model.

I m not going into details of these models they might be available on these vendors websites ..

If you are starting a small data warehouse project to model part of the bank, these models are probably not the way to go. The advantage of having a proven ready-to-use model does not outweigh the disadvantages of the high investment, customization to your situation and training that you will need.

If you are thinking of starting a series of data warehouse initiatives that have to lead to a company-wide data warehouse, these models might accelerate the creation of this data warehouse environment considerably. Still, the disadvantages mentioned will still be applicable.

Advantages of financial models

1. They are very well structured, and they are very extensive;
2. They can be implemented quickly and facilitate re-use of models;
3. They are created using proven financial knowledge and expertise;
4. They create a communication bridge between IT specialists and banking specialists;
5. They facilitate the integration of all financial processes;
6. They support multi-language, multi-currency environments;
7.The y come with an extensive library of documentation and release guides;
8.They can be implemented on a variety of platforms;
9.They are completely banking specific.

Disadvantages of financial models

The following are disadvantages of most, but not all, financial models:

1. You will need to put a lot of effort to align your model of the bank with these extensive detailed financial models;
2. They are based in general on banking in the United States or the United Kingdom, which might differ from banking in the Netherlands or elsewhere in Europe;
3. All banking terms are defined, but does your bank agree with these definitions;
4. You will have to have people working for you that have in-depth knowledge of both the banking processes and the model you use. These people are very scarce, and flying in consultants from one of the model’s vendors might be a costly business. Vendor’s who pretend you don’t need a lot of knowledge to use these models are not telling the truth;
5. Because of the interrelated nature of the models and their extensiveness, you will sometimes have to fill objects that you don’t have the data for. This will result in some workaround that is not desirable;
6.You buy the whole model, not pieces of it. For a small project this will result in a lot of overhead and extra cost;
7.Every model comes with a certain set of tools that someone in the department will have to learn;
8.New versions of the models come with changes in the model, which you have to examine to find out the impact on your current models.

The choice of which financial model to use is a difficult one. The models presented are all extensive and could all fit the bill. For particular situations some extensive research will be needed. Hardware and software standards and cost will surely have an impact on this decision.

Wednesday, April 14, 2010

Process - ITIL. COBIT. ISO 9000. Sarbanes-Oxley. Even PMBOK

Processes and models come in many flavors, shapes and sizes. Whether they advocate better quality management, better project management, better corporate governance or better audit-ability and control, their fundamental motivation--at least theoretically--is to, well, make things better. Models don’t start out with the underlying intent of making things worse. That would be unproductive, irrational and entirely unhelpful. The principle is that the model provides a better way of managing than whatever came before

Something very curious has happened in the implementation of countless models that have been implemented under the guise of “making management better”. In many instances, the result has been far from an improvement. The reality is that many implementations have made things worse.

Ironic? Certainly. Unhelpful? Unquestionably. But why? What is it that organizations are doing that takes a well-intentioned, well-meaning and purportedly well-crafted model and turns it into something that is considered bureaucratic, ill-guided and--in a couple of noteworthy instances--downright evil? And what can we do differently that will enable positive results, rather than haunted cries of “not again”?!?

The COBIT framework was developed by the IT Governance Institute, a self-described “research think tank” that was established in 1998 to support the improvement of IT governance. While the purpose is to define what an effective, architecturally driven means of managing IT that supports the enterprise is, the emphasis of COBIT is on controls, not processes. In other words, it doesn’t define how activities and initiatives should be done, but instead what controls should be in place to ensure that functions are being performed correctly.

Once the decision was made to adopt COBIT, however, the resulting activities quickly descended into the creation of a vast amount of rigor, oversight and bureaucracy that went far beyond where anyone in the organization expected or valued. Despite the lack of expectation or perceived value, however, the organization still proceeded down the path it had set for itself. Why didn’t it adjust its course, or even stop? How did this take on a life of its own? And how can future organizations learn a lesson from this experience and not do the same thing next time?

When looking at how industry standard models and frameworks are adopted, there are a number of traps that organizations allow themselves to fall into, which collectively can lead to the same slippery slope that the organization described above found itself on:

Because it’s the right thing to do. As noted, no one implements a model for the sake of it, or simply for the sake of creating bureaucracy. The models that exist do so for a reason. Creating visibility and momentum around this model or that, however, requires marketing and selling. Books are written, conferences staged and consultants bray that organizations that fail to adopt this model or that are at best misguided and at worse “doomed to fail”.

We’re just dealing with growing pains. Once an organization has made the choice to adopt this model or that framework, the implementation necessarily requires effort. Adoption and use requires that much more work. The literature on change management and implementation quite rightly points out the productivity impacts that can be encountered when adopting a change. When faced with the pains of adoption, however, legitimate concerns about the relevance and appropriateness of an approach risk dismissal as just growing pains. Rather than objectively asking whether the expressed concerns are legitimate, those raising concerns run the risk of being perceived as naysayers and “not on board”.

The technical imperative trumps the organizational need. Models are theoretically adopted to deliver business value. The implementation of any improvement initiative is frequently tied to a promise of improved business results that is in fact sold to the business. Like the organization described earlier, however, once agreement or adoption takes place, the actual adoption and implementation tends to be driven more by technical rather than business imperatives. The business oversight is assumed to be the decision to proceed in the first place, and the proper level of business scrutiny over what is implemented tends not to occur. The phenomenon of “inmates running the asylum” is far more appropriately the technical side implementing what they think is right, without a regular and necessary check-in with the business side of the organization as to whether or not it makes sense.

All of it, and as rigorously as possible. Models provide choices and alternatives. A careful reading of the introduction to the PMBOK, for example, reveals that there isn’t an expectation that every aspect is relevant for all projects. Appropriate and intelligent adaptation and application is essential. Sadly, when implementing a defined model, especially one that has been adopted as a best practice, the presumption is that everything it offers is good, appropriate and valuable. Rather than evaluating trade-offs and choosing what to implement, and how it should be implemented, the default position is that if the model says we should do it, then we should do it. Consequences in terms of the costs of adoption and the diminishing returns of benefits get dismissed in favor of rigorous adherence. After all, if this is what a “best” practice looks like, then any compromise runs the risk of becoming merely good, mediocre or even bad.

Adapting would “undermine the spirit and intent” of the model. Closely related to the presumption that the full model represents the best of all possible implementations is a related assumption: If adaptation were appropriate, the model would already be adapted. Again, the presumption is that because the model is the way it is, its integrity must be preserved. Adaptation is compromise. Compromise is assumed to be sub-optimal. Intelligent application of the model, in the eyes of the true believer, is heresy.

The result of these trends are implementations that are complete, universal and uncompromising in their adherence to what is viewed as “right”, unfortunately losing sight of what is fitting and practical. Models are just that--they are representations of reality. They are not reality, nor are they replacements for reality. They are suggestions of approaches that must be intelligently and reasonably considered by organizations in order to identify what is logical and appropriate, given the culture, context and management style of the organizations adopting them.

What this means is that the project managers and teams that implement models need to take a deep breath before proceeding to really think through what the results will mean for the organization. Often, the kickoff of an improvement effort is participation in a workshop, training course or boot camp to familiarize the team with the model and its purpose. It is at these events where the implementation can take on its sheen of ideology.

After all, the workshops are led by articulate, impassioned and well-meaning advocates for the approach being explored. They believe in what they are teaching and the value the model offers, and they have a host of horror stories to share regarding failures and consequences of incomplete or inappropriate adoption, or of not starting down this past in the first place. While education is fine, the second activity must be a sober reflection of what the implementation will mean for the organization. What fits, and what doesn’t? What makes sense in the context of the organization, and what won’t work? The fundamental question to be asked is how the principles of the model can be adopted and adapted, not how an ideologically pure and perfect version of the model can be shoehorned in and made to fit.

More importantly, organizational oversight is crucial. The executive agreement to adopt and proceed with an implementation requires a level of understanding of what the organization is signing on to when it chooses to proceed. This means that executives need to familiarize themselves with the principles and purposes of the models being considered. More importantly, they need to understand how these principles suit the context of the organization they lead. And most importantly, they need to provide the ongoing oversight of what is proposed to be actually implemented, constantly asking whether what is proposed makes sense, is relevant and will ultimately deliver value.

Models and frameworks abound in today’s marketplace. As organizations take stock of how they are performing, and seek improvement opportunities in the face of an uncertain marketplace, these models become tempting means of short-circuiting and accelerating the real work of improvement. Certainly, models like ITIL and COBIT have a place as a repository of practices and experiences that organizations can consider.

They are not blueprints for improvement, however, nor are they processes that can be adopted wholesale. They are representative principles of what can work. It is up to any organization considering them, however, to figure out what they can do to make them work in their context and environment. As has been said many times before: caveat emptor, let the buyer beware.

Partitioning data (Best Practices) in DataStage E.E. 8.1

In most cases, the default partitioning method (Auto) is appropriate. With Auto partitioning, the Information Server Engine will choose the type of partitioning at runtime based on stage requirements, degree of parallelism, and source and target systems. While Auto partitioning will generally give correct results, it might not give optimized performance. Based on requirements, and these can be optimize within a job and across job flows.

Objective 1

Choose a partitioning method that gives close to an equal number of rows in each partition, while minimizing overhead. This ensures that the processing workload is evenly balanced, minimizing overall run time.

Objective 2
The partition method must match the business requirements and stage functional requirements, assigning related records to the same partition if required.

Any stage that processes groups of related records (generally using one or more key columns) must be partitioned using a keyed partition method. This includes, but is not limited to: Aggregator, Change Capture, Change Apply, Join, Merge, Remove Duplicates, and Sort stages. It might also be necessary for Transformers and BuildOps that process groups of related records.

Objective 3

Unless partition distribution is highly skewed, minimize re-partitioning, especially in cluster or Grid configurations.

Re-partitioning data in a cluster or Grid configuration incurs the overhead of network transport.

Objective 4
Partition method should not be overly complex. The simplest method that meets the above objectives will generally be the most efficient and yield the best performance. Using the above objectives as a guide, the following methodology can be applied:

Start with Auto partitioning (the default).
Specify Hash partitioning for stages that require groups of related records as follows:
· Specify only the key column(s) that are necessary for correct grouping
as long as the number of unique values is sufficient
· Use Modulus partitioning if the grouping is on a single integer key
column
· Use Range partitioning if the data is highly skewed and the key column values and distribution do not change significantly over time (Range Map can be reused)

If grouping is not required, use Round Robin partitioning to redistribute data equally across all partitions.

· Especially useful if the input Data Set is highly skewed or sequential

Use Same partitioning to optimize end-to-end partitioning and to minimize re-partitioning

· Be mindful that Same partitioning retains the degree of parallelism of the upstream stage
· Within a flow, examine up-stream partitioning and sort order and attempt to preserve for down-stream processing. This may require re-examining key column usage within stages and re-ordering stages
within a flow (if business requirements permit).

Across jobs, persistent Data Sets can be used to retain the partitioning and sort order. This is particularly useful if downstream jobs are run with the same degree of parallelism (configuration file) and require the same partition and sort order.

Thursday, August 13, 2009

Enterprise Metadata

1 What is Metadata?

"Meta" is a Greek word meaning transcending, or going above and beyond. Metadata by definition is data that describes other data. Technology describes the term as referring to files or databases with information about another's attributes, structure, processing or changes. It can describe any characteristics of the data such as the occurrence or its quality.

But metadata is not just about the database and how data is placed in it. There can be other type of metadata as well. Let us try to understand the other possible type of metadata with the help of following example.

1 How to classify metadata?
Widely metadata is classified in two types:
· Technical or Administrative meta-data
· Business meta-data
2.1 Technical or Administrative Metadata
Administrative meta-data includes information about such things as data source, update times and any extraction rules and cleansing routines performed on the data.
It includes:
· Definition of source and target
· Schemas, dimensions, hierarchies
· Rules for extraction, cleaning
· Refresh, purging policies
· User profiles, access control
· Data lineage
· Data currency (e.g. active, archive, purged)
· Use Stats, error reports, audit trails
· ETL Tools
2.2 Business Metadata
Business meta-data, on the other hand, allows users to get a more clear understanding of the data on which their decisions are based. Information about calculations performed on the data, date and time stamps as well as meta-data about the graphic elements of data analysis generated by front end query tools.
It Includes:
· Business terms and definitions
· Object definitions and object help
· Data ownership, charging
· Data modeling tools
All types of metadata are critical to a successful data mart or warehouse solution. How well the data warehouse replenishment solution you choose manages and integrates metadata may affect the performance of presentation tools and the overall effectiveness of the data warehouse.
2 Why is Metadata Required?

Following are some of the main reasons for why metadata is required:

Metadata is required to understand the data.
Metadata provides information about where the data came from, when it was delivered, what happened to it during transport, and other descriptions can all be tracked.
Metadata gives context to the information.
Business rules/ Calculations applied to get the data can be analyzed or studied.

Metadata is required to manage the data.
Metadata explains how data is stored in a database.
The operational metadata gives us the statistics of data, e.g. error reports, use stats etc.
Metadata helps in Impact analysis for change in database schema etc.
Metadata helps in co-relating/ combining data from heterogeneous applications.
Metadata helps to ensure data accuracy, integrity and consistency of data.
Metadata minimizes the risk of system down time by eliminating the dependencies upon specialized system experts

Metadata is required in taking the business decisions.
Metadata helps the OLAP tools to exploit the correct information from warehouse.
As stated earlier, as the metadata gives the context to the data, it helps the business user to understand the data and relate various data.
Good Metadata makes it easier to use the data warehouse, so that the turn-around for information requests is faster.
Metadata gives us the information about how the historic data is saved. This data can be used for analyzing the business processes.
Metadata improves operational efficiency and customer certainty about the data.

3 How does Metadata help?

Metadata can help various types of users in following manner.

4.1 To Business Users

· Examining meta-data enhances the end user's understanding of the data they are using.
· It can also facilitate valuable “what if” analysis on the impact of changing data schemas and other elements.
· Identify and locate the information that they need in warehouse.
· Analyze the aggregation, summarization rules applied to data.

4.2 To Technical Users

· Metadata helps them ensure data accuracy, integrity and consistency.
· Metadata helps them in co-relating/ combining heterogeneous applications.
During data replenishment, solutions should store meta-data in tables located in the publisher and/or subscriber database, enabling companies to share metadata among heterogeneous applications and databases.
· Metadata helps them in defining the Transformation rules, access patterns and entity relationships.

4 Where is metadata in Data Warehousing?
5.1 Meta data storage
For business related decision-making user has to be given easy access to relevant business data. A data warehouse serves as a central repository for recording everything about business information and analysis. The warehouse can be accessed using different methods which include application, query and analysis tools etc. Metadata for theses warehouse applications should be readily available.

Metadata can be stored in two ways:

· Metadata Repository:
Metadata Repository is an old style of managing metadata, going back to the days of the mainframe. In a repository everything is centralized, i.e. all the metadata is integrated and stored at one location, which is the metadata repository of a warehouse.

· Distributed Metadata:
Here, the metadata is present at different locations, but this metadata can be exchanged as per requirements and there is integrity of ownership of metadata.
In a typical warehouse scenario, metadata is present in its raw form in various layers like ETL tool repository, OLAP tool repository and source system metadata.
5.2 Metadata in Data Warehouse

Let us analyze what to look for in every stage of Data warehousing:

Staging:
Staging is a phase between loading the data from source system to ETL database. A staging file may be a flat file that contains extracted and transformed data. The file contains all the data that is to be transported to the warehouse. Following information is considered as metadata for staging area.
1. The location of the staging file
2. Duration, volatility, and ownership of staging file
3. The fields in the staging file
4. The format of fields in staging file
5. Security settings for extract files
6. The statistics about the data i.e. duplicate data, erroneous data in staging file
7. The mapping of fields from staging area to database
8. Data staging area archive logs and recovery procedures

Data Mart:
Accessing data from a warehouse is time consuming, because there are a large number of users and large volumes of data. The use of data mart can solve this problem. It is also called as Business area warehouse or departmental warehouse. Following information about data mart is considered as metadata.
1. Loaded from: operational source or data mart
2. Expected network traffic
3. The users of data mart
4. The department/ business of which the data mart contains the information

ETL (Extraction, Transformation and Loading):
The first data warehouse process extracts data from many data sources. The extraction process extracts the selected data fields from the source. Extraction can take place with the help of routines containing business rules. Next to extraction is transformation. The transformation process integrates and transforms data into a consistent and uniform format for the target database. The aggregation and/or summarization can be applied at this stage. The final stage of ETL processing is loading. The loading process involves integrating, cleaning the data in warehouse and loading the data into target tables. Following data about ETL Processes can be considered as metadata.
1. Each column and its format of a warehouse table
2. The table, view, macro definition
3. Fields of interest from the source
4. Transformation rules from source to target table
5. Rules for aggregation, summarization, calculation over a period of time
6. Rules for stripping out fields, and looking up attributes
7. Slowly changing dimension policies
8. Current surrogate key assignments for each production key
9. Data cleaning specifications
10. Refresh frequency for tables
11. Refresh, purging policies
12. Modification logs
13. Business requirements
14. The relationship between the objects being used during ETL processing
15. Data transform run-time logs, success summaries, and time stamps

Storage (Database, RDBMS):
The metadata for storage Allows proactive assessment of impact of changes by providing enterprise-wide view and relationships of data gathered from many disparate sources. It also minimizes the risk of system down time by eliminating the dependencies upon specialized system experts. Following information regarding storage can be considered as metadata.
1. Schema for the database
2. The number of tables, views, stored procedures, macros in DB and their definitions
3. DBMS-level security privileges and grants
4. The relationship between various objects
5. DB monitoring Statistics
6. Database catalogs
7. DBMS load scripts
8. Partition settings
9. Indexes
10. DBMS backup status, procedures, and security

Application (Source System and Target system):
The source system here refers to the system to which the user actually feeds the data, where as the target system refers to the application which will be used by the end-user. The points mentioned below are considered as metadata for source and target systems.
1. The source of the data
2. The formats and definitions of all the fields from source and target system
3. The mapping information from the source system to data warehouse
4. Business rules, processes, data structures, programs and model definition to eliminate duplicate development efforts.
5. Platform for knowledge sharing across the organization.
6. Data management over time
7. Data dictionaries
8. Data lineage and audit records
9. Source system and target system job schedules
10. Access methods, access rights, privileges, and passwords for source and target access

OLAP and Reporting application:
On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. That is OLAP is a category of applications and technologies for collecting, managing, processing and presenting multidimensional data for analysis and management purposes. The function of reporting application is to display the required fields, in the layout specified by the business user. Following information about OLAP and reporting application can be considered as metadata.
1. Information related to the objects defined in OLAP/reporting tool e.g. their definition and attributes
2. Transformations applied in reporting tool
3. Business interests i.e., the measures, dimensions, hierarchy…present in the report
4. Slowly changing dimension policies
5. Rules for aggregation, summarization, calculation over a period of time for the fields in the report
6. Business terminologies
7. Report Layout
8. Detailed information of all the reports developed
9. The drill through and drill down specifications
10. The report schedules and the distribution list
11. Transformations required for data mining (for example, interpreting nulls and scaling numerics)
12. Network security user privilege profiles, authentication certificates, and usage statistics, including logon attempts, access attempts, and user ID by location reports
13. Usage and access maps for data elements, tables, views, and reports
5 What to consider in Metadata Design?

For business to analyze the decision process, the business must convert the data into reliable, reusable information asset to improve the operational efficiency and Customer certainty. Metadata management can provide solution to this. Metadata management is a key to eliminating information disparity, rapidly deploying information solutions, integrating disparate data sources, finding and sharing information assets and making the information coherent.

While designing metadata following points should be taken into consideration:

· Business-
The source and the target system should be analyzed and then business rules to migrate the data should be created. These rules should not violate the business considerations/ requirements.
· End User-
The areas which are of the interest of end user should be covered. E.g. all the required dimensions and measures should be present in the report specifications.
· Performance-
The turn around time for the queries fired should be as low as possible. The data should be made available to the ETL as well as reporting tools in minimum possible time.
· Accessibility-
The metadata should be easily accessible to the business as well as technical users.
· Standardization-
All the terminologies used across the metadata should be standardized, so that there should be no problem while integrating metadata from various locations. The metadata should be consistent over all the locations where it is stored.
· Historical Data-
Data warehouses also store the historical information. So the transformation rules applied over a period of time also need to be stored as metadata. Meta data design should make a provision to include the historical information as well.
· Up to date and accurate-
As the metadata is accessed for information it must be accurate and up-to-date.
· Completeness-
The metadata should include data about all the objects representing a data warehouse. Incomplete metadata may lead to improper analysis or erroneous reporting.

Friday, April 17, 2009

Using Selenium

Recently i have worked on two open source testing tools , and i found selenium as one of the best open source web testing tool available .. here below i m listing down how we can use selenium IDE..

How to create a test plan in Selenium IDECreating a test plan in Selenium IDE is very easy, so we will use it to create few simple tests to begin with.
1. Install Selenium IDE 0.8.7, a Firefox plugin.
2. After installing Selenium please restart your Firefox browser for the plugin to be activated.
3. Now you should see a new added menu item named Selenium IDE under your Firefox Tools menu.
4. Open / browse the site for which you want to prepare a test case.
5. Start Selenium IDE from Firefox Tools->Selenium IDE.
6. Browse some pages.
7. Now click red button to stop recording.

At this point you will see Selenium automatically recording your actions. Carefully note the commands, target and value. You can create and insert your own commands/ modify or even delete them. We will show some examples below. In the next section we will see how we can modify the generated tests to suit our needs.

How to create / modify / delete Selenium commands

The default commands generated by Selenium when you are browsing the page as a normal user should be modified to make the test more robust and to add test cases to it.

1. Let's replace all click commands by clickAndWait. click simply clicks the specified link and goes on to execute the next command without waiting. On the other hand clickAndWait waits for the new page to loaded before executing the next command. clickAndWait should be used to make more robust test cases.

2. Insert assertTextNotPresent command after each clickAndWait command to confirm a text must not be present in the browsed page.

3. Use assertTextPresent command to confirm a text must be present in the browsed page.

4. Finally to test your test plan please click green arrow button to play from the begining or to play from start point.

5. Export the test plan as java file by Selenium IDE File->Export Test As->Java - Selenium RC (for example the file name is SeleniumSTSanityTest.java)

6. Then close your Firefox Selenium ID.

How to run above test plan (automatically generated java file from Selenium IDE) in command line?

1. Download Selenium RC.

2. Unzip it under the same directory where SeleniumSTSanityTest.java (exported test plan as java file from Selenium ID) was saved.

3. Install junit.

4. Go to directory where you unzip selenium-remote-control-1.0-beta-1-dist.zip file.

5. Open a terminal and do the steps below-
a. cd selenium-remote-control-1.0-beta-1/selenium-server-1.0-beta-1
b. java -jar selenium-server.jar (to run the server in interactive mode execute java -jar selenium-server.jar -interactive)
c. If you get an error like Error: com.thoughtworks.selenium.SeleniumException: ERROR Server Exception: sessionId should not be null; has this session been started yet? then ensure that the browser is in the PATH before running the server. For example, you want to run the test in Firefox. Then you should do next two steps.

d. locate firefox-bin (for example it returns /usr/lib/firefox-1.5.0.12/firefox-bin)

e. export PATH=$PATH:/usr/lib/firefox-1.5.0.12/firefox-bin; Note: There is an alternative way to fix above error (browser is not in path). Simply replace chrome with browser PATH in SeleniumSTSanityTest.java file. For example: line setUp("http://floratec.blogspot.com", "*chrome"); becomes setUp("http://floratec.blogspot.com", "*firefox /usr/lib/firefox-1.5.0.12/firefox-bin"); in SeleniumSTSanityTest.java. To run the test in opera browser replace chrome with opera.

6. Now the selenium server is running and you have to run the Java client located in selenium-remote-control-1.0-beta-1/selenium-java-client-driver-1.0-beta-1.

7. Open another terminal.

a. export CLASSPATH=.:selenium-remote-control-1.0-beta-1/selenium-java-client-driver-1.0-beta-1/selenium-java-client-driver.jar:/usr/share/java/junit.jar
b. javac SeleniumSTSanityTest.java
c. java SeleniumSTSanityTest

8.The automatically generated java file SeleniumSTSanityTest.java is likely to have some defects. Fix it by comparing with the example below

:import com.thoughtworks.selenium.
*;import junit.framework.
*;import java.util.regex.Pattern;
public class SeleniumSTSanityTest extends SeleneseTestCase { public void
setUp()
throws Exception { setUp(http://floratec.blogspot.com, "*chrome"); // to run the test in opera replace chrome with opera }
public void testSimpleThoughts() throws Exception { selenium.open(""); assertFalse(selenium.isTextPresent("WordPress database error: [")); assertTrue(selenium.isTextPresent("2003-2008"));
selenium.open("/index.php/category/programming/java");
selenium.waitForPageToLoad("30000");
assertFalse(selenium.isTextPresent("WordPress database error: ["));
assertTrue(selenium.isTextPresent("2003-2008"));
selenium.click("//img[@alt='Übersetzen Sie zum Deutsch/German']"); selenium.waitForPageToLoad("30000"); assertFalse(selenium.isTextPresent("WordPress database error: [")); assertTrue(selenium.isTextPresent("2003-")); selenium.click("//img[@alt='Přeložit do Čech/Czech']"); selenium.waitForPageToLoad("60000"); assertFalse(selenium.isTextPresent("WordPress database error: [")); assertTrue(selenium.isTextPresent("2003")); } public static Test suite() { return new TestSuite(SeleniumSTSanityTest.class); } public static void main(String args[]) { junit.textui.TestRunner.run(suite()); }}