SAP DataServices Certification Study
Flashcard maker : Lily Taylor
Transforms that allow you to extract, transform, and load data. These transform help ensure data integrity and maximize developer productivity for loading and updating data warehouse environment.
Transforms that help you improve the quality of your data. These transforms can parse, standardize, correct, enrich, match and consolidate your customer and operational information assets.
Transforms that are needed for general data movement operations. These transforms allow you to generate, map and merge rows from two or more sources, create SQL query operations(expressions, lookups, joins, and filters), perform conditional splitting, and so on.
Text Data Processing
Transforms that help you extract specific information from your text. These transforms can parse large volumes of text, allowing you to identify and extract entities and facts, such as customers, products, locations, and financial information relevant to your organization.
Allows a data flow to split its processing into two sub data flows and push down resource-consuming operations to the database server.
Generates a column filled with date values based on the start and end dates and increment that you provide.
Generates an additional “effective to” column based on the primary key’s “effective date”.
Flattens hierarchical data into relational tables so that it can participate in a star schema. Hierarchy flattening can be both vertical and horizontal.
Converts rows flagged as UPDATE to UPDATE plus INSERT, so that the original values are preserved in the target. You specify in which column to look for updated data.
Generates new keys for source data, starting from a value based on existing keys in the table you specify.
Sorts input data, maps output data, and resolves before- and after-images for UPDATE rows. While commonly used to support Oracle changed-data capture, this transform supports any data stream if its input requirements are met.
Pivot(Columns to Rows)
Rotates the values in specified columns to rows. (Also see Reverse Pivot.)
Reverse Pivot( Rows to Columns)
Rotates the values in specified rows to columns.
Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT and UPDATE
Processes large XML inputs in small batches.
Completes and populates addresses with minimal data,and can offer suggestions for possible matches.
Compares group number to find associated matches from different Match transforms.
Parses input data and then identifies the country of destination for each record.
Identifies and parses name, title, and firm data, phone numbers, Social Security numbers, dates, and e-mail addresses. It can assign gender, add prenames, generate Match standards, and convert input sources to a standard format. It can also parse and manipulate various forms of international data, as well as operational and product data.
DSF2 Walk Sequencer
Adds delivery sequence information to your data, which you can use with presorting software to qualify for walk-sequence discounts.
Identifies and appends geographic information to address data such as latitude and longitude
Identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, and secondary number
Compares records, based on your criteria, or business rules, to find matching records in your data.
Identifies, parses, validates, and corrects USA address data according to the US Coding Accuracy Support System (CASS)
Does just about anything that you can write Python code to do. You can use the User_Defined transform to create new records and data sets, or populate a field with a specific value, just to name a few possibilities.
Simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.
Allows conversions between operation codes.
Unifies rows from two or more sources into a single target.
Retrieves a data set that satisfies conditions that you specifiy. A Query transform is similar to a SQL SELECT statement.
Generates a column filled with integer values starting at zero and incrementing by one to the end value you specify.
Performs the indicated SQL query operation.
Ensures that the data at any stage in the data flow meets your criteria. You can filter out or replace data that fails your criteria.
Extracts information(entities and facts)from unstructured data and creates structured data that can be used by various business intelligence tools.
In Data Services, functions take input values and produce a return value. Procedures take input values and perform a set of operations without returning a specific value. Input values can be parameters passed into a data flow, values from a column of data, or variables defined inside a script.
Functions compared to Transforms
Some functions can produce the same or similar values as transforms. However, functions and transforms operate in a different scope: Functions operate on single values, such as values in specific columns in a data set. Transforms operate on data sets, creating, updating, and deleting rows of data.
Operation of a function
The function’s operation determines where you can call the function. For example, the Lookup database function operates as an iterative function. The lookup function can cache information about the table and columns it is operating on between fuction calls. By contrast, conversion functions, such as to_char, operate as stateless functions. Conversion functions operate independently in each iteration. An aggregate function, such as max, requires a set of values ot operate. Neither the lookup function(iterative) nor the max function (aggregate) can be called from a script or conditional where the context does not support how these functions operate.
Generates a single value from a set of values. Aggregate functions, such as max, min, and count, use the data set specified by the expression in the GROUP BY tabe of a query. Can be called only from within a Query transform- not from custom functions or scripts
Maintains state information from one invocation to another. The life of an iterative function’s state of information is the the execution life of the query in which they are included. The lookup function is an iterative function. Can be called only from within a Query transform- not from functions or scripts.
State information is not maintained from one invocation to the next. Stateless functions such as to_char or month can be used anywhere expressions are allowed.
Embedded in other editor windows like the Script Editor, Conditional Editor, and Query Editor. Offers color coded syntax, a right-click menu, keyboard short cuts, and a list of available variables, data type formats, and functions that you can use to define a function.
You can use the function wizard to define the parameters for an existing function. The function wizard offers the most help when defining complex functions.
Processes and Threads
Data Services uses processes and threads to execute jobs that extract data from sources, transform the data, and load data into a data warehouse. The number of concurrently executing processes and threads affects the performance of Data Services jobs.
The al_jobserver initiates one process for each Job Server configured on a computer. This process does not use much CPU power becauce it is only responsible for launching each job and monitoring the job’s execution.
For batch jobs, an al_engine process runs when a job starts and for each of its data flows. Real-time jobs run as a single process.
The number of processes a batch job initiates also depends upon the number of:
parallel work flows, parallel data flows, sub data flows
Pushing down operations provides the following advantages:
Use the power of the database server to execute SELECT operations( such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations. Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations.
The following features enable a full push down:
Data_Transfer transform and Database links and linked datastores
Using caches for faster access to data
You can improve the performance of data transformations by caching as much data as possible. By caching data in memory, you limit the number of times the system must access the database.
Bulk loading to the target
The software supports database bulk loading engines including the Oracle bulk load API. You can hace multiple bulk load processes running in parallel.
Other tuning techniques
Source-based performance options, target-based performance options, job design performance options
Source-based performance options
Join ordering, minimizing extracted data, using array fetch size
Target-based performance options
loading method, rows per commit
Job design performance options
loading only changed data, minimizing data type conversion, minimizing locale conversion, precision in operations
If your jobs have CPU-intensive and memory-intensive operations, you can use the following advanced tuning features to improve performance
Parallel processes,parallel threads, server groups and distribution levels
Individual work flows and data flows can execute in parallel if you do not connect them in the Designer workspace.
The software supports partitioned source tables, partitioned target tables, and degree of parallelism. These options allow you to control the number of instances for a source, target, and transform that can run in parallel within a data flow. Each instance runs as a separate thread and can run on a separate CPU.
Server groups and distribution levels
You can group Job Servers on different computers into a logical component called a server group. A server group automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the computer with the lightest load at runtime. This functionality also provides a hot backup method. If one Job Server in a server group is down, another Job Server in the group processes the job. You can distribute the execution of data flows or sub data flows within a batch job across multiple Job Servers within a Server Group to better balance resource-intensive operations.
By pushing down operations to the source database, Data Services reduces the number of rows and operations that the engine must retrieve and process, which improves performance. When determining which operations to push to the database, Data Services examines the database and its environment.
Full Push-Down operations
All transform operations can be pushed down to the databases and the data streams directly from the source database to the target database.
The software does a full push-down operation to the source and target databases when the following conditions are met
All of the operations between the source table and target table can be pushed down. The source and target tables are from the same datastore or they are in datastores that have a database link defined betweern them.
To enable a full push-down from the source to the target, you can also use the following features:
Data_Transfer transform, Database links
When a full push-down operation is not possible, SAP Business Objects Data Services still pushes down the SELECT statement to the source database. Operations within the SELECT statement that the software can push to the database incude:
Aggregations, Distinct rows, Filtering, Joins, Ordering, Projections, Functions,
SAP BusinessObjects Data Services cannot push some transform operations to the database. For example:
Expressions that include functions that do not have database correspondents, Load operations that contain triggers, Transforms other than Query, Joins between sources that are on different database servers that do not have database links defined between them.
to move data from a source or from another transform into the target datastore and enable a full push-down operation (INSERT INTO… SELECT) to the target.
Use the Data_Transfer transform
You can improve the performance of data transformations that occur in memory
by caching as much data as possible.
Use when your data flow processes a small amount of data that fits in memory.
Use when your data flow processes a very large amount of data that does not fit in memory.
Default cache type
No_Cache, Pre_Load_Cache, Demand_Load_Cache
Does not cache any values
Preloads the result column and compare column into memory( it loads the values before executing the lookup.)
Loads the result column and compare column into memory as the function executes.
Use this option when looking up highly repetitive values that are a small subset of the data and when missing values are unlikely.
Three modes of Table_Comparison caching
Row-by-row select, cached comparison table, sorted input (Of the three, row-by-row select will likely be the slowest and sorted input the fastest)
Memory-instensive operations include the following operations:
Distinct, functions such as count_distinct and lookup_ext, Group By, Hierarchy_Flattening, Order By
None, Range, List
this table is not partitioned.
Each partition contains a set of rows with column values less than those specified.
Each partition contains a set of rows that contain the specified column values.
Degree of Parallelism (DOP)
a property of a data flow that defines how many times each transfrom defined in the data flow replicates for use on a parallel subset of data.
The Table Comparison replicates when you use the Row-by-row select and Cached comparison table comparison methods:
Map_Operation, History_Preserving, Pivot
When enabling functions to run in parallel, verify that:
Your database will allow a stored procedure to run in parallel, a custome function set to run in parallel will improve performance.
Parallel process threads option
a performance enhancement for some sources and targets.
the total elapsed time used to read a file source.
Distributing Data Flow provides the following benefits:
Better memory management by taking advantage of more CPU power and physical memory, better job performance and scalability by taking advantage of grid comuputing.
When you execute a job, you can specify the following values on the Distribution level option:
job level, data flow level, sub data flow level,
An entire job can execute on an available Job Server
Data flow level
Each data flow within a job can execute on an available Job Server and can take advantage of additional memory for both in-memory and pageable cache on another computer.
Sub data flow level
A resource-intensive operation within a data flow can execute on an available Job Server. Each operation can take advantage of up to two gigabytes additional memory for both in-memory and pageable cache on another computer.
Some general considerations when using bulk loading and reading are:
Specify bulk-loading options on the Data Services target table editor on the Options and Bulk Loader Options tabs, specify teradata reading options on the source table editor Teradata options tab, most databases do not support bulk loading with a template table.
SAP BusinessObjects Data Services supports the following bulk loading and reading tools:
Parallel Transporter, FastLoad, MultiLoad, Tpump, Load Utility, None (useODBC)
Generic named pipe
loads a large volume of data by writing to a pipe from which Teradata reads.
Source-based performance options
join ordering, minimizing extracted data, using array fetch size
Target-based performance options
loading method, rows per commit
Job design performance options
loading only changed data, minimizing data type conversion, minimizing locale conversion, precision in operations.
Information about administrative tasks such as monitoring, lifecycle management, security , and so on.
Customer Issues Fixed
Informationabout customer issues fixed in this release
Information about how to use SAP BUsinessObjects Data Services Designer
Information about available SAP BusinessObjects Data Services books, languages, and locations.
Installation Guide for Windows
Information about and procedures for installing SAPDS in a Windows environment
Information aboutfor the third-party developers to access SAP
Information about the application, its components and scenarios for planning and designing your system landscape. Information about SAP BusinessObjects Information Steward is also provided in this guide.
Management Console Guide
Information about how to use SAP BusinessObjects Data Services Administrator and SAP BusinessObjects Dat Services Metadata Reports
Performance Optimization Guide
Information about how to improve the performance of SAP DS
Important information you need before installing and deploying this version of SAP DS
Use to manage your production environment including batch job execution, real-time services, Web services, adapter instances, server groups, central and profiler repositories, and more.
Impact and Lineage Analysis
Use to analyze the end-to-end impact and lineage for Data Services tables and columns and SAP BusinessObjects Business Intelligence platform objects such as universes, business views, and reports.
Use to view dashboards of job execution statistics to see at a glance the status and performance of your job executions for one or more repositories overa given time period.
Data Validation Dashboard
Use to evaluate the reliability of your target data based on the validation rules you created in your batch jobs to quickly review, assess, and identify potential inconsistencies or errors in source data.
Use to view, analyze, and print graphical representations of all objects as depicted in the Designer including their relationships, properties, and more.
Data Quality Reports
Use to view and export reports for batch and real-time jobs such as job summaries and data quality transform-specific reports.
Contains the name of repository associated with the Job Server on which you run the batch jobs. To see batch jobs status, connect the repository to the Administrator.
Contains the name of the Access Servers associated with a real-time service. To see real-time jobs status, connect the Access Server to the Administrator.
Contains the name of the repository associated with the Job Server on which you run the adapter.
Contains the name of the repository associated with the Profiler Server.
View status of real-time services and client interfaces supported by this Access Server. Control, restart, and set a service provider interval for this Access Server.
View status for services and service providers, start and stop services, add or remove a service, configure Job Servers for a service.
view status for client interfaces, start and stop interfaces, add or remove an interface.
View list of current Access Server logs, content of each log, clear logs, configure content of logs for display, enable or disable tracing for each Access Server.
View list of historical Access Server logs, view content of each log, delte logs.
The name of the machine that the report server is running on. By default, the current web application server name is used.
The port number of the machine that the report server is running on.
The path where the reports will be exported to.
The format in which the reports can be exported.
Specifies whether existing reports will be overwritten when the reports are exported. If this option is not selected, the reports are exported to a subfolder with the run ID, which specifies a unique identification of an instance of the executed job.
The supported language that the reports are generated in. Note that some reports are designed only to support English.
Data Validation dashboard reports
provide graphical depictions that let you evaluate the reliability of your target data based on the validation rules you created in your SAP DS batch jobs. This feedback allows business users to quickly review, assess, and identify potential inconsistencies or errors in source data.
To enable data validation statistics collection for your reports, you must verify two options
one at the vaidation transform level and one at the job execution level.
avg, count, count_distinct, max, min, sum
Calculates the average of given set of values
counts the number of values in a table column
counts the number of distinct non-null values in a table column.
returns the maximum value from a list
returns the minimum value from a list
calculates the sum of a given set of values
cast, interval_to_char, julian_to_date, load_to_xml, long_to_varchar, num_to_interval, to_char, to_date, to_decimal, to_decimal_ext, varchar_to_long
Converts an expression of one data type to another
Takes an interval and returns a character representation of the interval.
Converts a julian date to a calendar date.
Converts the data represented by the given nested table into XML
Converts a column value of type long to a value of type varchar.
Takes a number and returns an interval.
Converts date or numeric type to string
Converts a string to a date under control of a format string.
Converts a varchar to decimal
Converts a column value of type varchar to a value of type long
decrypt_aes, Decrypt_aes_ext, encrypt_aes, encrypt_aes,ext
Returns decrypted text using AES generated key and specified passphrase. The passphrase must be same as the one used to encrypt the data.
Returns decrypted text using specified passphrase and salt. The passphrase and salt must be same as the one used to encrypt the data.
Returns encrypted text encoded in base64 using AES generated key and specified passphrase.
JDE_Date, JDE_Time, WL_GetKeyValue
Given a Julian date stored in J.D. Edwards database, returns the equivalent value as a DS date data type.
Given a number representing time in J.D.Edwards, returns a DS character value that represents the equivalent military time in HH24:MI:SS format.
Returns the value of the specified key in the given URL
key_generation, sql, total_rows
Reads the max(generated key) column from the input table, uses that as the starting value for generating new key values.
Executes the specified SQL command against the specified DATASTORE.
Returns the total number of rows ina database table or a memory table.
add_months, concat_date_time, date_diff, date_part, day_in_month, day_in_week, day_in_year, fiscal_day, isweekend, julian, last_date, month, quarter, sysdate, systime, week_in_month, week_in_year, year
Adds a given number of months to a date
Given a date value and a time value, concatenates the two values and returns one dateime value.
Returns the difference between dates.
Extracts a portion of a given date as an integer.
Returns the day number relative to the month for the input date.
Returns the day number relative to the week for the input date.
Returns the day number relative to the year for the input date
returns an integer representing the number of fiscal days since the reference date.
Returns 1 if the input date is a weekend, 0 otherwise.
Returns the integer Julian date for the input date.
Returns the last date of the month for a given date.
Returns the month number for the input date
Returns the number of the quarter for the input date.
Returns the current date in the system
Returns the current time in the system
Returns the week number relative to the month for the input date.
Returns the week number relative to the year for the input date.
Returns the year number of the input date.
get_env, get_error_filename, get_monitor_filename, get_trace_filename, is_set_env, set_env
Returns a varchar value for an environment variable set through the set_env() call. If the environment variable is not set, returns NULL.
Returns the full path and file name for the error log
Returns the full path and file name for the monitor log
Returns the full path and file name for the trace log
Returns 1 if the environment variable is set, otherwise returns 0.
Sets a value for an environment variable.
lookup, lookup_ext, lookup_seq
Translates column values using a lookup table
Translates column values using a lookup table, using a column that indicates the logical sequence of rows.
abs, ceil, floor, ln, log, mod, power, rand, rand_ext, round, sqrt, trunc
Returns the absolute value of the input number.
Returns the smallest integer value greater than or equal to the input number.
Returns the largest integer value less than or equal to the input number
Returns the natural logarithm of the given numeric expression.
Returns the base-10 logarithm of the given numeric expression.
Returns the remainder when one number is divided by another.
Returns the value of the given expression to the specified power.
Generates a random number between 0 and 1.
Generates a random number between 0 and 1 using linear congruential algorithm.
Returns the input number rounded to the specified number of decimal places to the right of the decimal point. The number of decimal places defaults to 0. If the number of decimal places is negative, digits left of the decimal point are rounded.
Returns the square root of the input number
Returns the iput number truncated to the specified number of decimal places to the right of the decimal point. The number of decimal places defaults to 0. If the number of decimal places is negative, digits left of the decimal point are truncated.
Returns the source data after decoding the base64-encoded input.
Return base64 encoded stream of input data.
Retrieves the name of the current configuration of a specified datastore.
Returns the name of the current system configuration
Returns the data flow name in which this call exists. If the call is not in a data flow, returns NULL.
Retrieves the value of a specified datastore field.
Retrieves the database name of a specified datastore.
Retrieves the real owner name that corresponds to a specified design name in a specified datastore.
Retrieves the database type of a specified datastore.
Retrieves the database version of a specified datastore.
Computes multiple conditions and returns the expression with the TRUE condition. Otherwise, returns the default expression.
Checks to see if a given file exists.
Returns the generated row number for the current instance.
Generates a number of each row in a group beginning with 1 and incremented by 1.
Returns the domain description for the input value. If no descritption exists, returns NULL.
Returns a specified file attribute
Returns the greatest value in a list of at least one or more expressions.
Returns the name of the computer on which the job is executing
Computes the expression A. If A evaluates to TRUE, returns B. Otherwise, returns C.
Returns 1 if group is changed, 0 otherwise.
Determines if the nested table is empty or not.
Returns the name of the job in which the call to this function exists.
Returns the run identifier of the job in which the call to this function exists.
Returns the lowest value in a list of at least one or more expressions.
Replaces Input with Replacement if Input is NULL.
Returns column value of previous row.
During execution, pushes the SQL string in the second argument to the database represented by the datastore in the first argument for execution.
Calling this function causes an exception to be generated.
Returns the repository name which is the database connection string and owner name.
Sets the checkpoint for the specified SQLServer CDC datastore with change data tracking ‘CDC’
Suspends the execution of a data flow or work flow for the specified milliseconds.
Returns the user name used to log into the Job Server operating system.
Retrieves the value of the specified table attribute.
Deletes all rows in a memory table
Notify of the existence of files that match the input file name pattern.
Returns the innermost work flow name in which this call exists.
Executes the process chain in the specified SAP datastore.
Notifies SAP NetWeaver BW whether DS was able to read the Open Hub table.
Returns the input string with the characters specified in the trim string removed. The input string is scanned from the left. When the first character not in the trim string is encountered, the result is returned.
Returns the input string with the blanks (space characters) on the left removed.
Searches input string for specified pattern
Searches input string for specified regular expression pattern
Searches input strings for the specified simple pattern.
Prints the input string to the trace log file.
Returns a string where every occurrence of a given search string in the input string is replaced by the given replacement string.
Right pads or truncates char1 to display length n with char2, replicated as many times as necessary.
Right pads or truncates char1 to display length n with char2, replicated as many times as necessary. Supports logical characters
Returns the input string with characters specified in the trim string removed. The input string is scanned from the right. When the first character not in the trim string is encountered, the result is returned.
Returns the input string with the blanks (space characters) on the right removed.
Returns the input string with the blanks( spaces, tabs, new lines, and other non-visible characters) on the right removed.
Performs a search and replace on input parameters by matching specific criteria and values given in the search table specification.
Returns the soundex encoding of the input string
Returns the portion of the input string specified by the offset and length.
Converts the input string to upper case
This function returns one word from an input string. Words are delimited by white space. The word number indicates which word to return.
This function returns one word from an input string. Words are delimited by a user-specified separator character. The word number indicates which word to return.
exec, mail_to, smtp_to
Executes an executable or a batch file with the specified parameters.
Sends e-mail (subject and message) to the provided recipients list.
Sends e-mail (subject and message) to the provided recipients list via SMTP.
Returns 1 if the input string is a valid date, 0 otherwise.
Returns 1 if the input string is a valid datetime, 0 otherwise.
Returns 1 if the input string is a valid decimal, 0 otherwise.
Returns 1 if the input string is a valid double, 0 otherwise.
Returns 1 if the input string is a valid integer, 0 otherwise.
Returns 1 if the input string is a valid real, 0 otherwise.
Returns 1 if the input string is a valid time, 0 otherwise.