Extraction of BLOB Content using Pentaho Kettle CE

This document describe the step by step method to extract BLOB content and stored in the local machine using pentaho Kettle Community Edition.

We assume that database table is already having data of the BLOB, if not, please create the table and stored the png/pdf/txt/etc file in BLOB column.

For understanding purpose, we have created following table in MySQL and stored pdf/png/txt file.

 

CREATE TABLE tbl_blob_demo (
id int(11) NOT NULL AUTO_INCREMENT,
filetype varchar(30) NOT NULL,
filecontent mediumblob NOT NULL,
PRIMARY KEY (id)
) CHARSET=utf8;

Data will look like as below,

1

Now, Create a transformation in the pentaho kettle to extract the BLOB content (i.e. file) and store in local machine. Below are the step by step method to perform in Pentaho Kettle:

  • Take Table Input Step and write query (for understanding we are extracting PNG file).

SELECT *   FROM tbl_blob_demo  WHERE filetype =’png’

  • Now, take the User defined java class steps and write below code so, it will store the png file at specified location.

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
String filename = “C:/test.png”;
File f = new File(filename);
FileOutputStream fos = null;
BufferedOutputStream bos = null;
try{
if (!f.exists()){ f.createNewFile(); }
fos = new FileOutputStream(f);
bos = new BufferedOutputStream(fos);
byte[] blobBytes = (byte[])get(Fields.In, “filecontent”).getBinary(r);
//byte[] blobBytes = (byte[])get(Fields.In, “<<blob column_name>>”).getBinary(r);
bos.write( blobBytes );
bos.flush();
bos.close();
}
catch(IOException e){}
return true;
}

Note: We can write dynamic path/filename as well but we have fixed it for simplicity of the code to understand

So, the transformation will look like as below

2

 

Thanks

Naimish | Sr. BI Developer

 

Senior BI Developer

Integration of LDAP in Pentaho BI-Server CE V5.2

This document describe the step by step method to integrate LDAP/Active Directory with the Pentaho BI-Server Community Edition Version 5.2.

We assume that you have already setup LDAP in your environment with the users, roles, Organization unit, etc. For the simplicity, we have created following users and other related object in the LDAP as shown in the below print screen.

  • dc = specindia
  • dc = com
  • cn = admin
  • ou = system
    • ou = roles
    • ou = users

Note: admin users have administrator rights.

LDAP setting

  • Now, we need to integrate LDAP information in the following pentaho configuration files located at “biserver-ce-5.2.0.0-209\biserver-ce\pentaho-solutions\system”.
    • applicationContext-security-ldap.properties
    • spring.properties
    • properties
  • applicationContext-security-ldap.properties

We need to provide LDAP users, password, server IP, etc information in this file. Following are the samples lines based on the above LDAP print screen.

contextSource.providerUrl=ldap\://10.37.55.10\:389/ou\=system,dc\=specindia,dc\=com
contextSource.userDn=cn=admin,dc=specindia,dc=com
contextSource.password=adminpassword

userSearch.searchBase=ou\=users
userSearch.searchFilter=(cn\={0})

populator.convertToUpperCase=false
populator.groupRoleAttribute=cn

populator.groupSearchBase=ou\=roles
populator.groupSearchFilter=(roleOccupant\={0})
populator.rolePrefix=
populator.searchSubtree=true

allAuthoritiesSearch.roleAttribute=cn
allAuthoritiesSearch.searchBase=ou\=roles
allAuthoritiesSearch.searchFilter=(objectClass\=organizationalRole)

allUsernamesSearch.usernameAttribute=uid
allUsernamesSearch.searchBase=ou\=users
allUsernamesSearch.searchFilter=objectClass\=Person

adminRole=cn\=Administrator,ou\=roles
adminUser=uid\=admin,ou\=users

  • spring.properties

We did NOT required to update this file if we have created “admin” users and “Administrator” role.

In this example, we have created “admin” user with the “Administrator” rights so we did not required to update this file.

But, if system admin has created different user other than admin and different role other than “Administrator” then they need to update following lines of this file.

singleTenantAdminDefaultUserName=admin
singleTenantAdminUserName=admin
singleTenantAdminDefaultAuthorityName=Administrator
singleTenantAdminAuthorityName=Administrator

  • properties

In this file, we need to update with the LDAP security instead jackrabbit. So, after modification file look like as below.

provider=ldap

Now, start the Pentaho CE (tomcat) server.

 

How to Integrated Kettle/PDI with Hadoop

How to Integrated Kettle with Big Data [Hadoop]

Big data is a collection of data sets so large and complex that it becomes difficult to process using tools or traditional data processing applications. The challenges include capture, storage, search, sharing, transfer, analysis and visualization.

Hadoop is a type of Big Data and it provides ability to manage large number of data. Basically, Hadoop work on Map/Reduce algorithm which required knowledge of JAVA to write this algorithm. But pentaho kettle has ability to get rid of this issue. Using Pentaho kettle user can easily write map/reduce algorithms to read/write data in the hadoop and more importantly it does not require any JAVA knowledge. The below example show you how to integrate PDI (kettle) with Hadoop.

Step 1: The first step is to create Master job which should store normal file in the HDFS file system and execute other transformation called mapper, reducer, etc.

Hadoop1

Step 2: Create mapper transformation. The below image shows the mapper transformation. Here, in the below example, mapper will store IP address as a key and access URL as a value.

Mapper

Step 3: Create reducer transformation. The below image shows the reducer transformation. Here, in the reducer transformation, it will calculate total no. of hits for the URL by from individual IP address.

Hadoop3

Step 4: Integrate Mapper/Reducer job with other parameter like no. of node, etc as shown in the below image:

Integrated

PDI kettle community edition gives “Unable to close openFile file” error when writing data in Excel

PDI kettle community edition 4.4 gives “Unable to close openFile file” error
when writing data in excel file using the template option in Linux. As shown in the below image:

untitled

User will receive  error as Community version of PDI has bug in the Linux OS. To solve this bug, Need to perform following steps.

  1. Open kitchen.sh
  2. Add “-Dfile.encoding=Cp1252” word in the OPT variable. As shown in the below image.

Bug_kettle_excel_writing

 

3.  Save and Close

 

OPatch failed with error code = 73

When user installing Patch 10.2.0.5.4 (patchset 4) then will get below error

OPatch failed with error code = 73

Solution :

1. Login your source directory (eg. cd /source/db_10g_patch_10205_4/1241922

2. set OPatch directory path (eg. export PATH=$PATH:$ORACLE_HOME/OPatch

3. Set oracle sid (eg. export ORACLE_SID=orcl)

4. execute opatch command (opatch apply)

5. Exeucte some script (cd $ORACLE_HOME/rdbms/admin)

6. Start database

7. Execute catbundle.sql (sql> @catbundle.sql psu apply )

8. Execute utlrp.sql (sql> @utlrp.sql)

 

It is not what we take up, but what we give up, that makes us rich

Kettle.properties file in Kettle (KETTLE_HOME)

We can create kettle.properties and repositories.xml file at our user defined path
 
To implement user defined path for KETTLE_HOME, perform following steps
 
1) Add following line in  …../set-pentaho-env.sh file
    export KETTLE_HOME=/pentaho/kettle/config
 
2) create one folder .kettle at /pentaho/kettle/config and add all properties files
    – repositories.xml
    – kettle.properties

User define log table in Kettle Transformation

We can provide user define table in kettle transformation. But we must add following column in transformation log table

TRANSNAME
STATUS
ERRORS
LOGDATE
STARTDATE
ENDDATE
DEPDATE

Get Positional String from comma seperated String

Plsql + retrieve particular postitional string from comma seperated

CREATE OR REPLACE FUNCTION func_get_str (icString varchar2, inPosition NUMBER, icSeperator VARCHAR2 :=’,’)

RETURN VARCHAR2
IS
v_ret_string VARCHAR2(300);

BEGIN

SELECT substr(str,
instr(str,’,’,1,rid)+1,
(length( substr(str,instr(str,’,’,1,rid)+1 ) ) – length(substr(str,instr(str,’,’,1,rid+1)) )) ) finalString
INTO v_ret_string
FROM
(
SELECT icSeperator || icString || icSeperator AS STR
FROM DUAL
), (
SELECT ROWNUM RID
FROM USER_TABLES
)
WHERE RID <= (LENGTH(STR) – LENGTH (REPLACE(STR,’,’))-1)
and rid=inPosition;

Return v_ret_string;
EXCEPTION
WHEN OTHERS THEN
RAISE;
END;
/

Calling Function

SELECT func_get_str ( ”’abc”,’prq”,2000,10,20′,6,’,’)
FROM DUAL;