Use of Patindex and Parsename To bring Unstructured Record into Structure format

Rajnilari2015
Posted by in Sql Server category on for Beginner level | Points: 250 | Views : 2980 red flag

In this article we will learn the use of PATINDEX and PARSENAME to extract relevant information from unstructured data.

Introduction

Let's say we have some records as under

Data
------
988455XXXX Niladri <html><body>This is Niladri's mobile number</body></html>
988785XXXX Arina <html><body>This is Arina's mobile number</body></html>
45/78,KRRoad,Kanoji-34**** Babai <html><body>This is Babai's house</body></html>

The objective is to bring the record into a relational format such as

MobileNumber OR Address			Owner Name	Details
-----------------------			----------	-------
988455XXXX				Niladri		<html><body>This is Niladri's mobile number</body></html>
988785XXXX				Arina		<html><body>This is Arina's mobile number</body></html>
45/78,KRRoad,Kanoji-34****		Babai		<html><body>This is Babai's house</body></html>

Straight to program

As a first activity, let us separate the HTML portion from the rest. For doing so, we will use PATINDEX to get the starting index of the character(<) as under

SELECT 
 Data,[StartIndex of <]=PATINDEX('%<%',Data)
 FROM @T
 /*

Data											StartIndex of <
----											------------------

988455XXXX Niladri <html><body>This is Niladri's mobile number</body></html>		20
988785XXXX Arina <html><body>This is Arina's mobile number</body></html>		18
45/78,KRRoad,Kanoji-34**** Babai <html><body>This is Babai's house</body></html>	34
 */

We can now use substring to extract the relevant portions as under

SELECT 
 BeforeHtmlContents = SUBSTRING(Data,1,PATINDEX('%<%',Data)-1)
 ,HtmlContents = SUBSTRING(Data,PATINDEX('%<%',Data),LEN(Data))
 FROM @T

 /*

BeforeHtmlContents			HtmlContents
------------------			------------
988455XXXX Niladri 			<html><body>This is Niladri's mobile number</body></html>
988785XXXX Arina 			<html><body>This is Arina's mobile number</body></html>
45/78,KRRoad,Kanoji-34**** Babai 	<html><body>This is Babai's house</body></html>
 */

Now let us replace the white spaces of the BeforeHtmlContents column's data with dot(.) as under

SELECT 
 BeforeHtmlContents = REPLACE(RTRIM(SUBSTRING(Data,1,PATINDEX('%<%',Data)-1)),' ','.')
 --,HtmlContents = SUBSTRING(Data,PATINDEX('%<%',Data),LEN(Data))
 FROM @T
 /*

BeforeHtmlContents
--------------------
988455XXXX.Niladri
988785XXXX.Arina
45/78,KRRoad,Kanoji-34****.Babai
 */

This will be needed at the time of using PARSENAME function. Finally we need to separate the MobileNumber OR Address field from the Owner Name which we are doing as under

SELECT
	[MobileNumber OR Address]=PARSENAME(BeforeHtmlContents, 2)
	,[Owner Name]=PARSENAME(BeforeHtmlContents, 1)	
	,Details = HtmlContents
FROM
(SELECT 
 BeforeHtmlContents = REPLACE(RTRIM(SUBSTRING(Data,1,PATINDEX('%<%',Data)-1)),' ','.')
 ,HtmlContents = SUBSTRING(Data,PATINDEX('%<%',Data),LEN(Data))
 FROM @T)X

 /*

MobileNumber OR Address			Owner Name	Details
-----------------------			----------	-------
988455XXXX				Niladri		<html><body>This is Niladri's mobile number</body></html>
988785XXXX				Arina		<html><body>This is Arina's mobile number</body></html>
45/78,KRRoad,Kanoji-34****		Babai		<html><body>This is Babai's house</body></html>


 */

The complete TSQL Script is given below

DECLARE @T TABLE(Data VARCHAR(100))
INSERT INTO @T VALUES('988455XXXX Niladri <html><body>This is Niladri''s mobile number</body></html>')
INSERT INTO @T VALUES('988785XXXX Arina <html><body>This is Arina''s mobile number</body></html>')
INSERT INTO @T VALUES('45/78,KRRoad,Kanoji-34**** Babai <html><body>This is Babai''s house</body></html>')

SELECT
	[MobileNumber OR Address]=PARSENAME(BeforeHtmlContents, 2)
	,[Owner Name]=PARSENAME(BeforeHtmlContents, 1)	
	,Details = HtmlContents
FROM
(SELECT 
 BeforeHtmlContents = REPLACE(RTRIM(SUBSTRING(Data,1,PATINDEX('%<%',Data)-1)),' ','.')
 ,HtmlContents = SUBSTRING(Data,PATINDEX('%<%',Data),LEN(Data))
 FROM @T)X

Conclusion

This article taught us how to use PARSENAME and PATINDEX together to bring an unstructured record to a structured one. Hope this will be helpful. Thanks for reading.

Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)