XML

XML

What is XML?
eXtensible Markup Language
A derivative of the lesser known SGML
Originally designed to be a flexible text format for electronic publishing
Plays an important role in data exchange on the Web and in other application areas
Designed for interoperability with SGML as well as HTML

Features of XML
A cross-platform, software and hardware independent means for transmitting information
Describes a class of data objects known as XML Documents
Partially describes the behavior of computer programs which process these documents
More expressive than HTML
XML uses a Document Type Definition (DTD) or an XML Schema to describe the data

Expressiveness of XML
HTML has nearly 100 different pre-defined elements with each element having several attributes
XML has no pre-defined elements (tags). The expressiveness of XML is attributed to its simplicity and not a plethora of elements
Tags have to be “invented” by the author of an XML document!
Self descriptive

XML and HTML
HTML was designed to display data and to focus on how data looks
XML was designed to describe data and to focus on what data is
XML was designed to carry data – data exchange
XML isn’t a replacement for HTML
XML is a complement to HTML

Data Exchange with XML
Data can be exchanged between incompatible systems 
Converting the data to XML can greatly reduce the complexity of data exchange between incompatible systems over the internet and create data that can be read by many different types of applications 
Ex: XML can be used in B2B (Business to Business) systems

Data Sharing with XML
Since XML is independent of hardware, software and application, you can make your data available to other than only standard HTML browsers 
Applications can access your XML files as data sources, like they are accessing databases 

XML Tags
In XML, it is illegal to omit closing tags
This is contrary to HTML
In XML, all elements must have a closing tag
The XML Declaration is not an element. So, it won’t have a closing tag.
XML tags are case-sensitive
Obviously, the tags must be properly nested

Other facets of XML
Attribute values must be quoted (“ ” or ‘ ‘)
In XML, whitespaces are preserved; Not truncated as in HTML

XML Elements
XML Elements are extensible
An application that is created to extract the elements in a XML document and produce an output should be able to produce the same output, even if an extra element is added to the document.
XML elements have relationships – Parent-Child relationships

XML Elements
An XML element is everything from (including) the element's start tag to (including) the element's end tag
An element can have element content, mixed content, simple content, or empty content
An element can also have attributes 

Naming Convention for Elements
Names can contain letters, numbers, and other characters 
Names must not start with a number or punctuation character 
Names must not start with the letters xml (or XML or Xml ..) 
Names cannot contain spaces
Any name can be used, no words are reserved, but the idea is to make names descriptive 


XML Documents
Two kinds of XML Documents
Valid XML
Well-formed XML
XML validated against a DTD is Valid XML
XML with correct syntax, but without a DTD, is Well-formed XML

XML DTD
Document Type Definition
A DTD defines the legal elements of an XML document
It defines the document structure with a list of legal elements
A DTD can be declared inline in your XML document, or as an external reference 


Why is a DTD needed?
With DTD, each of your XML files can carry a description of its own format with it
With a DTD, independent groups of people can agree to use a common DTD for interchanging data
Your application can use a standard DTD to verify that the data is valid 

XML Building Blocks – DTD Viewpoint
Elements -  “to”, “from”, “heading”, etc.
Tags  - used to markup elements
Attributes - extra information about elements
Entities
PCDATA 
CDATA 

Entities
Entities are variables used to define common text
Entity references are references to entities 
Entities are expanded when a document is parsed by an XML parser


PCDATA
PCDATA means parsed character data
Text found between the start tag and the end tag of an XML element
Text that will be parsed
Tags inside the text will be treated as markup and entities will be expanded


CDATA
CDATA means character data
Text that will NOT be parsed
Tags inside the text will NOT be treated as markup and entities will not be expanded


XML Schema
W3C supports an alternative to DTD called XML Schema
An XML schema describes the structure of an XML document
XML Schema Language is used for schema definition


XML Schema
defines elements that can appear in a document 
defines attributes that can appear in a document 
defines which elements are child elements 
defines the order of child elements 
defines the number of child elements 
defines whether an element is empty or can include text 
defines data types for elements and attributes 
defines default and fixed values for elements and attributes 


XML Schema – Successor of DTD
XML Schemas are extensible to future additions 
XML Schemas are richer and more useful than DTDs 
XML Schemas are written in XML 
XML Schemas support data types 
XML Schemas support namespaces (more about this in the forthcoming session)






0 comments: