I am trying to create an XML file using VB.NET for a system that accepts XML information for publication on their website. Even though special characters like © & ® are acceptable in the XML, they really should be encoded to ©
and ®
.
The problem that I am having is when I insert ©
into the XML, it results in a double-encoding of ©
in the XML, if I insert ©, then © is inserted unencoded.
I created a bare-bones example that I have replicated the problem with below, it has the same XML structure as what I need. I'm using HtmlAgilityPack to convert entities.
Imports HtmlAgilityPack
Private Sub webXML()
Dim oXml As New XmlDocument
oXml.LoadXml("<webTable xmlns=""http://ift.tt/1Jk9NWv"" xmlns:n1=""http://ift.tt/1IP9A1K"" xmlns:xsi=""http://ift.tt/1fQ42a7"" xsi:schemaLocation=""http://ift.tt/1IP9A1L""></webTable>")
''Add Namespace
Dim NS As New Xml.XmlNamespaceManager(oXml.NameTable)
NS.AddNamespace("ns", "http://ift.tt/1Jk9NWv")
NS.AddNamespace("n1", "http://ift.tt/1IP9A1K")
NS.AddNamespace("xsi", "http://ift.tt/1fQ42a7")
''Create XML declaration
Dim xmldecl As XmlDeclaration
xmldecl = oXml.CreateXmlDeclaration("1.0", "UTF-8", Nothing)
xmldecl.Encoding = "UTF-8"
''Add node to document
Dim root As XmlElement = oXml.DocumentElement
oXml.InsertBefore(xmldecl, root)
''info
Dim info As XmlNode = oXml.CreateNode("element", "info", "http://ift.tt/1Jk9NWv")
''data1
Dim data1 As XmlNode = oXml.CreateNode("element", "data1", "http://ift.tt/1Jk9NWv")
Dim data1Value As String = HtmlEntity.Entitize(Trim("Company Name 1 ©"), True)
Dim data1Text As XmlText = oXml.CreateTextNode(data1Value)
data1.AppendChild(data1Text)
info.AppendChild(data1)
Console.WriteLine("Data1 value: " + data1Value)
Console.WriteLine("Data1 text node value: " + data1Text.Value)
Console.WriteLine("Data1 node text value: " + data1.InnerText)
Console.WriteLine("Data1 node XML value: " + data1.InnerXml)
''data2
Dim data2 As XmlNode = oXml.CreateNode("element", "data2", "http://ift.tt/1Jk9NWv")
Dim data2Value As String = Trim(HtmlEntity.Entitize("Company Name 2 ®", False))
data2.InnerText = data2Value
info.AppendChild(data2)
Console.WriteLine("Data2 value: " + data2Value)
Console.WriteLine("Data2 node text value: " + data2.InnerText)
Console.WriteLine("Data2 node XML value: " + data2.InnerXml)
''data3
Dim data3 As XmlNode = oXml.CreateNode("element", "data3", "http://ift.tt/1Jk9NWv")
Dim data3value As String = Trim(HtmlEntity.Entitize("Company Name 3 ®", False))
data3.InnerXml = data3value
info.AppendChild(data3)
Console.WriteLine("Data3 value: " + data3value)
Console.WriteLine("Data3 node text value: " + data3.InnerText)
Console.WriteLine("Data3 node XML value: " + data3.InnerXml)
''Add info to Root
root.AppendChild(info)
oXml.Save(Console.Out)
oXml.Save("C:\Users\Chris\Dropbox\SECUREX\Junk\textXML.xml")
End Sub
The output I get is:
Data1 value: Company Name 1 ©
Data1 text node value: Company Name 1 ©
Data1 node text value: Company Name 1 ©
Data1 node XML value: Company Name 1 &copy;
Data2 value: Company Name 2 ®
Data2 node text value: Company Name 2 ®
Data2 node XML value: Company Name 2 &#174;
Data3 value: Company Name 3 ®
Data3 node text value: Company Name 3 ®
Data3 node XML value: Company Name 3 ®
<?xml version="1.0" encoding="Windows-1252"?>
<webTable xmlns="http://ift.tt/1Jk9NWv" xmlns:n1="http://ift.tt/1IP9A1K" xmlns:xsi="http://ift.tt/1fQ42a7" xsi:schemaLocation="http://ift.tt/1IP9A1L">
<info>
<data1>Company Name 1 &copy;</data1>
<data2>Company Name 2 &#174;</data2>
<data3>Company Name 3 ®</data3>
</info>
</webTable>
As you can see I have tries several different ways and I still can't get the entities encoded right.
Please note that oXml.Save(Console.Out)
shows Windows-1252
encoding but my output file is identical except it shows it properly as UTF-8
.
I'm using VS 2012 Express.
Any idea what I can do to encode the HTML entities properly?
Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire