Excel, при открытии больших html-таблиц тег colspan игнорируется
Есть большие таблицы (~300 000 строк), описанные как HTML-документ. Начало документа интерпретируется правильно, но в какой-то момент столбцы начинают двигаться, то есть тег colspan просто игнорируется.
Таблицы описанные примерно так:
<html>
<table border="2" rules="all">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<thead>
<tr>
<th>col 1</th>
<th>col 2</th>
<th colspan="3" >col 3</th>
<th>col 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td colspan="3">3</td>
<td>4</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
<td colspan="3">7</td>
<td>8</td>
</tr>
<tr>
<td>9</td>
<td>10</td>
<td colspan="3">11</td>
<td>12</td>
</tr>
<tr>
<td>13</td>
<td>14</td>
<td colspan="3">15</td>
<td>16</td>
</tr>
<tr>
<td>17</td>
<td>18</td>
<td colspan="3">19</td>
<td>20</td>
</tr></tbody>
</table>
</html>
Но получается как то так...
_____________________________________________ | col 1 | col 2 | col 3 | col 4 | --------------------------------------------- | 1 | 2 | 3 | | | 1 | 2 | 3 | 4 | | 1 | 2 | 3 | 4 | ... ~20 000 - 250 000 | 1 | 2 | 3 | | | 4 | | 1 | 2 | 3 | | | 4 | ---------------------------------------------
А должно быть так
_____________________________________________ | col 1 | col 2 | col 3 | col 4 | --------------------------------------------- | 1 | 2 | 3 | | | 1 | 2 | 3 | 4 | | 1 | 2 | 3 | 4 | ... ~20 000 - 250 000 | 1 | 2 | 3 | 4 | | 1 | 2 | 3 | 4 | ---------------------------------------------
Я конвертирую HTML в Excel (xlsx\xlsb) с помощью VBS, но тот же эффект будет, если вы просто откроете документ в Excel и позволите ему распознаться. Использую Excel 2019.
Для примера добавлю скрипт на VBS, стабильно ломается на 32766 строке...
dim objFSO, objFile, CNT
CNT = 100000
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.CreateTextFile("c:\raw.html", True)
objFile.WriteLine("<html><table border=2 rules='all'><col width=50><col width=50><col width=50><col width=50><col width=50><col width=50><thead><tr><th>col 1</th><th>col 2</th><th colspan=3 >col 3</th><th>col 4</th></tr></thead><tbody>")
For i = 1 To CNT
objFile.WriteLine("<tr><td>1</td><td>2</td><td colspan=3>3</td><td>4</td></tr>")
Next
objFile.WriteLine("</tbody></table></html>")
objFile.Close
WScript.StdOut.Write "Raw file written: " & CNT & " lines. Star conversion..." & vbCrLf
dim app, wbk
dim indx
Set app = CreateObject("Excel.Application")
Set wbk = app.Workbooks.Open("c:\raw.html")
app.DisplayAlerts = False
wbk.SaveAs "c:\conv.xlsx", 51
WScript.StdOut.Write "The file has been converted. Star checks..." & vbCrLf
set sht1 = wbk.Sheets(1)
indx = 1
For Each cell In sht1.Range("C:C")
If not cell.MergeCells Then
WScript.StdOut.Write "Error in line " & indx & vbCrLf
Exit For
End If
indx = indx + 1
if indx > CNT Then
WScript.StdOut.Write "Everything is fine" & vbCrLf
Exit For
End If
Next
wbk.Close False
app.Quit
set sht1 = Nothing
set wbk = Nothing
set app = Nothing
Нашел аналогичный вопрос у microsoft https://answers.microsoft.com/ru-ru/msoffice/forum/all/%D0%BE%D1%88%D0%B8%D0%B1%D0%BA%D0%B0-%D0%B2-excel/7ef72a2b-7416-4bf7-ac91-4e86f10510d2
https://stackoverflow.com/q/77461415/22894332
Ответы (1 шт):
Обходное решение - встроенные таблицы пачками по 32000 записей
<html>
<table border="2" rules="all">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<col width="50">
<thead>
<tr>
<th>col 1</th>
<th>col 2</th>
<th colspan="3" >col 3</th>
<th>col 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td colspan="3">3</td>
<td>4</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
<td colspan="3">7</td>
<td>8</td>
</tr>
<tr>
<td>9</td>
<td>10</td>
<td colspan="3">11</td>
<td>12</td>
</tr>
<tr>
<td>13</td>
<td>14</td>
<td colspan="3">15</td>
<td>16</td>
</tr>
<tr>
<td>17</td>
<td>18</td>
<td colspan="3">19</td>
<td>20</td>
</tr>
</tbody>
<table border="2" rules="all">
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td colspan="3">3</td>
<td>4</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
<td colspan="3">7</td>
<td>8</td>
</tr>
<tr>
<td>9</td>
<td>10</td>
<td colspan="3">11</td>
<td>12</td>
</tr>
<tr>
<td>13</td>
<td>14</td>
<td colspan="3">15</td>
<td>16</td>
</tr>
<tr>
<td>17</td>
<td>18</td>
<td colspan="3">19</td>
<td>20</td>
</tr>
</tbody>
</table>
</table>
</html>