Подсчет количества слов в тексте
На данном сайте есть задание: посчитать количество слов.
Словами считаются непрерывные последовательности английских алфавитных символов (от А до z)
Вот примеры:
Hello there, little user5453 374 ())$. I’d been using my sphere as a stool. Slow-moving target 839342 was hit by OMGd-63 or K4mp.
contains "words"
['Hello', 'there', 'little', 'user', 'I', 'd', 'been', 'using', 'my','sphere', 'as', 'a', 'stool', 'Slow', 'moving', 'target', 'was', 'hit', 'by', 'OMGd', 'or', 'K', 'mp']
Некоторые из слов следует исключить. Вот эти слова:
"a", "the", "on", "at", "of", "upon", "in" and "as", case-insensitive.
Моё решение такое:
Разбиваем по словам - убираем всё лишнее. В цикле проверяем, не запрещённое ли слово или пустая строка и увеличиваем счетчик
import codewars_test as test
import re
except_word = ["a", "the", "on", "at", "of", "upon", "in", "as"]
def word_count(s):
all_word = re.sub(r'([^A-Za-z]+)', r' ', s).split(' ')
print(all_word)
cnt = 0
for word in all_word:
if word in except_word or word == '':
continue
else:
cnt += 1
return cnt
if __name__ == '__main__':
test.assert_equals(word_count("hello there"), 2)
test.assert_equals(word_count("hello there and a hi"), 4)
test.assert_equals(word_count("I'd like to say goodbye"), 6)
test.assert_equals(word_count("Slow-moving user6463 has been here"), 6)
test.assert_equals(word_count("%^&abc!@# wer45tre"), 3)
test.assert_equals(word_count("abc123abc123abc"), 3)
test.assert_equals(word_count("Really2374239847 long ^&#$&(*@# sequence"), 3)
long_text = r"""
I’d been using my sphere as a stool. I traced counterclockwise circles on it with my fingertips and it shrank until I could palm it. My bolt had shifted while I’d been sitting. I pulled it up and yanked the pleats straight as I careered around tables, chairs, globes, and slow-moving fraas. I passed under a stone arch into the Scriptorium. The place smelled richly of ink. Maybe it was because an ancient fraa and his two fids were copying out books there. But I wondered how long it would take to stop smelling that way if no one ever used it at all; a lot of ink had been spent there, and the wet smell of it must be deep into everything.
"""
test.assert_equals(word_count(long_text), 112)
Базовые тесты:
test.assert_equals(word_count("hello there"), 2)
test.assert_equals(word_count("hello there and a hi"), 4)
test.assert_equals(word_count("I'd like to say goodbye"), 6)
test.assert_equals(word_count("Slow-moving user6463 has been here"), 6)
test.assert_equals(word_count("%^&abc!@# wer45tre"), 3)
test.assert_equals(word_count("abc123abc123abc"), 3)
test.assert_equals(word_count("Really2374239847 long ^&#$&(*@# sequence"), 3)
Функция проходит все базовые тесты, не выдаёт правильный ответ на длинном тексте
вот пример базовый:
Example Input 2
I’d been using my sphere as a stool. I traced counterclockwise circles on it with my fingertips and it shrank until I could palm it. My bolt had shifted while I’d been sitting. I pulled it up and yanked the pleats straight as I careered around tables, chairs, globes, and slow-moving fraas. I passed under a stone arch into the Scriptorium. The place smelled richly of ink. Maybe it was because an ancient fraa and his two fids were copying out books there. But I wondered how long it would take to stop smelling that way if no one ever used it at all; a lot of ink had been spent there, and the wet smell of it must be deep into everything.
Example Output 2
112
Моя функция выдаёт 113 Может, неправильно понимаю задание (перевод с Google)? В случайных тестах на сервере моё решение всегда от 1 до 3 лишних слов выдаёт.
Объясните, пожалуйста, в чём причина? Как будто что-то (а что, не пойму) забываю удалить.
Ответы (1 шт):
В задании написано: case-insensitive.
А ведь слово может начинаться и с большой буквы, и, например, "and" и "And" — разные слова.