Значение столбца в список Pandas

Question

Pandas, таблица. У меня есть значения в столбце 'industry_type'

Я хочу преобразовать их, чтобы они разбились на:

['Banking', 'E-Commerce', 'Internet', 'Mobile', 'Telecommunications']

для каждого 'organizationID' соответственно, но не знаю как это сделать.

Для начала я удалила запятые, использовав:

cols = ['industry_type']
for col in cols: 
    df[col] = df[col].str.replace(',', ' ')

Получила следующее:

Чтобы преобразовать в строку столбец 'industry_type' использовала:

list_indus = df['industry_type'].tolist()

Но в результате получаю:

['Banking E-Commerce Internet Mobile Telecommunications', 'Advertising Internet Online Games Online Portals Social Media Marketing',...

Только начинаю программировать и работать с pandas, подскажите пожалуйста, что нужно сделать, может цикл какой...

Answer 1

Не совсем понятно конечно, что вы хотите.

dict_of = {"org_id": [1, 2, 3],
     "org_name": ['bank1', 'bank2', 'bank3'],
     'in_type': [['Banking', 'E-Commerce', 'Internet', 'Mobile', 'Telecommunications'], ...]}
df = pd.DataFrame(dict_of)

>>> 
   org_id org_name                                            in_type
0       1    bank1  [Banking, E-Commerce, Internet, Mobile, Teleco...
1       2    bank2  [Banking, E-Commerce, Internet, Mobile, Teleco...
2       3    bank3  [Banking, E-Commerce, Internet, Mobile, Teleco...

Вариант 1

>>> print(df.explode('in_type'))

   org_id org_name             in_type
0       1    bank1             Banking
0       1    bank1          E-Commerce
0       1    bank1            Internet
0       1    bank1              Mobile
0       1    bank1  Telecommunications
1       2    bank2             Banking
1       2    bank2          E-Commerce
1       2    bank2            Internet
1       2    bank2              Mobile
1       2    bank2  Telecommunications
2       3    bank3             Banking
2       3    bank3          E-Commerce
2       3    bank3            Internet
2       3    bank3              Mobile
2       3    bank3  Telecommunications

Вариант 2

>>> print(df.merge(df.in_type.apply(pd.Series), right_index=True, left_index=True))


   org_id org_name        0           1         2       3                   4
0       1    bank1  Banking  E-Commerce  Internet  Mobile  Telecommunications
1       2    bank2  Banking  E-Commerce  Internet  Mobile  Telecommunications
2       3    bank3  Banking  E-Commerce  Internet  Mobile  Telecommunications

А может и вообще совсем другое :)

UPD

df['in_type'] = df['in_type'].apply(lambda x: x.split(', '))

>>> df
   org_id org_name                                            in_type
0       1    bank1  [Banking, E-Commerce, Internet, Mobile, Teleco...
1       2    bank2  [Banking, E-Commerce, Internet, Mobile, Teleco..

Answer 2

Предположим, есть датафрейм:

   moo  org industry_type
0   11    1       a, b, c
1   22    2           foo
2   33    3           bar

res = df["industry_type"].str.split(",").explode().str.strip().to_frame().combine_first(df)

res:

  industry_type  moo  org
0             a   11    1
0             b   11    1
0             c   11    1
1           foo   22    2
2           bar   33    3

БЛОГ НА HUSL

Значение столбца в список Pandas

Ответы (2 шт):